Bellman Principle - Optimal and Predictive Control

According to [Bellman1966] "an optimal policy has the property that, whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." When applied recursively, this means that decisions at all states of an optimal trajectory must appear to be optimal when considering only the future behavior of the system. While the statement might sound rather vague, Bellman's principle is the foundation of optimal control, applicable to both continuous-time and discrete-time systems and trajectories analyzed not only on finite but also infinite time horizons.

In the following subsections we will overview the Bellman and the Hamilton-Jacobi-Bellman (HJB) equation which are the consequence of Bellman's principle when applied to the discrete-time and continuous-time optimal control problems. Regardless of the representation, the idea is find a sequence of control inputs that minimize the total cost of a trajectory, defined on a horizon. Instead of looking for the entire sequence at once, the Bellman principle allows us to break it down into a sequence of problems, described by the aforementioned equations. For their derivation we introduce the concept of the cost-to-go, i.e. the remainder of the total cost from any point on the horizon to its end, assuming a particular sequence of inputs. In the case of the optimal sequence of inputs we refer to the corresponding cost-to-go as the value function. The value function figures on the left-hand side of both equations where is expressed recursively.

This is high level overview which without additional context might be incomprehensible. Hopefully the following subsections will provide the necessary context in the form of rigorous derivations and later also applications. Especially the derivations are likely to be sparse on text, so always feel free to jump back.

To emphasize the relevance of Bellman's principle in present day research a mention of the differential dynamic programming algorithm is called for. While the original algorithm was introduces in [Mayne1966] its extensions are vibrant area of contemporary research.