Bellman Equation

Let us assume we are trying to minimize the total cost of a discrete-time system's trajectory with dynamics in the form starting from the state .

The concept of the total cost can be generalized for any to a cost-to-go for which we may define a value function and an optimal control policy the application of which results in the system following an optimal trajectory .

The so-called Bellman equation can then be derived by formulating the value function for step recursively (using the value function for step ) as and substituting using the system's dynamics to attain the final form