Hamilton-Jacobi-Bellman Equation

Let us assume we are trying to minimize the total cost $J (x (t_{0}), u (τ), t_{0}) = Φ (x (t_{f})) + \int_{t_{0}}^{t_{f}} l (x (τ), u (τ), τ) d τ .$ of a continuous-time system's trajectory $x (τ)$ , $τ \in ⟨ t_{0}, t_{f} ⟩$ with dynamics in the form $\dot{x} (t) = f (x (t), u (t), t),$ starting from the state $x (t_{0}) = \tilde{x}_{0}$ .

The concept of the total cost can be generalized for any $t \in ⟨ t_{0}, t_{f} ⟩$ to a cost-to-go $J (x (t), u (τ), t) = Φ (x (t_{f})) + \int_{t}^{t_{f}} l (x (τ), u (τ), τ) d τ,$ for which we may define a value function $V (x (t), t) = u (τ) min J (x (t), u (τ), t),$ and optimal control policy $u^{*} (τ) = u (τ) arg min J (x (t), u (τ), t), τ \in ⟨ t, t_{f})$ the application of which results in the system following an optimal trajectory $x^{*} (τ)$ , $τ \in (t, t_{f} ⟩$ .

For the previously defined value function the Hamilton-Jacobi-Bellman (HJB) equation can be derived¹ as $- \frac{\partial V}{\partial t} (x (t), t) = u (t) min (l (x (t), u (t), t) + (\frac{\partial V}{\partial x})^{⊤} (x (t), t) f (x (t), u (t), t)) .$

An overview of the derivation is presented by Steven Brunton in one of his videos [Brunton2022-HJB] also available here. As a note, there is a small mistake, acknowledged by the presenter in the comments, at 9:11 of the video where " $0$ " should be replaced with " $t$ ".

Optimal and Predictive Control

Hamilton-Jacobi-Bellman Equation