Nonlinear Trajectory Optimization Problems

Unconstrained Problem

The unconstrained nonlinear trajectory optimization problem is most commonly solved in the discrete-time form $x_{0 : N}, u_{0 : N - 1} min s.t. Φ (x_{N}) + k = 0 \sum N - 1 l (x_{k}, u_{k}, k) x_{k + 1} = f (x_{k}, u_{k}, k), k \in {0, \dots, N - 1} x_{0} = \overset{x}{^}_{0}$ is certainly more typical than its continuous-time counterpart. We will cover direct transcription and iterative LQR (iLQR) / differential dynamic programming (DDP) as means of solving this problem.

Direct Transcription

Direct transcription simply solves the problem as any nonlinear program, typically using sequential quadratic programming (SDP). In SDP we start by initializing the algorithm with a nominal trajectory $\overset{x}{ˉ}_{0 : N}$ , $\overset{u}{ˉ}_{0 : N - 1}$ (important if the problem is non-convex), around which we form a QP that locally approximates the original problem: $δ x_{0 : N} δ u_{0 : N - 1} min s.t. Φ_{x_{N}}^{⊤} δ x_{N} + \frac{1}{2} δ x_{N}^{⊤} Φ_{x x_{N}} δ x_{N} + k = 0 \sum N - 1 ([l_{x_{k}} l_{u_{k}}]^{⊤} [δ x_{k} δ u_{k}] + \frac{1}{2} [δ x_{k} δ u_{k}]^{⊤} [l_{x x_{k}} l_{u u_{k}} l_{x u_{k}} l_{u u_{k}}] [δ x_{k} δ u_{k}]) δ x_{k + 1} = f_{x_{k}} δ x_{k} + f_{u_{k}} δ u_{k} + f_{k} - \overset{x}{ˉ}_{k + 1}, \forall k \in {0, \dots, N - 1} δ x_{0} = \overset{x}{^}_{0} - \overset{x}{ˉ}_{0},$ Its terms are obtained using first and second order Taylor expansions of the system’s dynamics and cost functions, evaluated at $\overset{x}{ˉ}_{0 : N}$ , $\overset{u}{ˉ}_{0 : N - 1}$ . The solution $δ x_{0 : N}$ , $δ u_{0 : N - 1}$ represents a search direction that is already scaled bases on the local curvature of the problem. Direct transcription faces challenges common to general nonlinear programming (NLP) in general.

The quadratic approximation is non-convex
Taking the full step does not improve the objective function
Constraints are not satisfied after updating the trajectory

As a resource for suitable strategies I generally point to Nocedal2006.

iLQR & DDP

An entirely different philosophy for solving this problem is followed in the closely linked iLQR and DDP algorithms. Both algorithms have a two pass structure. First in the backward pass quadratic approximations of the Bellman equation $V (x_{k}, k) = u_{k} min (l (x_{k}, u_{k}, k) + V (f (x_{k}, u_{k}, k), k + 1))$ are solved at each timestep along the receding horizon. This not only produces a local approximation of value function but also produces a policy update $δ u_{k} (δ x_{k}) = d_{k} + K_{k} δ x_{k}$ as a by-product. In the forward pass the updated controlled policy is applied while simulate the system’s behavior forwards in time: $x_{0} u_{k} x_{k + 1} = \overset{x}{^}_{0} = \overset{u}{ˉ}_{k} + d_{k} + K_{k} (x_{k} - \overset{x}{ˉ}_{k}) = f (x_{k}, u_{k}, k) .$

Backward Pass

$V_{k} + V_{x_{k}}^{⊤} δ x_{k} + \frac{1}{2} δ x_{k}^{⊤} V_{x x_{k}} δ x_{k} = δ u_{k} min (Q_{k} + [Q_{x_{k}} Q_{u_{k}}]^{⊤} [δ x_{k} δ u_{k}] + \frac{1}{2} [δ x_{k} δ u_{k}]^{⊤} [Q_{x x_{k}} Q_{u x_{k}} Q_{x u_{k}} Q_{u u_{k}}] [δ x_{k} δ u_{k}])$ where $Q_{x_{k}} Q_{u_{k}} Q_{x x_{k}} Q_{u u_{k}} Q_{u x_{k}} = l_{x_{k}} + f_{x_{k}}^{⊤} V_{x_{k + 1}} = l_{u_{k}} + f_{u_{k}}^{⊤} V_{x_{k + 1}} = l_{x x_{k}} + f_{x_{k}}^{⊤} V_{x x_{k + 1}} f_{x_{k}} + V_{x_{k + 1}} \cdot f_{x x_{k}} = l_{u u_{k}} + f_{u_{k}}^{⊤} V_{x x_{k + 1}} f_{u_{k}} + V_{x_{k + 1}} \cdot f_{u u_{k}} = l_{u x_{k}} + f_{u_{k}}^{⊤} V_{x x_{k + 1}} f_{x_{k}} + V_{x_{k + 1}} \cdot f_{u x_{k}}$ with $δ u_{k} (δ x_{k}) = d_{k} - Q_{u u_{k}}^{- 1} Q_{u_{k}} K_{k} - Q_{u u_{k}}^{- 1} Q_{u x_{k}} δ x_{k} .$

Unconstrained Continuous-Time Problem

Let us now tackle the continuous-time problem $x (t), u (t) min s.t. Φ (x (T)) + \int_{t = 0}^{T} l (x (t), u (t), t) \overset{x}{˙} (t) = f (x (t), u (t), t), t \in [0, T] x (0) = \overset{x}{^}_{0}$ As optimization of continuous variables is rather impractical, we generally need to discretize the problem at some point. This is often done even before stating the trajectory optimization problem. For instance, we may discretize $\overset{x}{˙} = f (t, x, u)$ using Runge-Kutta methods (with a zero-order hold on the input $u$ ) as : $x_{n + 1} = x_{n} + i = 1 \sum s b_{i} k_{i, n}, k_{i, n} = f (t_{n} + c_{i} h, x_{n} + h j = 1 \sum s a_{ij} k_{n, j}, u_{n})$ resulting in standard discrete-time dynamics. This is typically done using explicit schemes $0 c_{2} ⋮ c_{s} 0 a_{21} ⋮ a_{s 1} b_{1} ⋱ \dots \dots a_{s, s - 1} b_{s - 1} b_{s}$ Then, approximating the integral with a sum we get the discrete-time trajectory optimization problem that can be solved using direct transcription or iLQR/DDP.

Alternatively, we can aim for a higher degree of accuracy (and improved stability) by using implicit RK schemes $c_{1} c_{2} ⋮ c_{s} 0 a_{11} a_{21} ⋮ a_{s 1} b_{1} a_{12} a_{22} ⋮ a_{s 2} b_{2} \dots \dots ⋱ \dots \dots a_{1 s} a_{2 s} ⋮ a_{s, s} b_{s}$ In standard forward simulation this comes with the disadvantage of having to solve a set of nonlinear equations to acquire $k_{i}$ at each step. However, as we are already solving a nonlinear optimization problem, we may directly incorporate these equations into its constraints.

Direct Collocation

This exactly is utilized by direct collocation methods. Here we typically reserve the symbol $k$ for timesteps and use $\overset{x}{˙}_{i, k}$ and $x_{i, k}$ as a substitute for $k_{i, n}$ and $x_{k} + h \sum_{j = 1}^{s} a_{ij} \overset{x}{˙}_{j, k}$ . This is in-line with the idea of fitting a polynomial that approximates the value of $x (t)$ on each interval which behind the formulation of many implicit RK methods. Additionally, control inputs $u (t)$ , $t \in [t_{k}, t_{k + 1}]$ take the form of polynomial functions: $Π (τ; u_{1 : r, k}) = j = 1 \sum r σ_{j} (τ) u_{j, k}, τ \in [0, 1],$ where $σ_{j} (τ)$ are polynomial bases, however the running cost is for convenience evaluated only at the end-points of timesteps in most cases. The overall problem can then be stated as $x_{0 : N} u_{1 : r, 0 : N - 1} x_{1 : s, 0 : N - 1} \overset{x}{˙}_{1 : s, 0 : N - 1} min s.t. Φ (x_{N}) + k = 0 \sum N - 1 l (x_{k}, u_{k}, k) x_{k + 1} = x_{k} + h \sum_{i = 1}^{s} b_{i} \overset{x}{˙}_{i, k}, \overset{x}{˙}_{i, k} = f (t_{k} + c_{i} h, x_{i, k}, Π (c_{i}; u_{1 : r, k})), x_{i, k} = x_{k} + h \sum_{j = 1}^{s} a_{ij} \overset{x}{˙}_{j, k}, x_{0} = \overset{x}{^}_{0} . n \in {0, \dots, N - 1} {i, k} \in {1, \dots, s} \times {0, \dots, N - 1} {i, k} \in {1, \dots, s} \times {0, \dots, N - 1}$ This problem is then typically solved again using SQP, similarly to direct collocation.

Constrained Problems

Similarly to how we linearized the system’s dynamics it is common to also linearize additional constraints, e.g. $g (x_{k}, u_{k}, k) h (x_{k}, u_{k}, k) \leq 0 = 0.$ to fit within SQP. However, for some types of constraints, such as second-order cone constraints, there are specialized ways in which they can be effectively handled (e.g. ADMM).

Due to the special structure of the trajectory optimization problem, many specialized solvers capable of handling additional constraints exist. As nonlinear MPC is still extremely computationally demanding, these solvers are often tailored to handle specific types of constraints, also making deliberate choices when it comes to how the algorithms converge (Altro, Crocoddyl, MJPC).

Keyboard shortcuts

Optimal and Predictive Control