These are notes for tutorials of the subject "Optimal and Predictive Control System" taught at the Faculty of Mechanical Engineering at the Czech Technical University in Prague.
Dating back to the 1950s optimal control is an every evolving field currently fueled by the incredible amounts of computational power at our disposal. Although in this course we only cover the fundamentals as applied to linear system which can be successfully implemented on microcontrollers, cutting-edge approaches push hardware to its limits, allowing for the control of walking robots, autonomous vehicles, etc.
The materials are loosely divided into three blocks. We cover the basics of numerical optimization which we will become relevant for optimal control only at the very end of the course. Regardless we will attempt to provide examples that demonstrate their utility as we go. Then we will move on to the main content of the course. Starting from the quite abstract concept of the Bellman principle we cover the topic of the linear-quadratic regulator (LQR) from which we will transition to model predictive control (MPC).
We will apply each control strategy to the system of a bi-rotor, i.e. a quad-rotor simplified into the horizontal plane. As supplementary materials we will show how its dynamic model can be derived.
DISCLAIMER: These notes have not gone through a peer review process and are likely to contain (hopefully only minor) mistakes.
The vast majority of numerical optimization problems can be written in the form where are optimized variables, is the objective function, are inequality constraints and are equality constraints. Here we assume that , , and lie in and that constraints and define a non-empty set of feasible solutions.
Based on the properties of , , and , different approaches can be taken to solve the optimization problem. For example, if the objective function is convex and the constraint are affine the problem has a unique solution (optimal value of the objective function) that can be found using gradient descend. In this course we will focus on this type of problems which are classified as convex.
We will start with a general overview of Lagrangian Duality and KKT conditions which are applicable to both convex and non-convex optimization problems. Then we will describe two specific types of convex optimization problems: linear programming (LP) and quadratic programming (QP).
Lagrangian Duality
In the following section we will derive the so-called dual problem of a (primal) optimization problem, following the derivation demonstrated in a series of short lectures which accompany the book [Bierlaire2018]. Possibly the most important property of the dual problem (from a practical point of view) is that it is convex even if the primal optimization problem is not [Boyd2004].
Let us consider a constrained optimization problem in the form to which we will refer to as the primal problem. With regard to this problem, let us define the Lagrangian where and are Lagrange multipliers, and the dual function
Theorem
Let be the solution to the primal problem. If , then
Proof
(For any and , I can find an that minimizes the Lagrangian as well as or better than by violating the constraints.)
and
(Because if and then )
Corollary
Let be the solution to the primal problem and feasible w.r.t its constraints. If and , , then
Dual problem
As was proven provides a lower bound on the solution of the primal problem when . The dual problem can then be regarded as maximing this lower bound by penalizing violations of the primal problem's constraints using the Lagrange multipliers:
The second condition states that we consider on and such that the dual function is bounded. This is necessary for the problem to be well posed.
(Weak duality) theorem
If is the solution to the primal problem and is the solution to the dual problem, then
Corollary
If one problem is unbounded the other is infeasible.
Corollary
If such that they are optimal.
Karuch-Kuhn-Tucker Conditions
The Karuch-Kuhn-Tucker (KKT) conditions are first order necessary conditions for finding the solution of an optimization problem in the form Furthermore, they are also sufficient conditions for convex problems, i.e. those where , are convex and is affine (linear) [Boyd2004].
A solution satisfying these conditions gives us not only but also the Lagrange multipliers and present in the Lagrangian
The conditions are as follows:
-
stationarity (The linear combination of the constraints' gradients has to be equal to the objective function's gradient.)
-
primal feasibility (Constraints of the stated optimization problem have to be satisfied.)
-
dual feasibility (For a given , the columns of define a polyhedral cone in which must lie.)
-
complementary slackness (If lies inside the feasible set w.r.t to the condition , then and therefore .)
Physical interpretation
The KKT conditions can be intuitively derived though an analogy where is a potential field, for example of gravity of magnetism. Let us first address only the inequality constrained case where the inequality constraints can be regarded as physical barriers. If we then recall the Lagrangian the multipliers and can be viewed as contact forces acting against the influence of the potential field.
From this analogy we can immediately recover the dual feasibility and complementary slackness conditions (4-5) as contact forces can only be positive and do not act at a distance. The stationarity condition (1) (without the term related to the equality constraint) can then be interpreted as the contact forces and the pull of potential field being in equilibrium, which must occur at a local minima. Lastly, we are considering only solutions within the bounds of the inequality constraints, as if we were initially placed within them, satisfying the primal feasibility condition (2).
To incorporate equality constraints into this analogy we can simply reformulate them as likening them to being sandwiched between two surfaces. As we are always in contact there is no need for a complementary slackness condition and due to its bi-directional nature, elements of can be both positive and negative. Therefore we must only include a primal feasibilty condition (2) and add an additional term to the stationarity condition (1).
Linear Programming
Let us consider a constrained optimization problem in the form
As both the objective function and constraints are linear the problem is convex. For this reason KKT conditions are not only necessary but also sufficient if a feasible w.r.t the constraints exists.
KKT conditions
The KKT conditions of this problem can be states as:
Dual problem
The dual problem of the LP problem can be written as where the Lagrangian can be manipulated into the form As we are only looking for bound solutions (last constraint) the condition must be satisfied and therefore When substituted back into the original dual problem, after basic manipulations, we attain
Quadratic Programming
Let us consider a constrained optimization problem in the form where .
As the objective function is quadratic and the constraints linear the problem is convex. For this reason KKT conditions are not only necessary but also sufficient if a feasible w.r.t the constraints exists.
KKT Conditions
The KKT conditions of this problem can be states as:
Dual problem
The dual problem of the QP problem can be written as where the Lagrangian can be manipulated into the form Its critical point can be found by solving which yields When substituted back into the original dual problem, after basic manipulations, we attain
According to [Bellman1966] "an optimal policy has the property that, whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." When applied recursively, this means that decisions at all states of an optimal trajectory must appear to be optimal when considering only the future behavior of the system. While the statement might sound rather vague, Bellman's principle is the foundation of optimal control, applicable to both continuous-time and discrete-time systems and trajectories analyzed not only on finite but also infinite time horizons.
In the following subsections we will overview the Bellman and the Hamilton-Jacobi-Bellman (HJB) equation which are the consequence of Bellman's principle when applied to the discrete-time and continuous-time optimal control problems. Regardless of the representation, the idea is find a sequence of control inputs that minimize the total cost of a trajectory, defined on a horizon. Instead of looking for the entire sequence at once, the Bellman principle allows us to break it down into a sequence of problems, described by the aforementioned equations. For their derivation we introduce the concept of the cost-to-go, i.e. the remainder of the total cost from any point on the horizon to its end, assuming a particular sequence of inputs. In the case of the optimal sequence of inputs we refer to the corresponding cost-to-go as the value function. The value function figures on the left-hand side of both equations where is expressed recursively.
This is high level overview which without additional context might be incomprehensible. Hopefully the following subsections will provide the necessary context in the form of rigorous derivations and later also applications. Especially the derivations are likely to be sparse on text, so always feel free to jump back.
To emphasize the relevance of Bellman's principle in present day research a mention of the differential dynamic programming algorithm is called for. While the original algorithm was introduces in [Mayne1966] its extensions are vibrant area of contemporary research.
Bellman Equation
For a discrete-time linear system, with dynamics in the form let us assume we are trying to minimize the total cost of a trajectory where the initial state is prescribed. The concept of the total cost can be generalized for any to a cost-to-go for which we may define a value function and an optimal control policy the application of which results in the system following the optimal sequence of states .
The so-called Bellman equation can then be derived by formulating the value function for step recursively, using the value function for step . From its definition, it is apparent that where Assuming we don't know what the optimal input at step is, we may pose the right-hand side of (1) as an optimization problem Equation (2) is then referred to as the Bellman equation.
Hamilton-Jacobi-Bellman Equation
Let us assume we are trying to minimize the total cost of a continuous-time system's trajectory , with dynamics in the form starting from the state .
The concept of the total cost can be generalized for any to a cost-to-go for which we may define a value function and optimal control policy the application of which results in the system following an optimal trajectory , .
For the previously defined value function the Hamilton-Jacobi-Bellman (HJB) equation can be derived1 as
An overview of the derivation is presented by Steven Brunton in one of his videos [Brunton2022-HJB] also available here. As a note, there is a small mistake, acknowledged by the presenter in the comments, at 9:11 of the video where "" should be replaced with "".
Linear-Quadratic Regulator
The linear-quadratic regulator gets its name from the linear model of the system and the quadratic cost function used for its design. For these, the optimal controller can be derived analytically and takes the form of linear state feedback making its implementation particularly simple.
The systems can be described in both the discrete-time and continuous-time domain and particular variants of the LQR allow for time-variant systems. Cost functions must then reflect the type of the system. For time-invariant systems, the cost function can either take the form of a time-invariant running cost integrated over an infinite horizon, or a sum of a final cost and a running cost (optionally time-variant) integrated over a finite horizon. If the system is time-variant, only the second option is available.
In the following sections we will cover four variants based on the time-domain and horizon
- discrete-time x continuous-time,
- finite-horizon x infinite-horizon.
Many more variants exist including those for reference tracking, dead-beat control, and even nonlinear trajectory optimization.
Discrete-Time Infinite-horizon Linear-Quadratic Regulator
For a linear time-invariant discrete-time system and a quadratic total cost of its trajectory the optimal controller can be derived based on the assumption that the value function on the infinite horizon takes the time-invariant form
When substituted into the Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)
The input can then be substitued back into (1). As the equation must hold for all , through basic manipulations we then attain the discrete-time algebraic Riccati equation (DARE)
Discrete-Time Finite-horizon Linear-Quadratic Regulator
For a linear time-variant discrete-time system and a quadratic total cost of its trajectory the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)
The input can then be substitued back into (1). As the equation must hold for all , through basic manipulations we then attain the discrete-time dynamic Riccati equation (DDRE) for a finite horizon .
Continuous-Time Infinite-Horizon Linear-Quadratic Regulator
For a linear time-invariant continuous-time system and a quadratic total cost where and , of its trajectory , the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Hamilton-Jacobi-Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)
The input can then be substituted back into (1). As the equation must hold for all , through basic manipulations we then attain the continuous-time algebraic Riccati equation (CARE)
Continuous-Time Infinite-Horizon Linear-Quadratic Regulator
For a linear time-variant continuous-time system and a quadratic total cost where , , and , of its trajectory , the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Hamilton-Jacobi-Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)
The input can then be substituted back into (1). As the equation must hold for all , through basic manipulations we then attain the continuous-time differential Riccati equation (CDRE) for a finite horizon . Supplemented with the boundary conditon , the CDRE forms a initial value problem (IVP) which can be solved using numerical integration. Numerical errors in the integration process may lead to the loss of positive-semi-definiteness. To overcome this, instead of integrating directly, we may use its factorized form a.k.a. the "square-root form" where As must be invertible must be (at least numerically) positive definite. Consequenlty we may use the cholesky factorization in order to form the boundary condition
Linear-Quadratic MPC
Model predictive control (MPC) is a control strategy which relies on the system's model to predict and optimize its trajectory online, periodically re-optimizing the trajectory as a form of feedback. If a linear representation of the system and a quadratic running and final cost are used, it is essentially an extension of the finite-horizon LQR that includes control in state inputs. This is in fact what is commonly referred to as MPC. If the system's dynamics are nonlinear or the cost functions contain terms that are not linear nor quadratic, the strategy is referred to as nonlinear MPC.
In this course we will limit ourselves only to the former case which can be transcribed into a QP problem. We will go through two formulations of this problem based on how the system's dynamics are incorporated into the optimization problem:
- explicitly constrained - both states and inputs are the decision variables,
- implicitly constrained - only inputs are decision variables.
Direct Linear-Quadratic MPC
The standard realization of an MPC controller is such that solves the optimization problem where the initial state is given1.
Canonical QP form
Optimized variables
Objective function's terms
Constraint's terms
If the dynamics represent the linearized dynamics of a nonlinear system (i.e. and are in-fact and ) the value of must be considered in the choice of limits and .
Controllability/Reachability
Controllability and reachability are defined for discrete-time systems as:
- A system is controllable if from any state one can achieve the state using a series of inputs .
- A state is reachable if it can be achieved by applying a series of inputs when starting from the initial state .
For continuous-time systems they are similarly defined as:
- A system is controllable if from any state one can achieve the state using a control policy , .
- A state is reachable if it can be achieved by applying a control policy , when starting from the initial state .
For linear systems all states are reachable if the system is controllable.
Controlability of LTI Discrete-Time Systems
The first states of a linear time-invariant system assuming an initial state and a series of inputs , can be expressed as For square matrices that satisfy their own characteristic equation the Cayley-Hamilton theorem states that for we may express as a linear combination of the lower matrix powers of : Therefore, the state can be rewritten as which can be manipulated into the form where is the controllability matrix.
The same substitution can be applied also for subsequent timesteps up-to infinity, changing only the particular form of the vector of inputs' linear combinations. This has two consequences:
- All states reachable in steps are also reachable in steps (with unlimited inputs).
- If is rank deficient, some directions in the state-space cannot be effected by the inputs and therefore the system is uncontrollable.
Controlability of LTI Continuous-Time Systems
The state of a linear time-invariant system at time , starting from the initial state and influenced by a continuous input , can be expressed as For square matrices that satisfy their own characteristic equation the Cayley-Hamilton theorem has a consequence that the inifinite series can be expressed using a finite number of terms After substituting it back into the second term of (1) we attain which can be further manipulated into the form where is the controllability matrix. If is rank deficient, some directions in the state-space cannot be effected by the inputs and therefore the system is uncontrollable.
Discretization of LTI Systems
Let us consider a linear time-invariant system in the form
Integration of the system's dynamics over an iterval gives us To express its state at with respect to the state we may perform a series of manipulations If we then set , the integral can be manipulated into By substituting (2) into (1) we obtain where , , and .
If we use the first two terms of the taylor expansion in (3) we get the system representation
Optimal Luenberger Observer
Finite horizon
Let us consider a LTI system with additive Gaussian noise and a Luenberger observer The state estimate's error and its covariance can be expressed recursively as The gain can then be designed such that is minimized. This can be achieved by finding the stationary point , where Substituting (3) back into (2b) we attain the DDRE which governs the evolution of the state error's covariance.
The state of the observer then consists of and , its dynamics described by (1) and (4). This observer can also be used in the case where the dynamics of the system are time variant.
Infinite horizon
In the case where the covariances of the process and measurement noise are time-invariant, i.e. we can also design the gain of the observer based on the steady-state error covariance at . The covariance can be obtained by solving the DARE based on which we evaluate the gain
Duality with LQR
It is of note that the form of (4) and (5) is identical to those of the LQR with different matrices:
LQR | Luenberger |
---|---|
Kalman Filter
Let us consider a LTI system with additive Gaussian noise In addition to the state estimate the state of the Kalman Filter includes the error covariance where is the state estimate's error.
The Kalman filter estimates the state in two steps, first creating an a priori state estimate and error covariance where and then performing a correction based on the measurement and Kalman gain , to attain the a posteriori state estimate and error covariance as The Kalman gain is designed to minimize the a posteriori error covariance by finding the stationary point of (1b) After substituting (2) into (1b) we can manipulate the equation into the "Joseph" form or alternatively the simplified form which is numerically less stable.
State-Space Representation of Mechanical Systems
Equations of Motion
Let us consider manipulator equations in the form who's left-hand side may be for example derived from the system's kinetic and potential energy expressed in generalized coordinates by expanding Lagrange equations of the second kind where . First by separating the Lagrangian into its components and then applying the change rule If the external forces are linearly dependent on inputs , such that , the terms of (1) as derived by this approach are
Continuous-time dynamics
Most commonly used state-space representation of a mechanical system in continuous time is attained by concatenating and to form the system's state
Linearization
The system's dynamics can be linearized by performing a Taylor expansion around a point which yields where also .
If is a fixed point, i.e. , first order partial derivatives of the system's dynamics simplify to as terms involving drop out because for all fixed points. Partial derivatives of also disappear as all of its terms contain second degree products of velocities and all velocities must be equal to zero at a fixed point.
Discretization via integration using the explicit Euler scheme
The most basic approach with which we may discretize continuous-time dynamics of a nonlinear system is by integrating the systems state with a timestep of using the explicit Euler scheme
Linearization
Compared to more advanced integration methods, linearization of these discrete time dynamics around a point is trivial:
Cart-Pole Equations of Motion
\
/\
/ \
/ \
l \
/ >< m_p
/ //
/ //
\/ //
\ //
\ //
| |------\--//------|
| | \//\ |
g | m_c ( ) theta |
| | |_/ |
v |========|========|
_________ooo___|___ooo_________
|
//|-----s----->|
For generalized coordinates the system's kinetic and potential energy are where The individual terms of the manipulator equations1 are then assuming a single input is acting in the direction of .
Linearization in the upright configuration
From the equations we may see that for the state and input vectors all points are stationary. In this configuration the relevant terms for the system's linearization are
The python script for the derivation of the following terms can be found here. A very similar form of these equations of motion can also be found in section 3.2.1 of [Underactuated2023].
Bi-Rotor Equations of Motion
As a bi-rotor (quadcopter simplified to 2D) is essentially a rigid body, its equations of motion can be easily derived as
Linearization in a hovering configuration
From the EoM we may see that for the state description all points are stationary points, for which the system's linearization is trivial
Derivation in manipulator equation form
For those who want to validate the equations by deriving the manipulator equations, kinetic and potential energy for generalized coordinates are which should yield The manipulation matrix can then be derived separately by evaluating thrust vectors and differentiating moment arms.
Optimization Modeling Languages
CVXPY
"Open source Python-embedded modeling language for convex optimization problems"
Pyomo
"Python-based, open-source optimization modeling language with a diverse set of optimization capabilities"
JuMP
"JuMP is a modeling language and collection of supporting packages for mathematical optimization in Julia"
Solvers
OSQP
"Numerical optimization package for solving convex quadratic programs"
- Uses the alternating direction method of multipliers (ADMM)
IPOPT
"Open source software package for large-scale nonlinear optimization"
- Uses the interior point (IP) method