These are notes for the tutorials of the subject "Optimal and Predictive Control System" taught at the Faculty of Mechanical Engineering at the Czech Technical University in Prague.

The materials are loosely divided into three blocks. The first focuses on the classical approaches in optimal control, namely the four basic variants of the linear quadratic regulator (LQR) which are derived from the Bellman principle. The second goes over the basics of numerical optimization building up-to the third and final block focusing on model predictive control (MPC).

DISCLAIMER: These notes have not gone through a peer review process and are likely to contain (hopefully only minor) mistakes.

According to [Bellman1966] "an optimal policy has the property that, whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision." When applied recursively, this means that decisions at all states of an optimal trajectory must appear to be optimal when considering only the future behavior of the system. While the statement might sound rather vague, Bellman's principle is the foundation of optimal control, applicable to both continuous-time and discrete-time systems and trajectories analyzed not only on finite but also infinite time horizons.

In the following subsections we will overview the Bellman and the Hamilton-Jacobi-Bellman (HJB) equation which are the consequence of Bellman's principle being applied to the discrete-time and continuous-time optimal control problems. Later we will use both the Bellman and HJB equation to derive variants of the linear-quadratic regulator (LQR), a staple in optimal control.

To emphasize the relevance of Bellman's principle in present day research a mention of the differential dynamic programming algorithm is called for. While the original algorithm was introduces in [Mayne1966] its extensions are the product of research in the current decade.

Bellman Equation

Let us assume we are trying to minimize the total cost of a discrete-time system's trajectory with dynamics in the form starting from the state .

The concept of the total cost can be generalized for any to a cost-to-go for which we may define a value function and an optimal control policy the application of which results in the system following an optimal trajectory .

The so-called Bellman equation can then be derived by formulating the value function for step recursively (using the value function for step ) as and substituting using the system's dynamics to attain the final form

Hamilton-Jacobi-Bellman Equation

Let us assume we are trying to minimize the total cost of a continuous-time system's trajectory , with dynamics in the form starting from the state .

The concept of the total cost can be generalized for any to a cost-to-go for which we may define a value function and optimal control policy the application of which results in the system following an optimal trajectory , .

For the previously defined value function the Hamilton-Jacobi-Bellman (HJB) equation can be derived1 as


1

An overview of the derivation is presented by Steven Brunton in one of his videos [Brunton2022-HJB] also available here. As a note, there is a small mistake, acknowledged by the presenter in the comments, at 9:11 of the video where "" should be replaced with "".

Discrete-Time Linear-Quadratic Regulator

Infinite horizon

For a linear time-invariant discrete-time system and a quadratic total cost of its trajectory the optimal controller can be derived based on the assumption that the value function takes the form

When substituted into the Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)

The input can then be substitued back into (1). As the equation must hold for all , through basic manipulations we then attain the discrete-time algebraic Riccati equation (DARE)

Finite horizon

For a linear time-invariant discrete-time system and a quadratic total cost of its trajectory the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Bellman Equation along with the system's dynamics we attain To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)

The input can then be substitued back into (1). As the equation must hold for all , through basic manipulations we then attain the discrete-time dynamic Riccati equation (DDRE) for a finite horizon .

Continuous-Time Linear-Quadratic Regulator

Infinite horizon

For a linear time-invariant continuous-time system and a quadratic total cost where and , of its trajectory , the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Hamilton-Jacobi-Bellman Equation along with the system's dynamics we attain where and . To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)

The input can then be substituted back into (1). As the equation must hold for all , through basic manipulations we then attain the continuous-time algebraic Riccati equation (CARE)

Finite horizon

For a linear time-invariant continuous-time system and a quadratic total cost where , , and , of its trajectory , the optimal controller can be derived based on the assumption that the value function takes the form When substituted into the Hamilton-Jacobi-Bellman Equation along with the system's dynamics we attain where and . To find the minimum, we may take the gradient of its argument (which is by design quadratic and convex) with respect to , set it to zero and find the solution (optimal control input)

The input can then be substituted back into (1). As the equation must hold for all , through basic manipulations we then attain the continuous-time differential Riccati equation (CDRE) for a finite horizon . Supplemented with the boundary conditon , the CDRE forms a initial value problem (IVP) which can be solved using numerical integration. Numerical errors in the integration process may lead to the loss of positive-semi-definiteness. To overcome this, instead of integrating directly, we may use its factorized form a.k.a. the "square-root form" where As must be invertible must be (at least numerically) positive definite. Consequenlty we may use the cholesky factorization in order to form the boundary condition

Lagrangian Duality

In the following section we will derive the so-called dual problem of a (primal) optimization problem, following the derivation demonstrated in a series of short lectures which accompany the book [Bierlaire2018]. Possibly the most important property of the dual problem (from a practical point of view) is that it is convex even if the primal optimization problem is not [Boyd2004].

Let us consider a constrained optimization problem in the form to which we will refer to as the primal problem. With regard to this problem, let us define the Lagrangian where and are Lagrange multipliers, and the dual function

Theorem

Let be the solution to the primal problem. If , then

Proof

(For any and , I can find an that minimizes the Lagrangian as well as or better than by violating the constraints.)

and

(Because if and then )

Corollary

Let be the solution to the primal problem and feasible w.r.t its constraints. If and , , then

Dual problem

As was proven provides a lower bound on the solution of the primal problem when . The dual problem can then be regarded as maximing this lower bound by penalizing violations of the primal problem's constraints using the Lagrange multipliers:

The second condition states that we consider on and such that the dual function is bounded. This is necessary for the problem to be well posed.

(Weak duality) theorem

If is the solution to the primal problem and is the solution to the dual problem, then

Corollary

If one problem is unbounded the other is infeasible.

Corollary

If such that they are optimal.

Karuch-Kuhn-Tucker Conditions

The Karuch-Kuhn-Tucker (KKT) conditions are first order necessary conditions for finding the solution of an optimization problem in the form Furthermore, they are also sufficient conditions for convex problems, i.e. those where , are convex and is affine (linear) [Boyd2004].

A solution satisfying these conditions gives us not only but also the Lagrange multipliers and present in the Lagrangian

The conditions are as follows:

  • stationarity (The linear combination of the constraints' gradients has to be equal to the objective function's gradient.)

  • primal feasibility (Constraints of the stated optimization problem have to be satisfied.)

  • dual feasibility (For a given , the columns of define a polyhedral cone in which must lie.)

  • complementary slackness (If lies inside the feasible set w.r.t to the condition , then and therefore .)

Physical interpretation

The KKT conditions can be intuitively derived though an analogy where is a potential field, for example of gravity of magnetism. Let us first address only the inequality constrained case where the inequality constraints can be regarded as physical barriers. If we then recall the Lagrangian the multipliers and can be viewed as contact forces acting against the influence of the potential field.

From this analogy we can immediately recover the dual feasibility and complementary slackness conditions (4-5) as contact forces can only be positive and do not act at a distance. The stationarity condition (1) (without the term related to the equality constraint) can then be interpreted as the contact forces and the pull of potential field being in equilibrium, which must occur at a local minima. Lastly, we are considering only solutions within the bounds of the inequality constraints, as if we were initially placed within them, satisfying the primal feasibility condition (2).

To incorporate equality constraints into this analogy we can simply reformulate them as likening them to being sandwiched between two surfaces. As we are always in contact there is no need for a complementary slackness condition and due to its bi-directional nature, elements of can be both positive and negative. Therefore we must only include a primal feasibilty condition (2) and add an additional term to the stationarity condition (1).

Linear Programming

Let us consider a constrained optimization problem in the form

As both the objective function and constraints are linear the problem is convex. For this reason KKT conditions are not only necessary but also sufficient if a feasible w.r.t the constraints exists.

KKT conditions

The KKT conditions of this problem can be states as:

Dual problem

The dual problem of the LP problem can be written as where the Lagrangian can be manipulated into the form As we are only looking for bound solutions (last constraint) the condition must be satisfied and therefore When substituted back into the original dual problem, after basic manipulations, we attain

LP Problem Examples

Car importer

A car importer is planning to import two models of JDM cars. In one shipment he can transport 60 cars and investment of 50 million CZK. The purchase price of the two models is 2.5 million and 0.5 million CZK. He then expects a profit of 250 thousand CZK per unit sold of the first model and 75 thousand CZK of the second model. How many units of each model should he buy to maximize his profit, expecting every car will be sold?

Mathematical model

Code

Flow optimization

We have a network (represented as directed graph). The graph has a "Source" of a medium (it can be for example gas, water or electricity) and Sink with additional nodes in between. All edges between source and sink has defined maximal capacity.

graph LR;
    A["Source"];
		D["Sink"];
		A-->|"10"|B1;
    A-->|"8"|B2;
    A-->|"5"|C1;
    B1-->|"6"|C1;
		B1-->|"2"|C2;
		B2-->|"4"|C1;
		B2-->|"5"|C2;
		C1-->|"8"|D;
		C2-->|"9"|D;

What is the maximum amount medium we can transmit through the sink?

Mathematical model

Let us first define our optimization variables as individual flows through the edges of our graph

graph LR;
    A["Source"];
		D["Sink"];
		A-->|"x₂ ≤ 10"|B1;
    A-->|"x₃ ≤ 8"|B2;
    A-->|"x₁ ≤ 5"|C1;
    B1-->|"x₄ ≤ 6"|C1;
		B1-->|"x₅ ≤ 2"|C2;
		B2-->|"x₆ ≤ 4"|C1;
		B2-->|"x₇ ≤ 5"|C2;
		C1-->|"x₈ ≤ 8"|D;
		C2-->|"x₉ ≤ 9"|D;

we may then write the problem as

where

Canonical form

Manipulating the objective into the canonical form can be done simply with To deal with the constraints let us first formulate the equality constraints in matrix form as: where Equivalently we may write it as Together with the upper bounds they can be now rewritten into the canonical form

Code

Quadratic Programming

Let us consider a constrained optimization problem in the form where .

As the objective function is quadratic and the constraints linear the problem is convex. For this reason KKT conditions are not only necessary but also sufficient if a feasible w.r.t the constraints exists.

KKT Conditions

The KKT conditions of this problem can be states as:

Dual problem

The dual problem of the QP problem can be written as where the Lagrangian can be manipulated into the form Its critical point can be found by solving which yields When substituted back into the original dual problem, after basic manipulations, we attain

QP Problem Examples

Circular paraboloid with a single constraint

Let's visualize and solve a very simple QP problem of minimizing and for an objective function in the form of a circular paraboloid and a single linear inequality constraint.

Mathematical model

Canonical form

Code

Economic dispatch

We have three power plants each producing electricity to satisfy the demand of MW. Each of the power plants has a minimum and maximum capacity and and a quadratic cost model associated with its power output . The capacity limits and cost coefficients are provided in the table bellow.

12050.022001000
22540.0153001500
33030.01100800

Mathematical model

OSQP form

Code

Model Predictive Control (MPC)

Optimal control of linear systems on a finite horizon can be more often than not stated as a QP. Formulation of these problems is a topic of future chapters.

NLP Problem Examples

Proportions of a canister

We are tasked with designing the proportions of a 0.5 liter cylindrical canister to minimize its surface area and therefore the amount of material used.

Cylinder's volume and surface are

Mathematical model

As we are dealing with a quadratic objective function and a cubic equality constraint, the python package CVXPY does not have an interface to a suitable solver. Fortunately, the julia package JuMP does.

Tree nursery

We are tasked with planting a new patch of forest to offset logging activity. Because of the wildlife living there, the young trees need to be fences off from the rest of the forest until they mature.

As we will be planting adjacent plots in the future, we have decided upon fencing off a rectangular area. The factor limiting its size is that we only have 100 meters of suitable fencing. What is the largest rectangular area that we can plant?

Mathematical model

QP-like form

The reason why this problem falls under nonlinear programming is that Q is not positive semi-definite and therefore the problem is non-convex.

Code

Linear-Quadratic MPC

Model predictive control (MPC) is a control strategy which relies on the system's model to predict and optimize its trajectory online, applying inputs from the beginning of the predicted horizon. There are many algorithms for trajectory optimization, each adapted for a specific use-case. When talking about nonlinear systems in general, main factors determining the applicability of a specific algorithm are if the system is represented in continuous or discrete time and if its dynamics are continuous or hybrid.

In this course we will limit ourselves to the specific case of linear systems1 represented in discrete time with a quadratic objective (LQ control problem) and box constraints on the system's inputs 2. This specific control problem can be transcribed into a QP problem with specialized solvers dedicated to finding its solution. We will go through two formulations of this problem:

  • direct - both states and inputs are the decision variables,
  • indirect - only inputs are decision variables.

1

With some slight modification it can also be applied to the control of nonlinear system's near a fixed point.

2

There are no obstacles preventing us from also adding linear inequality constraints to states.

Direct Linear-Quadratic MPC

The standard realization of an MPC controller is such that solves the optimization problem where the initial state is given1.

Canonical QP form

Optimized variables

Objective function's terms

Constraint's terms


1

If the dynamics represent the linearized dynamics of a nonlinear system (i.e. and are in-fact and ) the value of must be considered in the choice of limits and .

Indirect Linear-Quadratic MPC

where

Linear-Quadratic MPC Examples

2-DOF double integrator

Let control a 2-DOF double integrator with an MPC controller with a prediction horizon of one second and quadratic running cost penalizing the state-error and input level.

We will use the first order approximation with a timestep of s to discretize its continuous dynamics

We will formulate the running cost as and place additional constraints on the controllers inputs.

Code

State-Space Representation of Mechanical Systems

Equations of Motion

Let us consider manipulator equations in the form who's left-hand side may be for example derived from the system's kinetic and potential energy expressed in generalized coordinates by expanding Lagrange equations of the second kind where . First by separating the Lagrangian into its components and then applying the change rule If the external forces are linearly dependent on inputs , such that , the terms of (1) as derived by this approach are

Continuous-time dynamics

Most commonly used state-space representation of a mechanical system in continuous time is attained by concatenating and to form the system's state

Linearization

The system's dynamics can be linearized by performing a Taylor expansion around a point which yields where also .

If is a fixed point, i.e. , first order partial derivatives of the system's dynamics simplify to as terms involving drop out because for all fixed points. Partial derivatives of also disappear as all of its terms contain second degree products of velocities and all velocities must be equal to zero at a fixed point.

Discretization via integration using the explicit Euler scheme

The most basic approach with which we may discretize continuous-time dynamics of a nonlinear system is by integrating the systems state with a timestep of using the explicit Euler scheme

Linearization

Compared to more advanced integration methods, linearization of these discrete time dynamics around a point is trivial:

Cart-Pole Equations of Motion

                  \
                  /\
                 /  \
                /    \
               l      \
              /       >< m_p
             /       //
            /       //
          \/       //
           \      //
            \    //
  |   |------\--//------|
  |   |       \//\      |
  g   |  m_c  ( ) theta |
  |   |        |_/      |
  v   |========|========|
_________ooo___|___ooo_________
               |
//|-----s----->|

For generalized coordinates the system's kinetic and potential energy are where The individual terms of the manipulator equations1 are then assuming a single input is acting in the direction of .

Linearization in the upright configuration

From the equations we may see that for the state and input vectors all points are stationary. In this configuration the relevant terms for the system's linearization are


1

The python script for the derivation of the following terms can be found here. A very similar form of these equations of motion can also be found in section 3.2.1 of [Underactuated2023].

Bi-Rotor Equations of Motion

As a bi-rotor (quadcopter simplified to 2D) is essentially a rigid body, its equations of motion can be easily derived as

Linearization in a hovering configuration

From the EoM we may see that for the state description all points are stationary points, for which the system's linearization is trivial

Derivation in manipulator equation form

For those who want to validate the equations by deriving the manipulator equations, kinetic and potential energy for generalized coordinates are which should yield The manipulation matrix can then be derived separately by evaluating thrust vectors and differentiating moment arms.

Controllability/Reachability of an LTI system

Controllability and reachability are defined for discrete-time systems as:

  • A system is controllable if from any state one can achieve the state using a series of inputs .
  • A state is reachable if it can be achieved by applying a series of inputs when starting from the initial state .

For continuous-time systems they are similarly defined as:

  • A system is controllable if from any state one can achieve the state using a control policy , .
  • A state is reachable if it can be achieved by applying a control policy , when starting from the initial state .

For linear systems all states are reachable if the system is controllable.

Discrete time

The first states of a linear time-invariant system assuming an initial state and a series of inputs , can be expressed as For square matrices that satisfy their own characteristic equation the Cayley-Hamilton theorem states that for we may express as a linear combination of the lower matrix powers of : Therefore, the state can be rewritten as which can be manipulated into the form where is the controllability matrix.

The same substitution can be applied also for subsequent timesteps up-to infinity, changing only the particular form of the vector of inputs' linear combinations. This has two consequences:

  • All states reachable in steps are also reachable in steps (with unlimited inputs).
  • If is rank deficient, some directions in the state-space cannot be effected by the inputs and therefore the system is uncontrollable.

Continuous time

The state of a linear time-invariant system at time , starting from the initial state and influenced by a continuous input , can be expressed as For square matrices that satisfy their own characteristic equation the Cayley-Hamilton theorem has a consequence that the inifinite series can be expressed using a finite number of terms After substituting it back into the second term of (1) we attain which can be further manipulated into the form where is the controllability matrix. If is rank deficient, some directions in the state-space cannot be effected by the inputs and therefore the system is uncontrollable.

Optimization Modeling Languages

CVXPY

"Open source Python-embedded modeling language for convex optimization problems"

Pyomo

"Python-based, open-source optimization modeling language with a diverse set of optimization capabilities"

JuMP

"JuMP is a modeling language and collection of supporting packages for mathematical optimization in Julia"

Solvers

OSQP

"Numerical optimization package for solving convex quadratic programs"

IPOPT

"Open source software package for large-scale nonlinear optimization"

Bibliography

[Boyd2004] - Boyd, Stephen and Vandenberghe, Lieven - Convex Optimization. - 2004. -

Summary/Abstract

N/A

[Bierlaire2018] - Michel Bierlaire - Optimization: Principles and Algorithms. - 2018. -

Summary/Abstract

N/A

[Underactuated2023] - Tedrake, Russ - Underactuated Robotics. - 2023. -

Summary/Abstract

N/A

[Brunton2022-HJB] - Brunton, Steven L. - Nonlinear Control: Hamilton Jacobi Bellman (HJB) and Dynamic Programming. - 2022. -

Summary/Abstract

This video discusses optimal nonlinear control using the Hamilton Jacobi Bellman (HJB) equation, and how to solve this using dynamic programming.

[Bellman1966] - Richard Bellman - Dynamic Programming. - 1966. -

Summary/Abstract

Little has been done in the study of these intriguing questions, and I do not wish to give the impression that any extensive set of ideas exists that could be called a theory. What is quite surprising, as far as the histories of science and philosophy are concerned, is that the major impetus for the fantastic growth of interest in brain processes, both psychological and physiological , has come from a device, a machine, the digital computer. In dealing with a human being and a human society, we enjoy the luxury of being irrational, illogical, inconsistent, and incomplete, and yet of coping. In operating a computer, we must meet the rigorous requirements for detailed instructions and absolute precision. If we understood the ability of the human mind to make effective decisions when confronted by complexity, uncertainty, and irrationality, then we could use computers a million times more effectively than we do. Recognition of this fact has been a motivation for the spurt of research in the field of neurophysiology. The more we study the information-processing aspects of the mind, the more perplexed and impressed we become. It will be a very long time before we understand these processes sufficiently to reproduce them. In any case, the mathematician sees hundreds and thousands of formidable new problems in dozens of blossoming areas, puzzles galore, and challenges to his heart's content. He may never resolve some of these, but he will never be bored. What more can he ask?