MPC with model estimated from data

Sometimes, the control engineer finds themselves in a situation in which no model of the process they have to control is available. This is often the case in industry, since plants may be complicated, and creating models from first principles is an expensive endeavour. If the task is regulation, .i.e., keep some output at a desired set point, a linear model is often sufficient. Indeed, the controller is doing all that it can to keep the process operating around the operating point, so a linear assumption may be valid despite the overall process being nonlinear.

In situations like these, we may be able to estimate a linear model from input-output data coming from the system during operation, i.e., through system identification. In this example, we will consider an industrial heating process, similar in nature to a hair dryer, controlled using linear MPC.

The data we will use for this identification comes from STADIUS's Identification Database, and they give the following description

Laboratory setup acting like a hair dryer. Air is fanned through a tube and heated at the inlet. The air temperature is measured by a thermocouple at the output. The input is the voltage over the heating device (a mesh of resistor wires).

More details around the system-identification part in this example is available in the extended tutorial System identification for an industrial dryer. The tutorial you are reading now will only give sparse details on the identification, and then move on to the control design.

This tutorial is also available in video form:

System identification

We start by downloading the data, reading it using readdlm and packaging the data into an iddata object:

using DelimitedFiles, Plots, Statistics, LinearAlgebra
using DyadControlSystems, ControlSystemIdentification
using DyadControlSystems.MPC

url = "https://ftp.esat.kuleuven.be/pub/SISTA/data/mechanical/dryer.dat.gz"
zipfilename = "/tmp/dryer.dat.gz"
path = Base.download(url, zipfilename)
run(`gunzip -f $path`)
data = readdlm(path[1:end-3])
Ts = 0.08
u = data[:, 1]' # voltage
y = data[:, 2]' # air temp
d = iddata(y, u, Ts)

plot(plot(d), coherenceplot(d))

The figures above show the data available for identification, as well as the estimated coherence function.^[coherence]

To get a feeling for the dynamics of the process, we estimate an plot its impulse response. Before doing this, we save the mean input and then detrend the data, i.e., remove the operating point around which the process is operating.

u0 = mean(u, dims = 2)[:]
d = detrend(d)
impulseestplot(d, 40, σ = 3)

The system appears to have a couple of samples input delay, this must be handled in one way or another, we choose to handle it explicitly by specifying the input delay when we estimate the model below.

In this case, we fit the model using all available training data. See the system-identification tutorial mentioned above for more details on how to split up data for training and validation, a good practice for model selection.

We estimate a model using the arx routine, and specify that we want two free parameters in both numerator and denominator

model_arx = arx(d, 2, 2, inputdelay = 3)

TransferFunction{Discrete{Float64}, ControlSystemsBase.SisoRational{Float64}}
     0.06551795934506961z + 0.043826000210192124
------------------------------------------------------
1.0z^4 - 1.2887298646631014z^3 + 0.4066586853805532z^2

Sample Time: 0.08 (seconds)
Discrete-time transfer function model

This gave us a discrete-time transfer function, we will later convert this to a statespace model using the ss function before we use it for MPC design.

We may plot the prediction and simulation performance of the estimated model:

predplot(model_arx, d, sysname = "ARX")
simplot!(model_arx, d, ploty = false, sysname = "ARX")

This looks reasonably good, so we move on!

Control design

Next up. we define operating points for the linear MPC controller. The input operating point u0 was extracted from the data above, to compute the corresponding operating point for the states and outputs, we perform a steady-state simulation using step (a step-response simulation), and extract the operating point from the final values.^[1] We use this information to create an OperatingPoint as well as create reference arrays for the MPC controller (these will be the same as the operating point in this case). These are the set points we want the controller to maintain the output at.

sys = named_ss(ss(model_arx), u = :voltage, y = :temp) # give names to signals for nicer plot labels

stepres = step(sys, 100)
x0 = stepres.x[:, end] * u0[]
y0 = stepres.y[:, end] * u0[]
op = OperatingPoint(x0, u0, y0)

xr = copy(x0)
ur = copy(u0)[:]
yr = sys.C * xr

The data we used for identification only included inputs in a narrow range of voltages. Without further information on how the process works, we take the safe route and constrain the inputs available for the MPC controller to be within this range. We do this by constructing a MPCConstraints object:

umin = minimum(u, dims = 2)[:]
umax = maximum(u, dims = 2)[:]

constraints = MPCConstraints(; umin, umax)

We now move on to define the solver parameters and the cost function for the MPC controller. In this case, we are interested in penalizing the outputs of the process rather than the states, since we estimated a black-box LTI model, we don't even know what the states represent! Our cost-function matrix $Q_1$ is thus of size $n_y$, i.e., the number of outputs of the process. We also specify covariance matrices $R_1, R_2$ for a Kalman filter that will be used to estimate the state of the process from the available measurements.

solver = OSQPSolver(
    verbose = false,
    eps_rel = 1e-6,
    max_iter = 5000,
    check_termination = 5,
    polish = true,
)

Q1 = Diagonal(ones(sys.ny))   # output cost matrix
Q2 = 0.01 * spdiagm(ones(sys.nu)) # control cost matrix

R1 = I(sys.nx)
R2 = I(sys.ny)
kf = KalmanFilter(ssdata(sys)..., R1, R2, MvNormal(x0, R1))

The last step before simulating the MPC controller is to define the prediction model and the MPC-problem structure. We create a LinearMPCModel containing the system model and the Kalman filter. This object also keeps track of the operating point. Since we want to control the output of the system rather than the state of the system, we also pass the $C$ matrix of the system using z = sys.C. This will tell the prediction model that we are interested in regulating $z = Cx$ instead of $x$ directly, and we may thus provide references for $z = y$ instead of for $x$. We do this by passing the argument r = yr when creating the problem.

We also define the prediction horizon for the MPC controller, $N$. We set this to N = 10 based on the impulse-response we saw above, after about 10 samples, the impulse-response has mostly dissipated.

N = 10
predmodel = LinearMPCModel(sys, kf; constraints, op, x0, z = sys.C)
prob = LQMPCProblem(predmodel; Q1, Q2, N, solver, r = yr)

Simulation

Our MPC controller is now setup, and we specify some parameters related to the simulation. Let's simulate T = 80 time steps from a random initial condition:

T = 80 # Simulation length (time steps)
x0sim = 0 * x0 .+ randn.()
hist = MPC.solve(prob; x0 = x0sim, T)
plot(hist, ploty = true)
hline!([umin[] umax[]], sp = 2, l = (:black, :dash), primary = false)

Looks pretty good! but this was an easy task, we did not simulate any model error and there were no disturbances acting on the system.

In the next scenario, we simulate a periodic disturbance acting on the process input. The disturbance is given by

\[d(t) = \sin(ωt)\]

ω = 2π
disturbance = function (u, i)
    sin(ω * i * Ts)
end
histd = MPC.solve(prob; x0 = x0sim, T, verbose = false, disturbance)
plot(histd, ploty = true)
hline!([umin[] umax[]], sp = 2, l = (:black, :dash), primary = false)

This time, we see that the output oscillates around the set-point. Not too bad, but not great either. Can we do better?

Add disturbance model

If the disturbance acting on the system is measurable, we can include it as an input to the system and thus improve the regulation performance. However, even if we cannot measure the disturbance, there's still hope to perform better as long as we know some properties of the disturbance!

Say that we know the frequency of the disturbance, but it is otherwise unknown. In this scenario, which is common in practice, we can augment the system model with a model of the disturbance. When we apply a state estimator to the augmented system, the state estimator will be nice enough to estimate the disturbance for us, based on its affect on the observable measurements. To do this, we make use of the function add_resonant_disturbance. We indicate the frequency of the disturbance as well as the relative damping $ζ$ which we set to something small. Lastly, we indicate how the disturbance enters the system, since it enters at the input, it has the same input matrix as the system, sys.B.

ζ = 1e-3
sys2_ = add_resonant_disturbance(sys, ω, ζ, sys.B)
sys2 = named_ss(sys2_, u = :voltage, y = :temp)

NamedStateSpace{Discrete{Float64}, Float64}
A = 
 1.2887298646631014  -0.4066586853805532  0.0  0.0  0.5                  0.0
 1.0                  0.0                 0.0  0.0  0.0                  0.0
 0.0                  0.5                 0.0  0.0  0.0                  0.0
 0.0                  0.0                 0.5  0.0  0.0                  0.0
 0.0                  0.0                 0.0  0.0  0.8762365783135667  -0.4817151353493578
 0.0                  0.0                 0.0  0.0  0.4817151353493578   0.8762365783135666
B = 
 0.5
 0.0
 0.0
 0.0
 0.0
 0.0
C = 
 0.0  0.0  0.26207183738027845  0.350608001681537  0.0  0.0
D = 
 0.0

Sample Time: 0.08 (seconds)
Discrete-time state-space model
With state  names: x1 x2 x3 x4 x5 x6
     input  names: voltage
     output names: temp

This will create an augmented model with two additional states, modeling a resonance.

With the augmented system sys2, we must provide an augmented covariance matrix R_1 for the Kalman filter. The covariance we assign to the two additional states will determine how fast the observer is at estimating the disturbance. Higher values gives faster estimation, but also more sensitivity to measurement noise.

We also create new initial conditions since the state of the system now has two additional components.

R12 = cat(R1, 0.2I(2), dims = (1, 2))
kf2 = KalmanFilter(ssdata(sys2)..., R12, R2)

x02 = [x0; zeros(2)]
x02sim = [x0sim; zeros(2)]
op2 = OperatingPoint(x02, u0, y0)

predmodel2 = LinearMPCModel(sys2, kf2; constraints, op = op2, x0 = x02, z = sys2.C)
prob2 = LQMPCProblem(predmodel2; Q1, Q2, N, solver, r = yr)

LQMPCProblem{LinearMPCModel{Discrete{Float64}, NamedStateSpace{Discrete{Float64}, StateSpace{Discrete{Float64}, Float64}}, KalmanFilter{Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, LinearAlgebra.Diagonal{Bool, Vector{Bool}}, LowLevelParticleFilters.SimpleMvNormal{Vector{Float64}, Matrix{Float64}}, Vector{Float64}, Matrix{Float64}, Int64, LowLevelParticleFilters.NullParameters, Float64}, OperatingPoint{Vector{Float64}, Vector{Float64}, Vector{Float64}}, Matrix{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, MPCConstraints{Vector{Float64}, Nothing}}, Vector{Float64}, Matrix{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, Matrix{Float64}, Vector{Float64}, Vector{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, DyadControlSystems.MPC.OSQPSolverWorkspace, Matrix{Float64}, SciMLBase.NullParameters, Nothing}(LinearMPCModel{Discrete{Float64}, NamedStateSpace{Discrete{Float64}, StateSpace{Discrete{Float64}, Float64}}, KalmanFilter{Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, LinearAlgebra.Diagonal{Bool, Vector{Bool}}, LowLevelParticleFilters.SimpleMvNormal{Vector{Float64}, Matrix{Float64}}, Vector{Float64}, Matrix{Float64}, Int64, LowLevelParticleFilters.NullParameters, Float64}, OperatingPoint{Vector{Float64}, Vector{Float64}, Vector{Float64}}, Matrix{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, MPCConstraints{Vector{Float64}, Nothing}}
A = 
 1.2887298646631014  -0.4066586853805532  0.0  0.0  0.5                  0.0
 1.0                  0.0                 0.0  0.0  0.0                  0.0
 0.0                  0.5                 0.0  0.0  0.0                  0.0
 0.0                  0.0                 0.5  0.0  0.0                  0.0
 0.0                  0.0                 0.0  0.0  0.8762365783135667  -0.4817151353493578
 0.0                  0.0                 0.0  0.0  0.4817151353493578   0.8762365783135666
B = 
 0.5
 0.0
 0.0
 0.0
 0.0
 0.0
C = 
 0.0  0.0  0.26207183738027845  0.350608001681537  0.0  0.0
D = 
 0.0

Sample Time: 0.08 (seconds)
Discrete-time state-space model, [4.6360151356571775], [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], sparse([1], [1], [0.01], 1, 1), nothing, sparse([1, 2, 3, 4, 5, 6, 1, 2, 3, 4  …  3, 4, 5, 6, 1, 2, 3, 4, 5, 6], [1, 1, 1, 1, 1, 1, 2, 2, 2, 2  …  5, 5, 5, 5, 6, 6, 6, 6, 6, 6], [0.06975177554766844, -0.0016000935115287507, -1.4016165302778644e-16, 8.271971957872702e-17, 0.017789612698717823, 0.000863762565326807, -0.0016000935115287507, 0.029057295821797648, 0.022971120800227056, -3.713249656128458e-18  …  -9.806761888812128e-17, -4.658234258454934e-17, 29.06205746063023, -0.004876502426269523, 0.000863762565326807, 0.001157917746038454, 1.8683510090590407e-15, 8.565128694553634e-16, -0.004876502426269523, 29.057367762367644], 6, 6), [0.0; 0.0; … ; 0.0; 0.0;;], 0.0, 0.0, [0.0 0.0 … 0.0 0.0], [4.6360151356571775], [-1.5899999999999999], [1.4100000000000001], 10, sparse([3, 4, 3, 4, 7, 10, 11, 10, 11, 14  …  73, 74, 75, 76, 71, 72, 73, 74, 75, 76], [3, 3, 4, 4, 7, 10, 10, 11, 11, 14  …  75, 75, 75, 75, 76, 76, 76, 76, 76, 76], [0.06868164794787511, 0.09188448320090815, 0.09188448320090815, 0.12292597084312065, 0.01, 0.06868164794787511, 0.09188448320090815, 0.09188448320090815, 0.12292597084312065, 0.01  …  -9.806761888812128e-17, -4.658234258454934e-17, 29.06205746063023, -0.004876502426269523, 0.000863762565326807, 0.001157917746038454, 1.8683510090590407e-15, 8.565128694553634e-16, -0.004876502426269523, 29.057367762367644], 76, 76), [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, -0.0, -0.0, -0.0  …  -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], sparse([1, 7, 8, 2, 7, 9, 3, 10, 4, 5  …  65, 66, 61, 76, 61, 62, 63, 64, 65, 66], [1, 1, 1, 2, 2, 2, 3, 3, 4, 5  …  69, 69, 70, 70, 71, 72, 73, 74, 75, 76], [-1.0, 1.2887298646631014, 1.0, -1.0, -0.4066586853805532, 0.5, -1.0, 0.5, -1.0, -1.0  …  -0.4817151353493578, 0.8762365783135666, 0.5, 1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], 76, 76), [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0  …  -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999, -1.5899999999999999], [-0.0, -0.0, -0.0, -0.0, -0.0, -0.0, 0.0, 0.0, 0.0, 0.0  …  1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001, 1.4100000000000001], DyadControlSystems.MPC.OSQPSolverWorkspace(OSQP.Model(Ptr{OSQP.Workspace} @0x00000000571ac7a0, [5.0e-324, 6.95060577688967e-310, 1.0e-323, 6.95060577689125e-310, 1.5e-323, 6.95060577689283e-310, 2.0e-323, 6.9506057768944e-310, 2.5e-323, 6.950605776896e-310  …  5.4e-323, 6.95060577694817e-310, 6.0e-323, 6.95060577694975e-310, 6.4e-323, 6.95060577695133e-310, 7.0e-323, 6.9506057769529e-310, 7.4e-323, 6.9506057769545e-310], [NaN, 5.0e-324, 5.0e-324, 5.0e-324, 1.0e-323, 1.0e-323, 1.5e-323, 1.5e-323, 2.0e-323, 2.0e-323  …  1.63e-322, 1.63e-322, 1.7e-322, 1.7e-322, 1.3e-322, 1.3e-322, 1.1e-322, 1.1e-322, 5.4e-323, 5.4e-323], false), OSQPSolver(false, 1.0e-6, 1.0e-5, 5000, 5, 2, 1, true, 1.0e10)), [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0 0.0 … 0.0 0.0], SciMLBase.NullParameters(), nothing)

This time, we need to provide two different models to the MPC.solve function. The prediction model that the MPC controller uses is augmented with a disturbance model, but the "true system" that we are controlling does not have any such state components. We thus define the three arguments dyn_actual, x0_actual and Cz_actual which specifies the dynamics, initial state and output matrix for the true system. This mechanism allows us to simulate MPC controllers where the prediction model of the controller has model errors or a different number of states, like here.

dyn_actual = sys
x0_actual  = x0sim
Cz_actual  = sys.C

hist2 = MPC.solve(
    prob2;
    x0 = x02sim,
    T,
    verbose = false,
    disturbance,
    dyn_actual,
    x0_actual,
    Cz_actual,
)
plot(hist2, ploty = true, legend=:bottomright)
hline!([umin[] umax[]], sp = 2, l = (:black, :dash), primary = false)

This time, we see that the controller does a much better job at rejecting the periodic disturbance. We also see that it takes a while before the observer (state estimator) has estimated the disturbance, this is because the covariance for the disturbance states in R12 was set very low, 0.2I, try increasing this value and the controller will reject the disturbance much faster!

Adding output constraints

Since we are dealing with a black-box model identified from data, we also show how one can add output constraints to the problem (for simplicity, we go back to use the problem formulation without the disturbance), where the constrained output$v_{min} ≤ v ≤ v_{max}$ can be any linear function of the state and the input, $v = C_vx + D_vu$. In this case, we want to limit the rate at which the output changes, i.e., constrain

\[\begin{aligned} \Delta y &= y_{k+1} - y_k \\ ~ &= C(Ax_k + Bu_k) - Cx_k \\ ~ &= C(A-I)x_k + CBu_k \end{aligned}\]

For this, we make use of MPC.LinearMPCConstraints which expects the matrices $C_v$ and $D_v$ as well as the upper and lower bounds for $v$. Since we want to keep our constraints on $u$, we let $v = [\Delta y, u]$, and assemble the matrices $C_v$ and $D_v$ accordingly:

Δymin = [-1*Ts]
Δymax = [1*Ts]
vmin  = [Δymin; umin]
vmax  = [Δymax; umax]

soft_indices = 1:1 # Indicate that we consider the Δy constraint a soft constraint
A,B,C,D = ssdata(sys)
Cv = [C*(A-I); zeros(sys.nu, sys.nx)]
Dv = [C*B; I(sys.nu)]
constraints3 = MPC.LinearMPCConstraints(; vmin, vmax, Cv, Dv, soft_indices)
predmodel3   = LinearMPCModel(sys, kf; constraints = constraints3, op, x0, z = sys.C)
prob3        = LQMPCProblem(predmodel3; Q1, Q2, N, solver, r = yr)
hist3        = MPC.solve(prob3; x0 = x0sim, T)

plot(hist, ploty = true, lab="Original problem")
plot!(hist3, ploty = true, lab="With Δy constraint")
hline!([umin[] umax[]], sp = 2, l = (:black, :dash), primary = false)

In this case, we see that the controller actually works in the opposite direction in the beginning in order to slow down the rate of change of the output, thus fighting the natural tendency of the system to approach the set point. Since the control signal is also constrained, the controller cannot meet the $\Delta y$ constraint during the first 0.7s of the simulation. After this, the rate of change of the output remains constant for a while until it has reached the set point.

Summary

This tutorial has gone through the steps of

Estimating a model from data
Designing an MPC controller
Simulating a disturbance response
Augmenting the model with a disturbance model

The final product was an MPC controller that successfully rejected the periodic disturbance, without having access to measurements of it.

coherenceThe coherence function gives an indication on the power transfer between output and input that can be explained by a linear system. This function indicates for which frequencies a linear model may be valid, and it is good system-identification practice to inspect this function before proceeding to estimate an LTI model.
1The stationary point$x_0$ when $u_0$ is given can also be computed from $0 = \dot x = Ax_0 + Bu_0$.