Skip to content

Surrogate Analysis

Many industrial users unlock faster runtime when writing models in Julia, but there are cases where even this improved runtime is not enough for bottlenecks in a product's development life-cycle. When simulations are too computationally expensive or time consuming to evaluate, engineers have limited time for experimentation and exploration. In many cases, these long-running models can be replaced by much faster surrogate models, cutting down workflows from days and months to minutes and hours.

Method Overview

For a given differential-algebraic equation (DAE) described by a Dyad model

x=f(x,y,p,t)0=g(x,y,p,t)

If we call z(t,p) to be the subset of the states x(t,p) and or the observables y(t,p) of the system which we wish to predict with respect to the tunable parameters p, the surrogate model will find an approximation h(t,p) such that:

h(t,p)z(t,p)

for all p in the training space. This space is defined by your own needs, and can be thought of as what ranges of values of parameters you are most interested in exploring. The boundary of the space is defined by the upper and lower bound of that range.

NOTE

The predicted values from the surrogate, z, will generally be a small subset of the state. For example, we may want to build a surrogate of a high-fidelity battery model with 10,000 differential and algebraic equations, but we may only need to know the average temperature of the battery, its output voltage, and output current. In this case, z would be dimension 3 even though the original model is 10,000. Much of the acceleration of the surrogate can be given by choosing to train to an appropriately chosen subset, though surrogates can still provide faster simulation even if this is not the case in some instances.

Additionally, not all parameters need to be tunable parameters. Training the surrogate over a smaller subset of parameters will give a more accurate or faster surrogate. Thus for the purpose of the surrogate, it is wise to choose the tunable parameters to only be the parameters which are often changed or are of most interest.

Underneath the hood, a neural network parameterized by weights θ approximates a dynamical system, which we denote as NNθ, such that:

h(u,p,t)=NNθ(u,p,t)

Which is trained on a dataset of solutions that strategically samples the parameter space given the lower and upper bounds you provide. Both the dataset generation process and the training of the surrogate are implemented to use hardware as efficiently as possible. In particular, when using JuliaHub compute, data generation will scale horizontally depending on the number of samples requested and perform an ensemble of solutions in parallel across nodes. Training the surrogate on this data will use batching strategies that ensure GPU utilization is maximized while minimizing the time required to generate a high-performance surrogate.

NOTE

Because the generated surrogate is an ordinary differential equation h(u,p,t)=NNθ(u,p,t), it itself can be represented as a Dyad model! As such, the surrogate model can be used as a component in place of the original system.

Setup

In order to get started with building a surrogate in Dyad, make sure your environment is set up according to the Getting Started instructions. In general, building a surrogate requires two things:

  1. A model defined in Dyad which you wish to build a surrogate for.

  2. A SurrogateAnalysis that lets you configure the job of generating a surrogate for that model.

To demonstrate, we will build a surrogate for the Train-Car-Stopper system. At a high level you will:

  1. Create a new component library (which is a Julia package) and name it TrainCarSurrogateAnalysis

  2. Add any required dependencies to Project.toml — in this case, only DyadSurrogateInterface — using the standard julia package manager.

  3. Write the Dyad code for the Train-Car-Stopper model and your SurrogateAnalysis inside the package’s dyad folder, and use the command pallet to compile it when you’re done.

Here’s how your library directory should look after creating and naming your component library:

.
└── TrainCarSurrogateAnalysis/
    ├── dyad/
    │   └── lv.dyad    # Contains all the Dyad code that we will write
    ├── src/
    │   ├── TrainCarSurrogateAnalysis.jl
    │   └── generated/
    ├── test/
    └── Project.toml   # The project file where we'll add `DyadSurrogateInterface`

For more help on how to create a component library and compile it, see the getting started instructions.

Example Definition

For this demonstration, we will design a stopper for a runaway train. The train is modeled as a series of connected train cars, and the stopper is modeled as a simple damped spring. We model each train car as a mass, coupled via a spring damper on each side to another train car. The mass is modeled as a system with two flanges, mechanical connection points that transmit force. A simpler version of this model is defined in the Design Optimization Tutorial. Here our engine is the last car i.e. cars[24].

Lets define the TrainCarsArray component in dyad. We set the ncars to 24.

dyad
component TrainCarsArray
  # The spring constant of the connectors
  parameter c::Real = 1e5
  # The damping constant of the connectors
  parameter d::Real = 1e5

  # The number of train cars.
  # This is defined as a structural parameter, so that the model has a known
  # number of components on construction.
  structural parameter ncars::Integer = 24
  structural parameter nconnectors::Integer = ncars - 1
  # The cars
  cars = [TranslationalComponents.Mass(m = if i == 24 then 2000 else 1000, L = 0.9) for i in 1:ncars]
  # The connectors between the cars, modeled as spring dampers
  connectors = [TranslationalComponents.SpringDamper(c=1e5, d=1e5) for i in 1:nconnectors]
  # The flange at the end of the train, that will "connect" to the stopper.  
  flange = Flange()
  # The initial velocity of the train
  parameter v0::Velocity = 10
  # The initial acceleration of the train
  parameter a0::Acceleration = 0

relations
  # The initial position of the cars
  for i in 1:ncars
    initial cars[i].s = 0.55*i
    initial cars[i].v = v0
  end
  # The initial acceleration of the engine, which is the last car in the array
  initial cars[24].a = a0
  # Connect the cars and connectors
  for i in 1:nconnectors-1
    initial connectors[i].initial_stretch = 0
  end

  for i in 1:nconnectors
    connect(cars[i].flange_b, connectors[i].flange_a)
    connect(connectors[i].flange_b, cars[i+1].flange_a)
  end
  # Connect the last car to the flange
  connect(cars[ncars].flange_b, flange)
end

Now, let's design the stopper,

dyad
component Stopper
  extends TranslationalComponents.PartialTwoFlanges
  parameter c::TranslationalSpringConstant
  parameter d::DampingCoefficient
  variable v_a::Velocity
  variable v_b::Velocity
  variable s_a::Dyad.Position
  variable s_b::Dyad.Position
  variable f::Force
relations
  flange_a.s = s_a
  flange_b.s = s_b

  flange_a.f = f
  flange_b.f = -f

  D(s_a) = v_a
  D(s_b) = v_b

  f = if s_a >= s_b then
    (v_a - v_b)*d + c*(s_a - s_b)
  else
    0
end

Finally, we can unify these two components in a TrainCarsArrayWithStopper component.

dyad
component TrainCarWithStopper
  train = TrainCarsArray()
  stopper = Stopper(c=1e1, d=1e4)
  reference = TranslationalComponents.Fixed(s0=10)
relations
  connect(train.flange, stopper.flange_a)
  connect(stopper.flange_b, reference.flange)
end

Now we can write a surrogate analysis for this TrainCarsArrayWithStopper component, extending the SurrogateAnalysis from DyadSurrogatesInterface.jl. Because this functionality is provided by DyadSurrogates.jl, first you have to explicitly add that package.

julia
Pkg.add("DyadSurrogatesInterface")
dyad
using DyadSurrogatesInterface

In order to generate a surrogate for this model, we will create a new analysis, name it TrainCarSurrAnalysis and extend it with SurrogateAnalysis. This will give you an instance of a SurrogateAnalysis job that you can configure to generate a surrogate for your particular model.

The rest of the code from here is straight forward. We provide an array with parameter_names and their upper and lower bound with parameter_lb and parameter_ub. Since we want to build the surrogate over a parameter space defined by two parameters from the system stopper₊c and stopper₊c, we define num_params as 2.

We also need to provide an array with variable_names that specifies the variables that we are interested in. If no variable is specified, the surrogate will train on and predict all the variables present in the system. Since we want to train on

  • train₊cars_23₊a(t): The acceleration of the last car

  • stopper₊s_a(t) : The flange_a position for the stopper.

  • stopper₊s_b(t) : The flange_b position for the stopper.

  • train₊cars_24₊v(t) : The velocity of the engine.

we define the num_variables as 4.

Next, we configure the hardware the job should use. We can run SurrogateAnalysis in two compute modes: local and JuliaHub compute. When using local compute we specify the keyword compute_instance as "local". When using JuliaHub compute, this should be the url of the JuliaHub instance. (for eg. "https://juliahub.com"). When we use JuliaHub compute, we can leverage parallel runners for generating data and GPU runners for training the model. The final analysis filled out should look something like the following. See the Required Arguments for the minimum configuration required.

dyad
analysis TrainCarSurrAnalysis
  extends DyadSurrogatesInterface.SurrogateAnalysis(
              surrogate_model="SolutionOperator", 
              abstol=1e-8, 
              reltol=1e-8,
              stop=2, 
              seed=1032, 
              num_samples=100, 
              num_runners=1, 
              num_params=2, 
              saveat = 10,
              RSIZE=30, 
              tau=0.1, 
              n_epochs=10, 
              epochs_per_lr=100, 
              parameter_lb=[80.0, 800.0], 
              parameter_ub=[120.0, 1200.0],
              parameter_names = ["stopper₊c", "stopper₊d"], 
              alg = "Rodas4",
              num_variables = 4, 
              variable_names = ["train₊cars_23₊a(t)", "stopper₊s_a(t)", "stopper₊s_b(t)", "train₊cars_24₊v(t)"],
              ground_truth_field="all",
              compute_instance = "local"
)

  model = TrainCarWithStopper()

metadata {"Dyad": {"using": "DyadSurrogatesInterface: AbstractSurrogateAnalysisSpec, SurrogateAnalysisSpec, SurrogateAnalysis"}}
end

Now we compile our library, and run our TrainCarSurrAnalysis:

julia
using DyadInterface, DyadSurrogatesInterface

sol = TrainCarSurrAnalysis();
[ Info: Constructing Model
[ Info: PreProcessing Data
┌ Warning: No functional GPU backend found! Defaulting to CPU.

│ 1. If no GPU is available, nothing needs to be done. Set `MLDATADEVICES_SILENCE_WARN_NO_GPU=1` to silence this warning.
│ 2. If GPU is available, load the corresponding trigger package.
│     a. `CUDA.jl` and `cuDNN.jl` (or just `LuxCUDA.jl`) for  NVIDIA CUDA Support.
│     b. `AMDGPU.jl` for AMD GPU ROCM Support.
│     c. `Metal.jl` for Apple Metal GPU Support. (Experimental)
│     d. `oneAPI.jl` for Intel oneAPI GPU Support. (Experimental)
└ @ MLDataDevices.Internal ~/.julia/packages/MLDataDevices/gyWcF/src/internal.jl:112
[ Info: Epoch : 100, LR : 0.001, TrainingLoss : 0.025982535768454273, mae : 0.024767740257461517, RegularizationLoss : 0.001214795510992758
[ Info: ValidationLoss: mae : 0.022869821669186653
[ Info: Epoch : 200, LR : 0.0005, TrainingLoss : 0.012131753432711076, mae : 0.010961587754596724, RegularizationLoss : 0.0011701656781143524
[ Info: ValidationLoss: mae : 0.008677415989656986
[ Info: Epoch : 300, LR : 0.0003, TrainingLoss : 0.007324533610944507, mae : 0.006173715586969214, RegularizationLoss : 0.001150818023975293
[ Info: ValidationLoss: mae : 0.005432350552359104
[ Info: Epoch : 400, LR : 0.0001, TrainingLoss : 0.004678641197877578, mae : 0.0035330936991149323, RegularizationLoss : 0.0011455474987626458
[ Info: ValidationLoss: mae : 0.003825122845454292
[ Info: Epoch : 500, LR : 5.0e-5, TrainingLoss : 0.002780961007607896, mae : 0.0016380222598763165, RegularizationLoss : 0.0011429387477315794
[ Info: ValidationLoss: mae : 0.0017392916484727612
[ Info: Epoch : 600, LR : 3.0e-5, TrainingLoss : 0.003062175967583445, mae : 0.0019208608871432737, RegularizationLoss : 0.0011413150804401712
[ Info: ValidationLoss: mae : 0.0023468674486398933
[ Info: Epoch : 700, LR : 1.0e-5, TrainingLoss : 0.0020084012401064915, mae : 0.0008676218357486495, RegularizationLoss : 0.001140779404357842
[ Info: ValidationLoss: mae : 0.000983777218072926

Lets plot some results from our analysis: We can see how we performed on our training data with:

julia
artifacts(sol, :SurrogateAnalysisTrainingDataPlot)

We can see how we perform on our validation data with:

julia
artifacts(sol, :SurrogateAnalysisValDataPlot)

We can also export our surrogate as a ModelingToolkit component with:

julia
surrogate_component = artifacts(sol, :SurrogateMTKDownload)
u(t)=model(u0,,p,[t])output.u(t)=u(t)

Analysis Arguments

The following arguments define a SurrogateAnalysis:

Required Arguments

  • model: the Dyad model that the analysis is being applied to.

  • abstol::Real: chooses the absolute tolerance for the solver. Defaults to 1e-6.

  • reltol::Real: chooses the relative tolerance for the solver. Defaults to 1e-3.

  • start::Time: chooses the start time for the interval. Defaults to 0.

  • stop::Time: chooses the end time for the interval.

  • num_params: The number of parameters in the model for which to vary and simulate data.

  • parameter_names: A vector of strings representing the names of the model parameters to vary. Must have num_params elements.

  • parameter_lb: A vector of real values for the lower bound of each corresponding parameter in parameter_names. Must have num_params elements.

  • parameter_ub: A vector of real values for the upper bound of each corresponding parameter in parameter_names. Must have num_params elements.

  • num_samples: The total number of trajectories (simulation data sets) to generate by sampling parameters. This data is split by train_ratio.

Optional Arguments

  • model: An ODESystem representing the model to simulate. Defaults to Empty().

  • surrogate_model: The type of surrogate model to be used. Can be either "SolutionOperator" or "DigitalEcho".

  • alg::String: chooses the solver algorithm for the data generation process. Seealg` in the Transient Analysis section.

  • abstol::Real: chooses the absolute tolerance for the solver. Defaults to 1e-6.

  • reltol::Real: chooses the relative tolerance for the solver. Defaults to 1e-3.

  • saveat::Real: chooses the saving interval for the solving process. The default is 0, which corresponds to saving a dense output with a continuous time interpolation for high-fidelity solution at all points. If a non-zero value is chosen, for example, 0.1, then it will save at only the interval points (0.1, 0.2, ...) and use a linear interpolation between those points. This should only be used as a performance and memory optimization.

  • dtmax::Time: sets the dtmax of the ODE solver. Defaults to 0 which uses the default of the solver. Setting this lower can be used to force smaller steps and thus a more accurate solution.

  • seed: The random seed to be used for simulations and data generation.

  • num_runners: The number of parallel runners to launch for data generation.

  • RSIZE: Reservoir size, typically used when surrogate_model is "DigitalEcho".

  • tau: Time constant, potentially used in the surrogate model's dynamics (e.g., for "DigitalEcho").

  • n_epochs: The number of epochs to train the surrogate model. Must be at least 7.

  • epochs_per_lr: The number of epochs to train per learning rate step. Must be at least 7.

  • regularization_constant: The regularization constant used during surrogate model training. Defaults to 1e-6.

  • train_ratio: The proportion of num_samples to use for training (the rest for validation). Must be at least 0.1. Defaults to 0.8.

  • ground_truth_field: The field to train the surrogate on. Can be either "states" or "observables".

  • compute_instance: Specifies where to run computations. Defaults to "local". Other options might include cloud identifiers like "juliahub.com" or "yourcompany.juliahub.com".

Artifacts

A SurrogateAnalysis returns the following artifacts:

Standard Plots

  • SurrogateAnalysisTrainingDataPlot: Plots the first trajectory in training data vs the prediction made by the surrogate.

  • SurrogateAnalysisValDataPlot: Plots the first trajectory in validaiton data vs the prediction made by the surrogate.

  • SurrogateAnalysisLossPlot: Plots the mean training and validaiton loss against the number of epochs.

Customizable Plot

The customizable plot of a SurrogateAnalysis supports plotting for specific variables in training against the predictions from the surrogate.

Downloadable Artifacts

  • SurrogateMTKDownload: Returns the surrogate deployed as a ModelingToolkit component.

  • SurrogateModelDownload: Returns the surrogate object.

Further Reading