Surrogate Analysis
Many industrial users unlock faster runtime when writing models in Julia, but there are cases where even this improved runtime is not enough for bottlenecks in a product's development life-cycle. When simulations are too computationally expensive or time consuming to evaluate, engineers have limited time for experimentation and exploration. In many cases, these long-running models can be replaced by much faster surrogate models, cutting down workflows from days and months to minutes and hours.
Method Overview
For a given differential-algebraic equation (DAE) described by a Dyad model
If we call
for all
NOTE
The predicted values from the surrogate,
Additionally, not all parameters need to be tunable parameters. Training the surrogate over a smaller subset of parameters will give a more accurate or faster surrogate. Thus for the purpose of the surrogate, it is wise to choose the tunable parameters to only be the parameters which are often changed or are of most interest.
Underneath the hood, a neural network parameterized by weights
Which is trained on a dataset of solutions that strategically samples the parameter space given the lower and upper bounds you provide. Both the dataset generation process and the training of the surrogate are implemented to use hardware as efficiently as possible. In particular, when using JuliaHub compute, data generation will scale horizontally depending on the number of samples requested and perform an ensemble of solutions in parallel across nodes. Training the surrogate on this data will use batching strategies that ensure GPU utilization is maximized while minimizing the time required to generate a high-performance surrogate.
NOTE
Because the generated surrogate is an ordinary differential equation
Setup
In order to get started with building a surrogate in Dyad, make sure your environment is set up according to the Getting Started instructions. In general, building a surrogate requires two things:
A model defined in Dyad which you wish to build a surrogate for.
A
SurrogateAnalysis
that lets you configure the job of generating a surrogate for that model.
To demonstrate, we will build a surrogate for the Train-Car-Stopper system. At a high level you will:
Create a new component library (which is a Julia package) and name it
TrainCarSurrogateAnalysis
Add any required dependencies to
Project.toml
— in this case, onlyDyadSurrogateInterface
— using the standard julia package manager.Write the Dyad code for the Train-Car-Stopper model and your
SurrogateAnalysis
inside the package’sdyad
folder, and use the command pallet to compile it when you’re done.
Here’s how your library directory should look after creating and naming your component library:
.
└── TrainCarSurrogateAnalysis/
├── dyad/
│ └── lv.dyad # Contains all the Dyad code that we will write
├── src/
│ ├── TrainCarSurrogateAnalysis.jl
│ └── generated/
├── test/
└── Project.toml # The project file where we'll add `DyadSurrogateInterface`
For more help on how to create a component library and compile it, see the getting started instructions.
Example Definition
For this demonstration, we will design a stopper for a runaway train. The train is modeled as a series of connected train cars, and the stopper is modeled as a simple damped spring. We model each train car as a mass, coupled via a spring damper on each side to another train car. The mass is modeled as a system with two flanges, mechanical connection points that transmit force. A simpler version of this model is defined in the Design Optimization Tutorial. Here our engine is the last car i.e. cars[24]
.
Lets define the TrainCarsArray
component in dyad. We set the ncars
to 24.
component TrainCarsArray
# The spring constant of the connectors
parameter c::Real = 1e5
# The damping constant of the connectors
parameter d::Real = 1e5
# The number of train cars.
# This is defined as a structural parameter, so that the model has a known
# number of components on construction.
structural parameter ncars::Integer = 24
structural parameter nconnectors::Integer = ncars - 1
# The cars
cars = [TranslationalComponents.Mass(m = if i == 24 then 2000 else 1000, L = 0.9) for i in 1:ncars]
# The connectors between the cars, modeled as spring dampers
connectors = [TranslationalComponents.SpringDamper(c=1e5, d=1e5) for i in 1:nconnectors]
# The flange at the end of the train, that will "connect" to the stopper.
flange = Flange()
# The initial velocity of the train
parameter v0::Velocity = 10
# The initial acceleration of the train
parameter a0::Acceleration = 0
relations
# The initial position of the cars
for i in 1:ncars
initial cars[i].s = 0.55*i
initial cars[i].v = v0
end
# The initial acceleration of the engine, which is the last car in the array
initial cars[24].a = a0
# Connect the cars and connectors
for i in 1:nconnectors-1
initial connectors[i].initial_stretch = 0
end
for i in 1:nconnectors
connect(cars[i].flange_b, connectors[i].flange_a)
connect(connectors[i].flange_b, cars[i+1].flange_a)
end
# Connect the last car to the flange
connect(cars[ncars].flange_b, flange)
end
Now, let's design the stopper,
component Stopper
extends TranslationalComponents.PartialTwoFlanges
parameter c::TranslationalSpringConstant
parameter d::DampingCoefficient
variable v_a::Velocity
variable v_b::Velocity
variable s_a::Dyad.Position
variable s_b::Dyad.Position
variable f::Force
relations
flange_a.s = s_a
flange_b.s = s_b
flange_a.f = f
flange_b.f = -f
D(s_a) = v_a
D(s_b) = v_b
f = if s_a >= s_b then
(v_a - v_b)*d + c*(s_a - s_b)
else
0
end
Finally, we can unify these two components in a TrainCarsArrayWithStopper
component.
component TrainCarWithStopper
train = TrainCarsArray()
stopper = Stopper(c=1e1, d=1e4)
reference = TranslationalComponents.Fixed(s0=10)
relations
connect(train.flange, stopper.flange_a)
connect(stopper.flange_b, reference.flange)
end
Now we can write a surrogate analysis for this TrainCarsArrayWithStopper
component, extending the SurrogateAnalysis
from DyadSurrogatesInterface.jl. Because this functionality is provided by DyadSurrogates.jl, first you have to explicitly add that package.
Pkg.add("DyadSurrogatesInterface")
using DyadSurrogatesInterface
In order to generate a surrogate for this model, we will create a new analysis
, name it TrainCarSurrAnalysis
and extend it with SurrogateAnalysis
. This will give you an instance of a SurrogateAnalysis
job that you can configure to generate a surrogate for your particular model.
The rest of the code from here is straight forward. We provide an array with parameter_names
and their upper and lower bound with parameter_lb
and parameter_ub
. Since we want to build the surrogate over a parameter space defined by two parameters from the system stopper₊c
and stopper₊c
, we define num_params
as 2.
We also need to provide an array with variable_names
that specifies the variables that we are interested in. If no variable is specified, the surrogate will train on and predict all the variables present in the system. Since we want to train on
train₊cars_23₊a(t)
: The acceleration of the last carstopper₊s_a(t)
: The flange_a position for the stopper.stopper₊s_b(t)
: The flange_b position for the stopper.train₊cars_24₊v(t)
: The velocity of the engine.
we define the num_variables
as 4.
Next, we configure the hardware the job should use. We can run SurrogateAnalysis
in two compute modes: local and JuliaHub compute. When using local compute we specify the keyword compute_instance
as "local". When using JuliaHub compute, this should be the url of the JuliaHub instance. (for eg. "https://juliahub.com"). When we use JuliaHub compute, we can leverage parallel runners for generating data and GPU runners for training the model. The final analysis filled out should look something like the following. See the Required Arguments
for the minimum configuration required.
analysis TrainCarSurrAnalysis
extends DyadSurrogatesInterface.SurrogateAnalysis(
surrogate_model="SolutionOperator",
abstol=1e-8,
reltol=1e-8,
stop=2,
seed=1032,
num_samples=100,
num_runners=1,
num_params=2,
saveat = 10,
RSIZE=30,
tau=0.1,
n_epochs=10,
epochs_per_lr=100,
parameter_lb=[80.0, 800.0],
parameter_ub=[120.0, 1200.0],
parameter_names = ["stopper₊c", "stopper₊d"],
alg = "Rodas4",
num_variables = 4,
variable_names = ["train₊cars_23₊a(t)", "stopper₊s_a(t)", "stopper₊s_b(t)", "train₊cars_24₊v(t)"],
ground_truth_field="all",
compute_instance = "local"
)
model = TrainCarWithStopper()
metadata {"Dyad": {"using": "DyadSurrogatesInterface: AbstractSurrogateAnalysisSpec, SurrogateAnalysisSpec, SurrogateAnalysis"}}
end
Now we compile our library, and run our TrainCarSurrAnalysis
:
using DyadInterface, DyadSurrogatesInterface
sol = TrainCarSurrAnalysis();
[ Info: Constructing Model
[ Info: PreProcessing Data
┌ Warning: No functional GPU backend found! Defaulting to CPU.
│
│ 1. If no GPU is available, nothing needs to be done. Set `MLDATADEVICES_SILENCE_WARN_NO_GPU=1` to silence this warning.
│ 2. If GPU is available, load the corresponding trigger package.
│ a. `CUDA.jl` and `cuDNN.jl` (or just `LuxCUDA.jl`) for NVIDIA CUDA Support.
│ b. `AMDGPU.jl` for AMD GPU ROCM Support.
│ c. `Metal.jl` for Apple Metal GPU Support. (Experimental)
│ d. `oneAPI.jl` for Intel oneAPI GPU Support. (Experimental)
└ @ MLDataDevices.Internal ~/.julia/packages/MLDataDevices/gyWcF/src/internal.jl:112
[ Info: Epoch : 100, LR : 0.001, TrainingLoss : 0.025982535768454273, mae : 0.024767740257461517, RegularizationLoss : 0.001214795510992758
[ Info: ValidationLoss: mae : 0.022869821669186653
[ Info: Epoch : 200, LR : 0.0005, TrainingLoss : 0.012131753432711076, mae : 0.010961587754596724, RegularizationLoss : 0.0011701656781143524
[ Info: ValidationLoss: mae : 0.008677415989656986
[ Info: Epoch : 300, LR : 0.0003, TrainingLoss : 0.007324533610944507, mae : 0.006173715586969214, RegularizationLoss : 0.001150818023975293
[ Info: ValidationLoss: mae : 0.005432350552359104
[ Info: Epoch : 400, LR : 0.0001, TrainingLoss : 0.004678641197877578, mae : 0.0035330936991149323, RegularizationLoss : 0.0011455474987626458
[ Info: ValidationLoss: mae : 0.003825122845454292
[ Info: Epoch : 500, LR : 5.0e-5, TrainingLoss : 0.002780961007607896, mae : 0.0016380222598763165, RegularizationLoss : 0.0011429387477315794
[ Info: ValidationLoss: mae : 0.0017392916484727612
[ Info: Epoch : 600, LR : 3.0e-5, TrainingLoss : 0.003062175967583445, mae : 0.0019208608871432737, RegularizationLoss : 0.0011413150804401712
[ Info: ValidationLoss: mae : 0.0023468674486398933
[ Info: Epoch : 700, LR : 1.0e-5, TrainingLoss : 0.0020084012401064915, mae : 0.0008676218357486495, RegularizationLoss : 0.001140779404357842
[ Info: ValidationLoss: mae : 0.000983777218072926
Lets plot some results from our analysis: We can see how we performed on our training data with:
artifacts(sol, :SurrogateAnalysisTrainingDataPlot)
We can see how we perform on our validation data with:
artifacts(sol, :SurrogateAnalysisValDataPlot)
We can also export our surrogate as a ModelingToolkit
component with:
surrogate_component = artifacts(sol, :SurrogateMTKDownload)
Analysis Arguments
The following arguments define a SurrogateAnalysis
:
Required Arguments
model
: the Dyad model that the analysis is being applied to.abstol::Real
: chooses the absolute tolerance for the solver. Defaults to1e-6
.reltol::Real
: chooses the relative tolerance for the solver. Defaults to1e-3
.start::Time
: chooses the start time for the interval. Defaults to 0.stop::Time
: chooses the end time for the interval.num_params
: The number of parameters in the model for which to vary and simulate data.parameter_names
: A vector of strings representing the names of the model parameters to vary. Must havenum_params
elements.parameter_lb
: A vector of real values for the lower bound of each corresponding parameter inparameter_names
. Must havenum_params
elements.parameter_ub
: A vector of real values for the upper bound of each corresponding parameter inparameter_names
. Must havenum_params
elements.num_samples
: The total number of trajectories (simulation data sets) to generate by sampling parameters. This data is split bytrain_ratio
.
Optional Arguments
model
: AnODESystem
representing the model to simulate. Defaults toEmpty()
.surrogate_model
: The type of surrogate model to be used. Can be either"SolutionOperator"
or"DigitalEcho"
.alg::String: chooses the solver algorithm for the data generation process. See
alg` in the Transient Analysis section.abstol::Real
: chooses the absolute tolerance for the solver. Defaults to1e-6
.reltol::Real
: chooses the relative tolerance for the solver. Defaults to1e-3
.saveat::Real
: chooses the saving interval for the solving process. The default is 0, which corresponds to saving a dense output with a continuous time interpolation for high-fidelity solution at all points. If a non-zero value is chosen, for example,0.1
, then it will save at only the interval points (0.1, 0.2, ...) and use a linear interpolation between those points. This should only be used as a performance and memory optimization.dtmax::Time
: sets thedtmax
of the ODE solver. Defaults to 0 which uses the default of the solver. Setting this lower can be used to force smaller steps and thus a more accurate solution.seed
: The random seed to be used for simulations and data generation.num_runners
: The number of parallel runners to launch for data generation.RSIZE
: Reservoir size, typically used whensurrogate_model
is"DigitalEcho"
.tau
: Time constant, potentially used in the surrogate model's dynamics (e.g., for"DigitalEcho"
).n_epochs
: The number of epochs to train the surrogate model. Must be at least 7.epochs_per_lr
: The number of epochs to train per learning rate step. Must be at least 7.regularization_constant
: The regularization constant used during surrogate model training. Defaults to1e-6
.train_ratio
: The proportion ofnum_samples
to use for training (the rest for validation). Must be at least 0.1. Defaults to0.8
.ground_truth_field
: The field to train the surrogate on. Can be either"states"
or"observables"
.compute_instance
: Specifies where to run computations. Defaults to"local"
. Other options might include cloud identifiers like"juliahub.com" or "yourcompany.juliahub.com"
.
Artifacts
A SurrogateAnalysis
returns the following artifacts:
Standard Plots
SurrogateAnalysisTrainingDataPlot
: Plots the first trajectory in training data vs the prediction made by the surrogate.SurrogateAnalysisValDataPlot
: Plots the first trajectory in validaiton data vs the prediction made by the surrogate.SurrogateAnalysisLossPlot
: Plots the mean training and validaiton loss against the number of epochs.
Customizable Plot
The customizable plot of a SurrogateAnalysis
supports plotting for specific variables in training against the predictions from the surrogate.
Downloadable Artifacts
SurrogateMTKDownload
: Returns the surrogate deployed as a ModelingToolkit component.SurrogateModelDownload
: Returns the surrogate object.
Further Reading
Composing Modeling And Simulation With Machine Learning In Julia
Active Learning Enhanced Surrogate Modeling of Jet Engines in JuliaSim
Automated surrogate training performance by incorporating simulator information
Systems and methods of component-based modeling using trained surrogates
Transforming a model in a first language to a surrogate in a second language for simulation
Systems and methods for automatically generating high-fidelity surrogates of dynamical systems
Scientific Machine Learning (SciML) Surrogates for Industry, Part 1: The Guiding Questions