Data Generation
In JuliaSimSurrogates, training algorithms are run on samples. Samples are various simulations run from the original model. So generating any one sample requires all the inputs needed for a simulation run: initial conditions, parameter values, and control functions. However, many samples are needed to train a surrogate model and it is not feasible to manually prepare every configuration for every sample. Instead, JuliaSimSurrogates provides user-friendly interfaces for describing various sampling spaces and then combining them to into a SimulatorConfig
which can help users generate many samples in parallel.
Sampling Spaces
There are 3 different sampling spaces:
- Parameter Space
- Initial Conditions Space
- Control Functions Space
JuliaSimSurrogates provides a consistent interface for defining each of these spaces in order to generate the desired samples for surrogate training.
Parameter Space
Parameter spaces contain all model parameters which will be used for simulation runs. Spaces can be produced by only specifying the parameter bounds.
DataGeneration.ParameterSpace
— TypeParameterSpace(lb, ub) -> ParameterSpace
ParameterSpace(lb, ub, nsamples) -> ParameterSpace
ParameterSpace(
lb,
ub,
nsamples,
alg;
labels
) -> ParameterSpace
Generate some parameter space within lower and upper bounds using a specified sampling algorithm.
Optional Arguments
nsamples::Integer
: number of samples to generate (defaults to $100$).alg<:SamplingAlgorithm
: algorithm to generate samples (defaults to Sobol sequence).labels::Vector{String}
: names for each parameter of the sampled space (defaults to["p_1", "p_2", ..., "p_n"]
wheren
is the same asnsamples
).
ParameterSpace(
lb,
ub,
samples::AbstractMatrix
) -> ParameterSpace
ParameterSpace(
lb,
ub,
samples::AbstractMatrix,
alg;
labels
) -> ParameterSpace
Generate some parameter space using some collection of pre-existing samples.
Note that samples
are in matrix form and alg = nothing
since the samples already exist.
Optional Arguments
labels::Vector{String}
: names for each parameter of the sampled space (defaults to["p_1", "p_2", ..., "p_n"]
wheren
is the same asnsamples
).
Initial Condition Space
Initial condition spaces contain all required starting values for simulation runs. Spaces can be produced by only specifying the parameter bounds. However, some optional arguments exist to produce the desired sample space.
DataGeneration.ICSpace
— TypeICSpace(lb, ub) -> ICSpace
ICSpace(lb, ub, nsamples) -> ICSpace
ICSpace(lb, ub, nsamples, alg; labels) -> ICSpace
Generate some initial condition space within lower and upper bounds using a specified sampling algorithm.
Optional Arguments
nsamples::Integer
: number of samples to generate (defaults to $100$).alg<:SamplingAlgorithm
: algorithm to generate samples (defaults to Sobol sequence).labels::Vector{String}
: names for each parameter of the sampled space (defaults to["p_1", "p_2", ..., "p_n"]
wheren
is the same asnsamples
).
ICSpace(lb, ub, samples::AbstractMatrix) -> ICSpace
ICSpace(
lb,
ub,
samples::AbstractMatrix,
alg;
labels
) -> ICSpace
Generate some initial condition space using some collection of pre-existing samples.
Note that samples
are in matrix form and alg = nothing
since the samples already exist.
Optional Arguments
labels::Vector{String}
: names for each parameter of the sampled space (defaults to["p_1", "p_2", ..., "p_n"]
wheren
is the same asnsamples
).
Control Space
Control spaces determine input parameterization for various simulation runs. Spaces can be produced by specifying the parameter bounds and a function which describes the nature of input parameter values throughout simulation run. For example, one could define a two-parameter control function with the following form.
f(u, p, t) = [p[1]*sin(t), p[2]*cos(t)]
This funtion, f
, can be passed to construct a CtrlSpace
.
DataGeneration.CtrlSpace
— TypeCtrlSpace(lb, ub, prob_func) -> CtrlSpace
CtrlSpace(lb, ub, prob_func, nsamples) -> CtrlSpace
CtrlSpace(
lb,
ub,
prob_func,
nsamples,
alg;
labels
) -> CtrlSpace
Generate some control space within specified bounds using a provided control function.
Control space consists of samples for pre-defined time varying inputs that drive a system. For example, a pre-defined time varying input could be a*sin(t + b)
where a
and b
are parameters - each with a lower and upper bound to sample from. In contrast to a system's state space, which represents possible values that state can take depending on model variables, control space depends on the bounded subset of values allowed for controls applied to a system.
Optional Arguments
nsamples::Integer
: number of samples to generate (defaults to $100$).alg<:SamplingAlgorithm
: algorithm to generate samples (defaults to Sobol sequence).labels::Vector{String}
: names for each parameter of the sampled space (defaults to["p_1", "p_2", ..., "p_n"]
wheren
is the same asnsamples
).
CtrlSpace(
lb,
ub,
prob_func,
samples::AbstractMatrix
) -> CtrlSpace
CtrlSpace(
lb,
ub,
prob_func,
samples::AbstractMatrix,
alg;
labels
) -> CtrlSpace
Generate some control space using some collection of pre-existing samples.
Note that samples
are in matrix form and alg = nothing
since the samples already exist.
Additional Samples
Users may wish to add samples to an existing sample space. This can be acheived by making a collection of additional samples in matrix form and using the add_samples
function.
DataGeneration.add_samples
— Functionadd_samples(sample_space::AbstractSampleSpace, samples; alg)
Function to add manual samples to an existing sample space.
Simulators
Once all sampling spaces are defined, Simulators can be configured to describe all the scenarios to be simulated and facilitate running these simulations. Simulator configurations make it easy to distribute the execution of simulations cross any number of machines. This means that JuliaHub can spin up one machine per simulation, and generate thousands of samples in the same time it takes to generate one.
SimulatorConfig
objects are themselves callable objects which describe all the scenarios which should be simulated for a given problem definition. Compatible problem definitions can take various forms, such as:
ODEProblem
(Julia code),- FMU (an FMI compliant model).
DataGeneration.SimulatorConfig
— Typestruct SimulatorConfig{IC, C, P} <: AbstractSpaceConfig
Simulator configurations contain information for all three sampling spaces:
Simulators help users run simulation ensembles over these sampling spaces in parallel. Once a SimulatorConfig
object is created, it can be called as a function with an FMU or ODEProblem
as its argument (kwargs
may also be passed for the running the simulations).
Note that Simulator configurations do not require all sampling spaces. For example, if a system does not have a control space, then a Simulator configuration can be created without one.
Experiment Data
No matter if the problem definition is Julia code or an FMU, when called with a SimulatorConfig
, the product is always ExperimentData
. This is the common format needed to Surrogatize a model.
What happens if a problem cannot be easily described by its mechanics, but its behaviour can be captured in a dataset? Often real-world data can be captured from sensors, and this data may be useful for constructing surrogate models. In cases like this, a dataset can be converted directly to ExperimentData
, making surrogate generation possible with out a formal mathematical problem definition. This level of flexibility in defining problems creates many new opportunities to generate surrogate models.
JSSBase.ExperimentData
— TypeExperimentData(dict::AbstractDict)
Constructs an ExperimentData object using a given dictionary of the following format.
Note that the labels in the dictionary must be exactly as shown.
* "states_labels": Vector{String},
* "states": Vector{Matrix{Float64}} with every element matrix being size (state_num, time_num)
* "observable_labels": Vector{String},
* "observables": Vector{Matrix{Float64}} with every element matrix being size (observable_num, time_num)
* "param_labels": Vector{String} every element corresponds to the name of a parameter
* "params": Vector{Vector{Float64}} with every element being a vector of real values
* "control_labels": Vector{String} every element corresponds to the name of a control
* "controls": Vector{Matrix} where every element matrix of shape (state_num, time_num)
* "ts": Vector{Vector} where every element is a vector of real values corresponding to the time steps the simulation was evaluated at
Each of states
, params
, controls
and ts
must be of length of the number of trajectories in the experiment.
Each of states_labels
, param_labels
, control_labels
must of the length corresponding to the number of states, parameters and controls in the experiment respectively.
Note: In the case that any field out of states
, controls
or params
does not exist, it (along with the corresponding labels field) must be set to nothing
.