Workflows

When training a DigitalEcho to reproduce a dynamical system, it is important that the data is pre-processed to prepare it for the surrogate generation pipeline. While the pipeline itself is designed to handle the more important pre-processing steps on your behalf, there are still some transformations that you can do before you generate a surrogate.

In this tutorial, we will go through the different pre-processing workflows that this library enables, as well as how they can be chained together.

This tutorial explains how we can use PreProcessingChain for setting up a preprocessing pipeline on the ExperimentData object. Using PreProcessingChain we can generate a new ExperimentData object that is preprocessed according to transformations defined in the chain.

Environment Setup

To begin, we will need DataGeneration, a module of JuliaSimSurrogates, to generate an ExperimentData object. OrdinaryDiffEq provides the interface for declaring our ODE problem.

using DataGeneration
using OrdinaryDiffEq

Problem Definition

We will then define our toy problem for this tutorial; Lotka Voltera. An in depth tutorial for generating an ExperimentData for an ODEProblem can be found in the Generating Data for an ODEProblem.

function lv(u, x, p, t)
    u₁, u₂ = u
    α, β, γ, δ = p
    x₁, x₂ = x
    dx = α * u₁ - β * u₁ * u₂ + x₁
    dy = δ * u₁ * u₂ - γ * u₂ - x₂
    [dx, dy]
end
lv (generic function with 1 method)

Following the common SciML procedure, we finish defining the ODE problem by specifying the function, initial conditions, timespan and parameters.

tstop = 12.5
p = [1.75, 1.8, 2.0, 1.8]
u0 = [1.0, 1.0]
tspan = (0.0, tstop)

prob = ODEProblem{false}(lv, u0, tspan, p);
ODEProblem with uType Vector{Float64} and tType Float64. In-place: false
timespan: (0.0, 12.5)
u0: 2-element Vector{Float64}:
 1.0
 1.0

Next, all sampling spaces that define the different configurations of simulations we want to run are defined for the problem. For an overview on sampling spaces, please see Sampling Spaces.

nsamples_x0 = 3
x0_space = [(0.98, 0.98), (1.2, 1.2)]
x0_lb = first.(x0_space)
x0_ub = last.(x0_space)
ic_space = ICSpace(x0_lb, x0_ub, nsamples_x0)

nsamples_p = 4
param_space = [(1.5, 2.5), (1.75, 2.0), (1.5, 2.5), (1.75, 2.0)]
p_lb = first.(param_space)
p_ub = last.(param_space)
param_space = ParameterSpace(p_lb, p_ub, nsamples_p)

nsamples_ctrl = 7
ctrl_lb = [0.0, 0.0]
ctrl_ub = [0.1, 0.1]
input_func(u, p, t) = [p[1] * sin(t), p[2] * sin(t)]
ctrl_space = CtrlSpace(ctrl_lb, ctrl_ub, input_func, nsamples_ctrl)

simconfig = SimulatorConfig(ic_space, ctrl_space, param_space)
ed = simconfig(prob)
 Number of Trajectories in ExperimentData: 84 
  Basic Statistics for Given Dynamical System's Specifications 
  Number of u0s in the ExperimentData: 2 
  Number of ps in the ExperimentData: 4 
 ╭─────────┬──────────────────────────────────────────────────────────────────...
───╮...
  Field  ...
        ...
├─────────┼──────────────────────────────────────────────────────────────────...
───┤...
           ╭──────────┬──────────────┬──────────────┬────────┬─────────────...
             Labels    LowerBound    UpperBound    Mean      StdDev...
           ├──────────┼──────────────┼──────────────┼────────┼─────────────...
              ic_1        0.98          0.98       0.98    2.23378e-16...
   u0s     ├──────────┼──────────────┼──────────────┼────────┼─────────────...
           ...
           ...
           ├──────────┼──────────────┼──────────────┼────────┼─────────────...
              ic_2        1.2           1.2        1.2     1.78703e-15...
           ╰──────────┴──────────────┴──────────────┴────────┴─────────────...
├─────────┼──────────────────────────────────────────────────────────────────...
───┤...
           ╭──────────┬──────────────┬──────────────┬─────────┬────────────...
             Labels    LowerBound    UpperBound    Mean      StdDev...
           ├──────────┼──────────────┼──────────────┼─────────┼────────────...
              p_1        1.625         2.375         2      0.281187...
   ps      ├──────────┼──────────────┼──────────────┼─────────┼────────────...
           ...
           ...
           ├──────────┼──────────────┼──────────────┼─────────┼────────────...
              p_4       1.78125       1.96875      1.875    0.0702968...
           ╰──────────┴──────────────┴──────────────┴─────────┴────────────...
╰─────────┴──────────────────────────────────────────────────────────────────...
───╯...
 Basic Statistics for Given Dynamical System's Continuous Fields 
  Number of states in the ExperimentData: 2 
  Number of controls in the ExperimentData: 2 
 ╭────────────┬───────────────────────────────────────────────────────────────...
──────────────╮...
   Field    ...
                   ...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
                  ╭──────────┬──────────────┬──────────────┬───────────┬───...
                    Labels    LowerBound    UpperBound     Mean...
                  ├──────────┼──────────────┼──────────────┼───────────┼───...
                     ic_1      0.654267      1.81879      1.05424...
   states         ├──────────┼──────────────┼──────────────┼───────────┼───...
                  ...
                  ...
                  ├──────────┼──────────────┼──────────────┼───────────┼───...
                     ic_2      0.462616       1.4295      1.05651...
                  ╰──────────┴──────────────┴──────────────┴───────────┴───...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
              ╭──────────────┬──────────────┬──────────────┬──────────────┬...
                  Labels      LowerBound    UpperBound       Mean...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_1    -0.0928494    0.0928571     0.00352886...
  controls    ├──────────────┼──────────────┼──────────────┼──────────────┼...
              ...
              ...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_2    -0.0928036    0.0928287     0.00342777...
              ╰──────────────┴──────────────┴──────────────┴──────────────┴...
╰────────────┴───────────────────────────────────────────────────────────────...
──────────────╯...

Now that we have our ExperimentData object, ed, we can use different transformations to preprocess the data. We will add the PreProcessing module of JuliaSimSurrogates for pre-processing capability as well as JSSBase for some helper functions. We will also import the _display_table function that will allow us to visually inspect an ExperimentData before and after the pre-processing transformations.

using JuliaSimSurrogates.JSSBase
using PreProcessing
import DataGeneration: _display_table

Normalization

Our first preprocessing workflow to demonstrate is the scaling of values, also known as normalization. The PreProcessing module allows us to use predefined normalizations such as MinMaxNorm and ZScore. We can also define a custom transformation using CustomTransform. Let's start with defining a MinMaxNorm for states. MinMaxNorm will require the lower bound lb and the upper bound ub for the states. This can be accessed using JSSBase.get_lb and JSSBase.get_ub.

lb, ub = JSSBase.get_lb(ed, :states), JSSBase.get_ub(ed, :states)
([0.6542674075166783, 0.46261606868927996], [1.8187878228149477, 1.4294966850608666])

The next step is to choose a scale to normalize to, as well as the category of values in an ExperimentData to normalize. In this example, we will choose our scale to be (-1.0, 1.0) and apply this normalization to the states.

minmax_norm = MinMaxNorm(lb, ub, (-1.0, 1.0), :states)
MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}([0.6542674075166783, 0.46261606868927996], [1.8187878228149477, 1.4294966850608666], (-1.0, 1.0), :states)

To produce an ExperimentData object that is now normalized, all we have to do is call minmax_norm with the ExperimentData object, ed.

preprocessed_ed = minmax_norm(ed)
 Number of Trajectories in ExperimentData: 84 
  Basic Statistics for Given Dynamical System's Specifications 
  Number of u0s in the ExperimentData: 2 
  Number of ps in the ExperimentData: 4 
 ╭─────────┬──────────────────────────────────────────────────────────────────...
────────╮...
  Field  ...
             ...
├─────────┼──────────────────────────────────────────────────────────────────...
────────┤...
           ╭──────────┬──────────────┬──────────────┬─────────────┬────────...
             Labels    LowerBound    UpperBound      Mean     ...
           ├──────────┼──────────────┼──────────────┼─────────────┼────────...
              ic_1     -0.440572     -0.440572     -0.440572    1.1168...
   u0s     ├──────────┼──────────────┼──────────────┼─────────────┼────────...
           ...
           ...
           ├──────────┼──────────────┼──────────────┼─────────────┼────────...
              ic_2      0.525284      0.525284     0.525284     1.1168...
           ╰──────────┴──────────────┴──────────────┴─────────────┴────────...
├─────────┼──────────────────────────────────────────────────────────────────...
────────┤...
              ╭──────────┬──────────────┬──────────────┬─────────┬─────────...
                Labels    LowerBound    UpperBound    Mean      StdDev...
              ├──────────┼──────────────┼──────────────┼─────────┼─────────...
                 p_1        1.625         2.375         2      0.28118...
   ps         ├──────────┼──────────────┼──────────────┼─────────┼─────────...
              ...
              ...
              ├──────────┼──────────────┼──────────────┼─────────┼─────────...
                 p_4       1.78125       1.96875      1.875    0.07029...
              ╰──────────┴──────────────┴──────────────┴─────────┴─────────...
╰─────────┴──────────────────────────────────────────────────────────────────...
────────╯...
 Basic Statistics for Given Dynamical System's Continuous Fields 
  Number of states in the ExperimentData: 2 
  Number of controls in the ExperimentData: 2 
 ╭────────────┬───────────────────────────────────────────────────────────────...
──────────────╮...
   Field    ...
                   ...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
                 ╭──────────┬──────────────┬──────────────┬─────────────┬──...
                   Labels    LowerBound    UpperBound      Mean...
                 ├──────────┼──────────────┼──────────────┼─────────────┼──...
                    ic_1         -1            1         -0.313076...
   states        ├──────────┼──────────────┼──────────────┼─────────────┼──...
                 ...
                 ...
                 ├──────────┼──────────────┼──────────────┼─────────────┼──...
                    ic_2         -1            1         0.228483...
                 ╰──────────┴──────────────┴──────────────┴─────────────┴──...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
              ╭──────────────┬──────────────┬──────────────┬──────────────┬...
                  Labels      LowerBound    UpperBound       Mean...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_1    -0.0928494    0.0928571     0.00352886...
  controls    ├──────────────┼──────────────┼──────────────┼──────────────┼...
              ...
              ...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_2    -0.0928036    0.0928287     0.00342777...
              ╰──────────────┴──────────────┴──────────────┴──────────────┴...
╰────────────┴───────────────────────────────────────────────────────────────...
──────────────╯...

We can now examine the ExperimentData object before and after the transformation to observe the difference:

Original ExperimentData:

_display_table(ed.results, stdout; compact = false)
2-element Vector{Matrix{Any}}:
 ["ic_1" 0.6542674075166783 … 1.0542357015963708 0.23527327701690967; "ic_2" 0.46261606868927996 … 1.056514451270074 0.2592692602609363]
 ["controls_1" -0.09284935169435195 … 0.0035288573443786183 0.03827675446959262; "controls_2" -0.09280356936789723 … 0.0034277744691327944 0.03846633486151819]

PreProcessed ExperimentData:

_display_table(preprocessed_ed.results, stdout; compact = false)
2-element Vector{Matrix{Any}}:
 ["ic_1" -1.0 … -0.3130763723412248 0.4040689607947308; "ic_2" -1.0 … 0.22848337741947622 0.5363004612376997]
 ["controls_1" -0.09284935169435195 … 0.0035288573443786183 0.03827675446959262; "controls_2" -0.09280356936789723 … 0.0034277744691327944 0.03846633486151819]

Notice the lb and ub for states in preprocessed_ed is now -1.0 and 1.0 for all states respectively.

Filtering Indices Out of an ExperimentData

Our second most common workflow is to filter out values out of an ExperimentData. In some dynamical systems, there are observables, states, parameters or controls that are constant across the different simulation configurations you run the model for. In these cases, it is necessary to drop them before passing on an ExperimentData to the DigitalEcho surrogate generation pipeline in order to avoid numerical instability that occurs with normalization procedures done within the pipeline. Doing this is incredibly simple and the pattern of the API is very similar to how we normalized earlier. We will define what indices we want to keep, where the rest will be dropped, and then define the category of data in the ExperimentData that we want to apply this too (in our case, the states).

filt = FilterFields(ed, :states, [1])
filtered_ed = filt(ed)
 Number of Trajectories in ExperimentData: 84 
  Basic Statistics for Given Dynamical System's Specifications 
  Number of u0s in the ExperimentData: 1 
  Number of ps in the ExperimentData: 4 
 ╭─────────┬──────────────────────────────────────────────────────────────────...
───╮...
  Field  ...
        ...
├─────────┼──────────────────────────────────────────────────────────────────...
───┤...
           ╭──────────┬──────────────┬──────────────┬────────┬─────────────...
             Labels    LowerBound    UpperBound    Mean      StdDev...
   u0s     ├──────────┼──────────────┼──────────────┼────────┼─────────────...
              ic_1        0.98          0.98       0.98    2.23378e-16...
           ╰──────────┴──────────────┴──────────────┴────────┴─────────────...
├─────────┼──────────────────────────────────────────────────────────────────...
───┤...
           ╭──────────┬──────────────┬──────────────┬─────────┬────────────...
             Labels    LowerBound    UpperBound    Mean      StdDev...
           ├──────────┼──────────────┼──────────────┼─────────┼────────────...
              p_1        1.625         2.375         2      0.281187...
   ps      ├──────────┼──────────────┼──────────────┼─────────┼────────────...
           ...
           ...
           ├──────────┼──────────────┼──────────────┼─────────┼────────────...
              p_4       1.78125       1.96875      1.875    0.0702968...
           ╰──────────┴──────────────┴──────────────┴─────────┴────────────...
╰─────────┴──────────────────────────────────────────────────────────────────...
───╯...
 Basic Statistics for Given Dynamical System's Continuous Fields 
  Number of states in the ExperimentData: 1 
  Number of controls in the ExperimentData: 2 
 ╭────────────┬───────────────────────────────────────────────────────────────...
──────────────╮...
   Field    ...
                   ...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
                  ╭──────────┬──────────────┬──────────────┬───────────┬───...
                    Labels    LowerBound    UpperBound     Mean...
   states         ├──────────┼──────────────┼──────────────┼───────────┼───...
                     ic_1      0.654267      1.81879      1.05424...
                  ╰──────────┴──────────────┴──────────────┴───────────┴───...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
              ╭──────────────┬──────────────┬──────────────┬──────────────┬...
                  Labels      LowerBound    UpperBound       Mean...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_1    -0.0928494    0.0928571     0.00352886...
  controls    ├──────────────┼──────────────┼──────────────┼──────────────┼...
              ...
              ...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_2    -0.0928036    0.0928287     0.00342777...
              ╰──────────────┴──────────────┴──────────────┴──────────────┴...
╰────────────┴───────────────────────────────────────────────────────────────...
──────────────╯...

Chaining Transformations

The PreProcessingChain can take in multiple transformations that are defined to operate on the same or different fields of an ExperimentData Lets add another transformation : ZScore normalization for our controls. ZScore will require the mean and the standard deviation of the data it will be applied to. This can be accessed using JSSBase.get_mean and JSSBase.get_std.

mean = JSSBase.get_mean(ed, :controls)
std = JSSBase.get_std(ed, :controls)
2-element Vector{Float64}:
 0.03827675446959262
 0.03846633486151819

Defining Minmax normalization on :states that we did earlier.

minmax_norm = MinMaxNorm(lb, ub, (-1.0, 1.0), :states)
MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}([0.6542674075166783, 0.46261606868927996], [1.8187878228149477, 1.4294966850608666], (-1.0, 1.0), :states)

Defining the ZScore normalization on :controls.

zscore_norm = ZScore(mean, std, :controls)
ZScore{Vector{Float64}, Vector{Float64}, Symbol}([0.0035288573443786183, 0.0034277744691327944], [0.03827675446959262, 0.03846633486151819], :controls)

Now we will add the minmax_norm and the zscore_norm to a PreProcessingChain in order to apply the transformations to an ExperimentData, one after the other:

chain = PreProcessingChain(minmax_norm, zscore_norm)
PreProcessingChain{Tuple{MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}, ZScore{Vector{Float64}, Vector{Float64}, Symbol}}}((MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}([0.6542674075166783, 0.46261606868927996], [1.8187878228149477, 1.4294966850608666], (-1.0, 1.0), :states), ZScore{Vector{Float64}, Vector{Float64}, Symbol}([0.0035288573443786183, 0.0034277744691327944], [0.03827675446959262, 0.03846633486151819], :controls)))

We can now pass an ed to this chain to get a new preprocessed ExperimentData with states normalized using MinMaxNorm, and controls normalized using ZScore.

preprocessed_ed = chain(ed)
_display_table(preprocessed_ed.results, stdout; compact = false)
2-element Vector{Matrix{Any}}:
 ["ic_1" -1.0 … -0.3130763723412248 0.4040689607947308; "ic_2" -1.0 … 0.22848337741947622 0.5363004612376997]
 ["controls_1" -2.5179305396776583 … -1.2044144077991962e-17 0.9999999999999984; "controls_2" -2.5017029613939146 … 1.3734956612017756e-16 1.0000000000000004]

Notice the lb and ub of states in preprocessed_ed are as displayed earlier, i.e -1.0 and 1.0 respectively for all states. Notice the mean and the standard deviation for all the controls are 0.0 and 1.0 respectively.

We can also a custom transformation by using CustomTransform Define a transformation function such as :

f(x) = x .* sin.(x)
f (generic function with 1 method)

We will define this CustomTransform on the parameters. The CustomTransform takes in two arguments : the transform function and the field that it needs to be applied to which in our case is :ps

transform = CustomTransform(f, :ps)
CustomTransform{typeof(Main.f), Symbol}(Main.f, :ps)

We then define the PreProcessingChain as usual and pass the transform

chain = PreProcessingChain(minmax_norm, zscore_norm, transform)
PreProcessingChain{Tuple{MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}, ZScore{Vector{Float64}, Vector{Float64}, Symbol}, CustomTransform{typeof(Main.f), Symbol}}}((MinMaxNorm{Vector{Float64}, Vector{Float64}, Tuple{Float64, Float64}, Symbol}([0.6542674075166783, 0.46261606868927996], [1.8187878228149477, 1.4294966850608666], (-1.0, 1.0), :states), ZScore{Vector{Float64}, Vector{Float64}, Symbol}([0.0035288573443786183, 0.0034277744691327944], [0.03827675446959262, 0.03846633486151819], :controls), CustomTransform{typeof(Main.f), Symbol}(Main.f, :ps)))

Then passing ed to the PreProcessingChain

preprocessed_ed = chain(ed)
_display_table(preprocessed_ed.results, stdout; compact = false)
2-element Vector{Matrix{Any}}:
 ["ic_1" -1.0 … -0.3130763723412248 0.4040689607947308; "ic_2" -1.0 … 0.22848337741947622 0.5363004612376997]
 ["controls_1" -2.5179305396776583 … -1.2044144077991962e-17 0.9999999999999984; "controls_2" -2.5017029613939146 … 1.3734956612017756e-16 1.0000000000000004]

PreProcessingChain also supports defining multiple transformations for a single field. We will define a ZScore normalization followed by a CustomTransform on :states We will use the same minmax_norm as step1 and define a CustomTransform with a cosine transformation on states.

step1 = minmax_norm
step2 = CustomTransform(x -> cos.(x), :states)

# Define the chain
chain = PreProcessingChain(step1, step2)

# And then pass ed to this chain.
preprocessed_ed_states = chain(ed)
 Number of Trajectories in ExperimentData: 84 
  Basic Statistics for Given Dynamical System's Specifications 
  Number of u0s in the ExperimentData: 2 
  Number of ps in the ExperimentData: 4 
 ╭─────────┬──────────────────────────────────────────────────────────────────...
───────╮...
  Field  ...
            ...
├─────────┼──────────────────────────────────────────────────────────────────...
───────┤...
           ╭──────────┬──────────────┬──────────────┬────────────┬─────────...
             Labels    LowerBound    UpperBound      Mean    ...
           ├──────────┼──────────────┼──────────────┼────────────┼─────────...
              ic_1      0.904508      0.904508     0.904508    8.93513...
   u0s     ├──────────┼──────────────┼──────────────┼────────────┼─────────...
           ...
           ...
           ├──────────┼──────────────┼──────────────┼────────────┼─────────...
              ic_2      0.865181      0.865181     0.865181    1.11689...
           ╰──────────┴──────────────┴──────────────┴────────────┴─────────...
├─────────┼──────────────────────────────────────────────────────────────────...
───────┤...
             ╭──────────┬──────────────┬──────────────┬─────────┬──────────...
               Labels    LowerBound    UpperBound    Mean      StdDev...
             ├──────────┼──────────────┼──────────────┼─────────┼──────────...
                p_1        1.625         2.375         2      0.281187...
   ps        ├──────────┼──────────────┼──────────────┼─────────┼──────────...
             ...
             ...
             ├──────────┼──────────────┼──────────────┼─────────┼──────────...
                p_4       1.78125       1.96875      1.875    0.070296...
             ╰──────────┴──────────────┴──────────────┴─────────┴──────────...
╰─────────┴──────────────────────────────────────────────────────────────────...
───────╯...
 Basic Statistics for Given Dynamical System's Continuous Fields 
  Number of states in the ExperimentData: 2 
  Number of controls in the ExperimentData: 2 
 ╭────────────┬───────────────────────────────────────────────────────────────...
──────────────╮...
   Field    ...
                   ...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
                 ╭──────────┬──────────────┬──────────────┬────────────┬───...
                   Labels    LowerBound    UpperBound      Mean...
                 ├──────────┼──────────────┼──────────────┼────────────┼───...
                    ic_1      0.540302         1         0.873529...
   states        ├──────────┼──────────────┼──────────────┼────────────┼───...
                 ...
                 ...
                 ├──────────┼──────────────┼──────────────┼────────────┼───...
                    ic_2      0.540302      0.999988     0.83732...
                 ╰──────────┴──────────────┴──────────────┴────────────┴───...
├────────────┼───────────────────────────────────────────────────────────────...
──────────────┤...
              ╭──────────────┬──────────────┬──────────────┬──────────────┬...
                  Labels      LowerBound    UpperBound       Mean...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_1    -0.0928494    0.0928571     0.00352886...
  controls    ├──────────────┼──────────────┼──────────────┼──────────────┼...
              ...
              ...
              ├──────────────┼──────────────┼──────────────┼──────────────┼...
                controls_2    -0.0928036    0.0928287     0.00342777...
              ╰──────────────┴──────────────┴──────────────┴──────────────┴...
╰────────────┴───────────────────────────────────────────────────────────────...
──────────────╯...

We can verify our transform with the previous states generated using preprocessed_ed and operating cosine on it. preprocessed_ed already has states transformed using MinMaxNorm. So we can verify the step2 here.

state = preprocessed_ed_states.results.states.vals[1]
state_previous = preprocessed_ed.results.states.vals[1]

cos.(state_previous) == state
true