Surrogatize Dataset

JuliaSim Surrogates provides the ability to generate surrogate models from datasets without explicit problem definitions. This tutorial walks through the process of:

  1. Setting up Julia environment
  2. Transforming a dataset into ExperimentData
  3. Surrogatization
  4. Visualisation

Julia Environment

First we prepare the environment by listing the packages we are using within the example. We also set the working directory which contains the raw dataset we'll be transforming.

cd(@__DIR__)
using CSV, BSON, DataFrames
using DataGeneration, Surrogatize, Visualisations, JSSBase
using Plots
using LinearAlgebra
using Distributions
plotlyjs()

Some constants will be defined now to declare the size of the reservoir for training and paths to our data.

RSIZE = 1000
RESULTS = "results_n_samples=1000"
PARAMS = "sampled_params_n_samples=1000.bson"
"sampled_params_n_samples=1000.bson"

Creating ExperimentData from a dataset

This dataset is made up of many separate files. We will consolidate those now into a single structure.

BSON.@load PARAMS sampled_params
outs = [CSV.read(joinpath(RESULTS, "$i.csv"), DataFrame) |> Array
        for i in 1:size(sampled_params, 2)]

Now that we have collected all our data, we need to transform this into an ExperimentData object. We achieve this by formatting the data structure into a dictionary with the specific keys seen below - in fact any external dataset can be transformed into ExperimentData the same way and ingested by our JuliaSimSurrogates modules.

dict = Dict("states_labels" => [
                "fan_coils_1₊HEX₊pipe₊T[2](t)",
                "fan_coils_1₊HEX₊pipe₊T[3](t)",
            ],
            "states" => [Matrix(out[:, 2:3]') for out in outs],
            "param_labels" => [
                "Q_sensibles[1].k",
                "Q_sensibles[2].k",
                "m_flow_fluids[1].k",
                "m_flow_fluids[2].k",
            ],
            "params" => collect(Array.(eachcol(sampled_params))),
            "control_labels" => nothing,
            "controls" => nothing,
            "ts" => [out[:, 1] for out in outs])
data = ExperimentData(dict)

Surrogatization

We have ExperimentData! Now there are two more steps to generating a surrogate:

  • Declare the model,
  • Call the surrogatize function

For this example, we will use the CTESN model. CTESN has sensible defaults. However, it gives myriad hyperparameter options to fine tune the modelling process as shown below.

model = CTESN(RSIZE;
              alpha = 1.0,
              driver_sol = (lb = nothing, ub = nothing, count = 1, order = nothing,
                            idxs = nothing),
              solver_kwargs = (abstol = 1e-8, reltol = 1e-8, progress = true,
                               progress_steps = 10),
              initial_condition_initializer = (x...) -> rand(Uniform(-1.0, 1.0), x...),
              weight_initializer = randn)
@time surrogate = surrogatize(data, model; verbose = true)

Visualisation

The surrogate has been generated and it is time to perform some analysis! Since the surrogate itself can be difficult to reason about, we are going to generate the dashboard specifically designed to provide insights on the performance and accuracy of our final product. The 2 function calls below are all that's needed to prepare the data and generate the dashboard.

dashboard_data = generate_dashboard_data(surrogate, data)
visualise_surrogate_results(dashboard_data)

This page was generated using Literate.jl.