Surrogatize Dataset
JuliaSim Surrogates provides the ability to generate surrogate models from datasets without explicit problem definitions. This tutorial walks through the process of:
- Setting up Julia environment
- Transforming a dataset into
ExperimentData
- Surrogatization
- Visualisation
Julia Environment
First we prepare the environment by listing the packages we are using within the example. We also set the working directory which contains the raw dataset we'll be transforming.
cd(@__DIR__)
using CSV, BSON, DataFrames
using DataGeneration, Surrogatize, Visualisations, JSSBase
using Plots
using LinearAlgebra
using Distributions
plotlyjs()
Some constants will be defined now to declare the size of the reservoir for training and paths to our data.
RSIZE = 1000
RESULTS = "results_n_samples=1000"
PARAMS = "sampled_params_n_samples=1000.bson"
"sampled_params_n_samples=1000.bson"
Creating ExperimentData
from a dataset
This dataset is made up of many separate files. We will consolidate those now into a single structure.
BSON.@load PARAMS sampled_params
outs = [CSV.read(joinpath(RESULTS, "$i.csv"), DataFrame) |> Array
for i in 1:size(sampled_params, 2)]
Now that we have collected all our data, we need to transform this into an ExperimentData
object. We achieve this by formatting the data structure into a dictionary with the specific keys seen below - in fact any external dataset can be transformed into ExperimentData
the same way and ingested by our JuliaSimSurrogates modules.
dict = Dict("states_labels" => [
"fan_coils_1₊HEX₊pipe₊T[2](t)",
"fan_coils_1₊HEX₊pipe₊T[3](t)",
],
"states" => [Matrix(out[:, 2:3]') for out in outs],
"param_labels" => [
"Q_sensibles[1].k",
"Q_sensibles[2].k",
"m_flow_fluids[1].k",
"m_flow_fluids[2].k",
],
"params" => collect(Array.(eachcol(sampled_params))),
"control_labels" => nothing,
"controls" => nothing,
"ts" => [out[:, 1] for out in outs])
data = ExperimentData(dict)
Surrogatization
We have ExperimentData
! Now there are two more steps to generating a surrogate:
- Declare the model,
- Call the
surrogatize
function
For this example, we will use the CTESN
model. CTESN has sensible defaults. However, it gives myriad hyperparameter options to fine tune the modelling process as shown below.
model = CTESN(RSIZE;
alpha = 1.0,
driver_sol = (lb = nothing, ub = nothing, count = 1, order = nothing,
idxs = nothing),
solver_kwargs = (abstol = 1e-8, reltol = 1e-8, progress = true,
progress_steps = 10),
initial_condition_initializer = (x...) -> rand(Uniform(-1.0, 1.0), x...),
weight_initializer = randn)
@time surrogate = surrogatize(data, model; verbose = true)
Visualisation
The surrogate has been generated and it is time to perform some analysis! Since the surrogate itself can be difficult to reason about, we are going to generate the dashboard specifically designed to provide insights on the performance and accuracy of our final product. The 2 function calls below are all that's needed to prepare the data and generate the dashboard.
dashboard_data = generate_dashboard_data(surrogate, data)
visualise_surrogate_results(dashboard_data)
This page was generated using Literate.jl.