Virtual Population: MCMC
Julia environment
For this tutorial we will need the following packages:
Module | Description |
---|---|
PumasQSP | PumasQSP is used to formulate the QSP workflow. |
ModelingToolkit | The symbolic modeling environment |
DifferentialEquations | The numerical differential equation solvers |
DataSets | We will load our experimental data from datasets on JuliaHub. |
DataFrames | Handling tabular data manipulation |
Turing | Bayesian inference with probabilistic programming |
Plots and StatsPlots | The plotting and visualization libraries used for this tutorial. |
First we prepare the environment by listing the packages we are using within the example.
# Packages used for this example
using PumasQSP
using ModelingToolkit
using DifferentialEquations
using DataSets
using DataFrames
using CSV
using Turing
using Plots
using StatsPlots
Model setup
As the example model we are using the Hires model with eight variables. Their dynamics are described by eight equations and default values are set for the variables and parameters. We use the ModelingTookit.jl
package to specify the model in Julia.
# Model states and their default values
@variables begin
t
y1(t) = 1.0
y2(t) = 0.0
y3(t) = 0.0
y4(t) = 0.0
y5(t) = 0.0
y6(t) = 0.0
y7(t) = 0.0
y8(t) = 0.0057
end
# Model parameters and their default values
@parameters begin
k1 = 1.71
k2 = 280.0
k3 = 8.32
k4 = 0.69
k5 = 0.43
k6 = 1.81
end
# Differential operator
D = Differential(t)
# Differential equations
eqs = [
D(y1) ~ (-k1*y1 + k5*y2 + k3*y3 + 0.0007),
D(y2) ~ (k1*y1 - 8.75*y2),
D(y3) ~ (-10.03*y3 + k5*y4 + 0.035*y5),
D(y4) ~ (k3*y2 + k1*y3 - 1.12*y4),
D(y5) ~ (-1.745*y5 + k5*y6 + k5*y7),
D(y6) ~ (-k2*y6*y8 + k4*y4 + k1*y5 - k5*y6 + k4*y7),
D(y7) ~ (k2*y6*y8 - k6*y7),
D(y8) ~ (-k2*y6*y8 + k6*y7)
]
# Model definition
@named model = ODESystem(eqs)
\[ \begin{align} \frac{\mathrm{d} \mathrm{y1}\left( t \right)}{\mathrm{d}t} =& 0.0007 + k5 \mathrm{y2}\left( t \right) + k3 \mathrm{y3}\left( t \right) - k1 \mathrm{y1}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y2}\left( t \right)}{\mathrm{d}t} =& k1 \mathrm{y1}\left( t \right) - 8.75 \mathrm{y2}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y3}\left( t \right)}{\mathrm{d}t} =& - 10.03 \mathrm{y3}\left( t \right) + k5 \mathrm{y4}\left( t \right) + 0.035 \mathrm{y5}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y4}\left( t \right)}{\mathrm{d}t} =& - 1.12 \mathrm{y4}\left( t \right) + k3 \mathrm{y2}\left( t \right) + k1 \mathrm{y3}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y5}\left( t \right)}{\mathrm{d}t} =& - 1.745 \mathrm{y5}\left( t \right) + k5 \mathrm{y6}\left( t \right) + k5 \mathrm{y7}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y6}\left( t \right)}{\mathrm{d}t} =& k1 \mathrm{y5}\left( t \right) + k4 \mathrm{y4}\left( t \right) + k4 \mathrm{y7}\left( t \right) - k5 \mathrm{y6}\left( t \right) - k2 \mathrm{y6}\left( t \right) \mathrm{y8}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y7}\left( t \right)}{\mathrm{d}t} =& - k6 \mathrm{y7}\left( t \right) + k2 \mathrm{y6}\left( t \right) \mathrm{y8}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y8}\left( t \right)}{\mathrm{d}t} =& k6 \mathrm{y7}\left( t \right) - k2 \mathrm{y6}\left( t \right) \mathrm{y8}\left( t \right) \end{align} \]
We are considering two clinical conditions. The first trial (Condition 1) provides a steady state for the initial conditions of the second trial (Condition 2). First we define all parameters of the first trial.
# Read data for Condition 1
data1 = dataset("pumasqsptutorials/hires_trial_1")
trial1 = SteadyStateTrial(data1, model;
params=[k6 => 20.0],
alg=DynamicSS(TRBDF2())
)
SteadyStateTrial with parameters:
k6 => 20.0.
The second condition has altered parameters trial2_params
and uses the steady state solution of trial1
as its initial value. Let's also assume that only states y1(t)
, y2(t)
and y5(t)
are observed in this condition and set the prior distributions for the respective noise terms for these states.
# Read data for Condition 2
data2 = dataset("pumasqsptutorials/hires_trial_2")
# Clinical condition 2
trial2_params = [k3 => 4.0, k4 => 0.4, k6 => 8.0]
trial2_tspan = (0.0, 20.0)
trial2_save_names = [y1, y2, y5]
trial2_noise_priors = [y1 => InverseGamma(2,2),
y2 => InverseGamma(2,3),
y5 => InverseGamma(2,1)]
3-element Vector{Pair{Symbolics.Num, Distributions.InverseGamma{Float64}}}:
y1(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=0.5)
θ: 2.0
)
y2(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=0.3333333333333333)
θ: 3.0
)
y5(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=1.0)
θ: 1.0
)
We are now ready to define trial2
. By default likelihood in this case is a multivariate Normal distribution with a vector of mean values equal to the saved states ([y1(t), y2(t), y5(t)]
) at each timepoint and a vector of standard deviations, derived from the respective priors (trial2_noise_priors
).
# Create the Trial object for trial 2
trial2 = Trial(data2, model;
forward_u0 = true,
params = trial2_params,
tspan = trial2_tspan,
save_names = trial2_save_names,
noise_priors = trial2_noise_priors,
alg = Rosenbrock23(),
abstol=1e-9, reltol=1e-8
)
Trial with parameters:
k3 => 4.0
k4 => 0.4
k6 => 8.0
and initial conditions obtained from another trial.
timespan: (0.0, 20.0)
Analysis
Next we define the InverseProblem
, which, for MCMC methods, holds the prior distributions of all parameters and initial conditions to be optimized, as its search space.
# Define a search space
example_ss = [k1 => TransformedBeta(Beta=Beta(2,6), lb=1.0, ub=3.0),
k2 => TransformedBeta(Beta=Beta(2,2), lb=200.0, ub=400.0),
y1 => TransformedBeta(Beta=Beta(2,5), lb=0.0, ub=3.0)
]
# Create sequence of Steady State Trials
trials = SteadyStateTrials(trial1, trial2)
# Create an InverseProblem object
prob = InverseProblem(trials, model, example_ss)
InverseProblem for model with 2 trials optimizing 3 parameters and states.
Once the QSP model and all trials are defined, the package can generate potential patients, i.e. find parameter combinations for the provided model, which describe the trial data sufficiently well. Particularly in the case of MCMC methods, the posterior distribution of the unknown parameters given the data will be approximated in a sampling-based manner.
# Generate potential population
vp = vpop(prob,
MCMCOpt(maxiters = 1000, sampler = NUTS(0.65), discard_initial = 500);
population_size = 500)
MCMC result of length 500 computed in 2 minutes and 9 seconds.
┌────────────┬─────────┬─────────┐
│ y1(t) │ k1 │ k2 │
├────────────┼─────────┼─────────┤
│ 0.0057917 │ 1.28888 │ 296.801 │
│ 0.00264983 │ 1.58909 │ 268.097 │
│ 0.00224626 │ 1.22059 │ 313.01 │
│ 0.00439438 │ 1.27885 │ 283.965 │
│ 0.00370099 │ 1.40809 │ 348.382 │
│ 0.00587055 │ 1.74316 │ 309.192 │
│ 0.00826058 │ 1.92291 │ 222.915 │
│ 0.00315886 │ 1.68269 │ 291.937 │
│ 0.00139706 │ 1.66325 │ 353.79 │
│ 0.00966692 │ 1.47832 │ 373.281 │
│ 0.00538329 │ 1.39459 │ 337.151 │
│ 0.0030097 │ 2.14366 │ 317.561 │
│ 0.00479104 │ 1.09172 │ 245.31 │
│ 0.00535953 │ 1.101 │ 309.826 │
│ 0.00542087 │ 1.4192 │ 329.763 │
│ 0.00460814 │ 2.02418 │ 319.681 │
│ ⋮ │ ⋮ │ ⋮ │
└────────────┴─────────┴─────────┘
484 rows omitted
Finally, the original MCMCChain
object that was created during Bayesian inference can be accessed via:
vp.original
Chains MCMC chain (1000×19×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 112.76 seconds
Compute duration = 112.76 seconds
parameters = y1(t), k1, k2, noise[1], noise[2, y1(t)], noise[2, y2(t)], noise[2, y5(t)]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std mcse ess_bulk ess_tail r ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Floa ⋯
y1(t) 0.0041 0.0021 0.0001 727.4417 613.4829 1.0 ⋯
k1 1.5250 0.2944 0.0090 1043.3271 591.5385 0.9 ⋯
k2 301.8013 43.1041 1.4642 844.7945 699.6500 0.9 ⋯
noise[1] 0.5947 0.2627 0.0088 1071.4067 526.4931 1.0 ⋯
noise[2, y1(t)] 0.0161 0.0015 0.0001 594.3668 766.5140 0.9 ⋯
noise[2, y2(t)] 0.0240 0.0021 0.0001 957.2142 687.7433 1.0 ⋯
noise[2, y5(t)] 0.0080 0.0007 0.0000 954.5615 632.8908 1.0 ⋯
2 columns omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
y1(t) 0.0007 0.0025 0.0040 0.0054 0.0087
k1 1.0872 1.2939 1.4944 1.6995 2.1825
k2 222.0308 270.4760 301.8101 334.5487 378.4413
noise[1] 0.2722 0.4170 0.5345 0.7106 1.2671
noise[2, y1(t)] 0.0135 0.0151 0.0160 0.0171 0.0190
noise[2, y2(t)] 0.0202 0.0226 0.0239 0.0254 0.0283
noise[2, y5(t)] 0.0067 0.0075 0.0079 0.0084 0.0095
The first index of the noise
parameters is the trial index and the second one is the state that it corresponds to. In the first trial, it was assumed that the same noise parameter is applied to all states, so there is only one index in noise[1]
.
Visualization
We can conveniently analyze and visualize the results by relying on the ploting recipes for Pumas-QSP.
# Viusalization (1/2): Simulation results for Condition 2
plot(vp, trials[2], legend = :outerright, grid = "off")
# Viusalization (2/2): Parameters
params = DataFrame(vp)[!,"y1(t)"]
plt = violin(params, grid = :off, legend = :outerright, label = "VP", ylabel = "y1", color = "purple")
Simulating Virtual Trials
Once we have created and validated our VP, we can use it to explore the system for new clinical conditions. For instance, what would happen to trial2
if the initial condition for y3
and the parameter k1
were different? For this, we can use the virtual_trial
function.
# Create virtual trial
vt = virtual_trial(trials[2], model;
u0 = [y3 =>2.1],
params = [k1 => 1.9],
tspan = (0, 100))
# Visualize simulation results
plot(vp, vt, legend = :outerright, grid = "off")