Virtual Population: MCMC

Julia environment

For this tutorial we will need the following packages:

ModuleDescription
PumasQSPPumasQSP is used to formulate the QSP workflow.
ModelingToolkitThe symbolic modeling environment
DifferentialEquationsThe numerical differential equation solvers
DataSetsWe will load our experimental data from datasets on JuliaHub.
DataFramesHandling tabular data manipulation
TuringBayesian inference with probabilistic programming
Plots and StatsPlotsThe plotting and visualization libraries used for this tutorial.

First we prepare the environment by listing the packages we are using within the example.

# Packages used for this example
using PumasQSP
using ModelingToolkit
using DifferentialEquations
using DataSets
using DataFrames
using CSV
using Turing
using Plots
using StatsPlots

Model setup

As the example model we are using the Hires model with eight variables. Their dynamics are described by eight equations and default values are set for the variables and parameters. We use the ModelingTookit.jl package to specify the model in Julia.

# Model states and their default values
@variables begin
    t
    y1(t) = 1.0
    y2(t) = 0.0
    y3(t) = 0.0
    y4(t) = 0.0
    y5(t) = 0.0
    y6(t) = 0.0
    y7(t) = 0.0
    y8(t) = 0.0057
end

# Model parameters and their default values
@parameters begin
    k1 = 1.71
    k2 = 280.0
    k3 = 8.32
    k4 = 0.69
    k5 = 0.43
    k6 = 1.81
end

# Differential operator
D = Differential(t)

# Differential equations
eqs = [
    D(y1) ~ (-k1*y1 + k5*y2 + k3*y3 + 0.0007),
    D(y2) ~ (k1*y1 - 8.75*y2),
    D(y3) ~ (-10.03*y3 + k5*y4 + 0.035*y5),
    D(y4) ~ (k3*y2 + k1*y3 - 1.12*y4),
    D(y5) ~ (-1.745*y5 + k5*y6 + k5*y7),
    D(y6) ~ (-k2*y6*y8 + k4*y4 + k1*y5 - k5*y6 + k4*y7),
    D(y7) ~ (k2*y6*y8 - k6*y7),
    D(y8) ~ (-k2*y6*y8 + k6*y7)
]

# Model definition
@named model = ODESystem(eqs)

\[ \begin{align} \frac{\mathrm{d} \mathrm{y1}\left( t \right)}{\mathrm{d}t} =& 0.0007 - k1 \mathrm{y1}\left( t \right) + k3 \mathrm{y3}\left( t \right) + k5 \mathrm{y2}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y2}\left( t \right)}{\mathrm{d}t} =& - 8.75 \mathrm{y2}\left( t \right) + k1 \mathrm{y1}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y3}\left( t \right)}{\mathrm{d}t} =& 0.035 \mathrm{y5}\left( t \right) - 10.03 \mathrm{y3}\left( t \right) + k5 \mathrm{y4}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y4}\left( t \right)}{\mathrm{d}t} =& - 1.12 \mathrm{y4}\left( t \right) + k1 \mathrm{y3}\left( t \right) + k3 \mathrm{y2}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y5}\left( t \right)}{\mathrm{d}t} =& - 1.745 \mathrm{y5}\left( t \right) + k5 \mathrm{y7}\left( t \right) + k5 \mathrm{y6}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y6}\left( t \right)}{\mathrm{d}t} =& k1 \mathrm{y5}\left( t \right) + k4 \mathrm{y7}\left( t \right) + k4 \mathrm{y4}\left( t \right) - k5 \mathrm{y6}\left( t \right) - k2 \mathrm{y8}\left( t \right) \mathrm{y6}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y7}\left( t \right)}{\mathrm{d}t} =& - k6 \mathrm{y7}\left( t \right) + k2 \mathrm{y8}\left( t \right) \mathrm{y6}\left( t \right) \\ \frac{\mathrm{d} \mathrm{y8}\left( t \right)}{\mathrm{d}t} =& k6 \mathrm{y7}\left( t \right) - k2 \mathrm{y8}\left( t \right) \mathrm{y6}\left( t \right) \end{align} \]

We are considering two clinical conditions. The first trial (Condition 1) provides a steady state for the initial conditions of the second trial (Condition 2). First we define all parameters of the first trial.

# Read data for Condition 1
data1 = dataset("pumasqsptutorials/hires_trial_1")

trial1 = SteadyStateTrial(data1, model;
                        params=[k6 => 20.0],
                        alg=DynamicSS(TRBDF2())
)
SteadyStateTrial with parameters:
k6 => 20.0.

The second condition has altered parameters trial2_params and uses the steady state solution of trial1 as its initial value. Let's also assume that only states y1(t), y2(t) and y5(t) are observed in this condition and set the prior distributions for the respective noise terms for these states.

# Read data for Condition 2
data2 = dataset("pumasqsptutorials/hires_trial_2")

# Clinical condition 2
trial2_params = [k3 => 4.0, k4 => 0.4, k6 => 8.0]
trial2_tspan = (0.0, 20.0)

trial2_save_names = [y1, y2, y5]

trial2_noise_priors = [y1 => InverseGamma(2,2),
                        y2 => InverseGamma(2,3),
                        y5 => InverseGamma(2,1)]
3-element Vector{Pair{Symbolics.Num, Distributions.InverseGamma{Float64}}}:
 y1(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=0.5)
θ: 2.0
)

 y2(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=0.3333333333333333)
θ: 3.0
)

 y5(t) => Distributions.InverseGamma{Float64}(
invd: Distributions.Gamma{Float64}(α=2.0, θ=1.0)
θ: 1.0
)

We are now ready to define trial2. By default likelihood in this case is a multivariate Normal distribution with a vector of mean values equal to the saved states ([y1(t), y2(t), y5(t)]) at each timepoint and a vector of standard deviations, derived from the respective priors (trial2_noise_priors).

# Create the Trial object for trial 2
trial2 = Trial(data2, model;
                forward_u0 = true,
                params = trial2_params,
                tspan = trial2_tspan,
                save_names = trial2_save_names,
                noise_priors = trial2_noise_priors,
                alg = Rosenbrock23(),
                abstol=1e-9, reltol=1e-8
)
Trial with parameters:
k3 => 4.0
k4 => 0.4
k6 => 8.0
and initial conditions obtained from another trial.
timespan: (0.0, 20.0)

Analysis

Next we define the InverseProblem, which, for MCMC methods, holds the prior distributions of all parameters and initial conditions to be optimized, as its search space.

# Define a search space
example_ss = [k1 => TransformedBeta(Beta=Beta(2,6), lb=1.0, ub=3.0),
              k2 => TransformedBeta(Beta=Beta(2,2), lb=200.0, ub=400.0),
              y1 => TransformedBeta(Beta=Beta(2,5), lb=0.0, ub=3.0)
]

# Create sequence of Steady State Trials
trials = SteadyStateTrials(trial1, trial2)

# Create an InverseProblem object
prob = InverseProblem(trials, model, example_ss)
InverseProblem for model with 2 trials optimizing 3 parameters and states.

Once the QSP model and all trials are defined, the package can generate potential patients, i.e. find parameter combinations for the provided model, which describe the trial data sufficiently well. Particularly in the case of MCMC methods, the posterior distribution of the unknown parameters given the data will be approximated in a sampling-based manner.

# Generate potential population
vp = vpop(prob,
        MCMCOpt(maxiters = 1000, sampler = NUTS(0.65), discard_initial = 500);
        population_size = 500)
MCMC result of length 500 computed in 2 minutes and 9 seconds.
┌────────────┬─────────┬─────────┐
│      y1(t) │      k1 │      k2 │
├────────────┼─────────┼─────────┤
│ 0.00427508 │ 1.83434 │ 388.899 │
│ 0.00334971 │ 2.04516 │ 323.648 │
│ 0.00179258 │ 1.78033 │ 282.529 │
│ 0.00143438 │ 2.15601 │ 280.299 │
│ 0.00594984 │ 1.13362 │ 258.108 │
│ 0.00605919 │ 1.34116 │ 333.104 │
│ 0.00448565 │ 1.63396 │  316.99 │
│ 0.00135613 │ 1.28026 │ 369.443 │
│ 0.00666752 │ 1.11552 │ 202.178 │
│ 0.00287581 │ 1.28228 │ 374.275 │
│ 0.00407808 │ 1.70871 │ 282.428 │
│ 0.00488257 │ 1.95376 │ 319.516 │
│ 0.00583159 │ 1.52107 │ 312.403 │
│ 0.00343793 │ 1.09455 │ 284.731 │
│ 0.00300695 │ 1.46933 │ 264.837 │
│ 0.00316345 │ 1.38336 │ 293.117 │
│     ⋮      │    ⋮    │    ⋮    │
└────────────┴─────────┴─────────┘
                  484 rows omitted

Finally, the original MCMCChain object that was created during Bayesian inference can be accessed via:

vp.original
Chains MCMC chain (1000×19×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 112.96 seconds
Compute duration  = 112.96 seconds
parameters        = y1(t), k1, k2, noise[1], noise[2, y1(t)], noise[2, y2(t)], noise[2, y5(t)]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
       parameters       mean       std      mcse    ess_bulk   ess_tail      r ⋯
           Symbol    Float64   Float64   Float64     Float64    Float64   Floa ⋯

            y1(t)     0.0042    0.0022    0.0001    868.7778   538.8274    1.0 ⋯
               k1     1.5437    0.3028    0.0099    854.1603   475.2077    0.9 ⋯
               k2   301.1963   44.0558    1.3299   1099.2150   671.2659    1.0 ⋯
         noise[1]     0.5808    0.3183    0.0125    857.2296   675.3714    0.9 ⋯
  noise[2, y1(t)]     0.0161    0.0015    0.0001    695.9440   623.2499    1.0 ⋯
  noise[2, y2(t)]     0.0240    0.0021    0.0001   1050.0662   633.3368    0.9 ⋯
  noise[2, y5(t)]     0.0080    0.0007    0.0000   1001.0273   640.2412    1.0 ⋯
                                                               2 columns omitted

Quantiles
       parameters       2.5%      25.0%      50.0%      75.0%      97.5%
           Symbol    Float64    Float64    Float64    Float64    Float64

            y1(t)     0.0009     0.0025     0.0039     0.0055     0.0088
               k1     1.0911     1.3152     1.5046     1.7268     2.2315
               k2   216.0155   267.9218   302.6804   336.3110   378.7000
         noise[1]     0.2683     0.4117     0.5149     0.6581     1.2588
  noise[2, y1(t)]     0.0136     0.0151     0.0161     0.0170     0.0192
  noise[2, y2(t)]     0.0202     0.0225     0.0239     0.0253     0.0284
  noise[2, y5(t)]     0.0067     0.0075     0.0080     0.0084     0.0095

The first index of the noise parameters is the trial index and the second one is the state that it corresponds to. In the first trial, it was assumed that the same noise parameter is applied to all states, so there is only one index in noise[1].

Visualization

We can conveniently analyze and visualize the results by relying on the ploting recipes for Pumas-QSP.

# Viusalization (1/2): Simulation results for Condition 2
plot(vp, trials[2], legend = :outerright, grid = "off")
# Viusalization (2/2): Parameters
params = DataFrame(vp)[!,"y1(t)"]
plt = violin(params, grid = :off, legend = :outerright, label = "VP", ylabel = "y1", color = "purple")

Simulating Virtual Trials

Once we have created and validated our VP, we can use it to explore the system for new clinical conditions. For instance, what would happen to trial2 if the initial condition for y3 and the parameter k1 were different? For this, we can use the virtual_trial function.

# Create virtual trial
vt = virtual_trial(trials[2], model;
	                u0 = [y3 =>2.1],
	                params = [k1 => 1.9],
	                tspan = (0, 100))

# Visualize simulation results
plot(vp, vt, legend = :outerright, grid = "off")