PEtab Format

Motivation

With the PEtab format([1]) users can specify probabilistically motivated (log-likelihood and log-posterior) objective functions for parameter estimation problems in systems biology. PEtab offers a zero-code way to define an InverseProblem in PumasQSP.

One of the big advantages of the PEtab format is its modular design that facilitates reusing and modifying the inverse problem. In particular PEtab defines

  • an (SBML) model that describes the system dynamics.
  • a so-called observation model that describes how system states map to experimental readouts.
  • a so-called noise model to define the likelihood of the data given the expected (i.e. simulated) observation.
  • a treatment model, that defines perturbations of the model.

Here, we will first provide a high-level explanation of the PEtab format, followed by a tutorial for creating an inverse problem.

Background: The PEtab format

The two most critical parts to define an inverse problem are the model and experimental data. In PEtab, the model is specified in the Systems Biology Markup Language (SBML). The experimental data is defined in a measurement table. In its simplest form and interpreted via DataFrames.jl, it looks like this:

2×4 DataFrame
RowobservableIdsimulationConditionIdmeasurementtime
StringStringAnyAny
1obs_Atrial13.50.0
2............

You will first notice, that the measurement table is in the stacked or long form. You may also wonder what the entries in the observableId, and conditionId mean. These entries have to be defined in an observable table and a condition table. In this sense, a PEtab problem is like a Relational Database.

Following this concept, PEtab also specifies the search space and other parameter-related information in a separate parameter table. A detailed description of all the tables, columns and their relations can be found in the documentation of the PEtab data format. If this all sounds abstract, the following tutorial will make the format and its application to QSP more concrete.

Petab Note that PumasQSP does currently not support Preequilibration, time point-specific observable and noise parameter overrides as well as the visualization table.

PumasQSP extends the core PEtab features with

  • A dosing table.
  • Use of ModelingToolkit ODESystems instead of SBML models.
  • Support of observableIds in the noiseFormula.

Defining an InverseProblem with PEtab

Julia environment

For this tutorial we will need the following package:

ModuleDescription
PumasQSPPumasQSP is used to formulate the QSP workflow.

First we prepare the environment by listing the package we are using within the example.

# Packages used for this example
using PumasQSP

Defining the ODESystem

For the purpose of this tutorial, we will use a very simple model of receptor-ligand association that triggers the expression of some protein of interest. If you want to use the files in this tutorial as a template for your own petab problem, simply call

create_petab_template("reclig_petab")
14

To define an SBML model from scratch, we recommend Copasi, RuleBender or any other modeling tool of your choice with SBML export functionality. SBML is a widely used format, so it is quite likely that your favourite modeling environment can exprot SBML. However, PumasQSP also allows you to use ModelingToolkit ODESystems instead of SBML models:

import_petab(my_problem.yaml, models = my_odesys)

Defining Trials

Let us now define four different trials in a condition table.

4×4 DataFrame
RowconditionIdReckAskDePOI
String7String31String15String31
1trial_1ic_Rec_normalkAs_nodrugkDePOI_nodrug
2trial_2ic_Rec_overexpressedkAs_nodrugkDePOI_nodrug
3trial_3ic_Rec_normalkAs_inhibitedkDePOI_sideeffect
4trial_4ic_Rec_overexpressedkAs_inhibitedkDePOI_sideeffect

This table contains a column for the conditionId and columns for one state and two parameters. If states are provided to columns of the condition table, their initial conditions from the SBML model are overridden with the indicated values. Here, we are not setting numeric values, but override with parameters that we will have to define in the parameter table. The parameter table will also allow us to choose whether these parameters are fixed or estimated. For the purpose of this tutorial, we will investigate conditions where Rec is either at normal levels or overexpressed. We will combine this with a drug treatment that has a known effect on the receptor ligand assosiation (kAs) and an potential effect of unknown magnitude on the degradation rate of the protein of interest kDePOI parameter. I.e. kDePOI will have to be estimated from our measurements, as we will indicate in the paramter table.

Note: To work with TSV files, you can add the Edit csv extension to your JuliaHub VS Code IDE, or simply use Microsoft Excel or any other spreadsheet editor locally and copy-paste your TSV files into the Explorer of the VS Code IDE.

Defining Observables

Before we can define measurements, we need to specify what we are measuring. This is because in many cases, we cannot directly observe the model states, but rather some metric that is a direct function of the model state. For the purpose of this tutorial, we assume that we can observe two different metrics.

2×5 DataFrame
RowobservableIdobservableFormulaobservableTransformationnoiseDistributionnoiseFormula
String15StringString3String7String
1obs_Lig_totaloffset_Lig + scale_Lig * (Lig + RecLig)linnormalnoise_additive + noise_multiplicative * obs_Lig_total
2obs_POIoffset_POI + scale_POI * POIloglaplacenoise_additive + noise_multiplicative * obs_POI

The observableFormula specifies how the observables are calculated from the states. For instance, obs_Lig_total is equal to offset_Lig + scale_Lig * (Lig + RecLig), where Lig and RecLig are states of the SBML model. offset_Lig and scale_Lig are parameters. Again, we could have chosen numeric values for these parameters. However, here we chose strings that we will have to define in the parameter table, where we can also specify if we want to estimate them or set them to a fixed nominalValue.

The optional noiseDistribution tells us the distribution of the residuals (allowed values are normal (default) and laplace). PEtab assumes a mean of zero and a standard deviation (for normal) or scale (for laplaceian) noise as defined in the noiseFormula column. To obtain the equivalent of a simple least-square objective function, we could simply set the noiseFormula to 1. However, in our example, we want to assume an additive and a multiplicative noise component. The corresponding parameters noise_additive and noise_multiplicative will have to be defined in the parameter table, where we can define if these parameters shall be estimated or not.

Optionally, PEtab also allows you to define an observableTransformation, meaning that observables and datapoints are transformed prior to calculation of the residuals. Allowed values are lin (default), log and log10.

PumasQSP will use the information from the observable table to calculate the likelihood of the measurement given the simulation.

Defining Measurement Data

The measurement table defines the measured data for each observable, trial and time point.

48×4 DataFrame
RowobservableIdsimulationConditionIdmeasurementtime
String15String7Float64Int64
1obs_Lig_totaltrial_1-0.04018090
2obs_Lig_totaltrial_11.5090710
3obs_Lig_totaltrial_11.3375120
4obs_Lig_totaltrial_12.5051730
5obs_Lig_totaltrial_12.3809340
6obs_Lig_totaltrial_12.1492550
7obs_POItrial_10.108750
8obs_POItrial_10.48143810
9obs_POItrial_11.4080520
10obs_POItrial_11.8309930
11obs_POItrial_11.9049840
12obs_POItrial_12.1126950
13obs_Lig_totaltrial_20.110220
14obs_Lig_totaltrial_21.2335610
15obs_Lig_totaltrial_21.8823320
16obs_Lig_totaltrial_22.5446230
17obs_Lig_totaltrial_22.6945640
18obs_Lig_totaltrial_22.9306350
19obs_POItrial_20.1244610
20obs_POItrial_20.85877210
21obs_POItrial_21.6085220
22obs_POItrial_22.7608530
23obs_POItrial_22.94640
24obs_POItrial_24.0722250
25obs_Lig_totaltrial_30.1131280
26obs_Lig_totaltrial_31.1233210
27obs_Lig_totaltrial_31.5285920
28obs_Lig_totaltrial_31.8558130
29obs_Lig_totaltrial_31.6326640
30obs_Lig_totaltrial_31.7114250
31obs_POItrial_30.06369310
32obs_POItrial_30.23612610
33obs_POItrial_30.46593720
34obs_POItrial_30.55420830
35obs_POItrial_30.65855140
36obs_POItrial_30.67861550
37obs_Lig_totaltrial_40.1748050
38obs_Lig_totaltrial_41.3715210
39obs_Lig_totaltrial_41.4060120
40obs_Lig_totaltrial_41.5894630
41obs_Lig_totaltrial_41.8775240
42obs_Lig_totaltrial_41.9005350
43obs_POItrial_40.07143080
44obs_POItrial_40.22395510
45obs_POItrial_40.81993120
46obs_POItrial_40.80911230
47obs_POItrial_40.94589340
48obs_POItrial_41.5155150

If there is more than one row with identical observableId, simulationConditionId and time the losses of all measurements are added up.

Defining Search Space and Parameter Priors

To complete the specification of the inverse problem, we have to create a parameter table.

16×8 DataFrame
RowparameterIdparameterScalelowerBoundupperBoundnominalValueestimateobjectivePriorTypeobjectivePriorParameters
String31String7Float64Float64Float64Int64String7String15
1kDilog0.220.02.01normal2; 1
2kSyLiglog0.011.00.11normal0.1; 0.05
3kDeLiglog100.011.00.11normal0.1; 0.05
4kSyPOIlog100.022.00.21normal0.2; 0.1
5ic_Rec_normallin0.110.01.01normal1; 0.5
6ic_Rec_overexpressedlin0.220.02.01normal2; 1
7kAs_nodruglin0.220.02.00normal2; 1
8kAs_inhibitedlin0.022.00.20normal0.2; 0.1
9kDePOI_nodruglin0.011.00.11normal0.1; 0.05
10kDePOI_sideeffectlin0.0050.50.051normal0.05; 0.025
11offset_Liglin0.011.00.11normal0.1; 0.05
12scale_Liglin0.1515.01.51normal1.5; 0.75
13offset_POIlin0.011.00.11laplace0.1; 0.05
14scale_POIlin0.220.02.01laplace2; 1
15noise_additivelin0.0250.10.051uniform0.025; 0.1
16noise_multiplicativelin0.050.20.11uniform0.05; 0.2

This parameter table has to include all parameters from the SBML model that we want to estimate, plus the parameters we have introduced in the condition and observable tables as parameterIds. For each parameter, we also have to define lowerBound, upperBound and nominalValue.

In this particular model, we see that some parameter values are negative. This is because this particular SBML model assumes parameters are log-transformed. In general, we recommend defining the biological parameter is linear space and making use of the parameterScale column of the paramter table for transformations (allowed values are lin, log and log10).

The nominal value will be used to set non-estimated parameters to a fixed value and depending on the optimization algorithm may be used as initial guess for estimated parameters. For the purpose of this tutorial, we want to estimate

  • all parameters in the SBML model.
  • all observation and noise parameters introduced in the observation table.
  • the unknown side effect of our drug on kDePOI (kDePOI_sideeffect) and its baseline value (kDePOI_nodrug).
  • the unknown level of receptor expression (ic_Rec_normal and ic_Rec_overexpressed).

The perturbations describing

  • the effect of the drug on kAs (kAs_nodrug and kAs_inhibited)

are assumed to be known and shall not be estimated. This is indicated in the estimate column.

Bundling all Files to a PEtab Problem

Finally, the path to all files is indicated in a YAML file. Note that an optimisation problem may contain identical-length lists of all files except for the parameter table. This is because an optimisation problem can only have one parameter vector, but multiple conditions, simulations, observables and even model.

format_version: 1
parameter_file: parameters.tsv
problems:
- condition_files:
  - conditions.tsv
  measurement_files:
  - measurements.tsv
  observable_files:
  - observables.tsv
  sbml_files:
  - model.xml

Importing a PEtab Problem

Once the PEtab problem is defined, it can be imported with only one line of code.

invprob = import_petab(joinpath("reclig_petab", "petab.yaml"))
InverseProblem for ##SBML#11895 with 4 trials optimizing 14 parameters.

Note that PumasQSP assumes valid PEtab problems. import_petab does currently not perform validity checks. We recommend using the official petablint Command Line tool for validity checks. Note that using observableIds in the noiseFormula will lead to failing checks. However, this failure can be ignored, as PumasQSP supports observableIds in the noiseFormula.

You may find that the default values for trials do not work for you. For instance, you may want to chance the solver or tolerances. This can be easily done with remake_trials:

using OrdinaryDiffEq
solver = TRBDF2()
invprob = remake_trials(invprob; alg=solver)

Now you are all set to take the imported invprob through the PumasQSP modelling pipeline, including calibrating the model, generating a virtual population, subsampling the virtual population and plotting the results. You can also export the solution back to PEtab.

If you have created a virtual population from a PEtab problem, you can also export the VpopResult back to PEtab with export_petab("my_problem.yaml", my_solution).

PumasQSP extensions to PEtab

The dosing table

In principle, perturbations of the dynamic system such as medical treatments can be handled with barebones PEtab/SBML using the condition table and SBML events. PumasQSP further increases the degree of mudularization by defining these perturbations in a separate dosing table.

2×5 DataFrame
RowsimulationConditionIdspeciesIdvaluetimeduration
String7String3Int64Int64Int64
1trial_1Lig110
2trial_1Lig222

Like all other files that make up the PEtab problem, the path of the dosing table has to be indicated in the YAML file.

dosing_files:
  - my_doses.tsv

Detailed field description

  • simulationConditionId [STRING]
    ID of the simulation condition to which the dose is applied.
  • speciesId [STRING]
    SBML species ID that is affected by the dosing.
  • value [NUMBER]
    Amount or concentration (depending on the units of the affected species) added to the system over the whole duration of the dosing event.
  • time [NUMBER]
    Start time of the dosing event.
  • duration [NUMBER]
    Duration of the dosing. 0 corresponds to a Bolus, positive values to an infusion with a rate of value/duration.

FAQ

  • How can I represent replicate measurements?
    • Just duplicate the time-points for which replicate measurements are available in the measurement table. As with any replicate measurement, all corresponding residuals will enter the cost function.
  • What if one or more of the covariates of a patient have changed between replicate experiments? Can I treat this as the same patient?
    • No, this is not possible. A change in covariates means that the patient has become a different person from the perspective of the optimization problem. However, different patients can share fixed or estimated covariates.
  • How can I set or estimate initial conditions u0?
    • Simply include a column with the SBML species ID in the contition table and put numeric values or parameterId from the parameter table in the cells.

References