With the PEtab format() users can specify probabilistically motivated (log-likelihood and log-posterior) objective functions for parameter estimation problems in systems biology. PEtab offers a zero-code way to define an
InverseProblem in PumasQSP.
One of the big advantages of the PEtab format is its modular design that facilitates reusing and modifying the inverse problem. In particular PEtab defines
- an (SBML) model that describes the system dynamics.
- a so-called observation model that describes how system states map to experimental readouts.
- a so-called noise model to define the likelihood of the data given the expected (i.e. simulated) observation.
- a treatment model, that defines perturbations of the model.
Here, we will first provide a high-level explanation of the PEtab format, followed by a tutorial for creating an inverse problem.
The two most critical parts to define an inverse problem are the model and experimental data. In PEtab, the model is specified in the Systems Biology Markup Language (SBML). The experimental data is defined in a measurement table. In its simplest form and interpreted via
DataFrames.jl, it looks like this:
You will first notice, that the measurement table is in the stacked or long form. You may also wonder what the entries in the
conditionId mean. These entries have to be defined in an observable table and a condition table. In this sense, a PEtab problem is like a Relational Database.
Following this concept, PEtab also specifies the search space and other parameter-related information in a separate parameter table. A detailed description of all the tables, columns and their relations can be found in the documentation of the PEtab data format. If this all sounds abstract, the following tutorial will make the format and its application to QSP more concrete.
Note that PumasQSP does currently not support Preequilibration, time point-specific observable and noise parameter overrides as well as the visualization table.
PumasQSP extends the core PEtab features with
- A dosing table.
- Use of ModelingToolkit
ODESystems instead of SBML models.
- Support of
observableIds in the
For this tutorial we will need the following package:
|PumasQSP||PumasQSP is used to formulate the QSP workflow.|
First we prepare the environment by listing the package we are using within the example.
# Packages used for this example using PumasQSP
For the purpose of this tutorial, we will use a very simple model of receptor-ligand association that triggers the expression of some protein of interest. If you want to use the files in this tutorial as a template for your own petab problem, simply call
To define an SBML model from scratch, we recommend Copasi, RuleBender or any other modeling tool of your choice with SBML export functionality. SBML is a widely used format, so it is quite likely that your favourite modeling environment can exprot SBML. However, PumasQSP also allows you to use ModelingToolkit ODESystems instead of SBML models:
import_petab(my_problem.yaml, models = my_odesys)
Let us now define four different trials in a condition table.
This table contains a column for the
conditionId and columns for one state and two parameters. If states are provided to columns of the condition table, their initial conditions from the SBML model are overridden with the indicated values. Here, we are not setting numeric values, but override with parameters that we will have to define in the parameter table. The parameter table will also allow us to choose whether these parameters are fixed or estimated. For the purpose of this tutorial, we will investigate conditions where
Rec is either at normal levels or overexpressed. We will combine this with a drug treatment that has a known effect on the receptor ligand assosiation (
kAs) and an potential effect of unknown magnitude on the degradation rate of the protein of interest
kDePOI parameter. I.e.
kDePOI will have to be estimated from our measurements, as we will indicate in the paramter table.
Note: To work with TSV files, you can add the Edit csv extension to your JuliaHub VS Code IDE, or simply use Microsoft Excel or any other spreadsheet editor locally and copy-paste your TSV files into the Explorer of the VS Code IDE.
Before we can define measurements, we need to specify what we are measuring. This is because in many cases, we cannot directly observe the model states, but rather some metric that is a direct function of the model state. For the purpose of this tutorial, we assume that we can observe two different metrics.
|1||obs_Lig_total||offset_Lig + scale_Lig * (Lig + RecLig)||lin||normal||noise_additive + noise_multiplicative * obs_Lig_total|
|2||obs_POI||offset_POI + scale_POI * POI||log||laplace||noise_additive + noise_multiplicative * obs_POI|
observableFormula specifies how the observables are calculated from the states. For instance,
obs_Lig_total is equal to
offset_Lig + scale_Lig * (Lig + RecLig), where
RecLig are states of the SBML model.
scale_Lig are parameters. Again, we could have chosen numeric values for these parameters. However, here we chose strings that we will have to define in the parameter table, where we can also specify if we want to estimate them or set them to a fixed
noiseDistribution tells us the distribution of the residuals (allowed values are
normal (default) and
laplace). PEtab assumes a mean of zero and a standard deviation (for
normal) or scale (for
laplaceian) noise as defined in the
noiseFormula column. To obtain the equivalent of a simple least-square objective function, we could simply set the
1. However, in our example, we want to assume an additive and a multiplicative noise component. The corresponding parameters
noise_multiplicative will have to be defined in the parameter table, where we can define if these parameters shall be estimated or not.
Optionally, PEtab also allows you to define an
observableTransformation, meaning that observables and datapoints are transformed prior to calculation of the residuals. Allowed values are
PumasQSP will use the information from the observable table to calculate the likelihood of the measurement given the simulation.
The measurement table defines the measured data for each observable, trial and time point.
If there is more than one row with identical
time the losses of all
measurements are added up.
To complete the specification of the inverse problem, we have to create a parameter table.
This parameter table has to include all parameters from the SBML model that we want to estimate, plus the parameters we have introduced in the condition and observable tables as
parameterIds. For each parameter, we also have to define
In this particular model, we see that some parameter values are negative. This is because this particular SBML model assumes parameters are log-transformed. In general, we recommend defining the biological parameter is linear space and making use of the
parameterScale column of the paramter table for transformations (allowed values are
The nominal value will be used to set non-estimated parameters to a fixed value and depending on the optimization algorithm may be used as initial guess for estimated parameters. For the purpose of this tutorial, we want to estimate
- all parameters in the SBML model.
- all observation and noise parameters introduced in the observation table.
- the unknown side effect of our drug on
kDePOI_sideeffect) and its baseline value (
- the unknown level of receptor expression (
The perturbations describing
- the effect of the drug on
are assumed to be known and shall not be estimated. This is indicated in the
Finally, the path to all files is indicated in a YAML file. Note that an optimisation problem may contain identical-length lists of all files except for the parameter table. This is because an optimisation problem can only have one parameter vector, but multiple conditions, simulations, observables and even model.
format_version: 1 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv sbml_files: - model.xml
Once the PEtab problem is defined, it can be imported with only one line of code.
invprob = import_petab(joinpath("reclig_petab", "petab.yaml"))
InverseProblem for ##SBML#11895 with 4 trials optimizing 14 parameters.
Note that PumasQSP assumes valid PEtab problems.
import_petab does currently not perform validity checks. We recommend using the official petablint Command Line tool for validity checks. Note that using
observableIds in the
noiseFormula will lead to failing checks. However, this failure can be ignored, as PumasQSP supports
observableIds in the
You may find that the default values for trials do not work for you. For instance, you may want to chance the solver or tolerances. This can be easily done with
using OrdinaryDiffEq solver = TRBDF2() invprob = remake_trials(invprob; alg=solver)
Now you are all set to take the imported
invprob through the PumasQSP modelling pipeline, including calibrating the model, generating a virtual population, subsampling the virtual population and plotting the results. You can also export the solution back to PEtab.
If you have created a virtual population from a PEtab problem, you can also export the
VpopResult back to PEtab with
In principle, perturbations of the dynamic system such as medical treatments can be handled with barebones PEtab/SBML using the condition table and SBML events. PumasQSP further increases the degree of mudularization by defining these perturbations in a separate dosing table.
Like all other files that make up the PEtab problem, the path of the dosing table has to be indicated in the YAML file.
dosing_files: - my_doses.tsv
ID of the simulation condition to which the dose is applied.
SBML species ID that is affected by the dosing.
Amount or concentration (depending on the units of the affected species) added to the system over the whole
durationof the dosing event.
Start time of the dosing event.
Duration of the dosing.
0corresponds to a Bolus, positive values to an infusion with a rate of
- How can I represent replicate measurements?
- Just duplicate the time-points for which replicate measurements are available in the measurement table. As with any replicate measurement, all corresponding residuals will enter the cost function.
- What if one or more of the covariates of a patient have changed between replicate experiments? Can I treat this as the same patient?
- No, this is not possible. A change in covariates means that the patient has become a different person from the perspective of the optimization problem. However, different patients can share fixed or estimated covariates.
- How can I set or estimate initial conditions u0?
- Simply include a column with the SBML species ID in the contition table and put numeric values or parameterId from the parameter table in the cells.
- 1Leonard Schmiester et al., "PEtab—Interoperable specification of parameter estimation problems in systems biology.", PLoS computational biology, 2021.