Collocation Methods

Introduction

Many methods for solving inverse problems involving ODE's need costly ODE solvers at each iteration to compute the loss. E.g. for L2 loss:

$\mathrm{Loss}_{\mathrm{L2}} = \left\|u - \hat{u}_{\mathrm{solver}}(u_0, t,p) \right\|_2$

Where $u$ is the observed data, $\hat{u}_{\mathrm{solver}}(u_0, t,p)$ is the ODE solver prediction of $u$, given initial conditions $u_0$, timespan $t$ and parameters $p$.

In order to reduce the computational cost, a collocation method may be used [3], [1]. Collocation methods circumvents the need for an ODE solver by estimating the derivatives of the data. The loss is computed between these estimated derivatives and the system dynamics. E.g.

$\mathrm{Loss}_{\mathrm{L2}} = \left\|\hat{u}' - f(u, t,p) \right\|_2$

Where $\hat{u}'$ is the estimated derivatives of the observed data.

JuliaSimModelOptimizer implements three different collocation methods to estimate the derivative of the data: Kernel Collocation, Spline Collocation and Noise Robust Collocation.

Kernel Collocation

JuliaSimModelOptimizer.KernelCollocateType
KernelCollocate(; maxiters=nothing, maxtime=nothing,  optimizer = Optim.LBFGS(linesearch = BackTracking()), kernel = EpanechnikovKernel())

A kernel based collocation method for solving the inverse problem. The derivatives of the data are estimated with a local linear regression using a kernel. The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by solving the ODE system.

Keyword Arguments

  • maxiters: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.
  • optimizer: An Optimization.jl algorithm for the global optimization. Defaults to NLopt.LD_LBFGS().
  • kernel: The kernel used to estimate the derivatives of the data. For more information, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/
source

Theory

Assume that each snapshot $y_i$ of the time series $\mathbf{y}$ can be approximated by a polynomial of degree $p$ in the following way:

\[y_i \approx \sum_{j=0}^{p} \alpha_j (t_i - t)^{j}\]

Where $\alpha_j$ are polynomial coefficients, $t_i$ is the time at snapshot $y_i$, in the neighborhood of $t$.

Following Eq. 2.9 in [2], we get an estimate of $y_i'$ by

\[\hat{y}_i' = \hat{\alpha_1}\]

Where $\hat{\alpha_1}$ is the first entry in $\hat{\alpha}$, which is the vector that minimizes the weighted least squares problem:

\[\min{\alpha}\left(\left(\mathbf{y} - \mathbf{T}\alpha\right)^{\top}\mathbf{W}\left(\mathbf{y} - \mathbf{T}\alpha\right)\right)\]

Where

\[\mathbf{T}=\left[\begin{array}{cccc} 1, & t_1-t_i & \ldots & \left(t_1-t_i\right)^p \\ \vdots & \vdots & & \vdots \\ 1, & t_n-t_i & \cdots & \left(t_n-t_i\right)^p \end{array}\right]\]

and $W=\operatorname{diag}\left(K_h\left(\mathbf{t}-t_i\right)\right)$ where $K_h$ is the kernel function with bandwith $h$ and $\mathbf{t}$ is the vector of time stamps.

To calculate $\hat{\alpha}$, we use the standard weighted least squares solution:

\[\hat{\alpha} = \left(\mathbf{T}^{\top}\mathbf{W}\mathbf{T}\right)^{-1} \mathbf{T}^{\top}\mathbf{W}\mathbf{y}\]

The implementation in JuliaSimModelOptimizer uses a polynomial of degree 1 to smoothen the data and a polynomial of degree 2 to estimate the derivatives of the data.

For reference see [3]

For more in depth reference see Chapter 2.1-2.3 and 3.1-3.8 in [2]

Available Kernels

The table below shows the available kernels in JuliaSimModelOptimizer

KernelEquation
EpanechnikovKernel (default)$k(t) = \begin{matrix} &0,&t>1 \\ &0.75 \cdot (1 - t^2)&t \leq1 \end{matrix}$
UniformKernel$k(t) = \begin{matrix}&0,&t>1\\&0.5,&t\leq1\end{matrix}$
TriangularKernel$k(t) = \begin{matrix} &0,&t>1 \\ &(1 - \mathrm{abs}(t)),&t\leq1 \end{matrix}$
QuarticKernel$k(t) = \begin{matrix} &0,&t>0 \\ &(15 \cdot (1 - t^2)^2) / 16,&t\leq0 \end{matrix}$
TriweightKernel$k(t) = \begin{matrix} &0,&t>0 \\ &(35 * (1 - t^2)^3) / 32,&t\leq0 \end{matrix}$
TricubeKernel$k(t) = \begin{matrix} &0,&t>0 \\ &(70 \cdot (1 - \mathrm{abs}(t)^3)^3) / 80,&t\leq0 \end{matrix}$
GaussianKernel$k(t) = \exp(-0.5 \cdot t^2) / \sqrt{2 \pi}$
CosineKernel$k(t) = \begin{matrix} &0,&t>0 \\ &(\pi \cdot \cos(\pi \cdot t / 2)) / 4, &t\leq0 \end{matrix}$
LogisticKernel$k(t) = 1 / (\exp(t) + 2 + \exp(-t))$
SigmoidKernel$k(t) = 2 / (\pi \cdot (\exp(t) + \exp(-t)))$
SilvermanKernel$k(t) = \sin(\mathrm{abs}(t) / 2 + \pi / 4) \cdot 0.5 \cdot \exp(-\mathrm{abs}(t) / \sqrt(2))$

Spline Collocation

JuliaSimModelOptimizer.SplineCollocateType
SplineCollocate(; maxiters=nothing, maxtime=nothing, optimizer = Optim.LBFGS(linesearch = BackTracking()), interp = Datainterpolations.CubicSpline)

A spline based collocation method for solving the inverse problem. The derivatives of the data are estimated by intepolating the data with a spline. The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by solving the ODE system.

Keyword Arguments

  • maxiters: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.
  • optimizer: An Optimization.jl algorithm for the global optimization. Defaults to BFGS().
  • interp: The interpolation function used to estimate the derivatives of the data. For more information, see: DataInterpolations.jl
source

Noise Robust Collocation

JuliaSimModelOptimizer.NoiseRobustCollocateType
NoiseRobustCollocate(; maxiters=nothing, maxtime=nothing, diff_iters, dx, α, optimizer = Optim.LBFGS(linesearch = BackTracking()))

A noise robust collocation method for solving the inverse problem. The derivatives of the data are estimated using total variational regularized numerical differentiation (tvdiff). The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by the ODE solver.

For more information about the tvdiff implementation, see: NoiseRobustDifferentiation.jl

Keyword Arguments

  • maxiters: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.
  • optimizer: An Optimization.jl algorithm for the global optimization. Defaults to BFGS().
  • diff_iters: Required. Number of iterations to run the main loop of the differentiation algorithm.
  • α: Required. Regularization parameter. Higher values increase regularization strength and improve conditioning.
source
  • 1Roesch, Elisabeth, Christopher Rackauckas, and Michael PH Stumpf. "Collocation based training of neural ordinary differential equations." Statistical Applications in Genetics and Molecular Biology 20, no. 2 (2021): 37-49.
  • 2Fan, Jianqing, and Irene Gijbels. Local polynomial modelling and its applications. Routledge, 2018. https://doi.org/10.1201/9780203748725
  • 3Liang, Hua, and Hulin Wu. "Parameter estimation for differential equation models using a framework of measurement error in regression models." Journal of the American Statistical Association 103, no. 484 (2008): 1570-1583. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/