Collocation Methods
Introduction
Many methods for solving inverse problems involving ODE's need costly ODE solvers at each iteration to compute the loss. E.g. for L2 loss:
$\mathrm{Loss}_{\mathrm{L2}} = \left\|u - \hat{u}_{\mathrm{solver}}(u_0, t,p) \right\|_2$
Where $u$ is the observed data, $\hat{u}_{\mathrm{solver}}(u_0, t,p)$ is the ODE solver prediction of $u$, given initial conditions $u_0$, timespan $t$ and parameters $p$.
In order to reduce the computational cost, a collocation method may be used [3], [1]. Collocation methods circumvents the need for an ODE solver by estimating the derivatives of the data. The loss is computed between these estimated derivatives and the system dynamics. E.g.
$\mathrm{Loss}_{\mathrm{L2}} = \left\|\hat{u}' - f(u, t,p) \right\|_2$
Where $\hat{u}'$ is the estimated derivatives of the observed data.
JuliaSimModelOptimizer
implements three different collocation methods to estimate the derivative of the data: Kernel Collocation, Spline Collocation and Noise Robust Collocation.
Kernel Collocation
JuliaSimModelOptimizer.KernelCollocate
— TypeKernelCollocate(; maxiters=nothing, maxtime=nothing, optimizer = Optim.LBFGS(linesearch = BackTracking()), kernel = EpanechnikovKernel())
A kernel based collocation method for solving the inverse problem. The derivatives of the data are estimated with a local linear regression using a kernel. The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by solving the ODE system.
Keyword Arguments
maxiters
: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.optimizer
: An Optimization.jl algorithm for the global optimization. Defaults toNLopt.LD_LBFGS()
.kernel
: The kernel used to estimate the derivatives of the data. For more information, see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/
Theory
Assume that each snapshot $y_i$ of the time series $\mathbf{y}$ can be approximated by a polynomial of degree $p$ in the following way:
\[y_i \approx \sum_{j=0}^{p} \alpha_j (t_i - t)^{j}\]
Where $\alpha_j$ are polynomial coefficients, $t_i$ is the time at snapshot $y_i$, in the neighborhood of $t$.
Following Eq. 2.9 in [2], we get an estimate of $y_i'$ by
\[\hat{y}_i' = \hat{\alpha_1}\]
Where $\hat{\alpha_1}$ is the first entry in $\hat{\alpha}$, which is the vector that minimizes the weighted least squares problem:
\[\min{\alpha}\left(\left(\mathbf{y} - \mathbf{T}\alpha\right)^{\top}\mathbf{W}\left(\mathbf{y} - \mathbf{T}\alpha\right)\right)\]
Where
\[\mathbf{T}=\left[\begin{array}{cccc} 1, & t_1-t_i & \ldots & \left(t_1-t_i\right)^p \\ \vdots & \vdots & & \vdots \\ 1, & t_n-t_i & \cdots & \left(t_n-t_i\right)^p \end{array}\right]\]
and $W=\operatorname{diag}\left(K_h\left(\mathbf{t}-t_i\right)\right)$ where $K_h$ is the kernel function with bandwith $h$ and $\mathbf{t}$ is the vector of time stamps.
To calculate $\hat{\alpha}$, we use the standard weighted least squares solution:
\[\hat{\alpha} = \left(\mathbf{T}^{\top}\mathbf{W}\mathbf{T}\right)^{-1} \mathbf{T}^{\top}\mathbf{W}\mathbf{y}\]
The implementation in JuliaSimModelOptimizer
uses a polynomial of degree 1 to smoothen the data and a polynomial of degree 2 to estimate the derivatives of the data.
For reference see [3]
For more in depth reference see Chapter 2.1-2.3 and 3.1-3.8 in [2]
Available Kernels
The table below shows the available kernels in JuliaSimModelOptimizer
Kernel | Equation |
---|---|
EpanechnikovKernel (default) | $k(t) = \begin{matrix} &0,&t>1 \\ &0.75 \cdot (1 - t^2)&t \leq1 \end{matrix}$ |
UniformKernel | $k(t) = \begin{matrix}&0,&t>1\\&0.5,&t\leq1\end{matrix}$ |
TriangularKernel | $k(t) = \begin{matrix} &0,&t>1 \\ &(1 - \mathrm{abs}(t)),&t\leq1 \end{matrix}$ |
QuarticKernel | $k(t) = \begin{matrix} &0,&t>0 \\ &(15 \cdot (1 - t^2)^2) / 16,&t\leq0 \end{matrix}$ |
TriweightKernel | $k(t) = \begin{matrix} &0,&t>0 \\ &(35 * (1 - t^2)^3) / 32,&t\leq0 \end{matrix}$ |
TricubeKernel | $k(t) = \begin{matrix} &0,&t>0 \\ &(70 \cdot (1 - \mathrm{abs}(t)^3)^3) / 80,&t\leq0 \end{matrix}$ |
GaussianKernel | $k(t) = \exp(-0.5 \cdot t^2) / \sqrt{2 \pi}$ |
CosineKernel | $k(t) = \begin{matrix} &0,&t>0 \\ &(\pi \cdot \cos(\pi \cdot t / 2)) / 4, &t\leq0 \end{matrix}$ |
LogisticKernel | $k(t) = 1 / (\exp(t) + 2 + \exp(-t))$ |
SigmoidKernel | $k(t) = 2 / (\pi \cdot (\exp(t) + \exp(-t)))$ |
SilvermanKernel | $k(t) = \sin(\mathrm{abs}(t) / 2 + \pi / 4) \cdot 0.5 \cdot \exp(-\mathrm{abs}(t) / \sqrt(2))$ |
Spline Collocation
JuliaSimModelOptimizer.SplineCollocate
— TypeSplineCollocate(; maxiters=nothing, maxtime=nothing, optimizer = Optim.LBFGS(linesearch = BackTracking()), interp = Datainterpolations.CubicSpline)
A spline based collocation method for solving the inverse problem. The derivatives of the data are estimated by intepolating the data with a spline. The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by solving the ODE system.
Keyword Arguments
maxiters
: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.optimizer
: An Optimization.jl algorithm for the global optimization. Defaults toBFGS()
.interp
: The interpolation function used to estimate the derivatives of the data. For more information, see: DataInterpolations.jl
Noise Robust Collocation
JuliaSimModelOptimizer.NoiseRobustCollocate
— TypeNoiseRobustCollocate(; maxiters=nothing, maxtime=nothing, diff_iters, dx, α, optimizer = Optim.LBFGS(linesearch = BackTracking()))
A noise robust collocation method for solving the inverse problem. The derivatives of the data are estimated using total variational regularized numerical differentiation (tvdiff
). The loss is then generated from the error between the estimated derivatives and the time derivatives predicted by the ODE solver.
For more information about the tvdiff
implementation, see: NoiseRobustDifferentiation.jl
Keyword Arguments
maxiters
: Required. Maximum numbers of iterations when using the method with a time stepping optimizer.optimizer
: An Optimization.jl algorithm for the global optimization. Defaults toBFGS()
.diff_iters
: Required. Number of iterations to run the main loop of the differentiation algorithm.α
: Required. Regularization parameter. Higher values increase regularization strength and improve conditioning.
- 1Roesch, Elisabeth, Christopher Rackauckas, and Michael PH Stumpf. "Collocation based training of neural ordinary differential equations." Statistical Applications in Genetics and Molecular Biology 20, no. 2 (2021): 37-49.
- 2Fan, Jianqing, and Irene Gijbels. Local polynomial modelling and its applications. Routledge, 2018. https://doi.org/10.1201/9780203748725
- 3Liang, Hua, and Hulin Wu. "Parameter estimation for differential equation models using a framework of measurement error in regression models." Journal of the American Statistical Association 103, no. 484 (2008): 1570-1583. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631937/