Paper digest: Simulation-based Inference for Partially Observed Markov Process Models with Spatial Coupling (Kidus Asfaw)

Last modified: 14 minute read, with 2867 words. Post views:

This dissertation is the PhD thesis from Kidus Asfaw. Here I recorded some notes when I read it. There is also a preprint paper of this dissertation: Partially observed Markov processes with spatial structure via the R package spatPomp.

Basically in this thesis, Kidus Asfaw introduced the SpatPomp model and its associated package. He also described several algorithms for calibrating the parameters e.g. GIRF, UBF, ABF, EnKF etc. in details in the paper. While I was studying this paper, I also found that there maybe some other useful (maybe better in some cases?) algorithms recently developed, e.g. IBPF (This was mentioned in the preprint paper but not in the thesis). There is also slides online about IBPF.

In the below section, I will take some notes which I think would be helpful to me from the thesis and paper.

SpatPOMP models and their representation in spatPomp

SpatPOMP models

  • Suppose there are $U$ units labelled $1:U={1,2,…U}$.
  • Let $t_1<t_2<t_3<\cdots<t_N$ be a collection of time at which measurements are taken.
  • We observe a measurement $y_{u,n}^{*}$ at time $t_n$ on unit $u$.
  • We postulate a latent stochastic process $\bm{X}{n} = (X{1,n},…,X_{U,n})$ on unit $u$ at time $t_n$.
  • The observation $y_{u,n}^{*}$ is modeled as a realization of an observable random variable $Y_{u,n}$.
  • The process $\bm{X}{0:N} = (\bm{X}_0, \bm{X}_1, …, \bm{X}_N )$ is required to have the Markov property, i.e., $\bm{X}{0:n−1}$ and $\bm{X}_{n+1:N}$ are conditionally independent given $\bm{X}_n$

Implementation of SpatPOMP models

  • spatPomp extends pomp by the addition of unit-level specification of the measurement model.
  • There are five unit-level functionalities of class ‘spatPomp’ objects: dunit_measure, runit_measure, eunit_measure, vunit_measure and munit_measure.
  • Data and observation times: The only mandatory arguments to the spatPomp() constructor are data (class data.frame object), times, units and t0.
  • Initial conditions: initial value parameters (IVPs) are components of the $\theta$ having the sole function of specifying $\bm{X}0$. $\bm{X}_0$ is a draw from the initial distribution $f{\bm{X}_0}(\bm{x}_0;\theta)$.
  • Parameters involved in the transition density or measurement density are called regular parameters (RPs)
  • Covariates: In spatPomp, covariate processes can be supplied as a class data.frame object to the covar argument of the spatPomp() constructor function. This data.frame requires a column for time, spatial unit, and each of the covariates.
  • Specifying model components using C snippets:
    • the names of the parameters and latent variables must be supplied to spatPomp using the paramnames and unit_statenames arguments.
    • unit-specific variable names can be supplied as needed via arguments to spatPomp_Csnippet. These can be used to specify the five unit_measure model components which specify properties of the spatially structured measurement model characteristic of a SpatPOMP.
    • For a unit_measure Csnippet, automatically defined variables also include the number of units, $\text{U}$, and an integer $\text{u}$ corresponding to a numeric unit from 0 to $\text{U}-1$.
    • A spatPomp Csnippet for rprocess will typically involve a computation looping through the units, which requires access to location data used to specify the interaction between units.
    • The location data can be made available to the Csnippet using the globals argument.
  • Simulation is carried out by simulate() which requires specification of rprocess and rmeasure.

Likelihood evaluation

Four filters

In the spatiotemporal context, successful particle filtering requires state-of-the-art algorithms. Below, we introduce four such algorithms implemented in the spatPomp package: a guided intermediate resampling filter (GIRF) implemented as girf, an adapted bagged filter (ABF) implemented as abf, an ensemble Kalman filter (EnKF) implemented as enkf, and a block particle filter (BPF) implemented as bpfilter.

The filtering problem can be decomposed into two steps, prediction and filtering. For all the filters we consider here, the prediction step involves simulating from the latent process model. The algorithms differ primarily in their approaches to the filtering step, also known as the data assimilation step or the analysis step.

  • For PF, the filtering step is a weighted resampling from the prediction particles, and the instability of these weights in high dimensions is the fundamental scalability issue with the algorithm.
  • GIRF carries out this resampling at many intermediate timepoints with the goal of breaking an intractable resampling problem into a sequence of tractable ones.
  • EnKF estimates variances and covariances of the prediction simulations, and carries out an update rule that would be exact for a Gaussian system.
  • BPF carries out the resampling independently over a partition of the units, aiming for an inexact but numerically tractable approximation.
  • ABF combines together many high-variance filters using local weights to beat the curse of dimensionality.

Considerations for choosing a filter

Of the four filters described above, only GIRF provides an unbiased estimate of the likeli- hood. However, GIRF has a relatively weak theoretical scaling support, beating the curse of dimensionality only in the impractical situation of an ideal guide function (Park and Ionides 2020). EnKF, ABF and BPF gain scalability by making different approximations that may or may not be appropriate for a given situation.

EnKF has low variance but is relatively sensitive to deviations from normality. BPF can break conservation laws satisfied by the latent process, such as a constraint on the total population in all units; ABF satisfies such constraints but has been found to have higher variance than BPF on some benchmark problems (Ionides et al. 2021). For the measles model built by measles(), BPF and ABF have been found to perform better than EnKF and GIRF (Ionides et al. 2021).

Through personal communication with Prof Iondies, I learned that the IBPF algorithm seems efficient and suitable for my study. On a new problem, it is advantageous to compare various algorithms to reveal unexpected limitations of the different approximations inherent in each algorithm.

Likelihood maximization and inference for SpatPOMP models

We focus on iterated filtering methods (Ionides et al. 2015) which provide a relatively simple way to coerce filtering algorithms to carry out parameter inference, applicable to the general class of SpatPOMP models considered by spatPomp.

The main idea of iterated filtering is to extend a POMP model to include dynamic parameter perturbations. Repeated filtering, with parameter perturbations of decreasing magnitude, approaches the maximum likelihood estimate.

Iterated block particle filter (IBPF): Any estimated parameter (whether shared or unit-specific) must be coded as a unit-specific parameter in order to apply this method. The spatiotemporal perturbations are used only as an optimization tool for model parameters which are fixed though time and space (for shared parameters) or just through time (for unit-specific parameters). The algorithm uses decreasing perturbation magnitudes so that the perturbed model approaches the fixed parameter model as the optimization proceeds. An example model compatible with ibpf is constructed by the he10() function. This builds a measles model similar to the measles() example discussed in Section 6, with the difference that the user can select which parameters are unit-specific.

Data analysis tools on a toy model

We illustrate key capabilities of spatPomp using the bm10 model for correlated Brownian motion.

Computing the likelihood

bpfilter(bm10, Np=2000, block_size=2)
enkf(bm10, Np=2000)

This generates objects of class girfd_spatPomp, bpfiltered_spatPomp and enkfd_spatPomp respectively. A plot method provides diagnostics, and the resulting log-likelihood estimate is extracted by logLik.

Parameter inference

We start with a test of igirf, estimating the parameters $ρ$, $σ$ and $τ$ but not the initial value parameters. We use a computational intensity variable, i, to switch between algorithmic parameter settings. For debugging, testing and code development we use i=1. For a final version of the manuscript, we use i=2.

start_params <- c(rho = 0.8, sigma = 0.4, tau = 0.2,
  X1_0 = 0, X2_0 = 0, X3_0 = 0, X4_0 = 0, X5_0 = 0,
  X6_0 = 0, X7_0 = 0, X8_0 = 0, X9_0 = 0, X10_0 = 0)
i <- 2
ig1 <- igirf(
  cooling.type = "geometric",

ig1 is an object of class igirfd_spatpomp which inherits from class girfd_spatpomp. A useful diagnostic of the parameter search is a plot of the change of the parameter estimates during the course of an igirf() run. Each iteration within an igirf run provides a parameter estimate and a likelihood evaluation at that estimate. The plot method for a class igirfd_spatPomp object shows the convergence record of parameter estimates and their likelihood evaluations.

Monte Carlo profiles

Proper interpretation of a parameter estimate requires understanding its uncertainty. Here, we construct a profile likelihood 95% confidence interval for the coupling parameter, $ρ$, in the bm10 model. This entails calculation of the maximized likelihood over all parameters excluding $ρ$, for a range of fixed values of $ρ$. We use Monte Carlo adjusted profile (MCAP) methodology to accommodate Monte Carlo error in maximization and likelihood evaluation (Ionides et al. 2017; Ning et al. 2021).

In practice, we carry out multiple searches for each value of $ρ$, with other parameters drawn at random from a specified hyperbox. We build this box on a transformed scale suitable for optimization, taking advantage of the partrans method. It is generally convenient to optimize non-negative parameters on a log scale and (0, 1) valued parameter on a logit scale. We set this up using the pomp function profile_design, taking advantage of the partrans method defined by the partrans argument to spatPomp.

bm10 <- spatPomp(bm10,
  partrans = parameter_trans(log = c("sigma", "tau"), logit = c("rho")),
  paramnames = c("sigma","tau","rho") 

This provides access to the partrans method which we use when constructing starting points for the search:

theta_lo_trans <- partrans(bm10,coef(bm10),dir="toEst") - log(2)
theta_hi_trans <- partrans(bm10,coef(bm10),dir="toEst") + log(2)
) -> pd

The argument nprof sets the number of searches, each started at a random starting point, for each value of the profiled parameter, rho.

here, we demonstrate using igirf (for likelihood maximization) and enkf (for likelihood evaluation).

foreach (p=iter(pd,"row"),.combine=dplyr::bind_rows) %dopar% {
  ig2 <- igirf(ig1,params=p,,tau=0.02))
  ef <- replicate(switch(i,2,10),enkf(ig2,Np=switch(i,50,2000)))
  ll <- sapply(ef,logLik)
  ll <- logmeanexp(ll,se=TRUE)
} -> rho_prof

rho_mcap <- mcap(rho_prof[,"loglik"],parameter=rho_prof[,"rho"])
[1] 0.2568569 0.5083083

Above, calling igirf on ig1 imports all the previous algorithmic settings except for those that we explicitly modify. Each row of rho_prof now contains a parameter estimate its log likelihood, with $ρ$ values fixed along a grid. The MCAP 95% confidence interval constructed by mcap uses loess to obtain a smoothed estimate of the profile likelihood function and then determines a confidence interval using by a cutoff based on the delta method applied to a local quadratic regression. This cutoff is typically slightly larger than the asymptotic 1.92 cutoff for a standard profile likelihood confidence interval constructed assuming error-free likelihood maximization and evaluation. Note that the data in bm10 are generated from a model with $ρ = 0.4$.

A spatiotemporal model of measles transmission

A spatPomp data analysis may consist of the following major steps: (i) obtain data, postulate a class of models that could have generated the data and bring these two pieces together via a call to spatPomp(); (ii) employ the tools of likelihood-based inference, evaluating the likelihood at specific parameter sets, maximizing likelihoods under the postulated class of models, constructing Monte Carlo adjusted confidence intervals, or performing likelihood ratio hypothesis tests of nested models; (iii) criticize the model by comparing simulations to data, or by considering rival models.

The measles() and he10() models are similar, but differ in details. For measles(), the data consist of biweekly counts and parameters are all shared between units, matching the analysis of Park and Ionides (2020) and Ionides et al. (2021). For he10(), data are weekly and parameters can be shared or unit-specific, matching the analysis of He et al. (2010) and Ionides et al. (2022). We write the model for the general case where all parameters are unit-specific, noting that it is a relevant data analysis question to determine when parameter dependence on u can be omitted.

$μ_{EI,u}$ is the rate at which an individual in $E$ progresses to $I$ in city $u$, $1/μ_{EI,u}$ is called mean disease latency. Similarly, $1/μ_{IR,u}$ is the mean infectious period.

The transmission rate $\overline\beta_{u}$ is parameterized as $\overlineβ_u = R_{0,u}(μ_{IR,u} + μ_{D})$ with $R_{0,u}$ being the basic reproduction rate; $v_{u\tilde{u}} = g_uV_{u\tilde{u}}$ is the number of travelers from city $u$ to city $\tilde{u}$ (symmetric). $μ_{SE,u}$ is the disease transmission rate, parameterized as:

\[\mu_{SE,u}(t) = \overline\beta_u seas_u(t)\left[{(\frac{I_u(t)+\iota_{u}}{P_u(t)})^{\alpha_{u}}+\sum_{\tilde{u}\ne{u}}\frac{v_{u\tilde{u}}}{P_u(t)}\{ (\frac{I_{\tilde{u}}(t)}{P_{\tilde{u}}(t)})^{a_{\tilde{u}}} - (\frac{I_{u}(t)}{P_{u}(t)})^{a_{u}} \}} \right]\frac{d\Gamma_{SE,u}}{dt}\]

$seas_u(t)$ is a periodic step function taking value $(1 − A_u)$ during school vacations and $(1 + 0.381 A_u)$ during school terms, defined so that the average value of $seas_u(t)$ is 1; $ι_u$ describes infected individuals arriving from outside the study population; $α_u$ is an exponent describing non-homogeneous mixing of individuals; the multiplicative white noise $\frac{dΓ_{SE,u}}{dt}$ is a derivative of a gamma process $Γ_{SE,u}(t)$ having independent gamma distributed increments. Multiplicative white noise provides a way to model over-dispersion, a phenomenon where data variability is larger than can be explained by binomial or Poisson approximations. Over- dispersion on a multiplicative scale is also called environmental stochasticity, or logarithmic noise, or extra-demographic stochasticity.

To model the observation process, we define $Y_{u,n}$ as a normal approximation to an over-dispersed binomial sample of $C_{u,n}$ with reporting rate ρu.

\[Y_{u,n} ∼ Normal(ρ_{u}c_{u,n}, ρ(1−ρ_u)c_{u,n} + τ^2ρ^2_{u}c_{u,n}^2)\]

where $τ$ is a measurement overdispersion parameter.

Construction of a measles spatPomp object

We construction the model described in Section 6.1 for the simplified situation where $α_u = 1$, $ι_u = 0$ and $c_u = 0$.

measles6 <- spatPomp(
  units='city', # colnames in table
  times='year', # colnames in table

Internally, unit names are mapped to an index $1,…,U$. The number assigned to each unit can be checked by inspecting their position in unit_names(measles).

First, we suppose that we have covariate time series, consisting of census population, $P_u(t)$, and lagged birthrate, $b_u(t − t_b)$ (as used in $\mu_{BS}$), in a class data.frame object called measles_covar. The required format is similar to the data argument, though the times do not have to correspond to observation times since spatPomp will interpolate the covariates as needed.

we define the movement matrix $(v_{u,\tilde{u}})_{u,\tilde{u}\in 1:U}$ as a global variable in C that will be accessible to all model components, via the globals argument to spatPomp().

measles_globals <- spatPomp_Csnippet("
  const double V[6][6] = {

We now construct a Csnippet for initializing the latent process at time $t_0$. This is done using unit-specific IVPs. Here, the IVPs are S1_0, ... ,S6_0, E1_0, ... ,E6_0, and I1_0, ... ,I6_0. These code for the initial value of the corresponding states, S1, ... ,S6, E1, ... ,E6, and I1, ... ,I6. Additional book-keeping states, C1, ... ,C6, count accumulated cases during an observation interval and so are initialized to zero. The arguments unit_ivpnames = c(’S’,’E’,’I’) and unit_statenames = c(’S’,’E’,’I’,’C’) enable spatPomp() to expect these variables and define then as needed when compiling the Csnippets. Similarly, unit_covarnames = ’P’ declares the corresponding unit-specific population covariate. This is demonstrated in the following Csnippet specifying rinit.

measles_rinit <- spatPomp_Csnippet(
  unit_statenames = c('S','E','I','C'), 
  unit_ivpnames = c('S','E','I'), # these three corresponds to S_0, E_0, I_0 below
  unit_covarnames = c('P'),
  code = "
    for (int u=0; u<U; u++) {
      S[u] = round(P[u]*S_0[u]);
      E[u] = round(P[u]*E_0[u]);
      I[u] = round(P[u]*I_0[u]);
      C[u] = 0; +}"

The rprocess Csnippet has to encode only a rule for a single Euler increment from the process model. C definitions are provided by spatPomp for all parameters, state variables, covariates, t, dt and U. Any additional variables required must be declared as C variables within the Csnippet.

measles_rprocess <- spatPomp_Csnippet(
  unit_statenames = c('S','E','I','C'),
  unit_covarnames = c('P','lag_birthrate'),
    double beta, seas, Ifrac, mu[7], dN[7];
    int u, v;
    int BS=0, SE=1, SD=2, EI=3, ED=4, IR=5, ID=6;
    beta = R0*(muIR+muD);
    t = (t-floor(t))*365.25;
    seas = (t>=7&&t<=100)||(t>=115&&t<=199)||(t>=252&&t<=300)||(t>=308&&t<=356)
      ? 1.0 + A * 0.2411/0.7589 : 1.0 - A;
      Ifrac = I[u]/P[u];
      for (v=0; v < U ; v++) if(v != u)
        Ifrac += g * V[u][v]/P[u] * (I[v]/P[v] - I[u]/P[u]);
      mu[BS] = lag_birthrate[u];
      mu[SE] = beta*seas*Ifrac*rgammawn(sigmaSE,dt)/dt;
      mu[SD] = muD;
      mu[EI] = muEI;
      mu[ED] = muD;
      mu[IR] = muIR;
      mu[ID] = muD;
      dN[BS] = rpois(mu[BS]*dt);
      S[u] += dN[BS] - dN[SE] - dN[SD];
      E[u] += dN[SE] - dN[EI] - dN[ED];
      I[u] += dN[EI] - dN[IR] - dN[ED];
      C[u] += dN[EI];

Here, we show the Csnippet defining the unit measurement model. The lik variable is pre-defined and is set to the evaluation of the unit measurement density in either the log or natural scale depending on the value of give_log.

measles_dunit_measure <- spatPomp_Csnippet("
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  lik = dnorm(cases,m,sqrt(v),give_log);

The user may also directly supply dmeasure that returns the product of unit-specific measurement densities. The latter is needed to apply pomp functions which require dmeasure rather than dunit_measure. We create the corresponding Csnippet in measles_dmeasure, but do not display the code here. Next, we construct a Csnippet to code runit_measure

measles_runit_measure <- spatPomp_Csnippet("
  double cases;
  double m = rho*C;
  double v = m*(1.0-rho+psi*psi*m);
  cases = rnorm(m,sqrt(v));
  if (cases > 0.0) cases = nearbyint(cases);
  else cases = 0.0;

We also construct, but do not display, a Csnippet measles_rmeasure coding the class ‘pomp’ version rmeasure. Next we fill the model with some preset parameters.

IVPs <- rep(c(0.032,0.00005,0.00004,0.96791),each=6)
names(IVPs) <- paste0(rep(c('S','E','I','R'),each=6),1:6,"_0")
measles_params <- c(R0=30,A=0.5,muEI=52,muIR=52,muD=0.02, + alpha=1,sigmaSE=0.01,rho=0.5,psi=0.1,g=1500,IVPs)

measles6 <- spatPomp(
  data = measles6,
  covar = measles_covar,
  unit_statenames = c('S','E','I','R','C'),
  unit_accumvars = c('C'),
  paramnames = names(measles_params),
  rinit = measles_rinit,
  rprocess = euler(measles_rprocess, delta.t=1/365),
  dunit_measure = measles_dunit_measure,
  eunit_measure = measles_eunit_measure,
  vunit_measure = measles_vunit_measure,
  runit_measure = measles_runit_measure,
  dmeasure = measles_dmeasure,
  rmeasure = measles_rmeasure,
  globals = measles_globals