# Paper digest: Simulation-based Inference for Partially Observed Markov Process Models with Spatial Coupling (Kidus Asfaw)

**Last modified:**Post views:

This dissertation is the PhD thesis from Kidus Asfaw. Here I recorded some notes when I read it. There is also a preprint paper of this dissertation: Partially observed Markov processes with spatial structure via the R package spatPomp.

Basically in this thesis, Kidus Asfaw introduced the **SpatPomp** model and its associated package. He also described several algorithms for calibrating the parameters e.g. GIRF, UBF, ABF, EnKF etc. in details in the paper. While I was studying this paper, I also found that there maybe some other useful (maybe better in some cases?) algorithms recently developed, e.g. IBPF (This was mentioned in the preprint paper but not in the thesis). There is also slides online about IBPF.

In the below section, I will take some notes which I think would be helpful to me from the thesis and paper.

## SpatPOMP models and their representation in spatPomp

### SpatPOMP models

- Suppose there are $U$ units labelled $1:U={1,2,…U}$.
- Let $t_1<t_2<t_3<\cdots<t_N$ be a collection of time at which measurements are taken.
- We observe a measurement $y_{u,n}^{*}$ at time $t_n$ on unit $u$.
- We postulate a latent stochastic process $\bm{X}
*{n} = (X*{1,n},…,X_{U,n})$ on unit $u$ at time $t_n$. - The observation $y_{u,n}^{*}$ is modeled as a realization of an observable random variable $Y_{u,n}$.
- The process $\bm{X}
*{0:N} = (\bm{X}_0, \bm{X}_1, …, \bm{X}_N )$ is required to have the Markov property, i.e., $\bm{X}*{0:n−1}$ and $\bm{X}_{n+1:N}$ are conditionally independent given $\bm{X}_n$

### Implementation of SpatPOMP models

**spatPomp**extends**pomp**by the addition of unit-level specification of the measurement model.- There are five unit-level functionalities of class ‘spatPomp’ objects:
`dunit_measure`

,`runit_measure`

,`eunit_measure`

,`vunit_measure`

and`munit_measure`

. - Data and observation times: The only mandatory arguments to the spatPomp() constructor are
`data`

(class`data.frame`

object),`times`

,`units`

and`t0`

. - Initial conditions:
*initial value parameters*(IVPs) are components of the $\theta$ having the sole function of specifying $\bm{X}*0$. $\bm{X}_0$ is a draw from the initial distribution $f*{\bm{X}_0}(\bm{x}_0;\theta)$. - Parameters involved in the transition density or measurement density are called
*regular parameters*(RPs) - Covariates: In
**spatPomp**, covariate processes can be supplied as a class`data.frame`

object to the covar argument of the`spatPomp()`

constructor function. This data.frame requires a column for time, spatial unit, and each of the covariates. - Specifying model components using C snippets:
- the names of the parameters and latent variables must be supplied to spatPomp using the
`paramnames`

and`unit_statenames`

arguments. - unit-specific variable names can be supplied as needed via arguments to spatPomp_Csnippet. These can be used to specify the five
`unit_measure`

model components which specify properties of the spatially structured measurement model characteristic of a SpatPOMP. - For a
`unit_measure`

Csnippet, automatically defined variables also include the number of units, $\text{U}$, and an integer $\text{u}$ corresponding to a numeric unit from 0 to $\text{U}-1$. - A
**spatPomp**Csnippet for`rprocess`

will typically involve a computation looping through the units, which requires access to location data used to specify the interaction between units. - The location data can be made available to the Csnippet using the
`globals`

argument.

- the names of the parameters and latent variables must be supplied to spatPomp using the
- Simulation is carried out by
`simulate()`

which requires specification of`rprocess`

and`rmeasure`

.

## Likelihood evaluation

### Four filters

In the spatiotemporal context, successful particle filtering requires state-of-the-art algorithms. Below, we introduce four such algorithms implemented in the **spatPomp** package: a guided intermediate resampling filter (GIRF) implemented as `girf`

, an adapted bagged filter (ABF) implemented as `abf`

, an ensemble Kalman filter (EnKF) implemented as `enkf`

, and a block particle filter (BPF) implemented as `bpfilter`

.

The filtering problem can be decomposed into two steps, **prediction** and **filtering**. For all the filters we consider here, the prediction step involves simulating from the latent process model. The algorithms differ primarily in their approaches to the filtering step, also known as the data assimilation step or the analysis step.

- For PF, the filtering step is a weighted resampling from the prediction particles, and the instability of these weights in high dimensions is the fundamental scalability issue with the algorithm.
- GIRF carries out this resampling at many intermediate timepoints with the goal of breaking an intractable resampling problem into a sequence of tractable ones.
- EnKF estimates variances and covariances of the prediction simulations, and carries out an update rule that would be exact for a Gaussian system.
- BPF carries out the resampling independently over a partition of the units, aiming for an inexact but numerically tractable approximation.
- ABF combines together many high-variance filters using local weights to beat the curse of dimensionality.

### Considerations for choosing a filter

Of the four filters described above, only GIRF provides an unbiased estimate of the likeli- hood. However, GIRF has a relatively weak theoretical scaling support, beating the curse of dimensionality only in the impractical situation of an ideal guide function (Park and Ionides 2020). EnKF, ABF and BPF gain scalability by making different approximations that may or may not be appropriate for a given situation.

**EnKF** has low variance but is relatively sensitive to deviations from normality. **BPF** can break conservation laws satisfied by the latent process, such as a constraint on the total population in all units; **ABF** satisfies such constraints but has been found to have higher variance than BPF on some benchmark problems (Ionides et al. 2021). For the measles model built by `measles()`

, **BPF and ABF** have been found to perform better than EnKF and GIRF (Ionides et al. 2021).

Through personal communication with Prof Iondies, I learned that the IBPF algorithm seems efficient and suitable for my study. On a new problem, it is advantageous to compare various algorithms to reveal unexpected limitations of the different approximations inherent in each algorithm.

## Likelihood maximization and inference for SpatPOMP models

We focus on iterated filtering methods (Ionides et al. 2015) which provide a relatively simple way to coerce filtering algorithms to carry out parameter inference, applicable to the general class of SpatPOMP models considered by spatPomp.

The main idea of iterated filtering is to extend a POMP model to include dynamic parameter perturbations. Repeated filtering, with parameter perturbations of decreasing magnitude, approaches the maximum likelihood estimate.

Iterated block particle filter (IBPF): Any estimated parameter (whether shared or unit-specific) must be coded as a unit-specific parameter in order to apply this method. The spatiotemporal perturbations are used only as an optimization tool for model parameters which are fixed though time and space (for shared parameters) or just through time (for unit-specific parameters). The algorithm uses decreasing perturbation magnitudes so that the perturbed model approaches the fixed parameter model as the optimization proceeds. An example model compatible with `ibpf`

is constructed by the `he10()`

function. This builds a measles model similar to the `measles()`

example discussed in Section 6, with the difference that the user can select which parameters are unit-specific.

## Data analysis tools on a toy model

We illustrate key capabilities of spatPomp using the `bm10`

model for correlated Brownian motion.

### Computing the likelihood

```
girf(bm10,Np=500,Nguide=50,Ninter=5,lookahead=1)
bpfilter(bm10, Np=2000, block_size=2)
enkf(bm10, Np=2000)
```

This generates objects of class `girfd_spatPomp`

, `bpfiltered_spatPomp`

and `enkfd_spatPomp`

respectively. A plot method provides diagnostics, and the resulting log-likelihood estimate is extracted by `logLik`

.

### Parameter inference

We start with a test of `igirf`

, estimating the parameters $ρ$, $σ$ and $τ$ but not the initial value parameters. We use a computational intensity variable, `i`

, to switch between algorithmic parameter settings. For debugging, testing and code development we use `i=1`

. For a final version of the manuscript, we use `i=2`

.

```
start_params <- c(rho = 0.8, sigma = 0.4, tau = 0.2,
X1_0 = 0, X2_0 = 0, X3_0 = 0, X4_0 = 0, X5_0 = 0,
X6_0 = 0, X7_0 = 0, X8_0 = 0, X9_0 = 0, X10_0 = 0)
i <- 2
ig1 <- igirf(
bm10,
params=start_params,
Ngirf=switch(i,2,50),
Np=switch(i,10,1000),
Ninter=switch(i,2,5),
lookahead=1,
Nguide=switch(i,5,50),
rw.sd=rw.sd(rho=0.02,sigma=0.02,tau=0.02),
cooling.type = "geometric",
cooling.fraction.50=0.5
)
```

`ig1`

is an object of class `igirfd_spatpomp`

which inherits from class `girfd_spatpomp`

. A useful diagnostic of the parameter search is a plot of the change of the parameter estimates during the course of an `igirf()`

run. Each iteration within an `igirf`

run provides a parameter estimate and a likelihood evaluation at that estimate. The plot method for a class `igirfd_spatPomp`

object shows the convergence record of parameter estimates and their likelihood evaluations.

### Monte Carlo profiles

Proper interpretation of a parameter estimate requires understanding its uncertainty. Here, we construct a profile likelihood 95% confidence interval for the coupling parameter, $ρ$, in the `bm10`

model. This entails calculation of the maximized likelihood over all parameters excluding $ρ$, for a range of fixed values of $ρ$. We use Monte Carlo adjusted profile (MCAP) methodology to accommodate Monte Carlo error in maximization and likelihood evaluation (Ionides et al. 2017; Ning et al. 2021).

In practice, we carry out multiple searches for each value of $ρ$, with other parameters drawn at random from a specified hyperbox. We build this box on a transformed scale suitable for optimization, taking advantage of the `partrans`

method. It is generally convenient to optimize non-negative parameters on a log scale and (0, 1) valued parameter on a logit scale. We set this up using the **pomp** function `profile_design`

, taking advantage of the `partrans`

method defined by the `partrans`

argument to `spatPomp`

.

```
bm10 <- spatPomp(bm10,
partrans = parameter_trans(log = c("sigma", "tau"), logit = c("rho")),
paramnames = c("sigma","tau","rho")
)
```

This provides access to the partrans method which we use when constructing starting points for the search:

```
theta_lo_trans <- partrans(bm10,coef(bm10),dir="toEst") - log(2)
theta_hi_trans <- partrans(bm10,coef(bm10),dir="toEst") + log(2)
profile_design(
rho=seq(from=0.2,to=0.6,length=10),
lower=partrans(bm10,theta_lo_trans,dir="fromEst"),
upper=partrans(bm10,theta_hi_trans,dir="fromEst"),
nprof=switch(i,2,10)
) -> pd
```

The argument `nprof`

sets the number of searches, each started at a random starting point, for each value of the profiled parameter, `rho`

.

here, we demonstrate using `igirf`

(for likelihood maximization) and `enkf`

(for likelihood evaluation).

```
foreach (p=iter(pd,"row"),.combine=dplyr::bind_rows) %dopar% {
library(spatPomp)
ig2 <- igirf(ig1,params=p,rw.sd=rw.sd(sigma=0.02,tau=0.02))
ef <- replicate(switch(i,2,10),enkf(ig2,Np=switch(i,50,2000)))
ll <- sapply(ef,logLik)
ll <- logmeanexp(ll,se=TRUE)
data.frame(as.list(coef(ig2)),loglik=ll[1],loglik.se=ll[2])
} -> rho_prof
rho_mcap <- mcap(rho_prof[,"loglik"],parameter=rho_prof[,"rho"])
rho_mcap$ci
[1] 0.2568569 0.5083083
```

Above, calling `igirf`

on `ig1`

imports all the previous algorithmic settings except for those that we explicitly modify. Each row of `rho_prof`

now contains a parameter estimate its log likelihood, with $ρ$ values fixed along a grid. The MCAP 95% confidence interval constructed by `mcap`

uses `loess`

to obtain a smoothed estimate of the profile likelihood function and then determines a confidence interval using by a cutoff based on the delta method applied to a local quadratic regression. This cutoff is typically slightly larger than the asymptotic 1.92 cutoff for a standard profile likelihood confidence interval constructed assuming error-free likelihood maximization and evaluation. Note that the data in bm10 are generated from a model with $ρ = 0.4$.

## A spatiotemporal model of measles transmission

A **spatPomp** data analysis may consist of the following major steps: **(i)** obtain data, postulate a class of models that could have generated the data and bring these two pieces together via a call to spatPomp(); **(ii)** employ the tools of likelihood-based inference, evaluating the likelihood at specific parameter sets, maximizing likelihoods under the postulated class of models, constructing Monte Carlo adjusted confidence intervals, or performing likelihood ratio hypothesis tests of nested models; **(iii)** criticize the model by comparing simulations to data, or by considering rival models.

The `measles()`

and `he10()`

models are similar, but differ in details. For `measles()`

, the data consist of biweekly counts and parameters are all shared between units, matching the analysis of Park and Ionides (2020) and Ionides et al. (2021). For `he10()`

, data are weekly and parameters can be shared or unit-specific, matching the analysis of He et al. (2010) and Ionides et al. (2022). We write the model for the general case where all parameters are unit-specific, noting that it is a relevant data analysis question to determine when parameter dependence on u can be omitted.

$μ_{EI,u}$ is the rate at which an individual in $E$ progresses to $I$ in city $u$, $1/μ_{EI,u}$ is called mean disease latency. Similarly, $1/μ_{IR,u}$ is the mean infectious period.

The transmission rate $\overline\beta_{u}$ is parameterized as $\overlineβ_u = R_{0,u}(μ_{IR,u} + μ_{D})$ with $R_{0,u}$ being the basic reproduction rate; $v_{u\tilde{u}} = g_uV_{u\tilde{u}}$ is the number of travelers from city $u$ to city $\tilde{u}$ (symmetric). $μ_{SE,u}$ is the disease transmission rate, parameterized as:

\[\mu_{SE,u}(t) = \overline\beta_u seas_u(t)\left[{(\frac{I_u(t)+\iota_{u}}{P_u(t)})^{\alpha_{u}}+\sum_{\tilde{u}\ne{u}}\frac{v_{u\tilde{u}}}{P_u(t)}\{ (\frac{I_{\tilde{u}}(t)}{P_{\tilde{u}}(t)})^{a_{\tilde{u}}} - (\frac{I_{u}(t)}{P_{u}(t)})^{a_{u}} \}} \right]\frac{d\Gamma_{SE,u}}{dt}\]$seas_u(t)$ is a periodic step function taking value $(1 − A_u)$
during school vacations and $(1 + 0.381 A_u)$ during school terms, defined so that the average
value of $seas_u(t)$ is 1; $ι_u$ describes infected individuals arriving from *outside the study population*; $α_u$ is an exponent describing non-homogeneous mixing of individuals; the multiplicative white noise $\frac{dΓ_{SE,u}}{dt}$ is a derivative of a gamma process $Γ_{SE,u}(t)$ having independent gamma distributed increments. Multiplicative white noise provides a way to model over-dispersion, a phenomenon where data variability is larger than can be explained by binomial or Poisson approximations. Over- dispersion on a multiplicative scale is also called environmental stochasticity, or logarithmic noise, or extra-demographic stochasticity.

To model the observation process, we define $Y_{u,n}$ as a normal approximation to an over-dispersed binomial sample of $C_{u,n}$ with reporting rate ρu.

\[Y_{u,n} ∼ Normal(ρ_{u}c_{u,n}, ρ(1−ρ_u)c_{u,n} + τ^2ρ^2_{u}c_{u,n}^2)\]where $τ$ is a measurement overdispersion parameter.

### Construction of a measles spatPomp object

We construction the model described in Section 6.1 for the simplified situation where $α_u = 1$, $ι_u = 0$ and $c_u = 0$.

```
measles6 <- spatPomp(
data=measles_cases,
units='city', # colnames in table
times='year', # colnames in table
t0=min(measles_cases$year)-1/26
)
```

Internally, unit names are mapped to an index $1,…,U$. The number assigned to each unit can be checked by inspecting their position in `unit_names(measles)`

.

First, we suppose that we have covariate time series, consisting of census population, $P_u(t)$, and lagged birthrate, $b_u(t − t_b)$ (as used in $\mu_{BS}$), in a class `data.frame`

object called `measles_covar`

. The required format is similar to the data argument, though the times do not have to correspond to observation times since **spatPomp** will interpolate the covariates as needed.

we define the movement matrix $(v_{u,\tilde{u}})_{u,\tilde{u}\in 1:U}$ as a global variable in C that will be accessible to all
model components, via the `globals`

argument to `spatPomp()`

.

```
measles_globals <- spatPomp_Csnippet("
const double V[6][6] = {
{0,2.42,0.950,0.919,0.659,0.786},
{2.42,0,0.731,0.722,0.412,0.590},
{0.950,0.731,0,1.229,0.415,0.432},
{0.919,0.722,1.229,0,0.638,0.708},
{0.659,0.412,0.415,0.638,0,0.593},
{0.786,0.590,0.432,0.708,0.593,0}
};
")
```

We now construct a Csnippet for initializing the latent process at time $t_0$. This is done using unit-specific IVPs. Here, the IVPs are `S1_0, ... ,S6_0, E1_0, ... ,E6_0, and I1_0, ... ,I6_0.`

These code for the initial value of the corresponding states, `S1, ... ,S6, E1, ... ,E6, and I1, ... ,I6.`

Additional book-keeping states, `C1, ... ,C6`

, count accumulated cases during an observation interval and so are initialized to zero. The arguments `unit_ivpnames = c(’S’,’E’,’I’)`

and `unit_statenames = c(’S’,’E’,’I’,’C’)`

enable `spatPomp()`

to expect these variables and define then as needed when compiling the Csnippets. Similarly, unit_covarnames = ’P’ declares the corresponding unit-specific population covariate. This is demonstrated in the following Csnippet specifying rinit.

```
measles_rinit <- spatPomp_Csnippet(
unit_statenames = c('S','E','I','C'),
unit_ivpnames = c('S','E','I'), # these three corresponds to S_0, E_0, I_0 below
unit_covarnames = c('P'),
code = "
for (int u=0; u<U; u++) {
S[u] = round(P[u]*S_0[u]);
E[u] = round(P[u]*E_0[u]);
I[u] = round(P[u]*I_0[u]);
C[u] = 0; +}"
)
```

The `rprocess`

Csnippet has to encode only a rule for a single Euler increment from the process model. C definitions are provided by spatPomp for all parameters, state variables, covariates, `t`

, `dt`

and `U`

. Any additional variables required must be declared as C variables within the Csnippet.

```
measles_rprocess <- spatPomp_Csnippet(
unit_statenames = c('S','E','I','C'),
unit_covarnames = c('P','lag_birthrate'),
code="
double beta, seas, Ifrac, mu[7], dN[7];
int u, v;
int BS=0, SE=1, SD=2, EI=3, ED=4, IR=5, ID=6;
beta = R0*(muIR+muD);
t = (t-floor(t))*365.25;
seas = (t>=7&&t<=100)||(t>=115&&t<=199)||(t>=252&&t<=300)||(t>=308&&t<=356)
? 1.0 + A * 0.2411/0.7589 : 1.0 - A;
for(u=0;u<U;u++){
Ifrac = I[u]/P[u];
for (v=0; v < U ; v++) if(v != u)
Ifrac += g * V[u][v]/P[u] * (I[v]/P[v] - I[u]/P[u]);
mu[BS] = lag_birthrate[u];
mu[SE] = beta*seas*Ifrac*rgammawn(sigmaSE,dt)/dt;
mu[SD] = muD;
mu[EI] = muEI;
mu[ED] = muD;
mu[IR] = muIR;
mu[ID] = muD;
dN[BS] = rpois(mu[BS]*dt);
reulermultinom(2,S[u],&mu[SE],dt,&dN[SE]);
reulermultinom(2,E[u],&mu[EI],dt,&dN[EI]);
reulermultinom(2,I[u],&mu[IR],dt,&dN[IR]);
S[u] += dN[BS] - dN[SE] - dN[SD];
E[u] += dN[SE] - dN[EI] - dN[ED];
I[u] += dN[EI] - dN[IR] - dN[ED];
C[u] += dN[EI];
}
"
)
```

Here, we show the Csnippet defining the unit measurement model. The `lik`

variable is pre-defined and is set to the evaluation of the unit measurement density in either the log or natural scale depending on the value of `give_log`

.

```
measles_dunit_measure <- spatPomp_Csnippet("
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
lik = dnorm(cases,m,sqrt(v),give_log);
")
```

The user may also directly supply `dmeasure`

that returns the product of unit-specific measurement densities. The latter is needed to apply **pomp** functions which require dmeasure rather than `dunit_measure`

. We create the corresponding Csnippet in `measles_dmeasure`

, **but do not display the code here**. Next, we construct a Csnippet to code `runit_measure`

```
measles_runit_measure <- spatPomp_Csnippet("
double cases;
double m = rho*C;
double v = m*(1.0-rho+psi*psi*m);
cases = rnorm(m,sqrt(v));
if (cases > 0.0) cases = nearbyint(cases);
else cases = 0.0;
")
```

We also construct, **but do not display**, a Csnippet `measles_rmeasure`

coding the class ‘pomp’ version rmeasure. Next we fill the model with some preset parameters.

```
IVPs <- rep(c(0.032,0.00005,0.00004,0.96791),each=6)
names(IVPs) <- paste0(rep(c('S','E','I','R'),each=6),1:6,"_0")
measles_params <- c(R0=30,A=0.5,muEI=52,muIR=52,muD=0.02, + alpha=1,sigmaSE=0.01,rho=0.5,psi=0.1,g=1500,IVPs)
measles6 <- spatPomp(
data = measles6,
covar = measles_covar,
unit_statenames = c('S','E','I','R','C'),
unit_accumvars = c('C'),
paramnames = names(measles_params),
rinit = measles_rinit,
rprocess = euler(measles_rprocess, delta.t=1/365),
dunit_measure = measles_dunit_measure,
eunit_measure = measles_eunit_measure,
vunit_measure = measles_vunit_measure,
runit_measure = measles_runit_measure,
dmeasure = measles_dmeasure,
rmeasure = measles_rmeasure,
globals = measles_globals
)
```

** Tags: **
Infectious Disease Modelling

** Categories: **
Paper digest

## Comments