Introduction
Let s_t
be the vector of state (predetermined) variables in a DSGE
model (we implicitly assume that the set of state variables is finite)
and y_t
be the vector of observed variables. This last vector may
contain predetermined and non predetermined variables. We assume that
the reduced form DSGE model can be cast into the following state space
model:
\begin{aligned}
& s_t = f( s_{t-1}, \varepsilon_t; \bm\theta ) \\
& y_t = g( s_t; \bm\theta) + e_t \\
\end{aligned}
with f(.)
the set of state equations, g(.)
the set of
measurement equations, \bm\theta\in\Theta\subseteq \mathbb R^m
a
vector of m
parameters, which are assumed to be known,
\varepsilon_t
and e_t
respectively the set of structural
shocks and additive measurement errors. These innovations are assumed
to be Gaussian.
⚠ Measurement errors are mandatory You must have as many measurement errors as observed variables to estimate a model with a nonlinear filter, otherwise you will obtain a singularity error (for the covariance matrix of the expectation errors).
This reduced form can be obtained using local or global approximation
methods. Functions f(.)
and g(.)
need not to be explicitly
defined, the only requirement is to have an algorithm that updates the
state variables and determines the observed variables from the state
variables. In Dynare, a k-order perturbation approach is used to build
these functions. Given the aforementioned state-space model, it is
obvious that the state variables, s_t
, are driven by a first order
Markov process:
p\left(s_t | s_{0:t-1} \right) = p\left(s_t | s_{t-1} \right)
meaning that all the information about s_t
is embodied in
s_{t-1}
, and that the observations are conditionally independent:
p\left( y_t | y_{1:t-1}, s_{0:t} \right) = p\left(y_t | s_t \right)
These two properties imply many simplifications that considerably alleviate the derivation of the nonlinear filters.
The likelihood of the model is the density of the sample y_{1:T} = \left\{ y_t \right\}_{t=1}^{T}
conditional on the parameters
\bm\theta
, which, in principle, can be written as a product of
conditional densities of y_t|y_{1:t-1}
. The evaluation of these
densities requires the knowledge of the state variables, s_t
, but
in general not all of them are observed. We need to track (infer) the
unobserved (latent) state variables, and this is where the nonlinear
filters come in. A nonlinear filter is a recursive Bayesian algorithm
that allows us to track the dynamic of the distribution of the latent
variables. We describe the different filtering algorithms assuming
that the parameters \bm\theta
are known, keeping the issues related
to estimation for latter. For the sake of simplicity, we do not
express the distributions in function of \bm\theta
. This vector
will appear explicitly when inference about the parameters is
discussed.
Using all the available information, the model and the sample, we need to
infer the distribution of the latent variables s_t
. More
formally, we need to build the density of s_t
conditional on the
sample up to time t
, y_{1:t}
. This can be done recursively
using the Bayes theorem. We have:
p\left( s_t| y_{1:t} \right) =
\frac{ p\left( y_t | s_t \right) p\left( s_t | y_{1:t-1}
\right)}{p\left(y_t | y_{1:t-1} \right)}
with
p\left( y_t | y_{1:t-1}\right) = \int p\left( y_t | s_t
\right)p\left( s_t | y_{1:t-1} \right)\mathrm d s_t
where the density of y_t
conditional on s_t
, p(y_t|s_t)
,
is interpreted as the likelihood of s_t
, while the density of
s_t
conditional on the sample up to time t-1
,
p(s_t|y_{1:t-1})
, can be interpreted as a prior belief about the
state variables at time t
given the information available at time
t-1
.
These equations are not analytically tractable, except for continuous state variables in linear/Gaussian model or for discrete state variables in linear/conditionally Gaussian model. In these two cases, the equations allow to derive respectively the Kalman filter and the Hamilton filter. In all other cases, approximations are required, like sequential importance sampling.
Numerical approximations and sequential importance sampling
Assume that the continuous distribution of s_t
conditional on
y_{1:t}
can be approximated by a set of particles
\left\{s_t^i\right\}_{i=1:N}
and associated weights
\left\{w_t^i\right\}_{i=1:N}
summing-up to one. Any moments of
this conditional distribution can be approximated by a weighted
average as follows:
\begin{aligned}
\mathbb E_{p(s_t|y_{1:t})} \left[ h(s_t) \right]
&= \int h(s_t)p(s_t | y_{1:t} ) \mathrm d s_t\\
&\approx \sum_{i=1}^N w_t^i h\left(s_t^i\right)
\end{aligned}
If we were able to sample the state variables directly from the
distribution of s_t
conditional on y_{1:t}
, we would then
approximate the moment with uniform weights: w_t^i=\frac{1}{N}
for
all i
. Since the density of s_t
conditional on the sample up
to time t
is unknown, we cannot do that. An importance sampling
algorithm can be used to recover this information and build the non
uniform weights. It consists in choosing an easy-to-sample proposal
distribution q\left(s_t| y_{1:t} \right)
, ideally not too
different from p\left(s_t| y_{1:t} \right)
, and correct the
weights for the difference between the targeted and proposal
conditional distributions.
Choose an easy-to-sample proposal distribution q(s_t| y_{1:t} )
:
\begin{aligned}
\mathbb E_{p(s_{t}| y_{1:t})}\left[ h(s_t) \right]
& =\int{h({{s}_{t}})\frac{p({{s}_{t}}\left| {{y}_{1:t}}
\right)}{q({{s}_{t}}\left| {{y}_{1:t}} \right)}{q({{s}_{t}}\left|
{{y}_{1:t}} \right)}\mathrm d{{s}_{t}}}\\
&=\mathbb E_{q({{s}_{t}}\left| {{y}_{1:t}} \right)}\left[
{\tilde{w}_{t}}({{s}_{t}})h({{s}_{t}}) \right]
\end{aligned}
with normalized weights defined as:
\tilde{w}_t(s_t) \equiv \frac{p(s_t | y_{1:t})}{q(s_t| y_{1:t} )}.
These new weights can be viewed as importance ratios, namely the correction to implement due to the sampling from the proposal distribution. These weights can be computed recursively if the proposal distribution satisfies the following condition:
q\left(s_t | y_{1:t} \right) = q(s_t|s_{t-1},y_t)q\left(s_{t-1}|
y_{1:t-1} \right)
In this case, s_t
is simply drawn in q(s_t|s_{t-1},y_t)
and
the (unormalized) weights can be recursively computed as:
\hat{w}_t(s_t) \propto \tilde{w}_{t-1}(s_{t-1})\frac{p\left(y_t | s_t
\right)p\left(s_t | s_{t-1} \right)}{q\left(s_t| s_{t-1},y_t \right)}
However, in this case, a degeneracy problem occurs. As t
increases, all-but-one particles have negligible weights (essentially
in large samples). That is the reason why systematic resampling
was initially proposed in the literature. It consists in randomly
drawing with replacement particles in their empirical distribution
\left\{\tilde{s}_t^i,\tilde{w}_t^i\right\}_{i=1:N}
. It amounts to
discard particles with low weights and replicate particles with high
weights to focus on interesting areas of the distribution using a
constant number of particles. Doucet proposed measures indicating when
resampling is necessary. Particles are resampled when a degeneracy
measure:
N_{eff,t} = \frac{1}{\sum_{i=1}^{N}{{{(\tilde{w}_t^i)}^2}}}
is below a particular threshold (generally \frac{N}{2}
).
We can then obtain the recursive iterations for a general particle
filter with systematic resampling: \forall t=1,\ldots,T
and
\forall i=1,\ldots,N
, knowing
\left\{s_{t-1}^i,w_{t-1}^i\right\}_{i=1:N}
that approximates
p(s_{t-1}|y_{1:t-1})
.
-
Draw
\left\{\tilde{s}_t^i\right\}_{i=1:N}
fromq(s_t|s_{t-1}^i,y_t)
. -
Evaluate the weights:
\hat{w}_t^i \propto w_{t-1}^i\frac{p(y_t | \tilde{s}_t^i)p(\tilde{s}_t^i | s_{t-1}^i )}{q(\tilde{s}_t^i | s_{t-1}^i,y_t)}
. -
In case of systematic resampling or if
N_{eff,t}
is lesser than the chosen threshold, resample particles and replace\left\{\tilde{s}_t^i,\tilde{w}_t^i=\frac{{\hat{w}_t^i}}{\sum_{i=1}^{N}{{\hat{w}_t^i}}}\right\}
with\left\{ s_t^i,w_t^i=\frac{1}{N}\; \right\}
that approximatesp(s_t|y_{1:t})
.
The choice for the proposal explains the diversity of filters implemented in the literature.
Estimation
The likelihood expression
From now on, the parameters set \bm\theta
reappears explicitly in
the distribution expressions. In the general framework, we can derive
the sample likelihood expression:
p\left(y_{1:T} |\bm \theta \right.) = p\left(y_1 | s_0;\bm\theta
\right.)p\left(s_0 |\bm\theta \right.) \prod_{t=2}^{T}{p\left(y_t |
y_{1:t-1};\bm\theta \right.)}
with the evidence (or unconditional likelihood) p\left(y_t | y_{1:t-1} ; \bm\theta \right.)
that can be approximated for all the
filters discussed below with:
\begin{aligned}
p\left(y_t | y_{1:t-1};\bm\theta \right)
& \approx \sum_{i=1}^{N}{\hat{w}_t^i} \\
& = \sum_{i=1}^{N} {w_{t-1}^i \frac{ p(y_t |\tilde{s}_t^i ; \bm\theta
) p(\tilde{s}_t^i | s_{t-1}^i ; \bm\theta
)}{q(\tilde{s}_t^i|s_{t-1}^i,y_t ; \bm\theta)}}
\end{aligned}
In the case of the standard proposal where
q(s_t|s_{t-1},y_t;\bm\theta)=p(s_t | s_{t-1};\bm\theta)
:
p(y_t | y_{1:t-1};\bm\theta )\approx \sum_{i=1}^{N}{ w_{t-1}^i} p(y_t
| \tilde{s}_t^i ; \bm\theta)
In case of systematic resampling, as w_{t-1}^i = 1/N
, we simply
get:
p(y_t | y_{1:t-1} ; \bm\theta ) \approx \frac{1}{N} \sum_{i=1}^{N}{
p(y_t | \tilde{s}_t^i ; \bm\theta ) }
While necessary to avoid degeneracy, an important issue with resampling is that it renders the maximization of the likelihood or posterior kernel quite difficult (Pitt (2002), Kantas et al. (2015)). Even when the seed for random draws is fixed across the simulations, the traditional likelihood estimator depends on both resampled particles and the unknown parameters. A small change in the parameters value will cause a small change in the importance weights that will potentially generate a different set of resampled particles. This produces a discontinuity in the likelihood criterion and explains why applied approaches depart from the usual likelihood-based approach. A first alternative consists in resampling but maximizing the estimation criterion with no gradient-based methods (such as downhill simplex, S.A., CMAES, ...). A second possibility consists in using a MCMC approach to build the posterior distribution of parameters, using the unbiased likelihood estimator provided by a particle filter. The generic denomination of these methods is P-MCMC (Andrieu et al. (2010)). A third alternative consists in considering the structural parameters as extra state variables. Parameters are then estimated with a specific particle filter at each date, which allows to treat issues such as structural breaks or change in behavior (see Yano(2010), Yano et al (2010) or Chen et al (2010)). Besides, it is relatively faster since it requires only one pass over the sample. This is the online approach proposed by Liu et al (2001).
Particle MCMC
The idea is to build the posterior distribution of parameters
\bm\theta
. From the Bayes rule, we know that the posterior density
is proportional to the product of the sample likelihood p\left(y_{1:T} | \bm\theta \right)
and the prior density over the parameters
p(\bm\theta)
:
p\left(\bm\theta | y_{1:T} \right) \propto p\left(y_{1:T} | \bm\theta
\right) p(\bm\theta)
In the case of a linear model, the sample likelihood is evaluated with
a Kalman filter. In the case of a nonlinear model, an unbiased
estimator of p\left(y_{1:T} | \bm\theta \right)
is provided by
particle filtering ({Delmoral2004}). Except for this difference, we
can follow the same approach. In a MCMC framework, like the random walk
Particle Marginal Metropolis-Hastings (PMMH) algorithm, a candidate is
drawn from a proposal distribution:
\bm\theta^\star_j= \bm\theta_{j-1} + \epsilon_j
with \epsilon_j \sim \mathcal{N}(0,\gamma_{RW} V(\Theta_0))
, where
the scale parameter \gamma_{RW}
is set in order to obtain an
acceptance ratio around 25%. The posterior distribution can then be
approximated using the following acceptance rule:
\bm\theta_j =
\begin{cases}
\bm\theta^\star_j & \text{if $U_{[0,1]} \leq \min
\left\{1,\frac{p\left(\bm\theta^\star_j | y_{1:T}
\right)}{p\left(\bm\theta_{j-1} | y_{1:T} \right)}\right\} $} \\
\bm\theta_{j-1} & \text{otherwise}
\end{cases}
For further developments, see Andrieu et al. (2010) (with discussions).
The online approach
This approach has been proposed by Liu et al (2001) as an alternative to estimate the parameters of nonlinear/non gaussian models. Instead of maximizing a likelihood function or posterior kernel, they consider the unknown parameters as extra state variables and reveal these parameters as any state variable using a nonlinear filter. For that purpose, they combine the auxiliary particle filter introduced by Pitt et al (1999) with an assumed transition distribution for the parameters that serves as extra state equations and importance density:
p\left(\bm\theta_t | \bm\theta_{t-1} \right) = \mathcal{N}(m_{t-1},b^2V_{t-1})
This equation produces time-varying parameters (note however that this
is not internalized by the agents in the DSGE model who consider that
the parameters are fixed when solving for the expectations in each
period). The authors adopt a kernel shrinkage technique based on a
parameter a
to produce slowly time-varying parameters and also to
control the variability. Suppose we have a particle swarm
\left\{s_{t-1}^i,\bm\theta_{t-1}^i, w_{t-1}^i\right\}_{i=1:N}
from
the preceeding period. This particle swwarm is updated using:
\begin{aligned}
& \bar{\bm\theta}_{t-1} = \sum_{i=1}^{N}{ w_{t-1}^i \bm\theta_{t-1}^i }\\
& m_{t-1}^i = a\bm\theta_{t-1}^i + (1-a)\bar{\bm\theta}_{t-1} \\
& V_{t-1} = \sum_{i=1}^{N}{ w_{t-1}^i
(\bm\theta_{t-1}^i-\bar{\bm\theta}_{t-1} )
(\bm\theta_{t-1}^i-\bar{\bm\theta}_{t-1} )' }\\
& \bar{s}_t^i = f(s_{t-1}^i,0;m_{t-1}^i )
\end{aligned}
As in the auxiliary particle filter, we build a resampling index
k^l
. The proposal for parameters
\left\{\tilde{\bm\theta}_t^l\right\}_{l=1:N}
are drawn from
\mathcal{N}(m_{t-1}^{k^l},b^2V_{t-1})
. The proposal for state
variables \left\{\tilde{s}_t^l\right\}_{l=1:N}
are drawn from
p(s_t | s_{t-1}^{k^l},\tilde{\bm\theta}_t^l)
. Finally, the weights
are updated as: \hat{w}_t^l \propto p(y_t | \tilde{s}_t^l,\tilde{\bm\theta}_t^l)\frac{{w}_{t-1}^{k^l}}{\tilde{\tau}_{t-1}^{k^l}}
and normalized. The new particles swarm is then
\left\{\tilde{s}_t^{l},\tilde{\bm\theta}_t^{l},\tilde{w}_t^l\right\}_{l=1:N}
. An
extra resampling step can be added.
Parameter \delta
is key to this approach, since it controls the
shrinkage and the smoothness parameters a
and b
:
\begin{aligned}
& b^2 = 1-\left( \frac{3\delta-1}{2\delta} \right)^2\\
& a = \sqrt{1-b^2}
\end{aligned}
In the literature, \delta
is generally chosen in the range
[0.9;0.99]
. Two outputs can be exploited in this framework: the
evolution of parameters along the sample and the distribution of
parameters at the last observation (incorporating full sample
information), that can be summarized by the usual statistics as the
mean, the median and some empirical quantiles.
estimation
command and its options
The Use for instance the following instruction:
estimation(datafile=extreme,order=2,mode_compute=0);
For order=1, Dynare estimates the linearized model using the Kalman filter. When order>1, Dynare switches automatically to the nonlinear estimation routines. Here, as mode_compute=0, P-MCMC will build the posterior distribution using 20,000 draws (by default) starting from the initial conditions, the likelihood being calculated with the nonlinear filter by default, namely the Bootstrap particle filter with systematic resampling using standard Kitagawa's approach and 5,000 particles.
Contrarily to linear estimation, it is not possible to calculate accurately the posterior mode in the presence of resampling because it induces discontinuities in the likelihood function. However, a non gradient-based method (such as the Nelder and Mead Downhill Simplex) can be used (mode_compute=7 or 8 or 9 for instance). For any other choice, a warning message appears and asks the user to confirm his/her choice.
The following table summarizes the options included in estimation.
Option names | Values ([default]) |
---|---|
filter_algorithm | [sis], apf, nlkf, gf, gmf, cpf |
proposal_approximation | [cubature], unscented, montecarlo |
distribution_approximation | [cubature], unscented, montecarlo |
number_of_particles | [5000] |
resampling | [systematic], none, generic |
resampling_method | [kitagawa], residual |
mode_compute | 7, 8, 9 |
mh_replic | [20000], 0 |
online_particle_filter |
First of all, the choice of the filter is operated with the keyword filter_algorithm. The sequential importance sampling (sis) is the filter by default but one can also choose the auxiliary particle filter (apf), the nonlinear Kalman filter (nlkf), the gaussian filter (gf), the gaussian-mixture filter (gmf), and the conditional particle filter (cpf).
Keyword online_particle_filter triggers the online estimation of
the model, using the method developped by Liu and West. It works for
order=1 as well as
order>1. options_.particle.liu_west_delta controls the value
of the \delta
parameter (set equal to 0.9 by default).
Some dependencies among other keywords should be clarified. They are summarized by the following table.
Keyword | Options | [sis] | apf | nlkf | gf | gmf | cpf | online |
---|---|---|---|---|---|---|---|---|
number_of_particles | [5000] | x | x | x | x | x | x | x |
proposal_approximation | [cubature], unscented, montecarlo | x | ||||||
distribution_approximation | [cubature], unscented, montecarlo | |||||||
resampling | [systematic], none, generic | x | x | x | x | x | x | |
resampling_method | [kitagawa], residual | x | x | x | x | x | x | |
mode_compute | 7, 8, 9 | x | x | if resampling | if resampling | x | 0 |
Some filters require Gaussian approximations, that can be done with
whether sparse grids methods (cubature or unscented transform) or MC
methods. One should notice that unscented transform is controled with
three parameters: \alpha
and \kappa
that determine the spread
of the sigma-points and \beta
that characterizes the
(non-gaussian) distribution. By default, we set \alpha = \kappa=1
and \beta=2
. They can be modified by redefining
options_.particle.unscented.alpha,
options_.particle.unscented.kappa and
options_.particle.unscented.beta.
-
The number of particles can be chosen with the keyword number_of_particles.
-
The keyword proposal_approximation allows to choose the Gaussian approximation for the proposal distribution. One can choose sparse grids approximations (cubature by default, or unscented) or an approximation using particles (by choosing montecarlo). In this former case, the number of particles should be specified. It is only compatible with filter_algorithm=nlkf, gf, gmf, or cpf that use the posterior state distributions from a nonlinear Kalman filter as proposal.
-
distribution_approximation allows to choose the Gaussian approximation for the state variables. It is only effective with the marginal filters (gf or gmf). One can choose sparse grids approximations (cubature by default, or unscented) or an approximation using particles (by choosing montecarlo). For other filters, state distributions are approximated with particles by default.
-
The resampling keyword controls the resampling step of the current particles. It is set by default to systematic which is highly recommended in the literature for filter_algorithm=sis,cpf. For methods based on a pre-selection step (like the auxiliary particle filter and the online filter), the literature advises to choose resampling=none. In these cases, it won't affect the pre-selection step that is compulsory but only skip the second one that is generally considered as optional. At last, there is no clear consensus concerning marginal filters (Gaussian and Gaussian-mixture filters). If resampling=none, the current means and variances of the states that will feed the next time iteration are calculated using the normalized weights of particles drawn in the NLKF posterior distribution. With the default, they will be calculated as the empirical moments of resampled particles. If resampling=generic, the option Neff_threshold can be defined as a fraction of the sample (generally between 0 and 0.5).
-
Two resampling methods are available for the moment. They can be chosen with the keyword resampling_method when resampling is used.
-
options_.particle.initialization controls the initial states distribution of the filter. Three possibilities are offered to the user. If options_.particle.initialization=1 (the default), the initial state vector covariance is the ergodic variance associated to the first order Taylor-approximation of the model. If it equals to 2, the initial state vector covariance is a monte-carlo based estimate of the ergodic variance (consistent with a k-order Taylor-approximation of the model). At last, if it equals to 3, the covariance is a diagonal matrix, whose value is determined by options_.particle.initial_state_prior_std.
References
Amisano G. and Tristani O. (2010), Euro Area Inflation Persistence in an Estimated Nonlinear DSGE Model, Journal of Economic Dynamics and Control, 34, 1837-1858.
An S. and Schorfheide F. (2007), Bayesian Analysis of DSGE Models, Econometric Reviews 26(2-4), 113-172.
Andrieu C., Doucet A. and Holenstein R. (2010), Particle Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(3), 269-342.
Arasaratnam I. and Haykin S. (2009a), Cubature Kalman Filters, IEEE Transactions on Automatic Control, 54(6), 1254-1269.
Arasaratnam I. and Haykin S. (2009b), Hybrid Cubature Filter: Theory and Tracking Application, McMaster University, Technical Report CSL-2009:4, 1-29.
Arulampalam S., Maskell S., Gordon N. and Clapp T. (2002), A Tutorial on Particle Filters for on-line Non-linear / Non-gaussian Bayesian Tracking, IEEE Trans, Signal Process, 50, 241-254.
Cappé O., Godsill S.J. and Moulines E. (2007), An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo, Proceedings of the IEEE, 95(5), 899-924.
Creal D. (2009), A Survey of Sequential Monte Carlo Methods for Economics and Finance, Econometric Reviews, 31(3), 245-296.
Del Moral P. (2004), Feynman Kac Formulae: Genealogical and Interacting Particle Systems with Applications, New-York Springer.
Douc R., Cappé O. and Moulines E. (2005), Comparison of Resampling Schemes for Particle Filtering, 4th International Symposium on Image and Signal Processing and Analysis (ISPA), Zagreb, Croatia.
Doucet A., Freitas J.G. and Gordon J. (2001), Sequential Monte Carlo Methods in Practice, Springer Verlag, New York.
Doucet A., Godsill S. and Andrieu C. (2000), On Sequential Monte Carlo Sampling Methods for Bayesian Filtering, Statistics and Computing, 10, 197-208.
Doucet, A. and Johansen A. (2009), A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later, The Oxford Handbook of Nonlinear filtering, Oxford University Press.
Fernandez-Villaverde J. and Rubio-Ramirez J.F. (2005), Estimating Dynamic Equilibrium Economies: Linear versus Nonlinear Likelihood, Journal of Applied Econometrics 20, 891-910.
Fernandez-Villaverde, J. and Rubio-Ramirez J.F. (2007), Estimating Macroeconomic Models: a Likelihood Approach, The Review of Economic Studies 74(4), 1059-1087.
Fernandez-Villaverde, J., Rubio-Ramirez J.F. and Schorfheide F. (2015), Solution and Estimation Methods for DSGE Models, Handbook of Macroeconomics, Vol.2.
Gordon N., Salmond D. and Smith A.F.M. (1993), Novel Approach to Nonlinear and Non-Gaussian Bayesian State Estimation, IEE Proceedings-F, 140, 107-113.
Herbst E. and Schorfheide F. (2015), Bayesian Estimation of DSGE Models, online version.
Julier S.J. and Uhlmann J.K. (1997), A New Extension of the Kalman Filter to Nonlinear Systems, Proceedings of AeroSense, the 11th Int. Symp. on Aerospace/Defense Sensing, Simulation and Controls.
Kantas N., Doucet A., Singh S.S., Maciejowski J., and Chopin N. (2015), On Particle Methods for Parameter Estimation in State-Space Models, Statistical Science, 30(3), 328-351.
Kitagawa G. (1987), Non-Gaussian State Space Modeling of Nonstationary Time Series, Journal of the American Statistical Association 82(400), 1023-1063.
Kitagawa G. (1996), Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models, Journal of Computational and Graphical Statistics 5(1), 1-25.
Kotecha J.H. and Djuric P.M. (2003a), Gaussian Particle Filtering, IEEE transactions on signal processing, 51(10),2592-2601.
Kotecha J.H. and Djuric P.M. (2003b), Gaussian Sum Particle Filtering, IEEE transactions on signal processing, 51(10), 2602-2612.
Liu J. and West M. (2001), Combined Parameter and State Estimation in Simulation-Based Filtering, in Sequential Monte Carlo Methods in Practice, eds Doucet, Freitas and Gordon, Springer Verlag, New York.
Malik S. and Pitt M. (2011), Particle Filters for Continuous Likelihood Evaluation and Maximisation, Journal of Econometrics, 165(2), 190-209.
Murray L.M., Jones E.M. and Parslow J. (2013), On Disturbance State-Space Models and the Particle Marginal Metropolis-Hastings Sampler, working paper, arXiv:1202.6159v3.
Pitt M. and Shephard N. (1999), Filtering via Simulation: Auxiliary Particle Filters. Journal of the American Statistical Association, 94(446), 590-599.
van der Merwe R. and Wan E. (2003), Gaussian Mixture Sigma-Point Particle Filters for Sequential Probabilistic Inference in Dynamic State-Space Models, mimeo.
Winschel V. and Kraltzig M. (2010), Solving, Estimating, and Selecting Nonlinear Dynamic Models without the Curse of Dimensionality, Econometrica, 78(2), 803-821.