Stéphane Adjemian · 4dae089e
--- a/Particle-filters.md
+++ b/Particle-filters.md
+# Introduction
+Let $`s_t`$ be the vector of state (predetermined) variables in a DSGE
+model (we implicitly assume that the set of state variables is finite)
+and $`y_t`$ be the vector of observed variables. This last vector may
+contain predetermined and non predetermined variables. We assume that
+the reduced form DSGE model can be cast into the following state space
+model:
+```math
+\begin{aligned}
+  & s_t = f( s_{t-1}, \varepsilon_t; \bm\theta ) \\
+  & y_t = g( s_t; \bm\theta) + e_t \\
+\end{aligned}
+```
+with $`f(.)`$ the set of state equations, $`g(.)`$ the set of
+measurement equations, $`\bm\theta\in\Theta\subseteq \mathbb R^m`$ a
+vector of $`m`$ parameters, which are assumed to be known,
+$`\varepsilon_t`$ and $`e_t`$ respectively the set of structural
+shocks and additive measurement errors. These innovations are assumed
+to be Gaussian.
+> :warning: **Measurement errors are mandatory** 
+> *You must have as many measurement errors as observed
+> variables to estimate a model with a nonlinear filter, otherwise you
+> will obtain a singularity error (for the covariance matrix of the
+> expectation errors).*
+This reduced form can be obtained using local or global approximation
+methods. Functions $`f(.)`$ and $`g(.)`$ need not to be explicitly
+defined, the only requirement is to have an algorithm that updates the
+state variables and determines the observed variables from the state
+variables. In Dynare, a k-order perturbation approach is used to build
+these functions. Given the aforementioned state-space model, it is
+obvious that the state variables, $`s_t`$, are driven by a first order
+Markov process:
+```math
+  p\left(s_t | s_{0:t-1} \right) = p\left(s_t | s_{t-1} \right)
+```
+meaning that all the information about $`s_t`$ is embodied in
+$`s_{t-1}`$, and that the observations are conditionally independent:
+```math
+  p\left( y_t | y_{1:t-1}, s_{0:t} \right) = p\left(y_t | s_t \right)
+```
+These two properties imply many simplifications that considerably
+alleviate the derivation of the nonlinear filters.
+The likelihood of the model is the density of the sample $`y_{1:T} =
+\left\{ y_t \right\}_{t=1}^{T}`$ conditional on the parameters
+$`\bm\theta`$, which, in principle, can be written as a product of
+conditional densities of $`y_t|y_{1:t-1}`$. The evaluation of these
+densities requires the knowledge of the state variables, $`s_t`$, but
+in general not all of them are observed. We need to track (infer) the
+unobserved (latent) state variables, and this is where the nonlinear
+filters come in. A nonlinear filter is a recursive Bayesian algorithm
+that allows us to track the dynamic of the distribution of the latent
+variables. We describe the different filtering algorithms assuming
+that the parameters $`\bm\theta`$ are known, keeping the issues related
+to estimation for latter.  For the sake of simplicity, we do not
+express the distributions in function of $`\bm\theta`$. This vector
+will appear explicitly when inference about the parameters is
+discussed.
+Using all the available information, the  model and the sample, we need to
+infer the  distribution of the latent variables $`s_t`$. More
+formally, we need to build the density  of $`s_t`$ conditional on the
+sample up to time $`t`$, $`y_{1:t}`$. This can be done recursively
+using the Bayes theorem. We have:
+```math
+  p\left( s_t| y_{1:t} \right) =
+  \frac{ p\left( y_t | s_t \right) p\left( s_t | y_{1:t-1}
+  \right)}{p\left(y_t | y_{1:t-1} \right)}
+```
+with
+```math
+  p\left( y_t | y_{1:t-1}\right) = \int p\left( y_t  | s_t
+  \right)p\left( s_t | y_{1:t-1} \right)\mathrm d s_t
+```
+where the density of $`y_t`$ conditional on $`s_t`$, $`p(y_t|s_t)`$,
+is interpreted as the likelihood of $`s_t`$, while the density of
+$`s_t`$ conditional on the sample up to time $`t-1`$,
+$`p(s_t|y_{1:t-1})`$, can be interpreted as a *prior* belief about the
+state variables at time $`t`$ given the information available at time
+$`t-1`$.
+These equations are not analytically tractable, except for
+continuous state variables in linear/Gaussian model or for discrete
+state variables in linear/conditionally Gaussian model. In these two
+cases, the equations allow to derive respectively the Kalman filter
+and the Hamilton filter. In all other cases, approximations are
+required, like sequential importance sampling.
+# Numerical approximations and sequential importance sampling
+Assume that the continuous distribution of $`s_t`$ conditional on
+$`y_{1:t}`$ can be approximated by a set of particles
+$`\left\{s_t^i\right\}_{i=1:N}`$ and associated weights
+$`\left\{w_t^i\right\}_{i=1:N}`$ summing-up to one. Any moments of
+this conditional distribution can be approximated by a weighted
+average as follows:
+```math
+  \begin{aligned}
+    \mathbb E_{p(s_t|y_{1:t})} \left[ h(s_t) \right]
+    &= \int h(s_t)p(s_t | y_{1:t} ) \mathrm d s_t\\
+    &\approx \sum_{i=1}^N w_t^i h\left(s_t^i\right)
+  \end{aligned}
+```
+If we were able to sample the state variables directly from the
+distribution of $`s_t`$ conditional on $`y_{1:t}`$, we would then
+approximate the moment with uniform weights: $`w_t^i=\frac{1}{N}`$ for
+all $`i`$. Since the density of $`s_t`$ conditional on the sample up
+to time $`t`$ is unknown, we cannot do that. An importance sampling
+algorithm can be used to recover this information and build the non
+uniform weights. It consists in choosing an easy-to-sample proposal
+distribution $`q\left(s_t| y_{1:t} \right)`$, ideally not too
+different from $`p\left(s_t| y_{1:t} \right)`$, and correct the
+weights for the difference between the targeted and proposal
+conditional distributions.
+Choose an easy-to-sample proposal distribution $`q(s_t| y_{1:t} )`$:
+```math
+  \begin{aligned}
+    \mathbb E_{p(s_{t}| y_{1:t})}\left[ h(s_t) \right]
+    & =\int{h({{s}_{t}})\frac{p({{s}_{t}}\left| {{y}_{1:t}}
+    \right)}{q({{s}_{t}}\left| {{y}_{1:t}} \right)}{q({{s}_{t}}\left|  
+{{y}_{1:t}} \right)}\mathrm d{{s}_{t}}}\\
+    &=\mathbb E_{q({{s}_{t}}\left| {{y}_{1:t}} \right)}\left[
+    {\tilde{w}_{t}}({{s}_{t}})h({{s}_{t}}) \right]
+   \end{aligned}
+```
+with *normalized* weights defined as:
+```math
+\tilde{w}_t(s_t) \equiv \frac{p(s_t | y_{1:t})}{q(s_t| y_{1:t} )}.
+```
+These new weights can be viewed as importance ratios, namely the
+correction to implement due to the sampling from the proposal
+distribution. These weights can be computed recursively if the
+proposal distribution satisfies the following condition:
+```math
+q\left(s_t | y_{1:t} \right) = q(s_t|s_{t-1},y_t)q\left(s_{t-1}|
+y_{1:t-1} \right)
+```
+In this case, $`s_t`$ is simply drawn in $`q(s_t|s_{t-1},y_t)`$ and
+the (unormalized) weights can be recursively computed as:
+```math
+\hat{w}_t(s_t) \propto \tilde{w}_{t-1}(s_{t-1})\frac{p\left(y_t | s_t
+\right)p\left(s_t | s_{t-1} \right)}{q\left(s_t| s_{t-1},y_t \right)}
+```
+However, in this case, a degeneracy problem occurs. As $`t`$
+increases, all-but-one particles have negligible weights (essentially
+in large samples). That is the reason why **systematic resampling**
+was initially proposed in the literature. It consists in randomly
+drawing with replacement particles in their empirical distribution
+$`\left\{\tilde{s}_t^i,\tilde{w}_t^i\right\}_{i=1:N}`$. It amounts to
+discard particles with low weights and replicate particles with high
+weights to focus on interesting areas of the distribution using a
+constant number of particles. Doucet proposed measures indicating when
+resampling is necessary. Particles are resampled when a degeneracy
+measure:
+```math
+N_{eff,t} = \frac{1}{\sum_{i=1}^{N}{{{(\tilde{w}_t^i)}^2}}}
+```
+is below a particular threshold (generally $`\frac{N}{2}`$).
+We can then obtain the recursive iterations for a general particle
+filter with systematic resampling: $`\forall t=1,\ldots,T`$ and
+$`\forall i=1,\ldots,N`$, knowing
+$`\left\{s_{t-1}^i,w_{t-1}^i\right\}_{i=1:N}`$ that approximates
+$`p(s_{t-1}|y_{1:t-1})`$.
+- Draw $`\left\{\tilde{s}_t^i\right\}_{i=1:N}`$ from $`q(s_t|s_{t-1}^i,y_t)`$.
+- Evaluate the weights: $`\hat{w}_t^i \propto w_{t-1}^i\frac{p(y_t |
+  \tilde{s}_t^i)p(\tilde{s}_t^i | s_{t-1}^i )}{q(\tilde{s}_t^i |
+  s_{t-1}^i,y_t)}`$.
+- In case of systematic resampling or if $`N_{eff,t}`$ is lesser than
+  the chosen threshold, resample particles and replace
+$`\left\{\tilde{s}_t^i,\tilde{w}_t^i=\frac{{\hat{w}_t^i}}{\sum_{i=1}^{N}{{\hat{w}_t^i}}}\right\}`$
+with $`\left\{ s_t^i,w_t^i=\frac{1}{N}\; \right\}`$ that approximates
+$`p(s_t|y_{1:t})`$.
+The choice for the proposal explains the diversity of filters
+implemented in the literature.
+# Estimation
+## The likelihood expression
+From now on, the parameters set $`\bm\theta`$ reappears explicitly in
+the distribution expressions. In the general framework, we can derive
+the sample likelihood expression:
+```math
+p\left(y_{1:T} |\bm \theta \right.)  = p\left(y_1 | s_0;\bm\theta
+\right.)p\left(s_0 |\bm\theta \right.) \prod_{t=2}^{T}{p\left(y_t |  
+y_{1:t-1};\bm\theta \right.)}
+```
+with the evidence (or unconditional likelihood) $`p\left(y_t |
+y_{1:t-1} ; \bm\theta \right.)`$ that can be approximated for all the
+filters discussed below with:
+```math
+\begin{aligned}
+p\left(y_t | y_{1:t-1};\bm\theta \right)
+& \approx \sum_{i=1}^{N}{\hat{w}_t^i} \\
+& = \sum_{i=1}^{N} {w_{t-1}^i \frac{ p(y_t |\tilde{s}_t^i ; \bm\theta
+) p(\tilde{s}_t^i | s_{t-1}^i ; \bm\theta  
+)}{q(\tilde{s}_t^i|s_{t-1}^i,y_t ; \bm\theta)}}
+\end{aligned}
+```
+In the case of the standard proposal where
+$`q(s_t|s_{t-1},y_t;\bm\theta)=p(s_t | s_{t-1};\bm\theta)`$:
+```math
+p(y_t | y_{1:t-1};\bm\theta )\approx \sum_{i=1}^{N}{ w_{t-1}^i} p(y_t
+| \tilde{s}_t^i ; \bm\theta)
+```
+In case of systematic resampling, as $`w_{t-1}^i = 1/N`$, we simply
+get:
+```math
+p(y_t | y_{1:t-1} ; \bm\theta ) \approx \frac{1}{N} \sum_{i=1}^{N}{
+p(y_t | \tilde{s}_t^i ; \bm\theta ) }
+```
+While necessary to avoid degeneracy, an important issue with
+resampling is that it renders the maximization of the likelihood or
+*posterior* kernel quite difficult (Pitt (2002), Kantas et
+al. (2015)). Even when the seed for random draws is fixed across the
+simulations, the traditional likelihood estimator depends on both
+resampled particles and the unknown parameters. A small change in the
+parameters value will cause a small change in the importance weights
+that will potentially generate a different set of resampled
+particles. This produces a discontinuity in the likelihood criterion
+and explains why applied approaches depart from the usual
+likelihood-based approach.  A first alternative consists in resampling
+but maximizing the estimation criterion with no gradient-based methods
+(such as downhill simplex, S.A., CMAES, ...). A second possibility
+consists in using a MCMC approach to build the *posterior*
+distribution of parameters, using the unbiased likelihood estimator
+provided by a particle filter. The generic denomination of these
+methods is ***P-MCMC*** (Andrieu et al. (2010)). A third alternative
+consists in considering the structural parameters as extra state
+variables.  Parameters are then estimated with a specific particle
+filter at each date, which allows to treat issues such as structural
+breaks or change in behavior (see Yano(2010), Yano et al (2010) or
+Chen et al (2010)). Besides, it is relatively faster since it requires
+only one pass over the sample. This is the ***online*** approach
+proposed by Liu et al (2001).
+## Particle MCMC
+The idea is to build the *posterior* distribution of parameters
+$`\bm\theta`$. From the Bayes rule, we know that the *posterior* density
+is proportional to the product of the sample likelihood $`p\left(y_{1:T} |
+\bm\theta \right)`$ and the *prior* density over the parameters
+$`p(\bm\theta)`$:
+```math
+p\left(\bm\theta  | y_{1:T} \right) \propto p\left(y_{1:T} | \bm\theta
+\right) p(\bm\theta)
+```
+In the case of a linear model, the sample likelihood is evaluated with
+a Kalman filter. In the case of a nonlinear model, an unbiased
+estimator of $`p\left(y_{1:T} | \bm\theta \right)`$ is provided by
+particle filtering ({Delmoral2004}). Except for this difference, we
+can follow the same approach. In a MCMC framework, like the random walk
+Particle Marginal Metropolis-Hastings (PMMH) algorithm, a candidate is
+drawn from a proposal distribution:
+```math
+\bm\theta^\star_j= \bm\theta_{j-1} + \epsilon_j
+```
+with $`\epsilon_j \sim \mathcal{N}(0,\gamma_{RW} V(\Theta_0))`$, where
+the scale parameter $`\gamma_{RW}`$ is set in order to obtain an
+acceptance ratio around 25\%. The *posterior* distribution can then be
+approximated using the following acceptance rule:
+```math
+\bm\theta_j =
+\begin{cases}
+\bm\theta^\star_j & \text{if $U_{[0,1]} \leq \min
+\left\{1,\frac{p\left(\bm\theta^\star_j | y_{1:T}  
+\right)}{p\left(\bm\theta_{j-1} | y_{1:T} \right)}\right\} $} \\
+\bm\theta_{j-1} & \text{otherwise}
+\end{cases}
+```
+For further developments, see Andrieu et al. (2010) (with
+discussions).
+## The online approach
+This approach has been proposed by Liu et al (2001) as an
+alternative to estimate the parameters of nonlinear/non gaussian
+models. Instead of maximizing a likelihood function or posterior
+kernel, they consider the unknown parameters as extra state variables
+and reveal these parameters as any state variable using a nonlinear
+filter. For that purpose, they combine the auxiliary particle filter
+introduced by Pitt et al (1999) with an assumed transition
+distribution for the parameters that serves as extra state equations
+and importance density:
+```math
+p\left(\bm\theta_t | \bm\theta_{t-1} \right) = \mathcal{N}(m_{t-1},b^2V_{t-1})
+```
+This equation produces time-varying parameters (note however that this
+is not internalized by the agents in the DSGE model who consider that
+the parameters are fixed when solving for the expectations in each
+period). The authors adopt a kernel shrinkage technique based on a
+parameter $`a`$ to produce slowly time-varying parameters and also to
+control the variability. Suppose we have a particle swarm
+$`\left\{s_{t-1}^i,\bm\theta_{t-1}^i, w_{t-1}^i\right\}_{i=1:N}`$ from
+the preceeding period. This particle swwarm is updated using:
+```math
+\begin{aligned}
+& \bar{\bm\theta}_{t-1} = \sum_{i=1}^{N}{ w_{t-1}^i \bm\theta_{t-1}^i }\\
+& m_{t-1}^i = a\bm\theta_{t-1}^i + (1-a)\bar{\bm\theta}_{t-1}  \\
+& V_{t-1} = \sum_{i=1}^{N}{ w_{t-1}^i
+(\bm\theta_{t-1}^i-\bar{\bm\theta}_{t-1} )
+(\bm\theta_{t-1}^i-\bar{\bm\theta}_{t-1} )' }\\
+& \bar{s}_t^i = f(s_{t-1}^i,0;m_{t-1}^i )
+\end{aligned}
+```
+As in the auxiliary particle filter, we build a resampling index
+$`k^l`$. The proposal for parameters
+$`\left\{\tilde{\bm\theta}_t^l\right\}_{l=1:N}`$ are drawn from
+$`\mathcal{N}(m_{t-1}^{k^l},b^2V_{t-1})`$. The proposal for state
+variables $`\left\{\tilde{s}_t^l\right\}_{l=1:N}`$ are drawn from
+$`p(s_t | s_{t-1}^{k^l},\tilde{\bm\theta}_t^l)`$. Finally, the weights
+are updated as: $`\hat{w}_t^l \propto p(y_t |
+\tilde{s}_t^l,\tilde{\bm\theta}_t^l)\frac{{w}_{t-1}^{k^l}}{\tilde{\tau}_{t-1}^{k^l}}`$
+and normalized. The new particles swarm is then
+$`\left\{\tilde{s}_t^{l},\tilde{\bm\theta}_t^{l},\tilde{w}_t^l\right\}_{l=1:N}`$. An
+extra resampling step can be added.
+Parameter $`\delta`$ is key to this approach, since it controls the
+shrinkage and the smoothness parameters $`a`$ and $`b`$:
+```math
+\begin{aligned}
+& b^2 = 1-\left( \frac{3\delta-1}{2\delta} \right)^2\\
+& a = \sqrt{1-b^2}
+\end{aligned}
+```
+In the literature, $`\delta`$ is generally chosen in the range
+$`[0.9;0.99]`$. Two outputs can be exploited in this framework: the
+evolution of parameters along the sample and the distribution of
+parameters at the last observation (incorporating full sample
+information), that can be summarized by the usual statistics as the
+mean, the median and some empirical quantiles.
+# The `estimation` command and its options
+Use for instance the following instruction:
+```example
+estimation(datafile=extreme,order=2,mode_compute=0);
+```
+For **order=1**, Dynare estimates the linearized model using the
+Kalman filter. When **order>1**, Dynare switches automatically to the
+nonlinear estimation routines. Here, as **mode_compute=0**, P-MCMC
+will build the *posterior* distribution using 20,000 draws (by
+default) starting from the initial conditions, the likelihood being
+calculated with the nonlinear filter by default, namely the Bootstrap
+particle filter with systematic resampling using standard Kitagawa's
+approach and 5,000 particles.
+Contrarily to linear estimation, it is not possible to calculate
+accurately the *posterior* mode in the presence of resampling because
+it induces discontinuities in the likelihood function. However, a non
+gradient-based method (such as the Nelder and Mead Downhill Simplex)
+can be used (**mode_compute=7** or **8** or **9** for instance). For
+any other choice, a warning message appears and asks the user to
+confirm his/her choice.
+The following table summarizes the options included in **estimation**.
+| Option names               | Values ([default])                |
+| ------                     | ------                            |
+| filter_algorithm           | [sis], apf, nlkf, gf, gmf, cpf    |
+| proposal_approximation     | [cubature], unscented, montecarlo |
+| distribution_approximation | [cubature], unscented, montecarlo |
+| number_of_particles        | [5000]                            |
+| resampling                 | [systematic], none, generic       |
+| resampling_method          | [kitagawa], residual              |
+| mode_compute               | 7, 8, 9                           |
+| mh_replic                  | [20000], 0                        |
+| online_particle_filter     |                                   |
+First of all, the choice of the filter is operated with the keyword
+**filter_algorithm**. The sequential importance sampling (**sis**) is
+the filter by default but one can also choose the auxiliary particle
+filter (**apf**), the nonlinear Kalman filter (**nlkf**), the gaussian
+filter (**gf**), the gaussian-mixture filter (**gmf**),  and the
+conditional particle filter (**cpf**).
+Keyword **online_particle_filter** triggers the online estimation of
+the model, using the method developped by Liu and West. It works for
+**order=1** as well as
+  **order>1**. **options_.particle.liu_west_delta** controls the value
+ of the $`\delta`$ parameter (set equal to 0.9 by default).
+Some dependencies among other keywords should be clarified. They are
+summarized by the following table.
+| Keyword                    | Options                           | [sis]  | apf    | nlkf   | gf            | gmf           | cpf    | online |
+| ------                     | ------                            | ------ | ------ | ------ | ------        | ------        | ------ | ------ |
+| number_of_particles        | [5000]                            | x      | x      | x      | x             | x             | x      | x      |
+| proposal_approximation     | [cubature], unscented, montecarlo |        |        | x      |               |               |        |        |
+| distribution_approximation | [cubature], unscented, montecarlo |        |        |        |               |               |        |        |
+| resampling                 | [systematic], none, generic       | x      | x      |        | x             | x             | x      | x      |
+| resampling_method          | [kitagawa], residual              | x      | x      |        | x             | x             | x      | x      |
+| mode_compute               | 7, 8, 9                           | x      | x      |        | if resampling | if resampling | x      | 0      |
+Some filters require Gaussian approximations, that can be done with
+whether sparse grids methods (cubature or unscented transform) or MC
+methods. One should notice that unscented transform is controled with
+three parameters: $`\alpha`$ and $`\kappa`$ that determine the spread
+of the sigma-points and $`\beta`$ that characterizes the
+(non-gaussian) distribution. By default, we set $`\alpha =
+\kappa=1`$ and $`\beta=2`$. They can be modified by redefining
+**options_.particle.unscented.alpha**,
+**options_.particle.unscented.kappa** and
+**options_.particle.unscented.beta**.
+- The number of particles can be chosen with the keyword
+  **number_of_particles**.
+- The keyword **proposal_approximation** allows to choose the Gaussian
+  approximation for the proposal distribution. One can choose sparse
+  grids approximations (**cubature** by default, or **unscented**) or
+  an approximation using particles (by choosing **montecarlo**). In
+  this former case, the number of particles should be specified. It is
+  only compatible with **filter_algorithm=nlkf**, **gf**, **gmf**, or
+  **cpf** that use the *posterior* state distributions from a
+  nonlinear Kalman filter as proposal.
+- **distribution_approximation** allows to choose the Gaussian
+  approximation for the state variables. It is only effective with the
+  marginal filters (**gf** or **gmf**). One can choose sparse grids
+  approximations (**cubature** by default, or **unscented**) or an
+  approximation using particles (by choosing **montecarlo**). For
+  other filters, state distributions are approximated with particles
+  by default.
+- The **resampling** keyword controls the resampling step of the
+  **current** particles. It is set by default to **systematic** which
+  is highly recommended in the literature for
+  **filter_algorithm=sis**,**cpf**.  For methods based on a
+  pre-selection step (like the auxiliary particle filter and the
+  online filter), the literature advises to choose
+  **resampling=none**. In these cases, it won't affect the
+  pre-selection step that is compulsory but only skip the second one
+  that is generally considered as optional. At last, there is no clear
+  consensus concerning marginal filters (Gaussian and Gaussian-mixture
+  filters). If **resampling=none**, the current means and variances of
+  the states that will feed the next time iteration are calculated
+  using the normalized weights of particles drawn in the NLKF
+  *posterior* distribution. With the default, they will be calculated
+  as the empirical moments of resampled particles. If
+  **resampling=generic**, the option **Neff_threshold** can be defined
+  as a fraction of the sample (generally between 0 and 0.5).
+- Two resampling methods are available for the moment. They can be
+  chosen with the keyword **resampling_method** when resampling is
+  used.
+- **options_.particle.initialization** controls the initial states
+  distribution of the filter. Three possibilities are offered to the
+  user. If **options_.particle.initialization=1** (the default), the
+  initial state vector covariance is the ergodic variance associated
+  to the first order Taylor-approximation of the model. If it equals
+  to 2, the initial state vector covariance is a monte-carlo based
+  estimate of the ergodic variance (consistent with a k-order
+  Taylor-approximation of the model). At last, if it equals to 3, the
+  covariance is a diagonal matrix, whose value is determined by
+  **options_.particle.initial_state_prior_std**.
+# References
+**Amisano G. and Tristani O. (2010)**, Euro Area Inflation Persistence
+  in an Estimated Nonlinear DSGE Model, Journal of Economic Dynamics
+ and Control, 34, 1837-1858.
+**An S. and Schorfheide F. (2007)**, Bayesian Analysis of DSGE Models,
+  Econometric Reviews 26(2-4), 113-172.
+**Andrieu C., Doucet A. and Holenstein R. (2010)**, Particle Markov
+  Chain Monte Carlo Methods. Journal of the Royal Statistical Society:
+Series B (Statistical Methodology), 72(3), 269-342.
+**Arasaratnam I. and Haykin S. (2009a)**, Cubature Kalman Filters,
+  IEEE Transactions on Automatic Control, 54(6), 1254-1269.
+**Arasaratnam I. and Haykin S. (2009b)**, Hybrid Cubature Filter:
+  Theory and Tracking Application, McMaster University, Technical
+ Report CSL\-2009:4, 1\-29.
+**Arulampalam S., Maskell S., Gordon N. and Clapp T. (2002)**, A
+  Tutorial on Particle Filters for on-line Non-linear / Non-gaussian
+Bayesian Tracking, IEEE Trans, Signal Process, 50, 241-254.
+**Cappé O., Godsill S.J. and Moulines E. (2007)**, An Overview of
+  Existing Methods and Recent Advances in Sequential Monte Carlo,
+Proceedings of the IEEE, 95(5), 899-924.
+**Creal D. (2009)**, A Survey of Sequential Monte Carlo Methods for
+  Economics and Finance, Econometric Reviews, 31(3), 245-296.
+**Del Moral P. (2004)**, Feynman Kac Formulae: Genealogical and
+  Interacting Particle Systems with Applications, New-York Springer.
+**Douc R., Cappé O. and Moulines E. (2005)**, Comparison of Resampling
+  Schemes for Particle Filtering, 4th International Symposium on Image
+and Signal Processing and Analysis (ISPA), Zagreb, Croatia.
+**Doucet A., Freitas J.G. and Gordon J. (2001)**, Sequential Monte
+  Carlo Methods in Practice, Springer Verlag, New York.
+**Doucet A., Godsill S. and Andrieu C. (2000)**, On Sequential Monte
+  Carlo Sampling Methods for Bayesian Filtering, Statistics and
+Computing, 10, 197-208.
+**Doucet, A. and Johansen A. (2009)**, A Tutorial on Particle
+  Filtering and Smoothing: Fifteen Years Later, The Oxford Handbook of
+Nonlinear filtering, Oxford University Press.
+**Fernandez-Villaverde J. and Rubio-Ramirez J.F. (2005)**, Estimating
+  Dynamic Equilibrium Economies: Linear versus Nonlinear Likelihood,
+Journal of Applied Econometrics 20, 891-910.
+**Fernandez-Villaverde, J. and Rubio-Ramirez J.F. (2007)**, Estimating
+  Macroeconomic Models: a Likelihood Approach, The Review of Economic
+Studies 74(4), 1059-1087.
+**Fernandez-Villaverde, J., Rubio-Ramirez J.F. and Schorfheide
+  F. (2015)**, Solution and Estimation Methods for DSGE Models,
+ Handbook of Macroeconomics, Vol.2.
+**Gordon N., Salmond D. and Smith A.F.M. (1993)**, Novel Approach to
+  Nonlinear and Non-Gaussian Bayesian State Estimation, IEE
+Proceedings-F, 140, 107-113.
+**Herbst E. and Schorfheide F. (2015)**, Bayesian Estimation of DSGE
+  Models, online version.
+**Julier S.J. and Uhlmann J.K. (1997)**, A New Extension of the Kalman
+  Filter to Nonlinear Systems, Proceedings of AeroSense, the 11th Int.
+Symp. on Aerospace/Defense Sensing, Simulation and Controls.
+**Kantas N., Doucet A., Singh S.S., Maciejowski J., and Chopin
+  N. (2015)**, On Particle Methods for Parameter Estimation in
+ State-Space Models, Statistical Science, 30(3), 328-351.
+**Kitagawa G. (1987)**, Non\-Gaussian State Space Modeling of
+  Nonstationary Time Series, Journal of the American Statistical
+Association 82(400), 1023-1063.
+**Kitagawa G. (1996)**, Monte Carlo Filter and Smoother for
+  Non-Gaussian Nonlinear State Space Models, Journal of Computational
+and Graphical Statistics 5(1), 1-25.
+**Kotecha J.H. and Djuric P.M. (2003a)**, Gaussian Particle Filtering,
+  IEEE transactions on signal
+processing, 51(10),2592-2601.
+**Kotecha J.H. and Djuric P.M. (2003b)**, Gaussian Sum Particle
+  Filtering, IEEE transactions on signal processing, 51(10),
+ 2602-2612.
+**Liu J. and West M. (2001)**, Combined Parameter and State Estimation
+  in Simulation-Based Filtering, in Sequential Monte Carlo Methods in
+Practice, eds Doucet, Freitas and Gordon, Springer Verlag, New York.
+**Malik S. and Pitt M. (2011)**, Particle Filters for Continuous
+  Likelihood Evaluation and Maximisation, Journal of Econometrics,
+165(2), 190-209.
+**Murray L.M., Jones E.M. and Parslow J. (2013)**, On Disturbance
+  State-Space Models and the Particle Marginal Metropolis-Hastings
+Sampler, working paper, arXiv:1202.6159v3.
+**Pitt M. and Shephard N. (1999)**, Filtering via Simulation:
+  Auxiliary Particle Filters. Journal of the American Statistical
+Association, 94(446), 590-599.
+**van der Merwe R. and Wan E. (2003)**, Gaussian Mixture Sigma-Point
+  Particle Filters for Sequential Probabilistic Inference in Dynamic
+State-Space Models, mimeo.
+**Winschel V. and Kraltzig M. (2010)**, Solving, Estimating, and
+  Selecting Nonlinear Dynamic Models without the Curse of
+Dimensionality, Econometrica, 78(2), 803-821.