Added parameter derivatives of perturbation solution up to 3 order
Preliminary comments
I finished the identification toolbox at orders two and three using the pruned state space system, but before I merge request this, I decided to first merge the new functionality to compute parameter derivatives of perturbation solution matrices at orders two and three. So after this is approved, I would merge the improved identification toolbox.
I guess @rattoma, @sebastien, and @MichelJuillard are best choices to review this. Note that @rattoma already implemented the parameter derivatives of g_x
and g_u
analytically (and numerically), and I rely heavily on his work in get_first_order_solution_params_derivs.m
(previously getH.m
). My additions are mainly to this function and thus it is renamed to get_perturbation_params_derivs.m
.
I outline the main idea first and then provide some more detailed changes I made to the functions.
Main idea
This merge request is concerned with the analytical computation of the parameter derivatives of first, second and third order perturbation solution matrices, i.e. using closed-form expressions to efficiently compute the derivative of g_x
, g_u
, g_{xx}
, g_{xu}
, g_{uu}
, g_{\sigma\sigma}
, g_{xxx}
, g_{xxu}
, g_{xuu}
, g_{uuu}
, g_{x\sigma\sigma}
, g_{u\sigma\sigma}
with respect to model parameters \theta
. Note that \theta
contains model parameters, stderr and corr parameters of shocks. stderr and corr parameters of measurement errors are not yet supported, (they can easily be included as exogenous shocks). The availability of such derivatives is beneficial in terms of more reliable analysis of model sensitivity and parameter identifiability as well as more efficient estimation methods, in particular for models solved up to third order, as it is well-known that numerical derivatives are a tricky business, especially for large models.
References for my approach are:
- Iskrev (2008, 2010) and Schmitt-Grohé and Uribe (2012, Appendix) who were the first to compute the parameter derivatives analytically at first order, however, using inefficient (sparse) Kronecker products.
- Mutschler (2015) who provides the expressions for a second-order, but again using inefficient (sparse) Kronecker products.
- Ratto and Iskrev (2012) who show how the first-order system can be solved accurately, fast and efficiently using existing numerical algorithms for generalized Sylvester equations by taking the parameter derivative with respect to each parameter separately.
- Julliard and Kamenik (2004) who provide the perturbation solution equation system in tensor notation at any order k.
- Levintal (2017) who introduces permutation matrices to express the perturbation solution equation system in matrix notation up to fifth order.
The basic idea of this merge request is to take the second- and third-order perturbation solution systems in Julliard and Kamenik (2004), unfold these into an equivalent matrix representation using permutation matrices as in Levintal (2017). Then extending Ratto and Iskrev (2012) one takes the derivative with respect to each parameter separately and gets a computational problem that is linear, albeit large, as it involves either solving generalized Sylvester equations or taking inverses of highly sparse matrices. I will now briefly summarize the perturbation solution system at third order and the system that results when taking the derivative with respect to parameters.
Perturbation Solution
The following systems arise at first, second, and third order:
(ghx): f_{x} z_{x} = f_{y_{-}^*} + f_{y_0} g_{x} + f_{y_{+}^{**}} g^{**}_{x} g^{*}_{x}= A g_{x} + f_{y_{-}^*}=0
(ghu): f_{z} z_{u} = f_{y_0} g_{u} + f_{y_{+}^{**}} g^{**}_{x} g^{*}_{u} + f_{u}= A g_u + f_u = 0
(ghxx) : A g_{xx} + B g_{xx} \left(g^{*}_{x} \otimes g^{*}_{x}\right) + f_{zz} \left( z_{x} \otimes z_{x} \right) = 0
(ghxu) : A g_{xu} + B g_{xx} \left(g^{*}_{x} \otimes g^{*}_{u}\right) + f_{zz} \left( z_{x} \otimes z_{u} \right) = 0
(ghuu) : A g_{uu} + B g_{xx} \left(g^{*}_{u} \otimes g^{*}_{u}\right) + f_{zz} \left( z_{u} \otimes z_{u} \right) = 0
(ghs2) : (A+B) g_{\sigma\sigma} + \left( f_{y^{**}_{+}y^{**}_{+}} \left(g^{**}_{u} \otimes g^{**}_{u}\right) + f_{y^{**}_{+}} g^{**}_{uu}\right)vec(\Sigma) = 0
(ghxxx) : A g_{xxx} + B g_{xxx} \left(g^{*}_{x} \otimes g^{*}_{x} \otimes g^{*}_{x}\right) + f_{y_{+}}g^{**}_{xx} \left(g^{*}_x \otimes g^{*}_{xx}\right)P_{x\_xx} + f_{zz} \left( z_{x} \otimes z_{xx} \right)P_{x\_xx} + f_{zzz} \left( z_{x} \otimes z_{x} \otimes z_{x} \right) = 0
(ghxxu) : A g_{xxu} + B g_{xxx} \left(g^{*}_{x} \otimes g^{*}_{x} \otimes g^{*}_{u}\right) + f_{zzz} \left( z_{x} \otimes z_{x} \otimes z_{u} \right) + f_{zz} \left( \left( z_{x} \otimes z_{xu} \right)P_{x\_xu} + \left(z_{xx} \otimes z_{u}\right) \right) + f_{y_{+}}g^{**}_{xx} \left( \left(g^{*}_{x} \otimes g^{*}_{xu}\right)P_{x\_xu} + \left(g^{*}_{xx} \otimes g^{*}_{u}\right) \right) = 0
(ghxuu) : A g_{xuu} + B g_{xxx} \left(g^{*}_{x} \otimes g^{*}_{u} \otimes g^{*}_{u}\right) + f_{zzz} \left( z_{x} \otimes z_{u} \otimes z_{u} \right)+ f_{zz} \left( \left( z_{xu} \otimes z_{u} \right)P_{xu\_u} + \left(z_{x} \otimes z_{uu}\right) \right) + f_{y_{+}}g^{**}_{xx} \left( \left(g^{*}_{xu} \otimes g^{*}_{u}\right)P_{xu\_u} + \left(g^{*}_{x} \otimes g^{*}_{uu}\right) \right) = 0
(ghuuu) : A g_{uuu} + B g_{xxx} \left(g^{*}_{u} \otimes g^{*}_{u} \otimes g^{*}_{u}\right) + f_{zzz} \left( z_{u} \otimes z_{u} \otimes z_{u} \right)+ f_{zz} \left( z_{u} \otimes z_{uu} \right)P_{u\_uu} + f_{y_{+}}g^{**}_{xx} \left(g^{*}_{u} \otimes g^{*}_{uu}\right)P_{u\_uu} = 0
(ghx\sigma\sigma) : A g_{x\sigma\sigma} + B g_{x\sigma\sigma} g^{*}_x + f_{y_{+}} g^{**}_{xx}\left(g^{*}_{x} \otimes g^{*}_{\sigma\sigma}\right) + f_{zz} \left(z_{x} \otimes z_{\sigma\sigma}\right) + F_{xu_{+}u_{+}}\left(I_{n_x} \otimes vec(\Sigma)\right) = 0
F_{xu_{+}u_{+}} = f_{y_{+}^{\ast\ast}} g_{xuu}^{\ast\ast} (g_x^{\ast} \otimes I_{n_u^2}) + f_{zz} \left( \left( z_{xu_{+}} \otimes z_{u_{+}} \right)P_{xu\_u} + \left(z_{x} \otimes z_{u_{+}u_{+}}\right) \right) + f_{zzz}\left(z_{x} \otimes z_{u_{+}} \otimes z_{u_{+}}\right)
(ghu\sigma\sigma) : A g_{u\sigma\sigma} + B g_{x\sigma\sigma} g^{*}_{u} + f_{y_{+}} g^{**}_{xx}\left(g^{*}_{u} \otimes g^{*}_{\sigma\sigma}\right) + f_{zz} \left(z_{u} \otimes z_{\sigma\sigma}\right) + F_{uu_{+}u_{+}}\left(I_{n_u} \otimes vec(\Sigma_u)\right) = 0
F_{uu_{+}u_{+}} = f_{y_{+}^{\ast\ast}} g_{xuu}^{\ast\ast} (g_u^{\ast} \otimes I_{n_u^2}) + f_{zz} \left( \left( z_{uu_{+}} \otimes z_{u_{+}} \right)P_{uu\_u} + \left(z_{u} \otimes z_{u_{+}u_{+}}\right) \right) + f_{zzz}\left(z_{u} \otimes z_{u_{+}} \otimes z_{u_{+}}\right)
A and B are the common perturbation matrices:
A = f_{y_0} + \begin{pmatrix} \underbrace{0}_{n\times n_{static}} &\vdots& \underbrace{f_{y^{**}_{+}} \cdot g^{**}_{x}}_{n \times n_{spred}} &\vdots& \underbrace{0}_{n\times n_{frwd}} \end{pmatrix}
and
B = \begin{pmatrix} \underbrace{0}_{n \times n_{static}}&\vdots & \underbrace{0}_{n \times n_{pred}} & \vdots & \underbrace{f_{y^{**}_{+}}}_{n \times n_{sfwrd}} \end{pmatrix}
and
z=(y_{-}^{\ast}; y; y_{+}^{\ast\ast}; u)
denotes the dynamic model variables as in M_.lead_lag_incidence
, y^\ast
denote state variables, y^{\ast\ast}
denote forward looking variables, y_+
denote the variables with a lead, y_{-}
denote variables with a lag, y_0
denote variables at period t, f
the model equations, and f_z
the first-order dynamic model derivatives g1
, f_{zz}
the second-order dynamic derivatives g2
, and f_{zzz}
the third-order dynamic model derivatives g3
. Then:
z_{x} = \begin{pmatrix}I\\g_{x}\\g^{**}_{x} g^{*}_{x}\\0\end{pmatrix}
, z_{u} =\begin{pmatrix}0\\g_{u}\\g^{**}_{x} \cdot g^{*}_{u}\\I\end{pmatrix}
, z_{u_{+}} =\begin{pmatrix}0\\0\\g^{**}_{u}\\0\end{pmatrix}
z_{xx} = \begin{pmatrix} 0\\g_{xx}\\g^{**}_{x} \left( g^{*}_x \otimes g^{*}_{x} \right) + g^{**}_{x} g^{*}_{x}\\0\end{pmatrix}
, z_{xu} =\begin{pmatrix}0\\g_{xu}\\g^{**}_{xx} \left( g^{*}_x \otimes g^{*}_{u} \right) + g^{**}_{x} g^{*}_{xu}\\0\end{pmatrix}
, z_{uu} =\begin{pmatrix}0\\g_{uu}\\g^{**}_{xx} \left( g^{*}_u \otimes g^{*}_{u} \right) + g^{**}_{x} g^{*}_{uu}\\0\end{pmatrix}
,
z_{xu_{+}} =\begin{pmatrix}0\\0\\g^{**}_{xu} \left( g^{*}_x \otimes I \right)\\0\end{pmatrix}
, z_{uu_{+}} =\begin{pmatrix}0\\0\\g^{**}_{xu} \left( g^{*}_{u} \otimes I \right)\\0\end{pmatrix}
, $z_{u_{+}u_{+}} =\begin{pmatrix}0\\0\\g^{\ast\ast}_{uu}\\0\end{pmatrix}
$, $z_{\sigma\sigma} = \begin{pmatrix}0\\ g_{\sigma\sigma}\\ g^{\ast\ast}_{x}g^{\ast}_{\sigma\sigma} + g^{\ast\ast}_{\sigma\sigma}\\0 \end{pmatrix}
$
Note that $P
$ are permutation matrices that can be computed using Matlab's ipermute
function.
Parameter derivatives of perturbation solutions
First, we need the parameter derivatives of first, second, third, and fourth derivatives of the dynamic model (i.e. g1,g2,g3,g4 in dynamic files), I make use of the implicit function theorem: Let $f_{z^k}
$ denote the kth derivative (wrt all dynamic variables) of the dynamic model, then let $df_{z^k}
$ denote the first-derivative (wrt all model parameters) of $f_{z^k}
$ evaluated at the steady state. Note that $f_{z^k}
$ is a function of both the model parameters $\theta
$ and of the steady state of all dynamic variables $\bar{z}
$, which also depend on the parameters. Hence, implicitly $f_{z^k}=f_{z^k}(\theta,\bar{z}(\theta))
$ and $df_{z^k}
$ consists of two parts:
- direct derivative wrt to all model parameters given by the preprocessor in the
_params_derivs.m
files - contribution of derivative of steady state of dynamic variables (wrt all model parameters): $
f_{z^{k+1}} \cdot d\bar{z}
$ Note that we already have functionality to compute $d\bar{z}
$ analytically.
Having this, the above perturbation systems are basically equations of the following types
$AX +BXC = RHS
$ or $AX = RHS
$
Now when taking the derivative (wrt to one single parameter $\theta_j
$), we get
$A\mathrm{d}\{X\} + B\mathrm{d}\{X\}C = \mathrm{d}\{RHS\} - \mathrm{d}\{A\}X - \mathrm{d}\{B\}XC - BX\mathrm{d}\{C\}
$
or
$A\mathrm{d}\{X\} = \mathrm{d}\{RHS\} - \mathrm{d}\{A\}X
$
The first one is a Sylvester type equation, the second one can be solved by taking the inverse of $A
$. The only difficulty and tedious work arises in computing (the highly sparse) derivatives of $RHS
$.
New functions:
get_perturbation_params_derivs.m
and get_perturbation_params_derivs_numerical_objective.m
- The parameter derivatives up to third order are computed in the new function
get_perturbation_params_derivs.m
both analytically and numerically. For numerical derivativesget_perturbation_params_derivs_numerical_objective.m
is the objective forfjaco.m
orhessian_sparse.m
orhessian.m
. -
get_perturbation_params_derivs.m
is basically an extended version of the previousget_first_order_solution_params_derivs.m
function. -
get_perturbation_params_derivs_numerical_objective.m
builds uponidentification_numerical_objective.m
. It is used for numerical derivatives, wheneveranalytic_derivation_mode=-1|-2
. It takes fromidentification_numerical_objective.m
the parts that compute numerical parameter Jacobians of steady state, dynamic model equations, and perturbation solution matrices. Hence, these parts are removed inidentification_numerical_objective.m
and it only computes numerical parameter Jacobian of moments and spectrum which are needed for identification analysis inget_identification_jacobians.m
, whenanalytic_derivation_mode=-1
only. - Detailed changes:
- Most important: notation of this function is now in accordance to the
k_order_solver
or Dynare++, i.e. we do not compute derivatives of Kalman transition matrices A and B, but rather the solution matrices ghx,ghu,ghxx,ghxu,ghuu,ghs2,ghxxx,ghxxu,ghxuu,ghuuu,ghxss,ghuss in accordance with notation used inoo_.dr
. As a byproduct at first-order, focusing on ghx and ghu instead of Kalman transition matrices A and B makes the computations slightly faster for large models (e.g. for Quest the computations were faster by a couple of seconds, not much, but okay). - Removed use of
kstate
, see also #1653 (closed) and !1656 (merged) - Output arguments are stored in a structure
DERIVS
, there is also a flagd2flag
that computes parameter hessians needed only indsge_likelihood.m
. - Removed
kronflag
as input.options_.analytic_derivation_mode
is now used instead ofkronflag
. - Removed
indvar
, an index that was used to selected specific variables in the derivatives. This is not needed, as we always compute the parameter derivatives for all variables first and then select a subset of variables. The selection now takes place in other functions, like indsge_likelihood.m
. - Some additional checks: (i) deterministic exogenous variables are not supported, (ii) Kronecker method only compatible with first-order approximation so reset to sylvester method, (iii) for purely backward or forward models we need to be careful with the rows in
M_.lead_lag_incidence
, (iv) if_params_derivs.m
files are missing an error is thrown. - For numerical derivatives, if mod file does not contain an
estimated_params_block
, a temporary one with the most important parameter information is created.
- Most important: notation of this function is now in accordance to the
unfold_g4.m
- When evaluating g3 and g4 one needs to take into account that these do not contain symmetric elements, so one needs to use
unfold_g3.m
and the new functionunfold_g4.m
. This returns an unfolded version of the same matrix (i.e. with symmetric elements).
New test models
-
.gitignore
andMakefile.am
are changed accordingly. Also now it is possible to run test suite on analytic_derivatives, i.e. runmake check m/analytic_derivatives
- Removed
analytic_derivatives/ls2003.mod
as this mod file is neither in the testsuite nor does it work.
analytic_derivatives/BrockMirman_PertParamsDerivs.mod
- This is the Brock Mirman model, where we know the exact policy function $
g
$ for capital and consumption. As this does not imply a nonzero $g_{\sigma\sigma}
$, $g_{x\sigma\sigma}
$, $g_{u\sigma\sigma}
$ I added some artificial equations to get nonzero solution matrices with respect to $\sigma
$. The true perturbation solution matrices $g_x
$ , $g_u
$, $g_{xx}
$, $g_{xu}
$, $g_{uu}
$, $g_{\sigma\sigma}
$, $g_{xxx}
$, $g_{xxu}
$, $g_{xuu}
$, $g_{uuu}
$, $g_{x\sigma\sigma}
$, $g_{u\sigma\sigma}
$ are then computed analytically with Matlab's symbolic toolbox and saved innBrockMirmanSYM.mat
. There is a preprocessor flag that recreates these analytical computations if changes are needed (and to check whether I made some errors here ;-) ) - Then solution matrices up to third order and their parameter Jacobians are then compared to the ones computed by Dynare's
k_order_solver
and byget_perturbation_params_derivs
for allanalytic_derivation_mode
's. There will be an error if the maximum absolute deviation is too large, i.e. for numerical derivatives (analytic_derivation_mode=-1|-2
) the tolerance is choosen lower (around 1e-5); for analytical methods we are stricter: around 1e-13 for first-order, 1e-12 for second order, and 1e-11 for third-order. - As a side note, this mod file also checks Dynare's
k_order_solver
algorithm and throws an error if something is wrong. - This test model shows that the new functionality works well. And analytical derivatives perform way better and accurate than numerical ones, even for this small model.
analytic_derivatives/burnside_3_order_PertParamsDerivs.mod
- This builds upon
tests/k_order_perturbation/burnside_k_order.mod
and computes the true parameter derivatives analytically by hand. - This test model also shows that the new functionality works well.
analytic_derivatives/LindeTrabandt2019.mod
- Shows that the new functionality also works for medium-sized models, i.e. a SW type model solved at third order with 35 variables (11 states). 2 shocks and 20 parameters.
- This mod file can be used to tweak the speed of the computations in the future.
- Compares numerical versus analytical parameter derivatives (for first, second and third order). Note that this model clearly shows that numerical ones are quite different than analytical ones even at first order!
identification/LindeTrabandt2019_xfail.mod
- This model is a check for (confidential) issue Dynare/dynare#1595, see fjaco.m below for a description, and will fail.
Detailed changes in other functions
get_first_order_solution_params_derivs.m
- Deleted, or actually, renamed to
get_perturbation_params_derivs.m
, as this function now is able to compute the derivatives up to third order
identification_numerical_objective.m
-
get_perturbation_params_derivs_numerical_objective.m
builds uponidentification_numerical_objective.m
. It takes fromidentification_numerical_objective.m
the parts that compute numerical parameter Jacobians of steady state, dynamic model equations, and perturbation solution matrices. Hence, these parts are removed inidentification_numerical_objective.m
and it only computes numerical parameter Jacobian of moments and spectrum which are needed for identification analysis inget_identification_jacobians.m
, whenanalytic_derivation_mode=-1
only.
dsge_likelihood.m
- As
get_first_order_solution_params_derivs.m
is renamed toget_perturbation_params_derivs.m
, the call is adapted. That is,get_perturbation_params_derivs
does not compute the derivatives of the Kalman transitionT
matrix anymore, but instead of the Dynare solution matrixghx
. So we recreateT
here as this amounts to adding some zeros and focusing on selected variables only. - Added some checks to make sure the first-order approximation is selected.
- Removed
kron_flag
as input, asget_perturbation_params_derivs
looks intooptions_.analytic_derivation_mode
forkron_flag
.
dynare_identification.m
- make sure that setting
analytic_derivation_mode
is set both inoptions_ident
andoptions_
. Note that at the end of the function we restore theoptions_
structure, so all changes are local. In a next merge request, I will remove the global variables to make all variables local.
get_identification_jacobians.m
- As
get_first_order_solution_params_derivs.m
is renamed toget_perturbation_params_derivs.m
, the call is adapted. That is,get_perturbation_params_derivs
does not compute the derivatives of the Kalman transitionA
andB
matrix anymore, but instead of the dynare solution matrixghx
andghu
. So we recreate these matrices here instead of inget_perturbation_params_derivs.m
. - Added
str2func
for better function handles infjaco.m
.
fjaco.m
- make
tol
an option, which can be adjusted by changingoptions_.dynatol.x
for identification and parameter derivatives purposes. - include a check and an informative error message, if numerical derivatives (two-sided finite difference method) yield errors in
resol.m
for identification and parameter derivatives purposes. This closes issue Dynare/dynare#1595. - Changed year of copyright to 2010-2017,2019
Further suggestions and questions
- Once this is merged, I will merge an improvement of the identification toolbox, which will work up to third order using the pruned state space. This will also remove some issues and bugs, and also I will remove global variables in this commit.
- The third-order derivatives can be further improved by taking sparsity into account and use mex versions for Kronecker products etc. I leave this for further testing (and if anybody actually uses this ;-) )