Depth issue

@stepan-a. This remind me of something. I take a look

@stepan-a I believe now that this issue is related to another change that occurred when we separated the temporary terms between the computation of the residuals and the computation of derivatives as various order. I don't remember who did it and how the issue of the commit is called.

Do you mean https://github.com/DynareTeam/dynare/commit/127637ffd672fdd8143ee13f2a38234e597d94d3

Thanks @JohannesPfeifer. That is the one. I think @houtanb did it before @stepan-a merged it in master. I will contact @houtanb to see how to handle it

@houtanb Here is a short example. Let's call it mod1.mod:

var x y;
varexo e;

model;
y = (1/x(+1))*(1/x(+2))*(1/x(+3));
x = exp(e);
end;

steady_state_model;
y = 1;
x = 1;
end;

steady;

stoch_simul(order=1,irf=0);

Look at the temporary term at the start of Jacobian computation in mod1_static.m. If this expression used temporary terms for common terms, it would solve the depth problem when there are lots of terms in the original equation. Note: it would be more efficient to write the equation with a single ratio and the products at the denominator, but we need Dynare to support this less efficient writing as well.

NB: This bug does not exist in 127637ff rather it was introduced in 6e514b7d, and seems to belong to the static model, as the dynamic model is the same between master and 127637ff. That said, the CMR model referenced above broke in unstable before 127637ff for a reference to logncdf which is not found. I will keep on looking for the source of the bug.

It seemed as though the bug came from substituteStaticAuxiliaryDefinition() but I could not immediately find the fault in the logic.

That said, removing the call to this function results in a _static.m file that contains the correct temporary variables. As we are substituting out the auxiliary variables in the static function, we don't need the associated auxiliary equations (after we change the Matlab code). In a conversation with @MichelJuillard we decided to remove the aux equations, pushing the change to the static_aux_vars branch so he can take a look when he has the time and work on the Matlab changes if he thinks this is the correct way to go.

Actually, this is a problem with the temporary variables. I will work on this.

This does not appear to be fixed. With the latest unstable (dynare-2016-07-24-win, untouched) and the CMR files downloaded from https://www.aeaweb.org/articles?id=10.1257/aer.104.1.27 if I run dynare cmr in the table2 subdir, I get the error:

Error: File: cmr_static.m Line: 1292 Column: 598
Nesting of {, [, and ( cannot exceed a depth of 32.

as reported above.

I have looked into this a bit more. The problem was indeed introduced with the changes of #1133 (closed), starting with 5199638b. Before this, the temporary term output was analogous to what was produced by the previous version of temporary terms.

@MichelJuillard do you want to take care of this or would you like me to look into the auxiliary variable changes?

@houtanb I will look at it unless you know exactly what needs to be changed

@MichelJuillard I don't. I would need to familiarize myself with the changes that were made. I offered in case you were short of time.

OK @houtanb I will look at it in the coming week

The problem appears with the change in the treatment of auxiliary variables, but it has to do with the computation of temporary terms for the derivatives and goes back at least to the stable version. Consider

var x y;
varexo e;

model;
y = (1/x(-1))*(1/x(-1))*(1/x(-1));
x = exp(e);
end;

steady_state_model;
y = 1;
x = 1;
end;

steady;

stoch_simul(order=1,irf=0);

even with the stable version, the temporary terms of the first equation of the Jacobian are not identified and factorized properly. I will follow that lead.

@MichelJuillard, to be clear, the first equation of the Jacobian produced is (spaced to see similar terms more clearly):

  T10 = 1/y(1);
  T11 = T10*T10;
  g1(1,1)=(-(T11*(-1)/(y(1)*y(1)) ...
       +T10*(T10*(-1)/(y(1)*y(1)) ...
            +T10*(-1)/(y(1)*y(1)))));

Are you saying that this should be:

  T10 = 1/y(1);
  T11 = T10*T10;
  g1(1,1)=(-(T11*(-1)*T11 ...
       +T10*(T10*(-1)*T11  ...
            +T10*(-1)*T11)));

If so, I'm not sure the preprocessor is smart enough to simplify 1/x*1/x into 1/x^2 so that it could be recognized as the same temporary term as that shown in T11. Second, even if this simplification were made it would only delay the problem we were encountering with the repeating parenthesis at the end of the equation.

Furthermore, I guess it's clear that a separate temporary term is not created for (-1)/(y(1)*y(1) because the cost would come to 3*(990+90)=3240 which is less than the (admittedly arbitrary) 3600 (defined in MIN_COST_MATLAB) cutoff for creating temporary terms.

Finally, and, I guess, most practically, this does not answer why the temporary terms did not crash the cmr.mod file before the auxiliary variable changes.

@houtanb yu are right my example is not discriminant, but there still a problem if you expand the first equations with more identical products I believe now that it has to do with the way that we compute the cost of an operation between temporary terms We didn' t see it before because the substitution of auxiliary variables can create very long expressions

@MichelJuillard if the first equation is changed to (repeating 5 times (1/x(-1))):

y = (1/x(-1))*(1/x(-1))*(1/x(-1))*(1/x(-1))*(1/x(-1));

the resulting _static.m file contains a temporary term for (-1)/(y(1)*y(1)):

  T10 = 1/y(1);
  T20 = (-1)/(y(1)*y(1));
  g1(1,1)=(-(T10*T10*T10*T10*T20 ...
           +T10*(T10*T10*T10*T20 ...
               +T10*(T10*T10*T20 ...
           +T10*(T10*T20+T10*T20)))));

(NB: my comment above should have listed the cost as 2*(990+90)=2160 as the first time an expression is encountered, a cost is not associated with it--it is simply added to the reference_count map. Thus, repeating (1/x(-1)) 5 times has a cost of 4*(990+90)=4320>3600 thereby creating a temporary term)

Such an equation would still create a problem with repeated parenthesis resulting from the application of the chain rule.

Is your proposal to decrease MIN_COST_MATLAB (or increase the cost of +, -, and *) such that temporary variables are made sooner or did you have something else in mind?

It is true that we can have several identical nodes before it is worth it to make a temporary terms (i.e. 40 products)
Nevertheless, I believe now that we are missing some useful temporary nodes. I will make a branch with a proposal for changes
However, it is not sufficient to solve the problem with the CMR model that started this issue. For complicated expressions it is indeed possible to hit the limit of 32 nested parentheses when one substitutes back the definition of auxiliary variables in the static model. We have to think about a way to detect and handle this problem. Going back to the previous way of handling auxiliary variables in the static model is not option as it didn't handle correctly some other models.

@MichelJuillard have you decided on the solution of replacing a sub-tree with 32 unclosed parenthesis with a temporary term? If so, I can implement that.

@houtanb OK, but it is not that simple. I will show you what I have so far

@MichelJuillard were you going to post your changes here or communicate them in some other medium?

@houtanb when I have time

I'm giving up the idea of fixing this problem. There doesn't seem to be any simple solution:

if we handle it at output time (in writeOutput function), we have to remove the "const" in all the output chain, blurring the distinction between building the expressions and writing them
if we handle the transformation pass, we have to add the entire logic for parenthesis handling, basically adding parenthesis as a new kind of node I take this amount of deep modification of the preprocessor code as an indication that we shouldn't attend to fix the problem. If we go back to the original problem, there is one model whose static representation has a deeper depth of parentheses than supported by Matlab. It was not a problem in previous Dynare version because of the way that we were using auxiliary variables for lead and lags in the static model. However, our strategy for auxiliary variables was erroneous in some cases (expectation operator for one) and the solution that we adopted made appear the parenthesis depth problem in that particular model. I prefer the new way to handle auxiliary variables in the static model (replacing it with their definition in terms of original variables). I think that we should just provide a new version of the Risk Shocks model working with version 4.5

@MichelJuillard Is it a problem to have a post-preprocessor step that does this? A simple program that would be run on any matlab code created by the preprocessor. It would parse it counting parenthesis, creating temporary variables as needed and substituting them in the appropriate places?

@houtanb yes, that would be indeed simpler from the preprocessor point of view. But, we should make it optional not to slow down things to much. Isn't it a lot of efforts for a rare problem? I was willing to fix it because until now I believed there was a simple fix

@MichelJuillard we can definitely make it optional. I was just putting it forward as a possible solution. We would need to decide together whether the time spent on it and the resultant dynare.git clutter and additional distribution work are worth it for this problem.

Edit: the above statement was made with the idea that it would be completely separate from the preprocessor. Though, it could also be a post-writeOutput method in the ModFile class that performs this work as well, in which case the clutter would be in the preprocessor/ folder and there would be no extra executable to distribute

Let see what the others are thinking

Providing a new version of the CMR model does not strike me as a feasible and scalable solution. Models are going to be ever more complex in the future and we need to cope with this. The problem in the CMR model comes from the definition of the 10-year term structure. This is an increasingly common feature in New Keynesian DSGE models. Forcing users to rewrite this manually seems not desireable. Moreover, the same problem can appear in other contexts like identification (http://www.dynare.org/phpBB3/viewtopic.php?f=1&t=3167), so we should aim at a transferable solution.

For that reason, I would prefer @houtanb's solution of having a function that checks for a nesting depth above 32 and replaces the expression using a temporary variable. That function should be triggered by a command line option like restrict_nesting_depth. I am agnostic on whether we should do this when outputing (writeOutput) or run a separate program on the created m-files, although I guess the former would be easier to handle as we need to run this only on variable definitions, not the function headers etc.

@MichelJuillard and @houtanb, I do not really see what would be the post-preprocessor step... We would have to parse the generated matlab functions and change them to remove nested parenthesis... I think this is a lot of work. Also this would be an awful hack (I have no problem if one uses post preprocessor hooks, I do that, but I do not think we should do that in Dynare).

If we go along this road, I do not think that this should be optional. I suppose we can detect the problem from the preprocessor and trigger the transformation of the generated matlab routines with a post preprocessor hook.

For models of this size I always use the use_dll model's option... And gcc does not have this issue. I rewrote different versions of CMR, and never encountered any problem with the nested parenthesis. My understanding from previous discussions with @JohannesPfeifer is that it is now very easy to install mingw/gcc from matlab on windows... So my preferred solution, if we are able to count the nested parenthesis from the preprocessor (even if we do not fix the problem there), would be to issue a message saying that the user should install gcc and add the use_dll option in the model block. If models are going to be ever more complex and bigger, users should install and use gcc.

@stepan-a you are right in that it it rather straightforward to detect if an output stream has more than a certain number of open parenthesis. If we encountered this problem, we could exit with an error telling the user to install gcc. If we wanted to put the work in at a later point, we could add code to introduce a temporary variable and make the substitution.

@JohannesPfeifer given that Matlab has a 32 deep parenthesis limit and that users somehow "choose" to use Dynare Matlab instead of "use_dll", it doesn't strike me that we have to provide a version of Dynare that makes up for Matlab limitation. Again, unless I'm missing an easier solution, I don't see how to do it without considerable amount of work and probably weakening the preprocessor, making it more difficult to maintain and debug in the future, short of extended rewrite. We have to balance the benefits against the cost of fixing this problem

@MichelJuillard does that mean that you agree with @stepan-a's proposal?

@MichelJuillard I am perfectly content if we have a workaround. Let me try whether use_dll solves the issue on Windows.

The CMR model works flawlessly with use_dll on Windows.

@houtanb Can you implement the test on the number of nested parenthesis and stop the preprocessor, with a message refering to use_dll if we have more than 32 nested parenthesis. Maybe we should also point to some documentation for installing gcc (or another compiler, I found that the installation of intel/icc was easier) in the reference manual. This should depend on the version of matlab.

@houtanb, please, before starting to code, explain how you plan to increment a counter inside ExpNode::writeOutput without violating the const characterization of these functions. I still think it is not simple.

@MichelJuillard I was planning to create a function in the ModelTree class that receives as input the ostringstream specified in DynamicModel::writeDynamicModel and StaticModel::writeStaticModel. The function in ModelTree will be called right after the check if (output_type == oMatlabStaticModel) or if (output_type == oMatlabDynamicModel) after they have been written to but before they are actually written to a file. I'll simply go through these streams counting parenthesis and issue an error if I see one.

In other words, we don't need to modify any const objects themselves.

@houtanb OK thanks

@stepan-a feel free to replace the link I included in the error message with the one you were talking about

@houtanb Does your commit also apply to the m-files generated for identification? And does the use_dll option work for these?

@JohannesPfeifer no, just for the static and dynamic files and use_dll does not work for the params_derivs file.

That said, I think it'd be rather straightforward to make non-efficient substitutions of nested parenthesis. We could essentially modify the ostringstream output. Whenever a line contains 32 or more nested parenthesis, we can create variables that break up these parenthesis, inserting their definitions at the beginning of the ostringstream and substituting them into the original equation.

We can still issue a preprocessor warning telling users it would be better to use the use_dll option, but this allows them to continue without setting up a compiler.

When we have a bit more free time, we can revisit this and make it more efficient by keeping a table that keeps track of nodes so we don't create two variables that make the same calculation.

@stepan-a @MichelJuillard if you're ok with this, I can take care of it. This solution could also then be used for the params_derivs file.

@houtanb you are right. I wasn't looking at the right place, trying to do the substitution on the fly inside ExpNode::writeOutput() I think that you could do it after we get 4.5 out

@MichelJuillard @houtanb If it's not too complicated, I would advocate having it in 4.5. The CMR model is now quite frequently used and many people use older Matlab versions on Windows where use_dll can be tricky (this is evident from numerous posts in the forum).

This was more complicated than expected, so it would be good to do more testing.

The final step will be to call this function for the parms derivatives file. It's not as simple as for the dynamic and static models because the output is written directly to the file (instead of being written to string streams which are then written to the file). I'll change this to write to string streams and apply the fix next week.

Also note that the solution I implemented does not recreate variables unnecessarily and it stops substituting once the original string drops below 32 nested parenthesis.

There is no Windows snapshot after your change. Did your commit break the compilation?

NB: even with the changes above the original .mod file specified in this issue does not work

@houtanb

Which issue are you experiencing? I could run the mod-file.
Would it be possible to only print the warning message once? For CMR, it appears 4 times. I guess once for every file

@JohannesPfeifer

When you don't have the necessary toolboxes (my situation), the steady state is not calculated and an empty vector of endogenous variables is passed to the static file. @stepan-a fixed this in 515e080f
Yes, when I have the time

To make clear what is left to be done on this issue:

Apply this change to the derivatives wrt the parameters for identification
Reduce the warning output to once per file (as opposed to once per equation encountered). This means that the warning would be emitted a maximum of 3 times.

I hope to have the time to do these later this week or next.

Depth issue

Designs

Child items ...

Activity