...improve robustness at the expenses of a bit less efficiency; (2) fix gradient computation for special cases when the value of the parameters get stacked at the prior boundary.