15.4.5 The Reason Why Fail to ConvergeThe_Reason_Why_Fail_to_Converge
The nonlinear fitting process is iterative. The process completes when the difference between reduced chisquare values of two successive iterations is less than a certain tolerance value. When the process completes, we say that the fit has converged. To illustrate, we'll use a model with one parameter:
The curve denotes the change of reduced during a fitting procedure. Starting from an initial value , the parameter value is adjusted so that the value is reduced. The fit has converged when ∆ ≤ Tolerance.
The purpose of the nonlinear curve fitting procedure is to find the absolute minimum value of reduced . However, sometimes the minimization procedure does not reach this absolute minimum value (the fit does not converge). Failure to converge can frequently be traced to one of the following:
Poor initial parameter values
Initial values are very important to a fitting procedure. They can be empirical values from previous work or estimated values derived by transforming the formula into an approximate form. With good initial values, it will take less time to perform the fitting.
Relative vs absolute minima
It is not uncommon for the iterative procedure to find a relative  as opposed to an absolute  minimum. In this example, the procedure actually converges in the sense that a further decrease of reduced seems impossible.
The problem is that you do not even know if the routine has reached an absolute or a relative minimum. The only way to be certain that the iterative procedure is reaching an absolute minimum is to start fitting from several different initial parameter values and observe the results. If you repeatedly get the same final result, it is unlikely that a local minimum has been found.
Nonunique parameter values
The most common problem that arises in nonlinear fitting is that no matter how you choose the initial parameter values, the fit does not seem to converge. Some or all parameter values continue to change with successive iterations and they eventually diverge, producing arithmetic overflow or underflow. This should indicate that you need to do something about the fitting function and/or data you are using. There is simply no single set of parameter values which best fit your data.
Overparameterized functions
If the function parameters have the same differential with respect to independent variables, it may suggest that the function is overparameterized. In such cases, the fitting procedure will not converge. For example in the following model
A is the amplitude and x0 is the horizontal offset. However, you can rewrite the function as
In other words, if, during the fitting procedure, the values of A and change so that the combination B remains the same, the reduced value will not change. Any attempts to further improve the fit are not likely to be productive.
If you see one of the following, it indicates that something is wrong:
 The parameter error is very large relative to the parameter value. For example, if the width of the Gaussian curve is 0.5 while the error is 10, the result for the width will be meaningless as the fit has not converged.
 The parameter dependence (for one or more parameters) is very close to one. You should probably remove or fix the value of parameters whose dependency is close to one, since the fit does not seem to depend upon the parameter.
Note, however, that overparameterization does not necessarily mean that the parameters in the model have no physical meanings. It may suggest that there are infinite solutions and you should apply constraints to the fit process.
Bad data
Even when the function is not theoretically overparameterized, the iterative procedure may behave as if it were, due to the fact that the data do not contain enough information for some or all of the parameters to be determined. This usually happens when the data are available only in a limited interval of the independent variable(s). For example, if you are fitting a nonmonotonic function such as the Gaussian to monotonic data, the nonlinear fitter will experience difficulties in determining the peak center and peak width, since the data can describe only one flank of the Gaussian peak.
