2.65.4.2 Algorithm for Data TransformationAlgorithmDT
Origin provides 3 transformation functions for transforming the data to follow a normal distribution, including BoxCox transformation, Johnson transformation, and YeoJohnson transformation.
Both BoxCox transformation and YeoJohnson transformation are power transformation. And the difference is that, BoxCox transformation can only apply to the data all are positive, but YeoJohnson transformation can be used for any data without restriction. While Johnson transformation uses the Johnson distribution system, it can check the normality of the original data, and transform the data.
BoxCox Transformation
BoxCox transformation is one kinds of power transformation, and it only works for positive data. The resulting of BoxCox transformation is formulated as follows:
Here is in the range of .
Optimal
Origin estimates the optimal in the range of , and the optimal should get the minimal standard deviation of the transformed data. To eliminate the effect of different for the standard deviation comparison, before calculating the standard deviation, standarizing the transformed data is needed. The following formula is used for the data standarization.
where is for the data, is the geometric mean of the original data. Then is used for the standard deviation calculation.
The detailed steps of the optimization (also called golden section search algorithm) are:
 Initialize the range for the optimization, here is from 5 to 5, and the tolerance for stopping the iteration.
 Narrow down the range by the golden ratio, that is




 then get a smaller new range.
 Take the end points of the new range as two , and calculate values, and then standard deviation.
 Compare two standard deviations.
 If the standard deviation of the small end point of the new range is bigger than the one of the large end point of the new range, update the range as from the small end point of the old range to the small end point of the new range.
 Otherwise, update the range as from the large end point of the new range to the large end point of the old range.
 Take the updated range in 4 as the old range, repeat 2 to 4 util the old range's length is smaller than the spcified tolerance, then get this old range as the final range.
 The middle point of the final range is considered as the optimal .
How to calculate standard deviation?
 For subgroup data, that is, subgroup size is bigger than 1, the unbiased pooled standard deviation is estimated.
 For individuals data, that is, subgroup size is 1, the average of moving range is estimated by moving range of 2.
Origin also provide the option if to round the optimal to 0.5, that is to say, after getting the optimal , round it to the closest value, which is the multiple times of 0.5.
Johnson Transformation
The three Johnson families of distribution include SB, SL and SU, which are the Johnson families distributions with the variable bounded (SB), lognormal (SL) and unbounded (SU) respectively. And the formulas for the transformation functions of these three families are:
The goal of this algorithm is to select the best transformation function from the three Johnson families. The "Best" means:
 After transformation, perform AndersonDarling test on the transformed data, and the corresponding pvalue should be the largest.
 The largest pvalue is greater than the specified pvalue criterion (default is 0.1).
The general flow for picking the best transformation function is:
 Almost all the potential transformation functions from the above three Johnson families are considered as candidates.
 For each candidate:
 Estimate the parameters by using the method described in YounMin Chou, Alan M. Polansky & Robert L. Mason (1998) Transforming NonNormal Data to Normality in Statistical Process Control, Journal of Quality Technology, 30:2, 133141, DOI: 10.1080/00224065.1998.11979832
 Transform the original data by the candidate function with the esitmated parameters.
 Perform AndersonDarling test (Note: in the literature above, ShapiroWilks normality test is used) on the transformed data and get the pvalue.
 According to the criterion of the "Best" mentioned above, pick out the "Best" transformation function. If no candidate can match the "Best" criterion, then no transformation is appropriate to be chosen for the data.
YeoJohnson Transformation
YeoJohnson transformation is another kinds of power transformation. Different from BoxCox transformation, YeoJohnson transformation works for any data, positive, negative and zero. The resulting of YeoJohnson transformation is formulated as follows:
Here is also restricted in the range of .
Optimal
Origin uses the same algorithm to estimate the optimal as BoxCox transformation. However, as we can see that, the algorithm needs to calculate the geometric mean of the original data, which will fail if the original data contains negative data or zero. So, to make this optimization work for nonpositive data, it needs to add a positive value to the original data, so to get a new data with all positive values for this algorithm.
For more details about the optimization, please refer to Optimal section in BoxCox Transformation section.
