Interpreting Results of Repeated Measures ANOVA

Multivariate Tests

Origin will automatically perform multivariate tests along with repeated measures ANOVA. In most cases, multivariate tests are not as powerful as repeated measures ANOVA, so we should use repeated measures ANOVA. However, under certain circumstances, for example large sample size and a serious violation of sphericity assumption, the multivariate tests would be a better choice.

As a report of multivariate tests, Origin outputs four rows, each shows the statistics of a separate multivariate test method: Pillai's trace, Wilks' lambda, Hotelling's trace, and Roy's largest root. Additionally to the statistics results, for each of these test methods, the information of value , F, Num DF, DF and Prob > F is provided, where Prob > F is the significance level. And when the significance level is smaller than 0.05, it is possible to conclude that the means are significantly different.

Normally, the Wilk's lambda test is the one to be used, but it may not always be the best choice. Pillai's trace is also used quite often because of its powerfulness and robustness.

Please note that for the Roy's largest root test, a lower bound estimate of the probability of F is given, therefore, if the calculated significance level (Prob>F value) is smaller than 0.05 while the results from other tests are not, we could disregard the results from Roy's largest root test.

Mauchly's Test of Sphericity

Mauchly's test is a commonly used test to determine whether the Sphericity assumption can be held. In the Mauchly's Test of Sphericity table of Origin result sheet, if the value of Prob>ChiSq is greater than or equal to 0.05, Sphericity can be assumed. In contrast, when the value Prob>ChiSq is less than 0.05, sphericity can not be assumed, and this leads to an increase in the Type I error. Therefore, modifications need to be made to the degrees of freedom so as to obtain a valid F-ratio. Luckily, the statistic epsilon of three correlations in the tests of within-subjects effects table can be used to evaluated that to which degree Sphericity has been violated and also make modifications to the degrees of freedom. In Origin, epsilons are generated using three methods: Greenhouse-Geisser, Huynh-Feldt, and Lower-bound. When epsilon is equal to 1, Sphericity is perfectly met. And the smaller the value of epsilon, the more serious the violation of Sphericity.

Tests of Within-Subjects Effects

Tests of within-subjects effects can be performed by four methods in Origin: Sphericity Assumed, Greenhouse-Geisser , Huynh-Feldt, and Lower-bound. Basically, we can use the Sphericity Assumed method when sphericity is assumed (the value of Prob>F in Mauchly's test is no less than 0.05). But some statisticians believe that statistical correction is still needed even when sphericity is assumed. For details of the three corrections, please refer to the following table:

Correction method Comparison When to use


epsilon < 0.75


least conservative

epsilon near or above 0.75


most conservative

the worst possible case

Tests of Between-Subjects Effects

Tests of Between-Subjects Effects provide tests for each between-subjects factor in your design (In two-way repeated measures ANOVA, one factor can be set as between-subjects factor) as well as any interactions which involve only the between-subjects factors (there should be at least two between-subjects factors). In Origin result sheet, you get the summary information, which includes the values of Sum of Squares, DF, mean square, F, and Prob>F.

Descriptive Statistics

In this table the results of descriptive statistics for the factor and subject are listed.

Pairwise Comparison

Multiple comparison procedures are commonly used in an ANOVA after obtaining a significant omnibus test result. The significant ANOVA result suggests that the global null hypothesis, H0, is rejected. The H0 hypothesis states that the means are the same across the groups being compared. We can use multiple comparison to determine which means are different.

Origin provides eight different methods for means comparison. They are Tukey, Bonferroni, Dunn-Sidak, Fisher LSD, Scheffe, Dunnett, Holm-Bonferroni, and Holm-Sidak.

Tukey The Tukey method controls the overall Type I error. When Tukey is used, the overall confidence level is 1-\alpha with equal sample sizes, that is, the risk of a Type I error is exactly \alpha ; while for unequal sample sizes, the risk of a Type I error is less than \alpha
Bonferroni The Bonferroni method controls the overall Type I error and is more conservative than Tukey. The method is commonly used for all pairwise comparisons tests.
Fisher's LSD Fishers LSD test dose not control the overall Type I error. Therefore, it should only be used for the significant overall F-test and the small number of comparisons.
Scheffé When the number of comparisons is small, Scheffé is very conservative (and more than Bonferroni). Scheffé is more powerful in cases of complex multiple comparisons, so it is used for complex multiple comparisons.
Dunnett Dunnett is a powerful test when comparing each treatment to a control and it is more capable to detect real differences.
Dunn-Sidak This is a more powerful method than the Dunnett test method, especially when the number of comparisons is large.
Holm-Bonferroni This method is less conservative and more powerful than the Bonferroni method. Hence you have more chances to reject null hypotheses with the Bonferroni-Holm method.
Holm-Sidak The method is more powerful than Holm test. However, it can not be used to compute a set of confidence intervals.


Origin provides three plots: Bar Chart , Means Plot (SE as Error), and Means Comparison Plot.

Note: These three plots are only supported for the One-way repeated measures ANOVA, but not for two-way repeated measures ANOVA.