17.1.3 Cross Tabulation and Chisquare (Pro Only)crosstab
Cross tabulation and Chisquare (also known as contingency table) is a table to reveal the frequency distribution of the variables. Analysis based on the table can determine whether there is a significant relationship, obtain the strength and direction of the relationship, and measure and test the agreement of matchedpairs data. It is widely used to analysis categorical data.

Goals
There are four main goals for cross tabulation:
 Frequency analysis
 To display the frequency distribution of the variables in a table format, calculating counts, percentage or even residual for each cell.
 Test of independence
 To determine whether there is a significant relationship between variables in the contingency table.
 Measuring association
 To assess the strength and direction of the relationship between the variables in the contingency table.
 Measuring agreement
 To test or measure to what degree two different rater or two different systems of evaluation are in agreement. For example, it can be used to consider how consistent that the the survey takers give their choice. such as agreeing or disagreeing with a statement
Processing Procedure
Preparing Analysis Data
Cross tabulation and Chisquare analysis can be performed on the raw data or frequency data.
 Raw data
 There is a column for each variable and each row represents an observation.
 Frequency data
 There is a column for each variable and a column of frequencies. Each row represents a level in the group.The column of frequencies represents the numbers of observations of the level in the data.
 For example
Sex

Frequencies

Female

15

Male

21

Selecting Marginal and Cell Statistics For Contingency Table
Counts
 Counts
 The observed frequency for each cell
 Expected Counts
 The observed frequency for each cell under the assumption that the column and row variable are independent
Percentages
 Percentages of Row Counts (Row%)
 Percent of each cell count to its row total
 Percentages of Column Counts (Col%)
 Percent of each cell count to its column total
 Percentages of Total Counts (Total%)
 Percent of each cell count and margin total to the grand total
Residuals
Origin provides three kind of residuals, Usually the more the value is close to zero, the more likely the column and row variable has no association. Please view interpreting results page for more information
 Residuals
 The difference between the observed count and the expected count.
 Standardized Residuals
 Also called Pearson residual.It standardizes the residuals by dividing by the square root of the expected count.
 Adjusted Residuals
 It is further standardized by taking into account of overall size of the sample. The most useful residual for comparing residual between different cells.
Selecting Methods for Test of Independence
Origin provides two different methods to test whether there is significant relationship between variables in the contingency table.
 ChiSquare tests
 A commonly used test for hypothesis that the row and column variables are independent.
 Fisher's Exact test
 Fisher's exact test is available only for a 2*2 table. It is particularly useful when sample sizes are small (even zero in some cells) and the Chisquare test is not appropriate.
Selecting Methods for Measuring Association
Measures for Nominal Variables
ChiSquare Based Measurements
 Phi
 Phi is a statistic which adjusts the chisquare by taking account of the sample size. Usually it is used for comparing 2*2 tables.
 Contingency coefficient
 Contingency coefficient is another statistic which adjusts the chisquare by the sample size. Similar to Phi, it also not recommended for comparing between tables of different dimension. But when the tables have the same dimension(same n and same m while n,m>2), it is useful. Comparing to Phi, it is kind of standardized statistic so that user can comparing with the statistic with 1 to measure the association between variables when n > 2 and m > 2
 Cramer's V
 A statistic which adjusts the chisquare by both the sample size and the dimension of table(n*m). It is commonly used for comparing the association between tables which have different dimensions.
PRE Measurements
 Lambda
 The most commonly used measurement for proportional reduction in error(PRE), which is by what percentage do we reduce our error when using the independent variable to predict the dependent variable. If the dependent variable is not predictable by the independent variable (Lamda = 0), it is more likely that the the two variable has no association.
 Uncertainty Coefficient
 Another measure measurement for PRE. But it is more conservative than Lambda
Measures for Ordinal Variables
 Gamma
 Classic statistic for ordinal variables. No correction for ties
 Kendall's taub and tauc
 The most commonly used statistics for ordinal variable. It is similar to Gamma but corrected for ties. Kendall's taub is used for n*n table while tauc can be used for n*m table.
 Somer's D
 Differ from Gamma and Kendall's taub and taub, Somer's D is an asymmetric statistic. It is appropriate when you want to identify which variable is depend on another. For example, it is useful to detect whether there is association between scores in examination (1,2,3,4,5) and the studying time out of school per week (5~10hr, 10~15hr ect...). The score in examination is the dependent variable and studying time out of school per week is the independent variable
Selecting Methods for Measuring Agreement
 Kappa
 Kappa is also known as Cohen's Kappa. It is to test whether two rater are agree with each other and to what degree two raters are in agreement when they are call up to evaluate the same object.
 Bowker's Test
 Also called the McNemarBowker test of symmetry. It is known as McNemar test for 2*2 table and Bowker's test for n*n table. It measures the agreement of matchedpairs data that each observation in one rater matches the observation of the other. For example, it tests whether the proportion of normal patient rater 1 evaluates is equal to the proportion of normal patients rater 2 evaluate when they evaluate the same group of patients.
Selecting Other Measures
 Odds Ratio
 It is available only for a 2*2 table. Odds Ratio measures the ratio of the odds that an event or result will occur to the odds of the event not happening.
 Relative Risk
 It is available only for a 2*2 table.Relative Risk measures the ratio of the odds of an event occurring in an group to the odds of the event occurring in a comparison group.
 CochranMantelHaenszel
 CochranMantelHaenszel tests are used to assess whether there is any relationship between the row and column variable after controlling for the layer variable. They consist of two type of tests( Conditional Independence Test, Odds Ratios’ Homogeneity Tests) and an estimator of Common Odds Radio.
Performing Cross Tabulation and Chisquare
 Select Statistics: descriptive statistics: Cross Tabulation and Chisquare
 Or
 Type crosstab d in script window.
This section covers the following topics:

