Table of Contents
The chi-square test is a statistical hypothesis test used to compare observed data with expected data based on a specific hypothesis. It helps us decide whether there is a significant association between categorical variables.
It is non-parametric, which means it doesn’t assume any distribution.
This test is suitable for nominal (categorical) data.
The result is a chi-square statistic, which we use to interpret the outcome.
The chi-square test is defined as follows:
"A chi-square test is a statistical method used to check if observed frequencies differ significantly from expected frequencies."
In simpler terms, it helps you answer questions like:
Is there a connection between age and brand preference?
Do the results of a dice roll match what we expect?
Do voting patterns vary by region?
This type of chi-square test checks if a sample matches the expected distribution.
It is used when you have one categorical variable.
Example: Rolling a die 60 times and seeing if the outcomes are evenly distributed.
This test assesses whether two categorical variables are independent.
It is used when you have two variables.
Example: Analysing whether gender affects smartphone brand preference.
Here is the standard chi-square test formula:
χ² = Σ [(O - E)² / E]
Where:
χ²: Chi-square statistic
O: Observed frequency
E: Expected frequency
Key points:
Calculate the difference between observed and expected.
Square the difference.
Divide by the expected value.
Example:
A teacher wants to check if students prefer four fruits equally: Apple, Mango, Orange, and Banana.
Observed (O): 25, 30, 20, 25
Total students = 100
Expected (E): 100 ÷ 4 = 25 each
Step 1: Write the formula
χ² = (25-25)²/25 + (30-25)²/25 + (20-25)²/25 + (25-25)²/25
Step 2: Subtract O - E
χ² = (0)²/25 + (5)²/25 + (-5)²/25 + (0)²/25
Step 3: Square the differences
χ² = 0/25 + 25/25 + 25/25 + 0/25
Step 4: Simplify
χ² = 0 + 1 + 1 + 0
Step 5: Final Answer
χ² = 2
State the Hypotheses
Null hypothesis (H₀): There is no difference (observed = expected).
Alternative hypothesis (H₁): There is a difference (observed ≠ expected).
Find Expected Frequencies
Expected = (Total observations ÷ Number of categories).
Apply the Formula
χ² = Σ (O - E)² / E
Check the Critical Value
Use a chi-square table with degrees of freedom = (categories - 1).
Compare your result with the critical value.
Make a Decision
If χ² is larger → Reject H₀ (there is a difference).
If χ² is smaller → Accept H₀ (no difference).
Example
A teacher checks if students like four colours equally: Red, Blue, Green, and Yellow.
Observed (O): 20, 30, 25, 25
Total = 100
Expected (E) = 100 ÷ 4 = 25 each
Step 1: Apply the formula
χ² = (20-25)²/25 + (30-25)²/25 + (25-25)²/25 + (25-25)²/25
Step 2: Simplify differences
χ² = (-5)²/25 + (5)²/25 + (0)²/25 + (0)²/25
Step 3: Square and divide
χ² = 25/25 + 25/25 + 0 + 0
Step 4: Add
χ² = 1 + 1 = 2
Final Answer: χ² = 2
The chi-square test has various applications, including:
Education: Evaluating student performance across different teaching methods.
Medicine: Testing the effect of treatments across patient groups.
Marketing: Checking if brand preference depends on age group.
Politics: Analysing voting behaviour by demographic.
Retail: Studying the connection between product types and returns.
Incorrect. The chi-square test is meant for categorical data only.
Wrong. All expected frequencies must be greater than 0.
False. It only shows an association, not causation.
Not necessarily. It may exaggerate minor differences.
It's also useful in business, education, and daily decision-making.
British statistician Karl Pearson first introduced it.
The symbol χ² comes from the Greek letter “chi.”
Even though it is non-parametric, it can work effectively with large samples.
Used in genetics to find out if traits follow expected Mendelian ratios.
Exit polls rely on the chi-square test to analyse voter demographics.
Question: Do four fruit choices occur equally often? Observed counts: Apple 25, Mango 30, Orange 20, Banana 25. Total = 100.
Step 1: Hypotheses
H₀: All four fruits are equally preferred (expected proportions equal).
H₁: Preferences are not equal.
Step 2: Expected frequencies
Expected for each = 100 ÷ 4 = 25.
Step 3: Compute (O-E)² / E for each category
Apple: (25 − 25)² / 25 = 0
Mango: (30 − 25)² / 25 = 25 / 25 = 1
Orange: (20 − 25)² / 25 = 25 / 25 = 1
Banana: (25 − 25)² / 25 = 0
Step 4: Sum → χ² = 0 + 1 + 1 + 0 = 2.00
Step 5: Degrees of freedom = k − 1 = 4 − 1 = 3. Critical value at α = 0.05: 7.815.
Decision: χ² = 2.00 < 7.815 → do not reject H₀. Conclusion: No significant difference; data are consistent with equal preference.
Question: Are these counts from a fair six-sided die? Observed counts for faces 1–6: 4, 6, 8, 7, 5, 10. Total = 40.
Step 1: H₀: Die is fair (each face probability = 1/6). H₁: Die is not fair.
Step 2: Expected for each face = 40 ÷ 6 = 6.6667.
Step 3: Compute (O-E)² / E for each face and sum
(4 − 6.6667)² / 6.6667 ≈ 1.069
(6 − 6.6667)² / 6.6667 ≈ 0.067
(8 − 6.6667)² / 6.6667 ≈ 0.267
(7 − 6.6667)² / 6.6667 ≈ 0.0167
(5 − 6.6667)² / 6.6667 ≈ 0.417
(10 − 6.6667)² / 6.6667 ≈ 1.163
Sum → χ² ≈ 3.50
Step 4: Degrees of freedom = 6 − 1 = 5. Critical value at α = 0.05: 11.070.
Decision: χ² ≈ 3.50 < 11.070 → do not reject H₀. Conclusion: No evidence the die is unfair.
Question: Is subject preference independent of gender? Observed table (rows = Gender, columns = Preference A/B):
Male: A = 20, B = 30
Female: A = 25, B = 25
Total = 100
Step 1: H₀: Preference is independent of gender. H₁: Not independent.
Step 2: Row sums and column sums
Row sums: Male = 50, Female = 50. Column sums: A = 45, B = 55.
Step 3: Expected counts = (row sum × column sum) / total
Expected(Male,A) = 50 × 45 / 100 = 22.5
Expected(Male,B) = 50 × 55 / 100 = 27.5
Expected(Female,A) = 22.5; Expected(Female,B) = 27.5
Step 4: Compute (O-E)² / E and sum
(Male, A): (20 − 22.5)² / 22.5 = 6.25 / 22.5 = 0.2778
(Male, B): (30 − 27.5)² / 27.5 = 6.25 / 27.5 = 0.2273
(Female, A): (25 − 22.5)² / 22.5 = 6.25 / 22.5 = 0.2778
(Female, B): (25 − 27.5)² / 27.5 = 6.25 / 27.5 = 0.2273
Sum → χ² ≈ 1.0101
Step 5: Degrees of freedom = (rows − 1)(cols − 1) = 1. Critical value at α = 0.05: 3.841.
Decision: χ² ≈ 1.01 < 3.841 → do not reject H₀. Conclusion: Preference appears independent of gender.
Question: A blood-type model expects proportions A:0.42, B:0.10, AB:0.04, O:0.44. Observed counts in sample of 200: A = 110, B = 10, AB = 2, O = 78. Test fit.
Step 1: H₀: Observed follow the given proportions. H₁: Not following them.
Step 2: Expected counts = total × proportions
E_A = 200 × 0.42 = 84.0
E_B = 200 × 0.10 = 20.0
E_AB = 200 × 0.04 = 8.0
E_O = 200 × 0.44 = 88.0
Step 3: Compute (O − E)² / E
A: (110 − 84)² / 84 = 26² / 84 = 676 / 84 ≈ 8.0476
B: (10 − 20)² / 20 = 100 / 20 = 5.0
AB: (2 − 8)² / 8 = 36 / 8 = 4.5
O: (78 − 88)² / 88 = 100 / 88 ≈ 1.1364
Sum → χ² ≈ 8.0476 + 5.0 + 4.5 + 1.1364 = 18.684
Step 4: Degrees of freedom = k − 1 = 4 − 1 = 3. Critical value at α = 0.05: 7.815.
Decision: χ² ≈ 18.684 > 7.815 → reject H₀. Conclusion: The observed counts differ significantly from the expected proportions.
Question: Does response type depend on group? Observed table (rows = Group1, Group2; columns = Response X, Y, Z):
Group1: X = 30, Y = 10, Z = 10
Group2: X = 20, Y = 25, Z = 25
Total = 120
Step 1: H₀: Response is independent of group. H₁: Not independent.
Step 2: Row and column sums
Row sums: Group1 = 50, Group2 = 70. Column sums: X = 50, Y = 35, Z = 35.
Step 3: Expected counts = (row sum × column sum) / total
Expected(Group1,X) = 50 × 50 / 120 = 20.8333
Expected(Group1,Y) = 50 × 35 / 120 = 14.5833
Expected(Group1,Z) = 14.5833
Expected(Group2,X) = 70 × 50 / 120 = 29.1667
Expected(Group2,Y) = 70 × 35 / 120 = 20.4167
Expected(Group2,Z) = 20.4167
Step 4: Compute (O − E)² / E for each cell and sum(Group1,X): (30 − 20.8333)² / 20.8333 ≈ 4.019
(Group1,Y): (10 − 14.5833)² / 14.5833 ≈ 1.432
(Group1,Z): (10 − 14.5833)² / 14.5833 ≈ 1.432
(Group2,X): (20 − 29.1667)² / 29.1667 ≈ 2.849
(Group2,Y): (25 − 20.4167)² / 20.4167 ≈ 1.018
(Group2,Z): (25 − 20.4167)² / 20.4167 ≈ 1.103
Sum → χ² ≈ 11.853
Step 5: Degrees of freedom = (2 − 1)(3 − 1) = 2. Critical value at α = 0.05: 5.991.
Decision: χ² ≈ 11.853 > 5.991 → reject H₀. Conclusion: Response type depends on group (significant association).
The chi-square test is an essential tool in statistics. Whether in academics, marketing, healthcare, or politics, its ability to assess relationships between variables is crucial for data analysis. By understanding the chi-square test formula, its applications, and how to calculate it with or without a chi-square test calculator, learners can make informed decisions based on data.
Answer: The chi-square test is used to find out if there is a significant association between two categorical variables.
Answer: The chi-square test is a statistical method used to test relationships between categorical variables, with types including the Chi-Square Test of Independence and the Chi-Square Goodness-of-Fit Test.
Answer: Yes, the chi-square test provides a p-value to help determine the statistical significance of the observed results.
Answer: The main goal of a chi-square test is to assess if observed frequencies differ significantly from expected frequencies.
Understand the Chi Square Test easily with Orchids The International School - Your trusted guide for mastering statistics!
CBSE Schools In Popular Cities