Chi-Square Test: A Complete Learning Guide

Introduction  

The Chi-Square Test is a helpful statistical method used to check the connection between two categorical variables. It is widely used in research, business, and education to see if the distribution of data is different from what we expect.
For example, it can be applied in surveys, market studies, or experiments to find patterns and relationships.
In this guide, you will learn what the chi-square test means, how to use its formula, how to read results with a chi-square calculator, and how it works with solved examples. By the end, you will clearly understand the chi-square test and how to use it in real-life situations.

Table of Contents  

 

What is the Chi-Square Test?

The chi-square test is a statistical hypothesis test used to compare observed data with expected data based on a specific hypothesis. It helps us decide whether there is a significant association between categorical variables.  

It is non-parametric, which means it doesn’t assume any distribution.  

  • This test is suitable for nominal (categorical) data.  

  • The result is a chi-square statistic, which we use to interpret the outcome.  

 

Chi-Square Test Definition

The chi-square test is defined as follows:  

"A chi-square test is a statistical method used to check if observed frequencies differ significantly from expected frequencies." 

In simpler terms, it helps you answer questions like:  

  • Is there a connection between age and brand preference?  

  • Do the results of a dice roll match what we expect?  

  • Do voting patterns vary by region?  

 

Types of Chi-Square Tests

Goodness-of-Fit Test  

  • This type of chi-square test checks if a sample matches the expected distribution.  

  • It is used when you have one categorical variable.  

  • Example: Rolling a die 60 times and seeing if the outcomes are evenly distributed.  

 

Test for Independence  

  • This test assesses whether two categorical variables are independent.  

  • It is used when you have two variables.  

  • Example: Analysing whether gender affects smartphone brand preference.  

 

Chi-Square Test Formula

Here is the standard chi-square test formula:  

χ² = Σ [(O - E)² / E]  

Where:  

  • χ²: Chi-square statistic  

  • O: Observed frequency  

  • E: Expected frequency  

Key points:  

  • Calculate the difference between observed and expected.  

  • Square the difference.  

  • Divide by the expected value.  

Example:

A teacher wants to check if students prefer four fruits equally: Apple, Mango, Orange, and Banana.

Observed (O): 25, 30, 20, 25
Total students = 100
Expected (E): 100 ÷ 4 = 25 each

Step 1: Write the formula
χ² = (25-25)²/25 + (30-25)²/25 + (20-25)²/25 + (25-25)²/25

Step 2: Subtract O - E
χ² = (0)²/25 + (5)²/25 + (-5)²/25 + (0)²/25

Step 3: Square the differences
χ² = 0/25 + 25/25 + 25/25 + 0/25

Step 4: Simplify
χ² = 0 + 1 + 1 + 0

Step 5: Final Answer
χ² = 2

 

How to Calculate the Chi-Square Test

  • State the Hypotheses

    • Null hypothesis (H₀): There is no difference (observed = expected).

    • Alternative hypothesis (H₁): There is a difference (observed ≠ expected).

  • Find Expected Frequencies

    • Expected = (Total observations ÷ Number of categories).

  • Apply the Formula
    χ² = Σ (O - E)² / E

  • Check the Critical Value

    • Use a chi-square table with degrees of freedom = (categories - 1).

    • Compare your result with the critical value.

  • Make a Decision

    • If χ² is larger → Reject H₀ (there is a difference).

    • If χ² is smaller → Accept H₀ (no difference).

Example

A teacher checks if students like four colours equally: Red, Blue, Green, and Yellow.

  • Observed (O): 20, 30, 25, 25

  • Total = 100

  • Expected (E) = 100 ÷ 4 = 25 each

Step 1: Apply the formula
χ² = (20-25)²/25 + (30-25)²/25 + (25-25)²/25 + (25-25)²/25

Step 2: Simplify differences
χ² = (-5)²/25 + (5)²/25 + (0)²/25 + (0)²/25

Step 3: Square and divide
χ² = 25/25 + 25/25 + 0 + 0

Step 4: Add
χ² = 1 + 1 = 2

Final Answer: χ² = 2

 

Application of the Chi-Square Test

The chi-square test has various applications, including:  

  • Education: Evaluating student performance across different teaching methods.  

  • Medicine: Testing the effect of treatments across patient groups.  

  • Marketing: Checking if brand preference depends on age group.  

  • Politics: Analysing voting behaviour by demographic.  

  • Retail: Studying the connection between product types and returns.  

 

Common Misconceptions

  • It can be used for numerical data.  

Incorrect. The chi-square test is meant for categorical data only.  

  • Expected values can be zero.  

Wrong. All expected frequencies must be greater than 0.  

  • The chi-square test tells us the cause.  

False. It only shows an association, not causation.  

  • A larger sample size always means significance.  

Not necessarily. It may exaggerate minor differences.  

  • It’s only useful in research.  

It's also useful in business, education, and daily decision-making.  

 

Fun Facts About the Chi-Square Test

  • Developed in 1900.

British statistician Karl Pearson first introduced it.   

  • Symbol Origin  

The symbol χ² comes from the Greek letter “chi.”  

  • Versatile Use  

Even though it is non-parametric, it can work effectively with large samples.  

  • Real-life Use  

Used in genetics to find out if traits follow expected Mendelian ratios.  

  • Critical in Elections  

Exit polls rely on the chi-square test to analyse voter demographics.  

 

Solved Examples of Chi-Square Test

Example 1: Goodness of Fit (equal proportions)

Question: Do four fruit choices occur equally often? Observed counts: Apple 25, Mango 30, Orange 20, Banana 25. Total = 100.
Step 1: Hypotheses
H₀: All four fruits are equally preferred (expected proportions equal).
H₁: Preferences are not equal.

Step 2: Expected frequencies
Expected for each = 100 ÷ 4 = 25.

Step 3: Compute (O-E)² / E for each category
Apple: (25 − 25)² / 25 = 0
Mango: (30 − 25)² / 25 = 25 / 25 = 1
Orange: (20 − 25)² / 25 = 25 / 25 = 1
Banana: (25 − 25)² / 25 = 0

Step 4: Sum → χ² = 0 + 1 + 1 + 0 = 2.00

Step 5: Degrees of freedom = k − 1 = 4 − 1 = 3. Critical value at α = 0.05: 7.815.

Decision: χ² = 2.00 < 7.815 → do not reject H₀. Conclusion: No significant difference; data are consistent with equal preference.


Example 2: Goodness of Fit (fair die)

Question: Are these counts from a fair six-sided die? Observed counts for faces 1–6: 4, 6, 8, 7, 5, 10. Total = 40.
Step 1: H₀: Die is fair (each face probability = 1/6). H₁: Die is not fair.

Step 2: Expected for each face = 40 ÷ 6 = 6.6667.

Step 3: Compute (O-E)² / E for each face and sum
(4 − 6.6667)² / 6.6667 ≈ 1.069
(6 − 6.6667)² / 6.6667 ≈ 0.067
(8 − 6.6667)² / 6.6667 ≈ 0.267
(7 − 6.6667)² / 6.6667 ≈ 0.0167
(5 − 6.6667)² / 6.6667 ≈ 0.417
(10 − 6.6667)² / 6.6667 ≈ 1.163
Sum → χ² ≈ 3.50

Step 4: Degrees of freedom = 6 − 1 = 5. Critical value at α = 0.05: 11.070.

Decision: χ² ≈ 3.50 < 11.070 → do not reject H₀. Conclusion: No evidence the die is unfair.


Example 3: Test of Independence (2 × 2 table)


Question: Is subject preference independent of gender? Observed table (rows = Gender, columns = Preference A/B):

Male: A = 20, B = 30
Female: A = 25, B = 25
Total = 100

Step 1: H₀: Preference is independent of gender. H₁: Not independent.

Step 2: Row sums and column sums
Row sums: Male = 50, Female = 50. Column sums: A = 45, B = 55.

Step 3: Expected counts = (row sum × column sum) / total
Expected(Male,A) = 50 × 45 / 100 = 22.5
Expected(Male,B) = 50 × 55 / 100 = 27.5
Expected(Female,A) = 22.5; Expected(Female,B) = 27.5

Step 4: Compute (O-E)² / E and sum
(Male, A): (20 − 22.5)² / 22.5 = 6.25 / 22.5 = 0.2778
(Male, B): (30 − 27.5)² / 27.5 = 6.25 / 27.5 = 0.2273
(Female, A): (25 − 22.5)² / 22.5 = 6.25 / 22.5 = 0.2778
(Female, B): (25 − 27.5)² / 27.5 = 6.25 / 27.5 = 0.2273
Sum → χ² ≈ 1.0101

Step 5: Degrees of freedom = (rows − 1)(cols − 1) = 1. Critical value at α = 0.05: 3.841.

Decision: χ² ≈ 1.01 < 3.841 → do not reject H₀. Conclusion: Preference appears independent of gender.


Example 4: Goodness of Fit (given probabilities): reject H₀ case

Question: A blood-type model expects proportions A:0.42, B:0.10, AB:0.04, O:0.44. Observed counts in sample of 200: A = 110, B = 10, AB = 2, O = 78. Test fit.
Step 1: H₀: Observed follow the given proportions. H₁: Not following them.

Step 2: Expected counts = total × proportions
E_A = 200 × 0.42 = 84.0
E_B = 200 × 0.10 = 20.0
E_AB = 200 × 0.04 = 8.0
E_O = 200 × 0.44 = 88.0

Step 3: Compute (O − E)² / E
A: (110 − 84)² / 84 = 26² / 84 = 676 / 84 ≈ 8.0476
B: (10 − 20)² / 20 = 100 / 20 = 5.0
AB: (2 − 8)² / 8 = 36 / 8 = 4.5
O: (78 − 88)² / 88 = 100 / 88 ≈ 1.1364
Sum → χ² ≈ 8.0476 + 5.0 + 4.5 + 1.1364 = 18.684

Step 4: Degrees of freedom = k − 1 = 4 − 1 = 3. Critical value at α = 0.05: 7.815.

Decision: χ² ≈ 18.684 > 7.815 → reject H₀. Conclusion: The observed counts differ significantly from the expected proportions.


Example 5:Test of Independence (2 × 3 table) :reject H₀ case

Question: Does response type depend on group? Observed table (rows = Group1, Group2; columns = Response X, Y, Z):

Group1: X = 30, Y = 10, Z = 10
Group2: X = 20, Y = 25, Z = 25
Total = 120

Step 1: H₀: Response is independent of group. H₁: Not independent.

Step 2: Row and column sums
Row sums: Group1 = 50, Group2 = 70. Column sums: X = 50, Y = 35, Z = 35.

Step 3: Expected counts = (row sum × column sum) / total
Expected(Group1,X) = 50 × 50 / 120 = 20.8333
Expected(Group1,Y) = 50 × 35 / 120 = 14.5833
Expected(Group1,Z) = 14.5833
Expected(Group2,X) = 70 × 50 / 120 = 29.1667
Expected(Group2,Y) = 70 × 35 / 120 = 20.4167
Expected(Group2,Z) = 20.4167

Step 4: Compute (O − E)² / E for each cell and sum(Group1,X): (30 − 20.8333)² / 20.8333 ≈ 4.019
(Group1,Y): (10 − 14.5833)² / 14.5833 ≈ 1.432
(Group1,Z): (10 − 14.5833)² / 14.5833 ≈ 1.432
(Group2,X): (20 − 29.1667)² / 29.1667 ≈ 2.849
(Group2,Y): (25 − 20.4167)² / 20.4167 ≈ 1.018
(Group2,Z): (25 − 20.4167)² / 20.4167 ≈ 1.103
Sum → χ² ≈ 11.853

Step 5: Degrees of freedom = (2 − 1)(3 − 1) = 2. Critical value at α = 0.05: 5.991.

Decision: χ² ≈ 11.853 > 5.991 → reject H₀. Conclusion: Response type depends on group (significant association).

 

Conclusion

The chi-square test is an essential tool in statistics. Whether in academics, marketing, healthcare, or politics, its ability to assess relationships between variables is crucial for data analysis. By understanding the chi-square test formula, its applications, and how to calculate it with or without a chi-square test calculator, learners can make informed decisions based on data.  

 

Frequently Asked Questions on the Chi-Square Test

1. What is the chi-square test used for?  

Answer: The chi-square test is used to find out if there is a significant association between two categorical variables.  

 

2. What is the chi-square test and its types?  

Answer: The chi-square test is a statistical method used to test relationships between categorical variables, with types including the Chi-Square Test of Independence and the Chi-Square Goodness-of-Fit Test.  

 

3. Does chi-square give a p-value?  

Answer: Yes, the chi-square test provides a p-value to help determine the statistical significance of the observed results.  

 

4. What is the main objective of a chi-square test?  

Answer: The main goal of a chi-square test is to assess if observed frequencies differ significantly from expected frequencies.

 

Understand the Chi Square Test easily with Orchids The International School - Your trusted guide for mastering statistics!

ShareFacebookXLinkedInEmailTelegramPinterestWhatsApp

We are also listed in