Normality Test: Kolmogorov–Smirnov (KS) Test with one-sample test example

Anthony Kwok
4 min readJan 25, 2023

--

Normality is a very important assumption for many statistical inference. There are several statistics tests that are useful for testing normality. For example:

1. Shapiro-Wilk (SW) test
2. Kolmogorov-Smirnov (KS) test
3. Anderson-Darling (AD) test

Today, we are going to introduce Kolmogorov-Smirnov Test (known as K-S Test), which is one of the most popular statistics tests for normality.

Source: Image from Unsplash

In this article, we will go through the following about Kolmogorov-Smirnov Test with an example.

1. Objective of K-S Test

2. Assumption

3. Null Hypothesis of K-S Test

4. How does K-S Test work?

5. K-S Test Statistics

6. Characteristics of K-S Test

Objective of K-S Test

K-S test can be used for one-sample testing / two-sample testing.

One-sample testing means that we are comparing the empirical cumulative distribution from the sample with the hypothesized distribution (e.g. Normal Distribution / Poisson Distribution). If we fail to reject the null hypothesis, then we can conclude the sample follows the hypothesized distribution.

Two-sample testing means that we are comparing the empirical cumulative distribution from two different samples of data. If we fail to reject the null hypothesis, then we can conclude these two samples follow the same distribution. However, we do not know what distribution they follow.

Assumptions

  • [For both testing] Data are independent and identically distributed (i.i.d) in each sample.
  • [In Two-sample testing] The two samples are mutually independent.

Null Hypothesis of K-S Test

  • [In One-sample testing] The sample follows the hypothesized distribution.
  • [In Two-sample testing] The two samples follows the same distribution.

How does K-S Test work?

What K-S Test does is to calculate the maximum distance between two distributions:

  • [In One-sample testing] the empirical cumulative distribution function of the sample and the cumulative distribution function of the hypothesized distribution
  • [In Two-sample testing] the empirical cumulative distribution functions of two samples

If both distributions are the same or similar, the maximum distance should be small enough. In the graph below, red line is the hypothesized cumulative distribution and the blue line is the empirical cumulative distribution.

Graphic Example of K-S Test. Source: Wikipedia

Noted that we are using the cumulative distribution function for comparison. Therefore we need to re-arrange the data order (i.e. ascending) so that the sample data is independent and identically distributed (i.i.d.) and ordered.

The following is a one-sample testing example:

Sample Data From Gamma(2,4):
7,10,1,8,5,9,8,5,4,2,7,9,1,0,2,10,12,19,6,12
After re-order:
0,1,1,2,2,4,5,5,6,7,7,8,8,9,9,10,10,12,12,19

The empirical cumulative distribution function of sampled data is given:

Empirical Cumulative Distribution Function. Source: Image by author

Sample Data:
0,1,1,2,2,4,5,5,6,7,7,8,8,9,9,10,10,12,12,19

Empirical Cumulative Distribution Function. Source: Image by author

After calculating the empirical cumulative distribution, you can calculate the hypothesized cumulative distribution. In our example, the hypothesized distribution is gamma cumulative distribution function:

Gamma Cumulative Distribution Function. Source: Image by author

where:

Lower Incomplete Gamma Function. Source: Image by author
Gamma Function. Source: Image by author

From the gamma cumulative distribution function, we have

Gamma Cumulative Distribution Function. Source: Image by author

K-S Test Statistics

[In One-sample testing] The Kolmogorov–Smirnov statistic for a given cumulative distribution function is

Kolmogorov-Smirnov Statistics (One-Sample Testing). Source: Image by author

[In Two-sample testing] The Kolmogorov–Smirnov statistic for a given cumulative distribution function is

Kolmogorov-Smirnov Statistics (Two-Sample Testing). Source: Image by author

In our example,

Source: Image by author

As a result,

Source: Image by author
Source: Image by author

With number of sample = 20 and significant level (α) of 0.05, the critical value obtained from the K-S table below is 0.29407.

Kolmogorov-Smirnov Table

As the critical value is larger than our K-S test statistics, we fail to reject the null hypothesis that the sample data follow gamma distribution.

Characteristics of K-S Test

  • Serve as a goodness of fit test
  • Does not report any confidence interval

Advantage:

  • Distribution-free (no population distribution to be known)
  • Applicable on data with any sample size
  • K-S Test statistics is easy to calculate

Disadvantage:

  • Parameters of the distribution has to be known (For one-sample testing)
  • Not applicable on discrete distribution

If you like this article, give me a clap and share it to your friends. You may also leave me a comment / message too.

And don’t forget to follow me for pre Python / AI / Statistics content.

--

--

Anthony Kwok

Data Science | ML & AI | Statistics | Blockchain | Cooking