Continuous Variable Not Normally Distributed Stata


Many statistical tests require one or more variables to be normally distributed in order for the results of the test to be reliable.

This tutorial explains several methods you can use to test for normality among variables in Stata.

For each of these methods, we will use the built-in Stata dataset calledauto. You can load this dataset using the following command:

sysuse auto

Method 1: Histograms

One informal way to see if a variable is normally distributed is to create a histogram to view the distribution of the variable.

If the variable isnormally distributed, the histogram should take on a "bell" shape with more values located near the center and fewer values located out on the tails.

We can use the histcommand to create a histogram for the variabledisplacement:

hist displacement

Histogram example in Stata

We can add a normal density curve to a histogram by using thenormalcommand:

hist displacement, normal

Histogram with normal curve in Stata

It's pretty obvious that the variabledisplacementis skewed to the right (e.g. most values are concentrated on the left and a long "tail" of values extends to the right) and does not follow a normal distribution.

Related:  Left Skewed vs. Right Skewed Distributions

Method 2: Shapiro-Wilk Test

A formal way to test for normality is to use theShapiro-Wilk Test.

The null hypothesis for this test is that the variable is normally distributed. If the p-value of the test is less than some significance level (common choices include 0.01, 0.05, and 0.10), then we can reject the null hypothesis and conclude that there is sufficient evidence to say that the variable is not normally distributed.

*This test can be used when the total number of observations is between 4 and 2,000.

We can use the the swilkcommand to perform a Shapiro-Wilk Test on the variabledisplacement:

swilk displacement

Shapiro Wilk Test output in Stata

Here is how to interpret the output of the test:

Obs: 74.This is the number of observations used in the test.

W: 0.92542.This is the test statistic for the test.

Prob>z: 0.00031.This is the p-value associated with the test statistic.

Since the p-value is less than 0.05, we can reject the null hypothesis of the test. We have sufficient evidence to say that the variabledisplacementis not normally distributed.

We can also perform the Shapiro-Wilk Test on more than one variable at once by listing several variables after theswilkcommand:

swilk displacement mpg length

Multiple Shapiro-Wilk tests at once in Stata

Using a 0.05 significance level, we would conclude thatdisplacementandmpgare both non-normally distributed, but we don't have sufficient evidence to say thatlengthis non-normally distributed.

Method 3: Shapiro-Francia Test

Another formal way to test for normality is to use theShapiro-Francia Test.

The null hypothesis for this test is that the variable is normally distributed. If the p-value of the test is less than some significance level, then we can reject the null hypothesis and conclude that there is sufficient evidence to say that the variable is not normally distributed.

*This test can be used when the total number of observations is between 10 and 5,000.

We can use the the sfranciacommand to perform a Shapiro-Wilk Test on the variabledisplacement:

sfrancia displacement

Shapiro-Francia Test output in Stata

Here is how to interpret the output of the test:

Obs: 74.This is the number of observations used in the test.

W': 0.93011.This is the test statistic for the test.

Prob>z: 0.00094.This is the p-value associated with the test statistic.

Since the p-value is less than 0.05, we can reject the null hypothesis of the test. We have sufficient evidence to say that the variabledisplacementis not normally distributed.

Similar to the Shapiro-Wilk Test, you can perform the Shapiro-Francia Test on more than one variable at once by listing several variables after thesfranciacommand.

Method 4: Skewness and Kurtosis Test

Another way to test for normality is to use theSkewness and Kurtosis Test, which determines whether or not the skewness and kurtosis of a variable is consistent with the normal distribution.

The null hypothesis for this test is that the variable is normally distributed. If the p-value of the test is less than some significance level, then we can reject the null hypothesis and conclude that there is sufficient evidence to say that the variable is not normally distributed.

*This test requires a minimum of 8 observations to be used.

We can use the the sktestcommand to perform a Skewness and Kurtosis Test on the variable displacement:

sktest displacement

Skewness and kurtosis for normality in Stata

Here is how to interpret the output of the test:

Obs: 74.This is the number of observations used in the test.

adj chi(2): 5.81.This is the Chi-Square test statistic for the test.

Prob>chi2: 0.0547.This is the p-value associated with the test statistic.

Since the p-value is not less than 0.05, we fail to reject the null hypothesis of the test. We don't have sufficient evidence to say thatdisplacementis not normally distributed.

Similar to the other normality tests, you can perform the Skewness and Kurtosis Test on more than one variable at once by listing several variables after thesktestcommand.

leachwhicephas.blogspot.com

Source: https://www.statology.org/normality-test-stata/

0 Response to "Continuous Variable Not Normally Distributed Stata"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel