测试数据是否符合正态分布¶

可视化的方法

箱线图(Box-whisker plots)可视化数据是否对称或者扭曲

QQ-plot 可以对比数据是否符合指定的分布

柱状图

假设检验的方法

Shapiro-Wilk test

D’Agostino-Pearson

Kolmogorov-Smirnov

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import quiz_tests

# Set plotting options
%matplotlib inline
plt.rc('figure', figsize=(16, 9))

创建一个正态和非正太分布¶

# Sample A: Normal distribution
sample_a = stats.norm.rvs(loc=0.0, scale=1.0, size=(1000,))

# Sample B: Non-normal distribution
sample_b = stats.lognorm.rvs(s=0.5, loc=0.0, scale=1.0, size=(1000,))

通过箱线图和柱状图可视化¶

# Sample A: Normal distribution
fig, axes = plt.subplots(2, 1, figsize=(16, 9), sharex=True)
#箱线图
axes[0].boxplot(sample_a, vert=False)
#柱状图
axes[1].hist(sample_a, bins=50)
axes[0].set_title("Boxplot of a Normal Distribution");

# Sample B: Non-normal distribution
fig, axes = plt.subplots(2, 1, figsize=(16, 9), sharex=True)
axes[0].boxplot(sample_b, vert=False)
axes[1].hist(sample_b, bins=50)
axes[0].set_title("Boxplot of a Lognormal Distribution");

通过QQ-plot可视化¶

# Q-Q plot of normally-distributed sample
plt.figure(figsize=(10, 10)); plt.axis('equal')
stats.probplot(sample_a, dist='norm', plot=plt);

# Q-Q plot of non-normally-distributed sample
plt.figure(figsize=(10, 10)); plt.axis('equal')
stats.probplot(sample_b, dist='norm', plot=plt);

正态分布的假设检验¶

正态分布的假设检验方法很多，这里利用scipy库的Shapiro-Wilk test方法进行检验。原假设是样本数据符合正态分布，如果P值大于选择的$\alpha$，则接受原假设，否则拒绝原假设。

https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.stats.shapiro.html

def is_normal_ks(sample, test=stats.kstest, p_level=0.05, **kwargs):    
    """
    sample: a sample distribution
    test: a function that tests for normality
    p_level: if the test returns a p-value > than p_level, assume normality
    
    return: True if distribution is normal, False otherwise
    """
    normal_args = (np.mean(sample),np.std(sample))
    
    t_stat, p_value = test(sample, 'norm', normal_args, **kwargs)
    print("Test statistic: {}, p-value: {}".format(t_stat, p_value))
    print("Is the distribution Likely Normal? {}".format(p_value > p_level))
    return p_value > p_level

quiz_tests.test_is_normal_ks(is_normal_ks)

Test statistic: 0.014762337753813415, p-value: 0.9813208156284505
Is the distribution Likely Normal? True
Test statistic: 0.11969044583541244, p-value: 6.108447081487611e-13
Is the distribution Likely Normal? False
Tests Passed

# Using Kolmogorov-Smirnov test
print("Sample A:-"); is_normal_ks(sample_a);
print("Sample B:-"); is_normal_ks(sample_b);

Sample A:-
Test statistic: 0.016918963065188558, p-value: 0.9370534969191464
Is the distribution Likely Normal? True
Sample B:-
Test statistic: 0.10934501078187586, p-value: 7.222222819791568e-11
Is the distribution Likely Normal? False