Central Limit Theorem: Normalization of Random Variable Series

The Central Limit Theorem (CLT) states that the distribution of the sum of n independent, identically distributed random variables with finite mathematical expectation μ and variance σ² converges to a normal distribution N(nμ, nσ²) as n → ∞. Formally, the standardized sum Z_n = (S_n − nμ) / (σ√n) converges in distribution to the standard normal variable Z ~ N(0, 1). This fundamental property is the cornerstone of statistical inference, as it permits the application of parametric analysis methods to aggregated data regardless of the underlying distribution of individual observations. The Lindeberg–Lévy theorem generalizes this result to the case of non-identically distributed summands under the Lindeberg condition.

A critical question in the practical application of the CLT concerns determining the minimum sufficient sample size n for an acceptable approximation by the normal law. The empirical rule n ≥ 30 is merely a rough guideline: for skewed or heavy-tailed source distributions, substantially larger samples are required. The skewness and kurtosis of the source distribution determine the convergence rate — the further the original shape departs from a symmetric unimodal form, the slower the CLT operates. The Berry–Esseen theorem establishes an upper bound on convergence speed: sup|P(Z_n ≤ x) − Φ(x)| ≤ C·ρ / (σ³√n), where ρ is the third absolute moment.

The convergence rate to the normal distribution has direct implications for confidence intervals and statistical tests. With insufficient sample size, confidence intervals constructed via normal approximation exhibit systematic coverage bias: the actual coverage probability differs from the nominal level. Bootstrap methods are applied to correct this effect, providing non-parametric estimation of statistic distributions without normality assumptions. Permutation tests offer an alternative approach entirely free from distributional assumptions.

In the context of aggregated analysis of stochastic sequences, the CLT justifies the application of z-tests and t-tests for evaluating the deviation of observed means from theoretical generator parameters. When testing a PRNG, the cumulative result of N consecutive calls is normalized, and the resulting z-statistic is compared against critical values of the standard normal distribution. A z-statistic exceeding ±2.576 (significance level α = 0.01) indicates systematic bias in the generator's operation. This methodology is integrated into automated validation pipelines, ensuring continuous quality control of pseudo-random sequences at industrial data volumes.

Verify Theoretical Frameworks