Library "QuantifyPS"
normdist(z)
Parameters:
z (float): (float): The z-score for which the CDF is to be calculated.
Returns: (float): The cumulative probability corresponding to the input z-score.
Notes:
- Uses an approximation method for the normal distribution CDF, which is computationally efficient.
- The result is accurate for most practical purposes but may have minor deviations for extreme values of `z`.
Formula:
- Based on the approximation formula:
`Φ(z) ≈ 1 - f(z) * P(t)` if `z > 0`, otherwise `Φ(z) ≈ f(z) * P(t)`,
where:
`f(z) = 0.3989423 * exp(-z^2 / 2)` (PDF of standard normal distribution)
`P(t) = Σ [c * t^i]` with constants `c` and `t = 1 / (1 + 0.2316419 * |z|)`.
Implementation details:
- The approximation uses five coefficients for the polynomial part of the CDF.
- Handles both positive and negative values of `z` symmetrically.
Constants:
- The coefficients and scaling factors are chosen to minimize approximation errors.
gamma(x)
Parameters:
x (float): (float): The input value for which the Gamma function is to be calculated.
Must be greater than 0. For x <= 0, the function returns `na` as it is undefined.
Returns: (float): Approximation of the Gamma function for the input `x`.
Notes:
- The Lanczos approximation provides a numerically stable and efficient method to compute the Gamma function.
- The function is not defined for `x <= 0` and will return `na` in such cases.
- Uses precomputed Lanczos coefficients for accuracy.
- Includes handling for small numerical inaccuracies.
Formula:
- The Gamma function is approximated as:
`Γ(x) ≈ sqrt(2π) * t^(x + 0.5) * e^(-t) * Σ(p[k] / (x + k))`
where `t = x + g + 0.5` and `p` is the array of Lanczos coefficients.
Implementation details:
- Lanczos coefficients (`p`) are precomputed and stored in an array.
- The summation iterates over these coefficients to compute the final result.
- The constant `g` controls the precision of the approximation (commonly `g = 7`).
t_cdf(t, df)
Parameters:
t (float): (float): The t-statistic for which the CDF value is to be calculated.
df (int): (int): Degrees of freedom of the t-distribution.
Returns: (float): Approximate CDF value for the given t-statistic.
Notes:
- This function computes a one-tailed p-value.
- Relies on an approximation formula using gamma functions and standard t-distribution properties.
- May not be as accurate as specialized statistical libraries for extreme values or very high degrees of freedom.
Formula:
- Let `x = df / (t^2 + df)`.
- The approximation formula is derived using:
`CDF(t, df) ≈ 1 - [sqrt(df * π) * Γ(df / 2) / Γ((df + 1) / 2)] * x^((df + 1) / 2) / 2`,
where Γ represents the gamma function.
Implementation details:
- Computes the gamma ratio for normalization.
- Applies the t-distribution formula for one-tailed probabilities.
tStatForPValue(p, df)
Parameters:
p (float): (float): P-value for which the t-statistic needs to be calculated.
Must be in the interval (0, 1).
df (int): (int): Degrees of freedom of the t-distribution.
Returns: (float): The t-statistic corresponding to the given p-value.
Notes:
- If `p` is outside the interval (0, 1), the function returns `na` as an error.
- The function uses binary search with a fixed number of iterations and a defined tolerance.
- The result is accurate to within the specified tolerance (default: 0.0001).
- Relies on the cumulative density function (CDF) `t_cdf` for the t-distribution.
Formula:
- Uses the cumulative density function (CDF) of the t-distribution to iteratively find the t-statistic.
Implementation details:
- `low` and `high` define the search interval for the t-statistic.
- The midpoint (`mid`) is iteratively refined until the difference between the cumulative probability
and the target p-value is smaller than the tolerance.
jarqueBera(n, s, k)
Parameters:
n (float): (series float): Number of observations in the dataset.
s (float): (series float): Skewness of the dataset.
k (float): (series float): Kurtosis of the dataset.
Returns: (float): The Jarque-Bera test statistic.
Formula:
JB = n * [(S^2 / 6) + ((K - 3)^2 / 24)]
Notes:
- A higher JB value suggests that the data deviates more from a normal distribution.
- The test is asymptotically distributed as a chi-squared distribution with 2 degrees of freedom.
- Use this value to calculate a p-value to determine the significance of the result.
skewness(data)
Parameters:
data (float): (series float): Input data series.
Returns: (float): The skewness value.
Notes:
- Handles missing values (`na`) by ignoring invalid points.
- Includes error handling for zero variance to avoid division-by-zero scenarios.
- Skewness is calculated as the normalized third central moment of the data.
kurtosis(data)
Parameters:
data (float): (series float): Input data series.
Returns: (float): The kurtosis value.
Notes:
- Handles missing values (`na`) by ignoring invalid points.
- Includes error handling for zero variance to avoid division-by-zero scenarios.
- Kurtosis is calculated as the normalized fourth central moment of the data.
regression(y, x, lag)
Parameters:
y (float): (series float): Dependent series (observed values).
x (float): (series float): Independent series (explanatory variable).
lag (int): (int): Number of lags applied to the independent series (x).
Returns: (tuple): Returns a tuple containing the following values:
- n: Number of valid observations.
- alpha: Intercept of the regression line.
- beta: Slope of the regression line.
- t_stat: T-statistic for the beta coefficient.
- p_value: Two-tailed p-value for the beta coefficient.
- r_squared: Coefficient of determination (R²) indicating goodness of fit.
- skew: Skewness of the residuals.
- kurt: Kurtosis of the residuals.
Notes:
- Handles missing data (`na`) by ignoring invalid points.
- Includes basic error handling for zero variance and division-by-zero scenarios.
- Computes residual-based statistics (skewness and kurtosis) for model diagnostics.