Source code for interpreTS.core.features.feature_entropy

import numpy as np
from scipy.stats import gaussian_kde

[docs] def calculate_entropy(data): """ Calculate the normalized Shannon entropy of a dataset. This function estimates the probability density of the data using Kernel Density Estimation (KDE) and calculates the Shannon entropy based on the estimated probabilities. Parameters ---------- data : array-like A 1D array or list of numerical data points. Returns ------- float The normalized Shannon entropy of the dataset. If the dataset consists of identical values (i.e., no variability), the entropy is 0. If the KDE results in zero probability for any point, NaN is returned to indicate that the entropy could not be calculated properly. Notes ----- - If the range (peak-to-peak value) of the data is zero (i.e., all values are identical), the entropy is directly returned as 0. - The dataset is evaluated at 100 evenly spaced points between the minimum and maximum values of the data for KDE estimation. - The Shannon entropy is normalized by dividing by `np.log2(len(x))`, where `len(x)` is the number of points used for KDE evaluation, to scale the entropy between 0 and 1. - If any probability in the KDE is zero, the function returns NaN, indicating a problematic or poorly estimated probability distribution. Examples -------- >>> data = [1, 1, 1, 1, 2, 2, 2] >>> calculate_entropy(data) 0.9182958340544894 # Example output, depending on the data distribution. """ if len(data) == 0: return np.nan if np.ptp(data) == 0: return 0.0 x = np.linspace(min(data), max(data), 100) probabilities = gaussian_kde(data)(x) probabilities /= probabilities.sum() if np.any(probabilities == 0): return np.nan shannon_entropy = -np.sum(probabilities * np.log2(probabilities)) return shannon_entropy / np.log2(len(x))