Feature Extractors#
This section provides an overview of the feature extractor and the features available in the library.
The Feature Extractor is the core component that extracts meaningful metrics and features from time-series data.
- class interpreTS.core.feature_extractor.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#
Bases:
object
- CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#
- DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#
- DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#
- FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#
- FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#
- add_custom_feature(name, function, metadata=None, params=None)[source]#
Add a custom feature to the FeatureExtractor with optional parameters.
- Parameters:
name (str) – The name of the custom feature.
function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.
metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.
params (dict, optional) –
A dictionary of parameters to be passed to the feature function when it is executed. Example: {
’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’
}
- Raises:
ValueError – If the feature name already exists or the function is not callable.
- extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#
Extract features from a time series dataset.
- Parameters:
data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.
progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).
mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.
n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).
- Returns:
A DataFrame containing calculated features for each window.
- Return type:
pd.DataFrame
- extract_features_stream(data_stream, progress_callback=None)[source]#
Extract features from a stream of time series data.
- Parameters:
data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.
progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.
- Yields:
dict – A dictionary containing the calculated features for the current window.
- group_data(data)[source]#
Group data based on the group_by column.
- Parameters:
data (pd.DataFrame) – Input data.
- Returns:
Grouped data.
- Return type:
iterable
- group_features_by_interpretability()[source]#
Group features by their interpretability levels.
- Returns:
A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.
- Return type:
dict
- head(features_df, n=5)[source]#
Returns the first n rows of the resulting DataFrame from the extract_features function.
- Parameters:
features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.
n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.
- Returns:
The first n rows of the DataFrame.
- Return type:
pd.DataFrame
Available Features#
Below is a list of the available features in the library. Each feature is automatically documented from the code, with a brief description.
Length#
Extracts the total length of a time series.
- interpreTS.core.features.feature_length.calculate_length(data)[source]#
Calculate the number of data points in a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the length feature is to be calculated.
- Returns:
The number of data points in the provided time series.
- Return type:
int
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or is not one-dimensional.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_length(data) 5
Mean#
Calculates the mean value of a time series.
- interpreTS.core.features.feature_mean.calculate_mean(data)[source]#
Calculate the mean value of a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the mean value is to be calculated.
- Returns:
The mean value of the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_mean(data) 3.0
Peak#
Identifies the maximum peak value.
- interpreTS.core.features.feature_peak.calculate_peak(data, start=None, end=None)[source]#
Calculate the local maximum of a time series within an optional range.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the maximum value is to be calculated.
start (int, str, or None, optional) – The starting index, timestamp, or position for slicing the data. If None, the series starts from the beginning.
end (int, str, or None, optional) – The ending index, timestamp, or position for slicing the data. If None, the series ends at the last value.
- Returns:
The local maximum of the specified range in the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 5, 4, 7]) >>> calculate_peak(data) 7.0 >>> calculate_peak(data, start=1, end=3) 5.0
Spikeness#
Measures the level of spikeness in the time series.
- interpreTS.core.features.feature_spikeness.calculate_spikeness(data)[source]#
Calculate the spikeness (skewness) of a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the spikeness is to be calculated.
- Returns:
The spikeness (skewness) of the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type or contains non-numeric values.
ValueError – If the data is empty.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_spikeness(data) 0.0
Standard Deviation of the First Derivative (Std_1st_der)#
Calculates the standard deviation of the first derivative of the series.
- interpreTS.core.features.feature_std_1st_der.calculate_std_1st_der(data)[source]#
Calculate the standard deviation of the first derivative of a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the standard deviation of the first derivative is to be calculated.
- Returns:
The standard deviation of the first derivative of the provided time series. Returns np.nan if the input data is empty.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_std_1st_der(data) 0.0
Trough#
Identifies the lowest point in the time series.
- interpreTS.core.features.feature_trough.calculate_trough(data, start=None, end=None)[source]#
Calculate the local minimum of a time series within an optional range.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the minimum value is to be calculated.
start (int, str, or None, optional) – The starting index, timestamp, or position for slicing the data. If None, the series starts from the beginning.
end (int, str, or None, optional) – The ending index, timestamp, or position for slicing the data. If None, the series ends at the last value.
- Returns:
The local minimum of the specified range in the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> data = pd.Series([1, 2, 5, 4, 3]) >>> calculate_trough(data) 1.0 >>> calculate_trough(data, start=1, end=3) 2.0
Variance#
Computes the variance of the series.
- interpreTS.core.features.feature_variance.calculate_variance(data, ddof=0)[source]#
Calculate the variance value of a time series with specified degrees of freedom.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the variance is to be calculated.
ddof (int, optional) – Delta degrees of freedom. The divisor used in calculations is N - ddof, where N is the number of elements. A ddof of 1 provides the sample variance, and a ddof of 0 provides the population variance. Default is 1.
- Returns:
The variance of the provided time series with specified degrees of freedom.
- Return type:
float
- Raises:
TypeError – If the data is not numeric.
ValueError – If the data contains NaN values or is not one-dimensional.
Examples
>>> import pandas as pd >>> import numpy as np >>> data = pd.Series([10, 12, 14, 16, 18]) >>> calculate_variance(data) 10.0
>>> data = np.array([2, 4, 6, 8, 10]) >>> calculate_variance(data, ddof=0) 8.0
>>> data = pd.Series([5]) >>> calculate_variance(data) 0.0
Dominant#
Finds the most dominant value in a histogram representation.
- interpreTS.core.features.feature_histogram_dominant.calculate_dominant(data, bins=10, return_bin_center=False)[source]#
Calculate the dominant value (mode) of a time series histogram.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the dominant value is to be calculated.
bins (int, optional) – The number of bins to use for creating the histogram, by default 10.
return_bin_center (bool, optional) – If True, return the center of the bin with the maximum frequency. Otherwise, return the lower bound of the bin (default is False).
- Returns:
The dominant value of the histogram (either the center or the lower bound of the bin).
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> import numpy as np >>> data = np.array([1, 2, 2, 3, 3, 3, 4, 4, 5]) >>> calculate_dominant(data, bins=5) 2.6
>>> data = np.array([10, 20, 20, 30, 30, 30, 40, 40, 50]) >>> calculate_dominant(data, bins=5, return_bin_center=True) 30.0
>>> data = np.array([1, 1, 1, 1, 1]) >>> calculate_dominant(data, bins=3) 0.8333333333333333
Seasonality Strength#
Assesses the strength of seasonality patterns in the data.
- interpreTS.core.features.feature_seasonality_strength.calculate_seasonality_strength(data, period=2, max_lag=12)[source]#
Calculate the strength of the seasonality in a time series based on autocorrelation.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the seasonality strength is to be calculated.
period (int, optional) – The periodic interval to check for seasonality (default is 2).
max_lag (int, optional) – The maximum number of lags to consider for autocorrelation (default is 12).
- Returns:
The seasonality strength, ranging from 0 to 1, where 1 indicates strong seasonality. Returns np.nan if the data is insufficient or invalid.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values, is too short to calculate seasonality, or if the period is invalid.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 2, 1, 2, 3, 2, 1, 2, 3, 2], index=pd.date_range("2023-01-01", periods=12, freq="M")) >>> calculate_seasonality_strength(data, period=3) 1.0
Trend Strength#
Measures the strength of the overall trend in the series.
- interpreTS.core.features.feature_trend_strength.calculate_trend_strength(data)[source]#
Calculate the strength of the trend in a time series using linear regression.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the trend strength is to be calculated.
- Returns:
The R-squared value representing the strength of the trend (0 to 1).
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or is empty.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_trend_strength(data) 1.0
Above 9th Decile#
Calculates whether the values in the time series are above the 9th decile.
- interpreTS.core.features.feature_above_9th_decile.calculate_above_9th_decile(data, training_data)[source]#
Calculate the fraction of values in the window above the 9th decile of the training data.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the fraction is to be calculated.
training_data (pd.Series or np.ndarray) – The training data to determine the 9th decile.
- Returns:
The fraction of values in the window above the 9th decile, in the range [0, 1].
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or is empty.
Examples
>>> import pandas as pd >>> data = pd.Series([8, 9, 10, 11, 12]) >>> training_data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> calculate_above_9th_decile(data, training_data) 0.6
Distance to the Last Change Point#
Measures the distance to the last change point in the time series.
- interpreTS.core.features.feature_distance_to_the_last_change_point.calculate_distance_to_last_trend_change(data, window_size=5)[source]#
Calculate the distance (in terms of indices) to the last trend change point in a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the distance to the last trend change is to be calculated.
window_size (int, optional) – The size of the rolling window to calculate mean (default is 5).
- Returns:
The distance (in terms of indices) to the last trend change point. If no change is detected, returns None.
- Return type:
int or None
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or window_size is invalid.
Examples
>>> data = pd.Series([1, 2, 3, 2, 1, 2, 3, 2, 1]) >>> calculate_distance_to_last_trend_change(data, window_size=2) 1
Absolute Energy#
Calculates the absolute energy of the time series.
- interpreTS.core.features.feature_absolute_energy.calculate_absolute_energy(data, start=None, end=None)[source]#
Calculate the absolute energy of a time series within an optional range.
Absolute energy is defined as the sum of squared values in the time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the absolute energy is to be calculated.
start (int, str, or None, optional) – The starting index, timestamp, or position for slicing the data. If None, the series starts from the beginning.
end (int, str, or None, optional) – The ending index, timestamp, or position for slicing the data. If None, the series ends at the last value.
- Returns:
The absolute energy of the specified range in the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4]) >>> calculate_absolute_energy(data) 30.0 >>> calculate_absolute_energy(data, start=1, end=3) 13.0
Below 1st Decile#
Calculates whether the values in the time series are below the 1st decile.
- interpreTS.core.features.feature_below_1st_decile.calculate_below_1st_decile(data, training_data)[source]#
Calculate the fraction of values in the window below the 1st decile of the training data.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the fraction is to be calculated.
training_data (pd.Series or np.ndarray) – The training data to determine the 1st decile.
- Returns:
The fraction of values in the window below the 1st decile, in the range [0, 1].
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or is empty.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> training_data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> calculate_below_1st_decile(data, training_data) 0.2
Binarize Mean#
Binarizes the mean value of the time series.
- interpreTS.core.features.feature_binarize_mean.calculate_binarize_mean(data)[source]#
Calculate the binarize mean of a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the binarize mean is to be calculated.
- Returns:
The binarize mean of the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_binarize_mean(data) 0.6
Crossing Points#
Counts the crossing points in the time series.
- interpreTS.core.features.feature_crossing_points.calculate_crossing_points(data)[source]#
Calculate the number of times and the list of indices where the time series crosses its mean.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which mean crossings are to be calculated.
- Returns:
A dictionary containing: - ‘crossing_count’: The total number of crossings. - ‘crossing_points’: A list of indices where crossings occur.
- Return type:
dict
- Raises:
ValueError – If the input data is empty or contains NaN values.
Entropy#
Calculates the entropy of the time series.
- interpreTS.core.features.feature_entropy.calculate_entropy(data)[source]#
Calculate the normalized Shannon entropy of a dataset.
This function estimates the probability density of the data using Kernel Density Estimation (KDE) and calculates the Shannon entropy based on the estimated probabilities.
- Parameters:
data (array-like) – A 1D array or list of numerical data points.
- Returns:
The normalized Shannon entropy of the dataset. If the dataset consists of identical values (i.e., no variability), the entropy is 0. If the KDE results in zero probability for any point, NaN is returned to indicate that the entropy could not be calculated properly.
- Return type:
float
Notes
If the range (peak-to-peak value) of the data is zero (i.e., all values are identical), the entropy is directly returned as 0.
The dataset is evaluated at 100 evenly spaced points between the minimum and maximum values of the data for KDE estimation.
The Shannon entropy is normalized by dividing by np.log2(len(x)), where len(x) is the number of points used for KDE evaluation, to scale the entropy between 0 and 1.
If any probability in the KDE is zero, the function returns NaN, indicating a problematic or poorly estimated probability distribution.
Examples
>>> data = [1, 1, 1, 1, 2, 2, 2] >>> calculate_entropy(data) 0.9182958340544894 # Example output, depending on the data distribution.
Flat Spots#
Identifies flat spots within the time series.
- interpreTS.core.features.feature_flat_spots.calculate_flat_spots(data, window_size=5)[source]#
Calculate the number of flat spots in the time series.
Flat spots are defined as maximum run-lengths across equally-sized segments of the time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which flat spots are to be calculated.
window_size (int, optional) – The size of the window to look for flat spots (default is 5).
- Returns:
The number of flat spots in the time series.
- Return type:
int
Examples
>>> import pandas as pd >>> data = pd.Series([1, 1, 1, 2, 3, 3, 4, 4, 4, 4, 4, 5, 1, 1]) >>> calculate_flat_spots(data) 4
Heterogeneity#
Measures the heterogeneity of the time series.
- interpreTS.core.features.feature_heterogeneity.calculate_heterogeneity(data)[source]#
Calculate the heterogeneity (coefficient of variation) of a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the heterogeneity is to be calculated.
- Returns:
The heterogeneity (coefficient of variation) of the provided time series.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or if the mean of the series is zero.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_heterogeneity(data) 0.5270462766947299
Linearity#
Calculates the linearity of the time series.
- interpreTS.core.features.feature_linearity.calculate_linearity(data, normalize=True, use_derivative=True)[source]#
Calculate the linearity of a time series, similar to tsflex or sktime implementations.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the linearity is to be calculated.
normalize (bool, optional) – Whether to normalize the data before calculating linearity (default is True).
use_derivative (bool, optional) – Whether to calculate linearity on the first derivative of the data (default is True).
- Returns:
The R-squared value representing the linearity of the time series. A value closer to 1 indicates higher linearity.
- Return type:
float
- Raises:
TypeError – If the data is not a valid time series type or contains non-numeric values.
ValueError – If the data is empty or contains insufficient unique points for regression.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5]) >>> calculate_linearity(data) 1.0 >>> data = pd.Series([1, 2, 1, 2, 1, 2, 1, 2, 1, 2]) >>> calculate_linearity(data) 0.0
Missing Points#
Identifies missing points within the time series.
- interpreTS.core.features.feature_missing_points.calculate_missing_points(data, percentage=True)[source]#
Calculate the percentage or count of missing (NaN or None) values in a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which missing information is to be calculated.
percentage (bool, optional) – If True, returns the percentage of missing values. If False, returns the count of missing values. Default is True.
- Returns:
The percentage or count of missing values in the provided time series.
- Return type:
float or int
- Raises:
TypeError – If the data is not a valid time series type.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, np.nan, 4, None]) >>> missing_points(data) 0.4 >>> missing_points(data, percentage=False) 2
Outliers IQR#
Identifies outliers based on the interquartile range (IQR).
- interpreTS.core.features.feature_outliers_iqr.calculate_outliers_iqr(data, training_data, epsilon=1e-06)[source]#
Calculates the percentage of observations in a given window that fall below (Q1 - 1.5 * IQR) or above (Q3 + 1.5 * IQR) using the Interquartile Range (IQR) method.
- Parameters:
data (np.ndarray or pd.Series) – The data window to analyze for outliers.
training_data (np.ndarray or pd.Series) – The training data used to calculate Q1 (25th percentile), Q3 (75th percentile), and IQR.
epsilon (float, optional) – A small tolerance added to bounds when training data contains a single unique value (default is 1e-6).
- Returns:
The percentage of observations in the window that are considered outliers.
- Return type:
float
Examples
>>> import numpy as np >>> training_data = np.array([10, 12, 14, 15, 16, 18, 19]) >>> data = np.array([9, 15, 20, 25]) >>> calculate_outliers_iqr(data, training_data) 0.25
Outliers STD#
Identifies outliers based on standard deviation (STD).
- interpreTS.core.features.feature_outliers_std.calculate_outliers_std(data, training_data)[source]#
Calculates the percentage of observations in a window that are above or below 3 standard deviations from the mean, based on the training dataset.
- Parameters:
data (np.ndarray or pd.Series) – Window data to analyze.
training_data (np.ndarray or pd.Series) – Training data used to calculate the mean and standard deviation.
- Returns:
Percentage of observations in the window that deviate by more than 3 standard deviations.
- Return type:
float
Examples
>>> import numpy as np >>> import pandas as pd >>> training_data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> data = pd.Series([0, 10, 2, 3, 15]) >>> calculate_outliers_std(data, training_data) 0.2
Significant Changes#
Detects significant changes in the time series.
- interpreTS.core.features.feature_significant_changes.calculate_significant_changes(data)[source]#
Calculate the proportion of significant increases or decreases in the signal within the given window.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the significant change is to be calculated.
- Returns:
The proportion of significant changes in the window, in the range [0, 1].
- Return type:
float
Examples
>>> import numpy as np >>> data = np.array([1, 2, 1, 3, 10, 2, 1]) >>> calculate_significant_changes(data) 0.0
Stability#
Measures the stability of the time series.
- interpreTS.core.features.feature_stability.calculate_stability(data, max_lag=None)[source]#
Calculate the stability of a time series based on autocorrelation.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the stability is to be calculated.
max_lag (int, optional) – The maximum number of lags to consider for autocorrelation. If None, it will be set to min(12, len(data) - 1).
- Returns:
The stability strength, ranging from 0 to 1, where 1 indicates high stability.
- Return type:
float
Examples
>>> import pandas as pd >>> data = pd.Series([10, 12, 11, 13, 12, 14, 11, 13, 12, 14, 13]) >>> calculate_stability(data) 0.8410385081084804
>>> # Example with less stable data >>> data = pd.Series([5, 20, 3, 18, 1, 25, 2, 22, 0, 19]) >>> calculate_stability(data) 0.6144144729613819
Variance Change#
Calculates the variance change over time.
- interpreTS.core.features.feature_variance_change.calculate_change_in_variance(data, window_size=5)[source]#
Calculate the change in variance over time in a time series.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the change in variance is to be calculated.
window_size (int, optional) – The size of the rolling window to calculate variance (default is 5).
- Returns:
A series containing the change in variance over time, with the same index as the input.
- Return type:
pd.Series
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or is too short to calculate variance.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> calculate_change_in_variance(data, window_size=3) 0 NaN 1 NaN 2 NaN 3 0.00 4 0.00 dtype: float64
Variability in Sub-Periods#
Measures variability within sub-periods of the time series.
- interpreTS.core.features.feature_variability_in_sub_periods.calculate_variability_in_sub_periods(data, window_size, step_size=None, ddof=0)[source]#
Calculate the variance within sub-periods of a time series, providing a measure of variability.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the variability is to be calculated.
window_size (int) – The size of each sub-period window (number of points in each window).
step_size (int, optional) – The step size between sub-periods. If None, it defaults to window_size (non-overlapping windows).
ddof (int, optional) – The degrees of freedom to use when calculating variance within each sub-period. Default is 0 (population variance).
- Returns:
A series of variance values representing the variability in each sub-period.
- Return type:
pd.Series
- Raises:
TypeError – If the data is not a valid time series type.
ValueError – If the data contains NaN values or if window_size is larger than the data length.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) >>> calculate_variability_in_sub_periods(data, window_size=5) 0 2.5 1 2.5 dtype: float64
Amplitude Change Rate#
Calculates the rate of amplitude change in the time series.
- interpreTS.core.features.feature_amplitude_change_rate.calculate_amplitude_change_rate(data)[source]#
Calculate the average amplitude change rate in a time series, defined as the mean change in amplitude between consecutive local peaks and troughs.
- Parameters:
data (pd.Series or np.ndarray) – The time series data for which the amplitude change rate is to be calculated.
- Returns:
The average amplitude change rate. Returns NaN if no peaks/troughs are found.
- Return type:
float
- Raises:
TypeError – If the data is not numeric or not a valid type.
ValueError – If the data contains NaN values or is not one-dimensional.
Examples
>>> import pandas as pd >>> data = pd.Series([1, 3, 2, 4, 1, 5, 2, 6, 3]) >>> calculate_amplitude_change_rate(data) 2.0
>>> data = pd.Series([1, 1, 1, 1]) >>> calculate_amplitude_change_rate(data) nan
Notes#
Each feature is designed to provide specific insights into time-series data. For detailed usage, refer to the module documentation linked above.