interpreTS.core package#
Subpackages#
- interpreTS.core.features package
- Submodules
- interpreTS.core.features.feature_length module
- interpreTS.core.features.feature_mean module
- interpreTS.core.features.feature_peak module
- interpreTS.core.features.feature_spikeness module
- interpreTS.core.features.feature_std_1st_der module
- interpreTS.core.features.feature_trough module
- interpreTS.core.features.feature_variance module
- interpreTS.core.features.histogram_dominant module
- interpreTS.core.features.seasonality_strength module
- interpreTS.core.features.trend_strength module
- Module contents
Submodules#
interpreTS.core.feature_extractor module#
- class interpreTS.core.feature_extractor.FeatureExtractor(features=None, feature_params=None, window_size=nan, stride=1, id_column=None, sort_column=None, feature_column=None, group_by=None)[source]#
Bases:
object
- CAN_USE_NAN = ['missing_points', 'peak', 'spikeness', 'trough', 'seasonality_strength']#
- DEFAULT_FEATURES_BIG = ['absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'change_in_variance']#
- DEFAULT_FEATURES_SMALL = ['length', 'mean', 'variance', 'stability', 'entropy', 'spikeness', 'seasonality_strength']#
- FEATURES_ALL = ['above_9th_decile', 'below_1st_decile', 'absolute_energy', 'binarize_mean', 'change_in_variance', 'crossing_points', 'distance_to_last_trend_change', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'outliers_iqr', 'outliers_std', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'mean_change', 'seasonality_strength', 'trend_strength', 'variability_in_sub_periods', 'change_in_variance']#
- FOR_ML = ['absolute_energy', 'binarize_mean', 'dominant', 'entropy', 'flat_spots', 'heterogeneity', 'linearity', 'length', 'mean', 'missing_points', 'peak', 'significant_changes', 'spikeness', 'stability', 'std_1st_der', 'trough', 'variance', 'seasonality_strength', 'trend_strength']#
- add_custom_feature(name, function, metadata=None, params=None)[source]#
Add a custom feature to the FeatureExtractor with optional parameters.
- Parameters:
name (str) – The name of the custom feature.
function (callable) – A function that computes the feature. It should accept a Pandas Series and optional parameters as input.
metadata (dict, optional) – A dictionary containing metadata about the feature, such as its interpretability level and description. - level (str): Interpretability level (‘easy’, ‘moderate’, ‘advanced’). - description (str): Description of the feature.
params (dict, optional) –
A dictionary of parameters to be passed to the feature function when it is executed. Example: {
’level’: ‘easy’ | ‘moderate’ | ‘advanced’, ‘description’: ‘Description of the feature.’
}
- Raises:
ValueError – If the feature name already exists or the function is not callable.
- extract_features(data, progress_callback=None, mode='sequential', n_jobs=-1)[source]#
Extract features from a time series dataset.
- Parameters:
data (pd.DataFrame or pd.Series) – The time series data for which features are to be extracted.
progress_callback (function, optional) – A function to report progress, which takes a single argument: progress percentage (0-100).
mode (str, optional) – The mode of processing. Can be ‘parallel’ for multi-threaded processing or ‘sequential’ for single-threaded processing with real-time progress reporting.
n_jobs (int, optional) – The number of jobs (processes) to run in parallel. Default is -1 (use all available CPUs).
- Returns:
A DataFrame containing calculated features for each window.
- Return type:
pd.DataFrame
- extract_features_stream(data_stream, progress_callback=None)[source]#
Extract features from a stream of time series data.
- Parameters:
data_stream (iterable) – An iterable that yields incoming data points as dictionaries with keys corresponding to column names.
progress_callback (function, optional) – A function to report progress, which takes a single argument: the total number of processed points.
- Yields:
dict – A dictionary containing the calculated features for the current window.
- group_data(data)[source]#
Group data based on the group_by column.
- Parameters:
data (pd.DataFrame) – Input data.
- Returns:
Grouped data.
- Return type:
iterable
- group_features_by_interpretability()[source]#
Group features by their interpretability levels.
- Returns:
A dictionary where keys are interpretability levels (‘easy’, ‘moderate’, ‘advanced’), and values are lists of feature names.
- Return type:
dict
- head(features_df, n=5)[source]#
Returns the first n rows of the resulting DataFrame from the extract_features function.
- Parameters:
features_df (pd.DataFrame) – The resulting DataFrame from the extract_features function.
n (int, optional (default 5)) – The number of rows to return. If n is negative, returns all rows except the last |n| rows.
- Returns:
The first n rows of the DataFrame.
- Return type:
pd.DataFrame
interpreTS.core.time_series_data module#
- class interpreTS.core.time_series_data.TimeSeriesData(data)[source]#
Bases:
object
A class to manage and process time series data.
- resample(interval)[source]#
Resample the time series data to a specified interval.
- Parameters:
interval (str) – The interval to resample the data, e.g., ‘D’ for daily, ‘H’ for hourly.
- Returns:
A new TimeSeriesData object with resampled data.
- Return type:
Examples
>>> data = pd.Series([1, 2, 3, 4, 5], index=pd.date_range("2023-01-01", periods=5, freq="D")) >>> ts_data = TimeSeriesData(data) >>> resampled_data = ts_data.resample("2D")
- split(train_size=0.7)[source]#
Split the time series data into training and test sets.
- Parameters:
train_size (float, optional) – The proportion of the data to use for training, by default 0.7.
- Returns:
A tuple containing the training and test sets as TimeSeriesData objects.
- Return type:
tuple of TimeSeriesData
Examples
>>> data = pd.Series([1, 2, 3, 4, 5]) >>> ts_data = TimeSeriesData(data) >>> train, test = ts_data.split(0.6)