All APIs of BenchPOTS¶

BenchPOTS¶

benchpots.datasets¶

benchpots.datasets.preprocess_physionet2012(subset, rate, pattern='point', features=None, random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset PhysionNet2012.

Parameters:

subset (str) – The name of the subset dataset to be loaded. Must be one of [‘all’, ‘set-a’, ‘set-b’, ‘set-c’].
rate (float) – The missing rate.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
features (Optional[list]) – The features to be used in the dataset. If None, all features except the static features will be used.
random_state (Optional[int]) – Controls the randomness of the train/validation/test split. Pass an int for reproducible splits across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed PhysionNet2012.

Return type:

processed_dataset

benchpots.datasets.preprocess_physionet2019(subset, rate, pattern='point', features=None, random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset PhysionNet2019.

Parameters:

subset (str) – The name of the subset dataset to be loaded. Must be one of [‘all’, ‘training_setA’, ‘training_setB’].
rate (float) – The missing rate.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
features (Optional[list]) – The features to be used in the dataset. If None, all features except the static features will be used.
random_state (Optional[int]) – Controls the randomness of the train/validation/test split. Pass an int for reproducible splits across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed PhysionNet2019.

Return type:

processed_dataset

benchpots.datasets.preprocess_beijing_air_quality(rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset Beijing Multi-site Air Quality.

Parameters:

rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed Beijing Multi-site Air Quality dataset.

Return type:

processed_dataset

benchpots.datasets.preprocess_italy_air_quality(rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset Italy Air Quality.

Parameters:

rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed Italy Air Quality.

Return type:

processed_dataset

benchpots.datasets.preprocess_electricity_load_diagrams(rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset Electricity Load Diagrams.

Parameters:

rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed Electricity Load Diagrams.

Return type:

processed_dataset

benchpots.datasets.preprocess_ett(subset, rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset ETT.

Parameters:

subset (str) – The name of the subset dataset to be loaded. Must be one of [‘ETTm1’, ‘ETTm2’, ‘ETTh1’, ‘ETTh2’].
rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed ETT.

Return type:

processed_dataset

benchpots.datasets.preprocess_pems_traffic(rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset PeMS traffic.

Parameters:

rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed PeMS traffic.

Return type:

processed_dataset

benchpots.datasets.preprocess_ucr_uea_datasets(dataset_name, rate, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset from UCR&UEA.

Parameters:

dataset_name (str) – The name of the UCR_UEA dataset to be loaded. Must start with ‘ucr_uea_’. Use tsdb.list() to get all available datasets.
rate (float) – The missing rate.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of the train/validation split. Pass an int for reproducible splits across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed UCR&UEA dataset.

Return type:

processed_dataset

benchpots.datasets.preprocess_solar_alabama(rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset Solar Alabama.

Parameters:

rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed Solar Alabama.

Return type:

processed_dataset

benchpots.datasets.preprocess_random_walk(n_steps=24, n_features=10, n_classes=2, n_samples_each_class=1000, anomaly_rate=0, missing_rate=0.1, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Generate a random-walk data.

Parameters:

n_steps (int, default=24) – Number of time steps in each sample.
n_features (int, default=10) – Number of features.
n_classes (int, default=2) – Number of classes (types) of the generated data.
n_samples_each_class (int, default=1000) – Number of samples for each class to generate.
anomaly_rate (float, default=0) – Proportion of anomaly samples in all samples. Default as 0 means no anomaly samples are generated.
missing_rate (float, default=0.1) – The rate of randomly missing values to generate, should be in [0,1).
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness for generated samples and train/validation/test splits. Pass an int for reproducible outputs across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

data – A dictionary containing the generated data.

Return type:

dict,

benchpots.datasets.preprocess_nl_benchmarks(dataset_name, rate, n_steps, pattern='point', random_state=None, task_type='imputation', n_pred_steps=1, forecast_feature_indices=None, **kwargs)[source]¶

Load and preprocess the dataset from nonlinear benchmarks.

Parameters:

dataset_name (str) –
The name of the nonlinear benchmark dataset to be loaded. Must be one of [

”EMPS”, “CED”, “WienerHammerBenchMark”, “Silverbox”, “F16”, “ParWH”, “Cascaded_Tanks”,

].
rate (float) – The missing rate.
n_steps (int) – The number of time steps to in the generated data samples. Also the window size of the sliding window.
pattern (str) – The missing pattern to apply to the dataset. Must be one of [‘point’, ‘subseq’, ‘block’].
random_state (Optional[int]) – Controls the randomness of missingness generation. Pass an int for reproducible missingness masks across runs.
task_type (str) – Task type for postprocessing. Supported values are [‘imputation’, ‘forecasting’, ‘classification’, ‘clustering’, ‘anomaly_detection’].
n_pred_steps (int) – Forecasting horizon. Effective only when task_type is ‘forecasting’.
forecast_feature_indices (Union[int, Sequence[int], None]) – Target feature indices for forecasting labels. If None, all features are used.

Returns:

A dictionary containing the processed nonlinear benchmark datasets.

Return type:

processed_dataset