pypots.utils¶

pypots.utils.file¶

Utilities for checking things.

pypots.utils.file.extract_parent_dir(path)[source]¶

Extract the given path’s parent directory.

Parameters:: path (str) – The path for extracting.
Returns:: The path to the parent dir of the given path.
Return type:: parent_dir

pypots.utils.file.create_dir_if_not_exist(path, is_dir=True)[source]¶

Create the given directory if it doesn’t exist.

Parameters:

path (str) – The path for check.
is_dir (bool) – Whether the given path is to a directory. If is_dir is False, the given path is to a file or an object, then this file’s parent directory will be checked.

Return type:

None

pypots.utils.file.get_class_full_path(cls)[source]¶

Get the full path of the given class.

Parameters:: cls – The class to get the full path.
Returns:: The full path of the given class.
Return type:: path

pypots.nn.functional¶

pypots.nn.functional.autocast(**kwargs)[source]¶

pypots.nn.functional.gather_listed_dicts(dict_list)[source]¶

Gather batched dict output from model forward

Parameters:: dict_list (list) – A list of dict output from model forward. Each dict should have the same keys.
Returns:: A dict with the same keys as the input dict, but with values concatenated along the batch dimension.
Return type:: gathered_dict

pypots.nn.functional.nonstationary_norm(X, missing_mask=None)[source]¶

Normalization from Non-stationary Transformer. Please refer to [30] for more details.

Parameters:

X (torch.Tensor) – Input data to be normalized. Shape: (n_samples, n_steps (seq_len), n_features).
missing_mask (torch.Tensor, optional) – Missing mask has the same shape as X. 1 indicates observed and 0 indicates missing.

Return type:

Tuple[Tensor, Tensor, Tensor]

Returns:

X_enc (torch.Tensor) – Normalized data. Shape: (n_samples, n_steps (seq_len), n_features).
means (torch.Tensor) – Means values for de-normalization. Shape: (n_samples, n_features) or (n_samples, 1, n_features).
stdev (torch.Tensor) – Standard deviation values for de-normalization. Shape: (n_samples, n_features) or (n_samples, 1, n_features).

pypots.nn.functional.nonstationary_denorm(X, means, stdev)[source]¶

De-Normalization from Non-stationary Transformer. Please refer to [30] for more details.

Parameters:

X (torch.Tensor) – Input data to be de-normalized. Shape: (n_samples, n_steps (seq_len), n_features).
means (torch.Tensor) – Means values for de-normalization . Shape: (n_samples, n_features) or (n_samples, 1, n_features).
stdev (torch.Tensor) – Standard deviation values for de-normalization. Shape: (n_samples, n_features) or (n_samples, 1, n_features).

Returns:

X_denorm – De-normalized data. Shape: (n_samples, n_steps (seq_len), n_features).

Return type:

torch.Tensor

pypots.nn.functional.calc_mae(predictions, targets, masks=None)[source]¶

Calculate the Mean Absolute Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.nn.functional import calc_mae
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mae = calc_mae(predictions, targets)

mae = 0.6 here, the error is from the 3rd and 5th elements and is $|3-1|+|5-6|=3$ , so the result is 3/5=0.6.

If we want to prevent some values from MAE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mae = calc_mae(predictions, targets, masks)

mae = 0.5 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|=1$ , so the result is 1/2=0.5.

pypots.nn.functional.calc_mse(predictions, targets, masks=None)[source]¶

Calculate the Mean Square Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.nn.functional import calc_mse
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mse = calc_mse(predictions, targets)

mse = 1 here, the error is from the 3rd and 5th elements and is $|3-1|^2+|5-6|^2=5$ , so the result is 5/5=1.

If we want to prevent some values from MSE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mse = calc_mse(predictions, targets, masks)

mse = 0.5 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is 1/2=0.5.

pypots.nn.functional.calc_rmse(predictions, targets, masks=None)[source]¶

Calculate the Root Mean Square Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.nn.functional import calc_rmse
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> rmse = calc_rmse(predictions, targets)

rmse = 1 here, the error is from the 3rd and 5th elements and is $|3-1|^2+|5-6|^2=5$ , so the result is $\sqrt{5/5}=1$ .

If we want to prevent some values from RMSE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> rmse = calc_rmse(predictions, targets, masks)

rmse = 0.707 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is $\sqrt{1/2}=0.5$ .

pypots.nn.functional.calc_mre(predictions, targets, masks=None)[source]¶

Calculate the Mean Relative Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.nn.functional import calc_mre
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mre = calc_mre(predictions, targets)

mre = 0.2 here, the error is from the 3rd and 5th elements and is $|3-1|+|5-6|=3$ , so the result is $\sqrt{3/(1+2+3+4+5)}=1$ .

If we want to prevent some values from MRE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mre = calc_mre(predictions, targets, masks)

mre = 0.111 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is $\sqrt{1/2}=0.5$ .

pypots.nn.functional.calc_quantile_crps(predictions, targets, masks, scaler_mean=0, scaler_stddev=1)[source]¶

Continuous rank probability score for distributional predictions.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor]) – The masks for filtering the specific values in inputs and target from evaluation. Only values at corresponding positions where values ==1 in masks will be used for evaluation.
scaler_mean – Mean value of the scaler used to scale the data.
scaler_stddev – Standard deviation value of the scaler used to scale the data.

Returns:

Value of continuous rank probability score.

Return type:

CRPS

pypots.nn.functional.calc_quantile_crps_sum(predictions, targets, masks, scaler_mean=0, scaler_stddev=1)[source]¶

Sum continuous rank probability score for distributional predictions.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor]) – The masks for filtering the specific values in inputs and target from evaluation. Only values at corresponding positions where values ==1 in masks will be used for evaluation.
scaler_mean – Mean value of the scaler used to scale the data.
scaler_stddev – Standard deviation value of the scaler used to scale the data.

Returns:

Sum value of continuous rank probability score.

Return type:

CRPS

pypots.nn.functional.calc_binary_classification_metrics(prob_predictions, targets, pos_label=1)[source]¶

Calculate the evaluation metrics for the binary classification task, including accuracy, precision, recall, f1 score, area under ROC curve, and area under Precision-Recall curve. If targets contains multiple categories, please set the positive category as pos_label.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int) – The label of the positive class. Note that pos_label is also the index used to extract binary prediction probabilities from predictions.

Returns:

A dictionary contains classification metrics and useful results:

predictions: binary categories of the prediction results;

accuracy: prediction accuracy;

precision: prediction precision;

recall: prediction recall;

f1: F1-score;

precisions: precision values of Precision-Recall curve

recalls: recall values of Precision-Recall curve

pr_auc: area under Precision-Recall curve

fprs: false positive rates of ROC curve

tprs: true positive rates of ROC curve

roc_auc: area under ROC curve

Return type:

classification_metrics

pypots.nn.functional.calc_precision_recall_f1(prob_predictions, targets, pos_label=1)[source]¶

Calculate precision, recall, and F1-score of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, float, float]

Returns:

precision – The precision value of model predictions.
recall – The recall value of model predictions.
f1 – The F1 score of model predictions.

pypots.nn.functional.calc_pr_auc(prob_predictions, targets, pos_label=1)[source]¶

Calculate precisions, recalls, and area under PR curve of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, ndarray, ndarray, ndarray]

Returns:

pr_auc – Value of area under Precision-Recall curve.
precisions – Precision values of Precision-Recall curve.
recalls – Recall values of Precision-Recall curve.
thresholds – Increasing thresholds on the decision function used to compute precision and recall.

pypots.nn.functional.calc_roc_auc(prob_predictions, targets, pos_label=1)[source]¶

Calculate false positive rates, true positive rates, and area under AUC curve of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probabilities/predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, ndarray, ndarray, ndarray]

Returns:

roc_auc – The area under ROC curve.
fprs – False positive rates of ROC curve.
tprs – True positive rates of ROC curve.
thresholds – Increasing thresholds on the decision function used to compute FPR and TPR.

pypots.nn.functional.calc_acc(class_predictions, targets)[source]¶

Calculate accuracy score of model predictions.

Parameters:

class_predictions (ndarray) – Estimated classification predictions returned by a classifier.
targets (ndarray) – Ground truth (correct) classification results.

Returns:

The accuracy of model predictions.

Return type:

acc_score

pypots.nn.functional.calc_rand_index(class_predictions, targets)[source]¶

Calculate Rand Index, a measure of the similarity between two data clusterings. Refer to [54].

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

Rand index.

Return type:

References

pypots.nn.functional.calc_adjusted_rand_index(class_predictions, targets)[source]¶

Calculate adjusted Rand Index.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

Adjusted Rand index.

Return type:

aRI

References

pypots.nn.functional.calc_cluster_purity(class_predictions, targets)[source]¶

Calculate cluster purity.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

cluster purity.

Return type:

cluster_purity

Notes

This function is from the answer https://stackoverflow.com/a/51672699 on StackOverflow.

pypots.nn.functional.calc_nmi(class_predictions, targets)[source]¶

Calculate Normalized Mutual Information between two clusterings.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

NMI – Normalized Mutual Information

Return type:

float,

pypots.nn.functional.calc_chs(X, predicted_labels)[source]¶

Compute the Calinski and Harabasz score (also known as the Variance Ratio Criterion).

Xarray-like of shape (n_samples_a, n_features): A feature array, or learned latent representation, that can be used for clustering.
predicted_labelsarray-like of shape (n_samples): Predicted labels for each sample.

Returns:: calinski_harabasz_score – The resulting Calinski-Harabasz score. In short, the higher, the better.
Return type:: float

References

pypots.nn.functional.calc_dbs(X, predicted_labels)[source]¶

Compute the Davies-Bouldin score.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

davies_bouldin_score – The resulting Davies-Bouldin score. In short, the lower, the better.

Return type:

float

References

pypots.nn.functional.calc_silhouette(X, predicted_labels)[source]¶

Compute the mean Silhouette Coefficient of all samples.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

silhouette_score – Mean Silhouette Coefficient for all samples. In short, the higher, the better.

Return type:

float

References

pypots.nn.functional.calc_internal_cluster_validation_metrics(X, predicted_labels)[source]¶

Computer all internal cluster validation metrics available in PyPOTS and return as a dictionary.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

internal_cluster_validation_metrics – A dictionary contains all internal cluster validation metrics available in PyPOTS.

Return type:

dict

pypots.nn.functional.calc_external_cluster_validation_metrics(class_predictions, targets)[source]¶

Computer all external cluster validation metrics available in PyPOTS and return as a dictionary.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

external_cluster_validation_metrics – A dictionary contains all external cluster validation metrics available in PyPOTS.

Return type:

dict

pypots.utils.random¶

PyPOTS util module about random seed setting.

pypots.utils.random.set_random_seed(random_seed=2022)[source]¶

Manually set the random state to make PyPOTS output reproducible results.

Parameters:: random_seed (int) – The seed to be set for generating random numbers in PyPOTS.
Return type:: None

pypots.utils.random.get_random_seed()[source]¶

Get the random seed used in PyPOTS.

Returns:: The random seed used in PyPOTS.
Return type:: random_seed