pypots.utils package#

pypots.utils.file#

Utilities for checking things.

pypots.utils.file.extract_parent_dir(path)[source]#

Extract the given path’s parent directory.

Parameters:: path (str) – The path for extracting.
Returns:: The path to the parent dir of the given path.
Return type:: parent_dir

pypots.utils.file.create_dir_if_not_exist(path, is_dir=True)[source]#

Create the given directory if it doesn’t exist.

Parameters:

path (str) – The path for check.
is_dir (bool) – Whether the given path is to a directory. If is_dir is False, the given path is to a file or an object, then this file’s parent directory will be checked.

Return type:

None

pypots.utils.metrics#

pypots.utils.metrics.calc_mae(predictions, targets, masks=None)[source]#

Calculate the Mean Absolute Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.utils.metrics import calc_mae
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mae = calc_mae(predictions, targets)

mae = 0.6 here, the error is from the 3rd and 5th elements and is $|3-1|+|5-6|=3$ , so the result is 3/5=0.6.

If we want to prevent some values from MAE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mae = calc_mae(predictions, targets, masks)

mae = 0.5 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|=1$ , so the result is 1/2=0.5.

pypots.utils.metrics.calc_mse(predictions, targets, masks=None)[source]#

Calculate the Mean Square Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.utils.metrics import calc_mse
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mse = calc_mse(predictions, targets)

mse = 1 here, the error is from the 3rd and 5th elements and is $|3-1|^2+|5-6|^2=5$ , so the result is 5/5=1.

If we want to prevent some values from MSE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mse = calc_mse(predictions, targets, masks)

mse = 0.5 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is 1/2=0.5.

pypots.utils.metrics.calc_rmse(predictions, targets, masks=None)[source]#

Calculate the Root Mean Square Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.utils.metrics import calc_rmse
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> rmse = calc_rmse(predictions, targets)

rmse = 1 here, the error is from the 3rd and 5th elements and is $|3-1|^2+|5-6|^2=5$ , so the result is $\sqrt{5/5}=1$ .

If we want to prevent some values from RMSE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> rmse = calc_rmse(predictions, targets, masks)

rmse = 0.707 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is $\sqrt{1/2}=0.5$ .

pypots.utils.metrics.calc_mre(predictions, targets, masks=None)[source]#

Calculate the Mean Relative Error between predictions and targets. masks can be used for filtering. For values==0 in masks, values at their corresponding positions in predictions will be ignored.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor, None]) – The masks for filtering the specific values in inputs and target from evaluation. When given, only values at corresponding positions where values ==1 in masks will be used for evaluation.

Return type:

Union[float, Tensor]

Examples

>>> import numpy as np
>>> from pypots.utils.metrics import calc_mre
>>> targets = np.array([1, 2, 3, 4, 5])
>>> predictions = np.array([1, 2, 1, 4, 6])
>>> mre = calc_mre(predictions, targets)

mre = 0.2 here, the error is from the 3rd and 5th elements and is $|3-1|+|5-6|=3$ , so the result is $\sqrt{3/(1+2+3+4+5)}=1$ .

If we want to prevent some values from MRE calculation, e.g. the first three elements here, we can use masks to filter out them:

>>> masks = np.array([0, 0, 0, 1, 1])
>>> mre = calc_mre(predictions, targets, masks)

mre = 0.111 here, the first three elements are ignored, the error is from the 5th element and is $|5-6|^2=1$ , so the result is $\sqrt{1/2}=0.5$ .

pypots.utils.metrics.calc_quantile_crps(predictions, targets, masks, scaler_mean=0, scaler_stddev=1)[source]#

Continuous rank probability score for distributional predictions.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor]) – The masks for filtering the specific values in inputs and target from evaluation. Only values at corresponding positions where values ==1 in masks will be used for evaluation.
scaler_mean – Mean value of the scaler used to scale the data.
scaler_stddev – Standard deviation value of the scaler used to scale the data.

Returns:

Value of continuous rank probability score.

Return type:

CRPS

pypots.utils.metrics.calc_quantile_crps_sum(predictions, targets, masks, scaler_mean=0, scaler_stddev=1)[source]#

Sum continuous rank probability score for distributional predictions.

Parameters:

predictions (Union[ndarray, Tensor]) – The prediction data to be evaluated.
targets (Union[ndarray, Tensor]) – The target data for helping evaluate the predictions.
masks (Union[ndarray, Tensor]) – The masks for filtering the specific values in inputs and target from evaluation. Only values at corresponding positions where values ==1 in masks will be used for evaluation.
scaler_mean – Mean value of the scaler used to scale the data.
scaler_stddev – Standard deviation value of the scaler used to scale the data.

Returns:

Sum value of continuous rank probability score.

Return type:

CRPS

pypots.utils.metrics.calc_binary_classification_metrics(prob_predictions, targets, pos_label=1)[source]#

Calculate the evaluation metrics for the binary classification task, including accuracy, precision, recall, f1 score, area under ROC curve, and area under Precision-Recall curve. If targets contains multiple categories, please set the positive category as pos_label.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int) – The label of the positive class. Note that pos_label is also the index used to extract binary prediction probabilities from predictions.

Returns:

A dictionary contains classification metrics and useful results:

predictions: binary categories of the prediction results;

accuracy: prediction accuracy;

precision: prediction precision;

recall: prediction recall;

f1: F1-score;

precisions: precision values of Precision-Recall curve

recalls: recall values of Precision-Recall curve

pr_auc: area under Precision-Recall curve

fprs: false positive rates of ROC curve

tprs: true positive rates of ROC curve

roc_auc: area under ROC curve

Return type:

classification_metrics

pypots.utils.metrics.calc_precision_recall_f1(prob_predictions, targets, pos_label=1)[source]#

Calculate precision, recall, and F1-score of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, float, float]

Returns:

precision – The precision value of model predictions.
recall – The recall value of model predictions.
f1 – The F1 score of model predictions.

pypots.utils.metrics.calc_pr_auc(prob_predictions, targets, pos_label=1)[source]#

Calculate precisions, recalls, and area under PR curve of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probability predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, ndarray, ndarray, ndarray]

Returns:

pr_auc – Value of area under Precision-Recall curve.
precisions – Precision values of Precision-Recall curve.
recalls – Recall values of Precision-Recall curve.
thresholds – Increasing thresholds on the decision function used to compute precision and recall.

pypots.utils.metrics.calc_roc_auc(prob_predictions, targets, pos_label=1)[source]#

Calculate false positive rates, true positive rates, and area under AUC curve of model predictions.

Parameters:

prob_predictions (ndarray) – Estimated probabilities/predictions returned by a decision function.
targets (ndarray) – Ground truth (correct) classification results.
pos_label (int, default=1) – The label of the positive class.

Return type:

Tuple[float, ndarray, ndarray, ndarray]

Returns:

roc_auc – The area under ROC curve.
fprs – False positive rates of ROC curve.
tprs – True positive rates of ROC curve.
thresholds – Increasing thresholds on the decision function used to compute FPR and TPR.

pypots.utils.metrics.calc_acc(class_predictions, targets)[source]#

Calculate accuracy score of model predictions.

Parameters:

class_predictions (ndarray) – Estimated classification predictions returned by a classifier.
targets (ndarray) – Ground truth (correct) classification results.

Returns:

The accuracy of model predictions.

Return type:

acc_score

pypots.utils.metrics.calc_rand_index(class_predictions, targets)[source]#

Calculate Rand Index, a measure of the similarity between two data clusterings. Refer to [33].

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

Rand index.

Return type:

References

pypots.utils.metrics.calc_adjusted_rand_index(class_predictions, targets)[source]#

Calculate adjusted Rand Index.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

Adjusted Rand index.

Return type:

aRI

References

pypots.utils.metrics.calc_cluster_purity(class_predictions, targets)[source]#

Calculate cluster purity.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

cluster purity.

Return type:

cluster_purity

Notes

This function is from the answer https://stackoverflow.com/a/51672699 on StackOverflow.

pypots.utils.metrics.calc_nmi(class_predictions, targets)[source]#

Calculate Normalized Mutual Information between two clusterings.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

NMI – Normalized Mutual Information

Return type:

float,

pypots.utils.metrics.calc_chs(X, predicted_labels)[source]#

Compute the Calinski and Harabasz score (also known as the Variance Ratio Criterion).

Xarray-like of shape (n_samples_a, n_features): A feature array, or learned latent representation, that can be used for clustering.
predicted_labelsarray-like of shape (n_samples): Predicted labels for each sample.

Returns:: calinski_harabasz_score – The resulting Calinski-Harabasz score. In short, the higher, the better.
Return type:: float

References

pypots.utils.metrics.calc_dbs(X, predicted_labels)[source]#

Compute the Davies-Bouldin score.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

davies_bouldin_score – The resulting Davies-Bouldin score. In short, the lower, the better.

Return type:

float

References

pypots.utils.metrics.calc_silhouette(X, predicted_labels)[source]#

Compute the mean Silhouette Coefficient of all samples.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

silhouette_score – Mean Silhouette Coefficient for all samples. In short, the higher, the better.

Return type:

float

References

pypots.utils.metrics.calc_internal_cluster_validation_metrics(X, predicted_labels)[source]#

Computer all internal cluster validation metrics available in PyPOTS and return as a dictionary.

Parameters:

X (array-like of shape (n_samples_a, n_features)) – A feature array, or learned latent representation, that can be used for clustering.
predicted_labels (array-like of shape (n_samples)) – Predicted labels for each sample.

Returns:

internal_cluster_validation_metrics – A dictionary contains all internal cluster validation metrics available in PyPOTS.

Return type:

dict

pypots.utils.metrics.calc_external_cluster_validation_metrics(class_predictions, targets)[source]#

Computer all external cluster validation metrics available in PyPOTS and return as a dictionary.

Parameters:

class_predictions (ndarray) – Clustering results returned by a clusterer.
targets (ndarray) – Ground truth (correct) clustering results.

Returns:

external_cluster_validation_metrics – A dictionary contains all external cluster validation metrics available in PyPOTS.

Return type:

dict

pypots.utils.random#

PyPOTS util module about random seed setting.

pypots.utils.random.set_random_seed(random_seed=2204)[source]#

Manually set the random state to make PyPOTS output reproducible results.

Parameters:: random_seed (int) – The seed to be set for generating random numbers in PyPOTS.
Return type:: None

pypots.utils.random.get_random_seed()[source]#

Get the random seed used in PyPOTS.

Returns:: The random seed used in PyPOTS.
Return type:: random_seed