Source code for pygrinder.missing_completely_at_random.mcar
"""Corrupt data by adding missing values to it with MCAR (missing completely at random) pattern."""# Created by Wenjie Du <wenjay.du@gmail.com># License: BSD-3-ClausefromtypingimportUnionimportnumpyasnpimporttorchdef_mcar_numpy(X:np.ndarray,p:float,)->np.ndarray:assert0<p<1,f"p must be in range (0, 1), but got {p}"# clone X to ensure values of X out of this function not being affectedX=np.copy(X)mcar_missing_mask=np.asarray(np.random.rand(np.prod(X.shape))<p)mcar_missing_mask=mcar_missing_mask.reshape(X.shape)X[mcar_missing_mask]=np.nan# mask values selected by mcar_missing_maskreturnXdef_mcar_torch(X:torch.Tensor,p:float,)->torch.Tensor:assert0<p<1,f"p must be in range (0, 1), but got {p}"# clone X to ensure values of X out of this function not being affectedX=torch.clone(X)mcar_missing_mask=torch.rand(X.shape)<pX[mcar_missing_mask]=torch.nan# mask values selected by mcar_missing_maskreturnX
[docs]defmcar(X:Union[np.ndarray,torch.Tensor],p:float,)->Union[np.ndarray,torch.Tensor]:"""Create completely random missing values (MCAR case). Parameters ---------- X : Data vector. If X has any missing values, they should be numpy.nan. p : The probability that values may be masked as missing completely at random. Note that the values are randomly selected no matter if they are originally missing or observed. If the selected values are originally missing, they will be kept as missing. If the selected values are originally observed, they will be masked as missing. Therefore, if the given X already contains missing data, the final missing rate in the output X could be in range [original_missing_rate, original_missing_rate+rate], but not strictly equal to `original_missing_rate+rate`. Because the selected values to be artificially masked out may be originally missing, and the masking operation on the values will do nothing. Returns ------- corrupted_X : Original X with artificial missing values. Both originally-missing and artificially-missing values are left as NaN. """assert0<p<1,f"p must be in range (0, 1), but got {p}"ifisinstance(X,list):X=np.asarray(X)ifisinstance(X,np.ndarray):corrupted_X=_mcar_numpy(X,p)elifisinstance(X,torch.Tensor):corrupted_X=_mcar_torch(X,p)else:raiseTypeError(f"X must be type of list/numpy.ndarray/torch.Tensor, but got {type(X)}")returncorrupted_X