.. PyPOTS developer documentation - Non-NN Integration Path Non-NN Integration Path ========================= Use this path for models that should **not** use ``BaseNNModel`` at all. ``LOCF`` is the cleanest example. When This Path Is Correct ---------------------------- Choose the non-NN path when: - There is no gradient-based training loop - There is no optimizer - The model is rule-based, statistical, or algorithmic - Wrapping it in a neural-network base class would add fake complexity Good examples in PyPOTS: - ``LOCF`` (Last Observed Carried Forward) - ``Mean`` (fill with mean values) - ``Median`` (fill with median values) - ``Lerp`` (linear interpolation) - ``TRMF`` (Temporal Regularized Matrix Factorization) - ``BTTF`` (Bayesian Temporal Tensor Factorization) ``LOCF`` as the Reference Pattern -------------------------------------- ``LOCF`` inherits ``BaseImputer``, **not** ``BaseNNImputer``. That choice immediately removes: - Optimizer setup - ``_train_model()`` - ``_assemble_input_*`` hooks - Checkpoint-selection logic tied to NN training Its implementation is direct and clean: .. code-block:: python # pypots/imputation/locf/model.py (simplified) import warnings from typing import Union, Optional import h5py import numpy as np import torch from .core import locf_numpy, locf_torch from ..base import BaseImputer class LOCF(BaseImputer): """LOCF imputation: fills missing values with the last observed value. Parameters ---------- first_step_imputation : str, default='zero' Strategy for imputing missing values at the beginning of sequences. Can be 'backward', 'zero', 'median', or 'nan'. """ def __init__( self, first_step_imputation: str = "zero", device: Optional[Union[str, torch.device, list]] = None, ): super().__init__(device=device) assert first_step_imputation in ["nan", "zero", "backward", "median"] self.first_step_imputation = first_step_imputation def fit( self, train_set: Union[dict, str], val_set: Optional[Union[dict, str]] = None, file_type: str = "hdf5", ) -> None: """LOCF does not need training. Issues a warning.""" warnings.warn( "LOCF has no parameter to train. " "Please run func `predict()` directly." ) def predict( self, test_set: Union[dict, str], file_type: str = "hdf5", **kwargs, ) -> dict: # Handle both dict and file input if isinstance(test_set, str): with h5py.File(test_set, "r") as f: X = f["X"][:] else: X = test_set["X"] assert len(X.shape) == 3, ( f"Input X should have 3 dimensions " f"[n_samples, n_steps, n_features], " f"but got shape: {X.shape}" ) if isinstance(X, np.ndarray): imputed_data = locf_numpy(X, self.first_step_imputation) elif isinstance(X, torch.Tensor): imputed_data = locf_torch(X, self.first_step_imputation) result_dict = { "imputation": imputed_data, } return result_dict This is exactly what a non-NN wrapper should look like: clean, explicit, and contract-driven. Two Valid Non-NN Styles -------------------------- Stateless Models ^^^^^^^^^^^^^^^^^^ Examples: ``LOCF``, ``Mean``, ``Median``, ``Lerp`` These models do not learn parameters from data. ``fit()`` is an explicit no-op with a warning. .. code-block:: python class StatelessModel(BaseImputer): def fit(self, train_set, val_set=None, file_type="hdf5"): warnings.warn("This model has no parameters to train.") def predict(self, test_set, file_type="hdf5", **kwargs): X = test_set["X"] imputed_data = self._apply_algorithm(X) return {"imputation": imputed_data} Stateful Models ^^^^^^^^^^^^^^^^^ Examples: ``TRMF``, ``BTTF`` These models still do **not** use the NN training loop, but they do learn algorithm state in ``fit()``. .. code-block:: python class StatefulModel(BaseImputer): def fit(self, train_set, val_set=None, file_type="hdf5"): X = train_set["X"] # Learn parameters from training data self.learned_params = self._fit_algorithm(X) def predict(self, test_set, file_type="hdf5", **kwargs): X = test_set["X"] imputed_data = self._apply_algorithm(X, self.learned_params) return {"imputation": imputed_data} In both cases, the public contract is the same: ``predict()`` must return the task-level result key (e.g. ``"imputation"``). Step-by-Step Implementation Guide ==================================== Step 1: Choose the Base Class -------------------------------- Inherit the correct non-NN task base: .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Task - Non-NN Base - Result Key * - Imputation - ``BaseImputer`` - ``"imputation"`` * - Forecasting - ``BaseForecaster`` - ``"forecasting"`` * - Classification - ``BaseClassifier`` - ``"classification"`` * - Anomaly Detection - ``BaseDetector`` - ``"anomaly_detection"`` * - Clustering - ``BaseClusterer`` - ``"clustering"`` Step 2: Implement ``fit()`` ------------------------------ Be explicit, even if it only warns: .. code-block:: python def fit(self, train_set, val_set=None, file_type="hdf5"): """Train the model. For stateless models, this is a no-op.""" warnings.warn("This model has no parameters to train.") Step 3: Implement ``predict()`` ---------------------------------- Keep it simple and contract-driven: .. code-block:: python def predict(self, test_set, file_type="hdf5", **kwargs): # Handle both dict and file input if isinstance(test_set, str): with h5py.File(test_set, "r") as f: X = f["X"][:] else: X = test_set["X"] # Validate input shape assert len(X.shape) == 3, ( f"Input X should have 3 dimensions, got {X.shape}" ) # Apply your algorithm imputed_data = your_algorithm(X) return {"imputation": imputed_data} Step 4: Implement Helper Methods ----------------------------------- Make helper methods like ``impute()`` or ``forecast()`` return the raw array users expect: .. code-block:: python def impute(self, test_set, file_type="hdf5", **kwargs): result = self.predict(test_set, file_type, **kwargs) return result["imputation"] Step 5: Wire the Package --------------------------- Same as the standard NN path: .. code-block:: python # pypots/imputation/your_model/__init__.py from .model import YourModel __all__ = ["YourModel"] Common Mistake ----------------- Do **not** force a non-NN model into ``BaseNNModel`` just because most folders around it are neural models. That usually creates: - Fake hooks that do nothing - Fake optimizers that are never used - Confusing tests with unnecessary training loops - Review confusion for maintainers If there is no gradient, there should be no ``BaseNNModel``. Definition of Done --------------------- Your non-NN integration is done when: - The chosen base class matches the real algorithm - ``fit()`` behavior is explicit (even if it's a no-op) - ``predict()`` returns the correct task result key - Helper methods return the expected array - Targeted tests cover the advertised input modes (both dict and file input)