.. PyPOTS developer documentation - Testing and CI

Testing Checklist and CI Guide
================================

Run this checklist before opening a model-related PR.


Environment Setup
-------------------

.. code-block:: bash

   # Clone the repository
   git clone https://github.com/WenjieDu/PyPOTS.git
   cd PyPOTS

   # Install in development mode
   pip install -e ".[dev]"

   # Generate test data (required before running any test)
   python tests/global_test_config.py


Understanding the Test Infrastructure
-----------------------------------------

Test Configuration: ``global_test_config.py``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The file ``tests/global_test_config.py`` sets up shared test data and configuration:

.. code-block:: python

   # Key constants used across all tests:
   RANDOM_SEED = 2023
   EPOCHS = 2                  # Very few epochs for fast testing
   N_STEPS = 6
   N_PRED_STEPS = 2
   N_FEATURES = 5
   N_CLASSES = 2
   N_SAMPLES_PER_CLASS = 100
   MISSING_RATE = 0.1

   # Pre-generated data splits:
   TRAIN_SET = {"X": ..., "y": ...}
   VAL_SET = {"X": ..., "X_ori": ..., "y": ...}
   TEST_SET = {"X": ..., "X_ori": ..., "y": ...}

   # HDF5 file paths for lazy-loading tests:
   GENERAL_H5_TRAIN_SET_PATH = "..."
   GENERAL_H5_VAL_SET_PATH = "..."
   GENERAL_H5_TEST_SET_PATH = "..."

   # For forecasting tasks:
   FORECASTING_TRAIN_SET = {"X": ..., "X_pred": ...}
   FORECASTING_VAL_SET = {"X": ..., "X_pred": ...}
   FORECASTING_TEST_SET = {"X": ..., "X_pred": ...}

   # Device selection (auto-detects CUDA):
   DEVICE = None  # or cuda device if available

The test data is generated using ``benchpots.datasets.preprocess_random_walk()``
which creates a synthetic random walk dataset with configurable missingness.


Test File Structure
^^^^^^^^^^^^^^^^^^^^^

Each model has a dedicated test file under ``tests/<task>/``:

.. code-block:: text

   tests/
   ├── global_test_config.py       # Shared configuration
   ├── imputation/
   │   ├── saits.py                # SAITS test cases
   │   ├── brits.py                # BRITS test cases
   │   ├── locf.py                 # LOCF test cases
   │   ├── usgan.py                # USGAN test cases
   │   └── ...                     # One file per model
   ├── classification/
   ├── forecasting/
   ├── clustering/
   ├── anomaly_detection/
   └── representation/


Writing Tests for Your Model
-------------------------------

Use the SAITS test as a reference. Here is a complete test template:

.. code-block:: python

   # tests/imputation/your_model.py

   import os
   import unittest

   import numpy as np
   import pytest

   from pypots.imputation import YourModel
   from pypots.nn.functional import calc_mse
   from pypots.optim import Adam
   from pypots.utils.logging import logger
   from tests.global_test_config import (
       DATA,
       EPOCHS,
       DEVICE,
       TRAIN_SET,
       VAL_SET,
       TEST_SET,
       GENERAL_H5_TRAIN_SET_PATH,
       GENERAL_H5_VAL_SET_PATH,
       GENERAL_H5_TEST_SET_PATH,
       RESULT_SAVING_DIR_FOR_IMPUTATION,
       check_tb_and_model_checkpoints_existence,
   )


   class TestYourModel(unittest.TestCase):
       logger.info("Running tests for YourModel...")

       # Set paths
       saving_path = os.path.join(
           RESULT_SAVING_DIR_FOR_IMPUTATION, "YourModel"
       )
       model_save_name = "saved_your_model.pypots"

       # Initialize optimizer
       optimizer = Adam(lr=0.001, weight_decay=1e-5)

       # Initialize model with small hyperparameters for fast testing
       model = YourModel(
           DATA["n_steps"],
           DATA["n_features"],
           d_model=32,
           epochs=EPOCHS,
           saving_path=saving_path,
           optimizer=optimizer,
           device=DEVICE,
       )

       @pytest.mark.xdist_group(name="imputation-your_model")
       def test_0_fit(self):
           """Test that the model trains successfully."""
           self.model.fit(TRAIN_SET, VAL_SET)

       @pytest.mark.xdist_group(name="imputation-your_model")
       def test_1_impute(self):
           """Test that predict() returns valid imputation results."""
           results = self.model.predict(TEST_SET)
           assert not np.isnan(results["imputation"]).any(), (
               "Output still has missing values after imputation."
           )

           test_MSE = calc_mse(
               results["imputation"],
               DATA["test_X_ori"],
               DATA["test_X_indicating_mask"],
           )
           logger.info(f"YourModel test_MSE: {test_MSE}")

       @pytest.mark.xdist_group(name="imputation-your_model")
       def test_2_parameters(self):
           """Test that model parameters are properly initialized."""
           assert hasattr(self.model, "model") and self.model.model is not None
           assert hasattr(self.model, "optimizer") and self.model.optimizer is not None
           assert hasattr(self.model, "best_loss")
           self.assertNotEqual(self.model.best_loss, float("inf"))
           assert hasattr(self.model, "best_model_dict")
           assert self.model.best_model_dict is not None

       @pytest.mark.xdist_group(name="imputation-your_model")
       def test_3_saving_path(self):
           """Test model save and load functionality."""
           # Check tensorboard and checkpoint files
           assert os.path.exists(self.saving_path)
           check_tb_and_model_checkpoints_existence(self.model)

           # Test save/load round trip
           saved_model_path = os.path.join(
               self.saving_path, self.model_save_name
           )
           self.model.save(saved_model_path)
           self.model.load(saved_model_path)

       @pytest.mark.xdist_group(name="imputation-your_model")
       def test_4_lazy_loading(self):
           """Test with HDF5 file-backed input (lazy loading)."""
           self.model.fit(
               GENERAL_H5_TRAIN_SET_PATH,
               GENERAL_H5_VAL_SET_PATH
           )
           results = self.model.predict(GENERAL_H5_TEST_SET_PATH)
           assert not np.isnan(results["imputation"]).any(), (
               "Output still has missing values with lazy loading."
           )

           test_MSE = calc_mse(
               results["imputation"],
               DATA["test_X_ori"],
               DATA["test_X_indicating_mask"],
           )
           logger.info(f"Lazy-loading YourModel test_MSE: {test_MSE}")


   if __name__ == "__main__":
       unittest.main()

Key points about the test structure:

- **Test numbering**: Tests are numbered ``test_0_``, ``test_1_``, etc. to ensure execution order
- **xdist_group marker**: Required for parallel test execution with ``pytest-xdist``
- **Lazy loading test**: Tests HDF5 file input in addition to dict input
- **Save/load test**: Verifies the full checkpoint round trip
- **MSE calculation**: Uses ``calc_mse`` with the indicating mask for proper evaluation


Minimum Required Checks
--------------------------


1. Run the Targeted Model Test
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   # Your specific model
   pytest -rA tests/imputation/your_model.py -n 1

   # Example reference models
   pytest -rA tests/imputation/saits.py -n 1
   pytest -rA tests/imputation/usgan.py -n 1
   pytest -rA tests/imputation/locf.py -n 1


2. Verify the Real Contract
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

At minimum, confirm all of these:

- ``fit()`` completes if the model has a training phase
- ``predict()`` returns the correct task result key
- Helper methods (e.g. ``impute()``, ``forecast()``) return the expected array shape
- Task-specific assumptions are tested


3. Verify Save/Load When State Exists
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the model is stateful, verify the full round trip:

.. code-block:: python

   # 1. Train the model
   model.fit(TRAIN_SET, VAL_SET)

   # 2. Save
   model.save("checkpoint.pypots")

   # 3. Load
   model.load("checkpoint.pypots")

   # 4. Predict again — should still work
   results = model.predict(TEST_SET)
   assert not np.isnan(results["imputation"]).any()


4. Verify Every Claimed Input Mode
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the model claims to support file-path input, **test file-path input**.
Do not stop after dict input passes.


When to Run Broader Regression
---------------------------------

Run broader regression when you change shared modules:

- ``pypots/base.py``
- ``pypots/data/``
- ``pypots/nn/``
- ``pypots/optim/``

.. code-block:: bash

   pytest -rA -s tests/*/* -n 1 --cov=pypots --dist=loadgroup --cov-config=.coveragerc


CI and Lint
=============

This section maps real PyPOTS CI behavior to local commands.


What CI Checks
-----------------

The CI workflows currently perform these core checks:

1. ``flake8 .`` — code style linting
2. Package build — ``python -m build``
3. Full pytest with coverage — parallel execution with ``--dist=loadgroup``


Local Commands That Match CI
-------------------------------

Lint
^^^^^^

.. code-block:: bash

   flake8 .


Test Environment Setup
^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   python tests/global_test_config.py


Targeted Model Test
^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   pytest -rA tests/imputation/your_model.py -n 1


Full Regression
^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   pytest -rA -s tests/*/* -n 1 --cov=pypots --dist=loadgroup --cov-config=.coveragerc


Package Build
^^^^^^^^^^^^^^^^

.. code-block:: bash

   python -m build

Run this when packaging or install behavior may be affected.


Fast Triage Rules
--------------------

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Problem
     - Action
   * - Lint failure only
     - Start with ``flake8 .``
   * - One-model failure
     - Run that model's test file directly
   * - Shared-module change
     - Run the full regression command
   * - Packaging suspicion
     - Run ``python -m build``


Review-Ready Evidence
-----------------------

A PR is **not** review-ready unless it includes:

- The **exact commands** you ran
- Whether the run was **targeted or broad**
- The **result** of those commands
- Any **remaining gap** you did not cover

Example PR evidence:

.. code-block:: text

   ## Testing Evidence

   ### Environment
   - Python 3.10, PyTorch 2.1, CUDA 12.1

   ### Commands Run
   ```
   python tests/global_test_config.py
   pytest -rA tests/imputation/my_model.py -n 1
   flake8 .
   ```

   ### Results
   - All 5 tests passed
   - No lint errors
   - Scope: targeted (only my_model changed)