.. PyPOTS developer documentation - Codebase Map Codebase Map and Folder Intent ================================ Use this page to find the right place before reading code in depth. Put code where another contributor would expect to find it — that rule prevents many review problems. Repository Overview --------------------- .. code-block:: text PyPOTS/ ├── pypots/ # Main source package │ ├── base.py # Shared model abstractions (BaseModel, BaseNNModel) │ ├── version.py # Version information │ ├── imputation/ # Imputation task models │ ├── forecasting/ # Forecasting task models │ ├── classification/ # Classification task models │ ├── clustering/ # Clustering task models │ ├── anomaly_detection/ # Anomaly detection task models │ ├── representation/ # Representation learning models │ ├── data/ # Dataset and IO helpers │ │ ├── dataset/base.py # BaseDataset class │ │ ├── checking.py # Data validation functions │ │ ├── saving/ # Data saving utilities (e.g. HDF5) │ │ └── utils.py # Data transformation utilities │ ├── nn/ # Reusable neural network modules │ │ ├── functional/ # Utility functions (error, classification, etc.) │ │ └── modules/ # PyTorch model components and backbones │ ├── optim/ # Optimizer abstractions │ ├── cli/ # Command-line interface │ └── utils/ # Utility functions (logging, file ops, etc.) ├── tests/ # Model and task tests │ ├── global_test_config.py # Shared test configuration and data │ ├── imputation/ # Imputation model tests │ ├── forecasting/ # Forecasting model tests │ ├── classification/ # Classification model tests │ ├── clustering/ # Clustering model tests │ ├── anomaly_detection/ # Anomaly detection model tests │ └── representation/ # Representation learning tests ├── docs/ # Sphinx documentation source ├── requirements/ # Dependency specifications └── .github/workflows/ # CI behavior used in pull requests Core Directories Explained ---------------------------- ``pypots//`` — Task Wrappers and Model Folders ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each task package (e.g. ``pypots/imputation/``) contains: - ``base.py`` — Task-specific base classes (e.g. ``BaseImputer``, ``BaseNNImputer``) - ``template/`` — Scaffolding folder to help new contributors get started - One subfolder per model (e.g. ``saits/``, ``brits/``, ``locf/``) Each model subfolder typically has: .. code-block:: text pypots/imputation/saits/ ├── __init__.py # Exports the public model class ├── model.py # User-facing wrapper API ├── core.py # Forward computation and result dict └── data.py # Custom dataset (only if needed) ``pypots/base.py`` — Framework Contracts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This file defines the shared model abstractions: - ``BaseModel`` — device setup, AMP switch, checkpoint IO, abstract ``fit()``/``predict()`` - ``BaseNNModel`` — training loop state, early stopping, best checkpoint tracking, TensorBoard logging All models ultimately inherit from one of these. ``pypots/data/`` — Dataset and IO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - ``dataset/base.py`` — ``BaseDataset`` class that all datasets inherit from - ``checking.py`` — Functions to validate data keys and structure - ``saving/`` — Utilities for saving data to HDF5 files - ``utils.py`` — Data transformation utilities ``pypots/nn/`` — Reusable Neural Modules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Contains 60+ reusable PyTorch modules: - ``modules/base_model_core.py`` — ``ModelCore`` base class for all NN model cores - ``modules/loss.py`` — Loss functions: ``Criterion``, ``MAE``, ``MSE``, ``RMSE``, ``MRE``, ``CrossEntropy``, ``NLL`` - ``modules/metric.py`` — Metric evaluation - ``modules//`` — Model-specific backbone implementations (e.g. ``saits/``, ``transformer/``) - ``functional/`` — Utility functions (``calc_mse``, ``calc_mae``, ``gather_listed_dicts``, etc.) If you implement reusable NN components, put them here instead of inside a model folder. ``pypots/optim/`` — Optimizer Abstractions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Provides ``Optimizer`` base class and concrete implementations (e.g. ``Adam``, ``SGD``). All PyPOTS optimizers wrap PyTorch optimizers with a consistent interface. Template Directories ---------------------- If you are adding a new model, check the task template first: .. code-block:: text pypots/imputation/template/ pypots/forecasting/template/ pypots/classification/template/ pypots/clustering/template/ Treat templates as **scaffolding**. The task base class still defines the real contract. Always verify result keys and helper method behavior against the base class. Three Example Folders Worth Reading First ------------------------------------------- .. list-table:: :header-rows: 1 :widths: 35 25 40 * - Folder - Path Type - Key Lesson * - ``pypots/imputation/saits/`` - Standard NN - One optimizer, default training loop, custom dataset * - ``pypots/imputation/usgan/`` - Complex NN - Dual optimizer GAN, custom ``_train_model()`` * - ``pypots/imputation/locf/`` - Non-NN - No training, inherits ``BaseImputer`` directly Reading those three folders gives you a fast mental map of the main extension styles in PyPOTS. Dependency Direction ---------------------- Task packages depend on shared infrastructure. Shared infrastructure should **not** depend on task packages. In practice: - Task-specific orchestration stays in task folders - Reusable blocks move to ``pypots/nn/`` - Cross-task utilities should not be hidden inside one model folder Module Boundary Rules ----------------------- .. list-table:: :header-rows: 1 :widths: 20 80 * - File - What Belongs Here * - ``model.py`` - Public wrapper API, dataset/dataloader setup, optimizer creation, training orchestration, stage-specific input assembly * - ``core.py`` - Forward computation, result dict creation, loss and metric outputs. Should **not** become a hidden wrapper. * - ``data.py`` - Custom dataset class. **Only** add when ``BaseDataset`` cannot express your model's sample contract. Quick Self-Check Before Commit ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Can someone infer intent from the file location alone? - Did reusable code go to shared modules instead of one model folder? - Does the wrapper own orchestration and the core own math? - Is a custom dataset really necessary?