Getting Started

Welcome to the PyPOTS developer documentation! This guide helps contributors understand the codebase and integrate new models, algorithms, and features.

If you are new to PyPOTS, do not start from a random model folder. Start from understanding the contracts — the base classes and their responsibilities.

Setting Up the Development Environment

git clone https://github.com/WenjieDu/PyPOTS.git
cd PyPOTS
pip install -e ".[dev]"

Or with conda:

conda create -n pypots python=3.10
conda activate pypots
git clone https://github.com/WenjieDu/PyPOTS.git
cd PyPOTS
pip install -e ".[dev]"

Key Concepts

Before diving into the code, understand these three concepts that define how PyPOTS works.

Three-Layer Model Architecture

Every model in PyPOTS follows a three-layer architecture:

File

Layer

Responsibility

model.py

Wrapper

User-facing API, dataloaders, optimizers, training orchestration, input assembly

core.py

Core

Forward computation, result dict creation, loss and metric outputs

data.py

Dataset

Custom dataset class (only when BaseDataset is not enough)

Three Integration Paths

Before writing any code, decide which integration path your model belongs to. This is the most important decision — changing paths late usually means you started from the wrong contract.

Path

When to Use

Reference Model

Standard NN

One optimizer, default training loop. Most models fall here.

SAITS (pypots/imputation/saits/)

Complex NN

Multiple optimizers, alternating updates, or pretraining stages.

USGAN (pypots/imputation/usgan/)

Non-NN

Rule-based, statistical, or algorithmic. No gradients.

LOCF (pypots/imputation/locf/)

Six Supported Tasks

PyPOTS organizes models by task. Each task has its own base class and result contract:

Task

NN Base

Non-NN Base

Result Key

Imputation

BaseNNImputer

BaseImputer

"imputation"

Forecasting

BaseNNForecaster

BaseForecaster

"forecasting"

Classification

BaseNNClassifier

BaseClassifier

"classification"

Anomaly Detection

BaseNNDetector

BaseDetector

"anomaly_detection"

Clustering

BaseNNClusterer

BaseClusterer

"clustering"

Representation

BaseNNRepresentor

BaseRepresentor

"representation"

How to Read a Reference Model

When reading an example model implementation, follow this order:

  1. Task base class — understand the contract (result keys, helper methods)

  2. model.py — the public wrapper API, dataloaders, optimizers, training orchestration

  3. core.py — forward computation and result dict contract

  4. data.py — only if it exists; the custom dataset class

  5. The matching test file — under tests/<task>/

End-to-End Development Journey

The shortest safe path from idea to merged PR.

Step 1: Define the Contract

Before touching implementation code, decide:

  • The task: imputation, forecasting, classification, anomaly_detection, clustering, or representation

  • The correct base class: e.g. BaseNNImputer for an NN imputation model

  • The public result key: e.g. "imputation" for imputation models

  • The integration path: standard NN, complex NN, or non-NN

Step 2: Start From a Scaffold

Use the task template as a starting folder:

pypots/imputation/template/
pypots/forecasting/template/
pypots/classification/template/
pypots/clustering/template/

Then compare it with the matching reference model (SAITS, USGAN, or LOCF). The template gives structure; the reference model gives the actual contract.

Step 3: Implement

Follow the detailed guide for your chosen path:

Step 4: Wire the Package

  • Export the model in the task package __init__.py

  • Add the matching test file under tests/<task>/

Step 5: Validate Locally

# Generate test data
python tests/global_test_config.py

# Run your model's targeted test
pytest -rA tests/imputation/your_model.py -n 1

# Lint
flake8 .

Step 6: Submit with Evidence

Your PR should state:

  • The chosen integration path and reason

  • Exact local commands you ran and their results

  • Known limitations, if any

See Testing Checklist and CI Guide for the full testing checklist and CI guide.