Developer Guide

Architecture Overview

PyMARS is organized into focused modules, each with a clear responsibility:

pymars/
├── __init__.py           # Package exports
├── mars.py               # Main MARS class (fit/predict)
├── basis.py              # Basis function classes
├── model.py              # Forward/Backward algorithm implementations
├── gcv.py                # GCV calculation & selection
├── utils.py              # Utility functions (knots, solvers)
├── cubic.py              # Cubic spline extension
├── interactions.py       # ANOVA decomposition
└── plots.py              # Visualization tools

Design Principles

  1. Faithfulness to Friedman (1991) - Every algorithm reproduced from paper - Mathematical formulas directly implemented - No algorithmic shortcuts

  2. Numerical Stability - QR/Cholesky/pseudoinverse fallback chain - Standardization default-enabled - Condition number awareness

  3. scikit-learn Compatibility - fit() / predict() / score() interface - Parameter validation - Transformer-compatible API

  4. Clarity Over Speed - Readable code valued - Comments for non-obvious logic - Educational quality

  5. Modularity - Each class has single responsibility - Composable components - Easy to extend

Core Classes

MARS Class

File: pymars/mars.py

Responsibility: Main user-facing class

class MARS:
    """Multivariate Adaptive Regression Splines."""

    def __init__(self, max_terms=30, max_degree=1, ...):
        """Initialize parameters."""

    def fit(self, X, y):
        """Forward + Backward pass."""

    def predict(self, X):
        """Evaluate basis functions."""

    def score(self, X, y):
        """R² score."""

    def summary(self):
        """Print detailed summary."""

Key Methods: - fit() - Run forward & backward passes - predict() - Evaluate model on new data - score() - Compute R² score - get_anova_decomposition() - ANOVA decomposition

Attributes (after fitting): - basis_functions_ - List of fitted bases - coefficients_ - Fitted coefficients - gcv_score_ - Final GCV score - feature_importances_ - Importance per feature

BasisFunction Class

File: pymars/basis.py

Responsibility: Represent a single basis function

class BasisFunction:
    """Product of hinge functions."""

    def __init__(self, hinges, basis_id):
        """Initialize with list of hinges."""
        self.hinges = hinges  # List of HingeFunction
        self.basis_id = basis_id  # Unique stable ID

    def evaluate(self, X):
        """Evaluate on data matrix X."""
        result = np.ones(len(X))
        for hinge in self.hinges:
            result *= hinge(X)
        return result

    def degree(self):
        """Number of hinges (interaction order)."""
        return len(self.hinges)

HingeFunction Class

File: pymars/basis.py

Responsibility: Single univariate hinge

class HingeFunction:
    """Univariate hinge: max(0, d*(x-t))"""

    def __init__(self, variable, knot, direction):
        self.variable = variable  # Which feature
        self.knot = knot          # Knot location
        self.direction = direction  # +1 or -1

    def __call__(self, X):
        """Evaluate on data."""
        x = X[:, self.variable]
        diff = self.direction * (x - self.knot)
        return np.maximum(diff, 0)

ForwardPass Class

File: pymars/model.py

Responsibility: Implement forward expansion

class ForwardPass:
    """Greedy basis expansion."""

    def run(self, X, y, max_terms, ...):
        """Execute forward pass."""

        # Initialize
        basis_functions = [self._constant_basis()]

        # Iterate
        for m in range(1, max_terms):
            # Find best split
            split = self._find_best_split(basis_functions, X, y)

            if split is None:
                break

            # Add pair
            basis_functions.extend(split.pair)

        return basis_functions

BackwardPass Class

File: pymars/model.py

Responsibility: Implement backward pruning

class BackwardPass:
    """GCV-based pruning."""

    def run(self, basis_functions, X, y, penalty):
        """Execute backward pass."""

        sequence = []  # Models at each step

        while len(basis_functions) > 1:
            # Try removing each basis
            best_removal = self._find_best_removal(
                basis_functions, X, y
            )

            # Remove
            basis_functions.remove(best_removal)

            # Compute GCV
            gcv = self.gcv_calc.calculate(...)
            sequence.append((basis_functions, gcv))

        return sequence

GCVCalculator Class

File: pymars/gcv.py

Responsibility: GCV computation

class GCVCalculator:
    """Compute GCV scores."""

    def __init__(self, penalty=3.0):
        self.penalty = penalty

    def calculate(self, B, y):
        """GCV = RSS / [N(1 - df/N)²]"""

        c = solve_least_squares(B, y)
        rss = np.sum((y - B @ c)**2)
        df = self.complexity(B)

        N = len(y)
        return rss / (N * (1 - df/N)**2)

    def complexity(self, B):
        """df = trace(B(B^T B)^-1 B^T) + d*M"""

        BtB_inv = np.linalg.pinv(B.T @ B)
        trace_term = np.trace(B @ BtB_inv @ B.T)

        return trace_term + self.penalty * B.shape[1]

Development Workflow

Setting Up Development Environment

# Clone repository
git clone https://github.com/abder111/pymars.git
cd pymars

# Create virtual environment
python -m venv dev_env
source dev_env/bin/activate  # Linux/macOS
# OR
dev_env\Scripts\activate  # Windows

# Install in editable mode with dev deps
pip install -e ".[dev,plot]"

Running Tests

# Run all tests
pytest tests/ -v

# Run specific test
pytest tests/test_mars.py::TestMARS::test_fit_predict -v

# With coverage
pytest tests/ --cov=pymars --cov-report=html

# Open coverage report
open htmlcov/index.html

Code Quality

# Format code
black pymars/ tests/

# Check style
flake8 pymars/ tests/ --max-line-length=88

# Type checking
mypy pymars/

Building Documentation

cd docs
pip install sphinx sphinx-rtd-theme sphinxcontrib-bibtex
make html

# View locally
open _build/html/index.html

Adding a New Feature

Example: Adding L1 Regularization

Step 1: Create new module

touch pymars/regularization.py

Step 2: Implement class

# pymars/regularization.py

class L1PenalizedMARSprinciples:
    """MARS with L1 penalty on coefficients."""

    def __init__(self, alpha=0.1):
        self.alpha = alpha

    def solve(self, B, y):
        """Solve with L1 + L2 (elastic net)."""
        from sklearn.linear_model import ElasticNet

        en = ElasticNet(alpha=self.alpha, l1_ratio=0.5)
        en.fit(B, y)
        return en.coef_

Step 3: Add tests

# tests/test_regularization.py

import pytest
from pymars.regularization import L1PenalizedMARS

def test_l1_mars():
    """Test L1-regularized MARS."""

    X, y = generate_test_data()
    model = L1PenalizedMARS(alpha=0.1)
    # ... test implementation

Step 4: Update docs

# docs/advanced_topics.rst

L1 Regularization
------------------

PyMARS supports optional L1 regularization...

Example:

.. code-block:: python

   model = L1PenalizedMARS(alpha=0.1)

Step 5: Update __init__.py

# pymars/__init__.py

from .mars import MARS
from .regularization import L1PenalizedMARS  # NEW

__all__ = ['MARS', 'L1PenalizedMARS']

Testing Guidelines

Test Structure

# tests/test_module.py

import pytest
import numpy as np
from pymars import MARS

@pytest.fixture
def sample_data():
    """Provide test data."""
    X = np.random.randn(100, 5)
    y = X[:, 0] + np.sin(X[:, 1])
    return X, y

class TestMARS:
    """Tests for MARS class."""

    def test_fit_predict(self, sample_data):
        """Test basic fit/predict."""
        X, y = sample_data

        model = MARS(max_terms=10)
        model.fit(X, y)
        y_pred = model.predict(X)

        assert y_pred.shape == y.shape
        assert not np.isnan(y_pred).any()

Test Categories

  1. Unit Tests – Single function/method

  2. Integration Tests – Multiple components together

  3. Regression Tests – Known results (Friedman dataset)

  4. Edge Case Tests – Boundary conditions

  5. Performance Tests – Speed benchmarks (optional)

Writing Good Tests

Good:

def test_minspan_increases_with_sample_size():
    """Minspan should increase as N grows."""

    L_small = calculate_minspan(N=10, alpha=0.05)
    L_large = calculate_minspan(N=1000, alpha=0.05)

    assert L_large > L_small

Bad:

def test_utils():
    """Test utils."""
    calculate_minspan(100, 0.05)  # No assertion!

Documentation Standards

Docstring Format

Use NumPy style docstrings:

def solve_least_squares(B, y):
    """Solve least squares system robustly.

    Uses QR decomposition with fallback to Cholesky
    and pseudoinverse for numerical stability.

    Parameters
    ----------
    B : ndarray
        Design matrix (n_samples, n_features).
    y : ndarray
        Target vector (n_samples,).

    Returns
    -------
    c : ndarray
        Coefficients (n_features,).

    Notes
    -----
    Uses three-tier fallback:
    1. QR (preferred, numerically stable)
    2. Cholesky (medium stability)
    3. Pseudoinverse (always works)

    Raises
    ------
    ValueError
        If dimensions don't match.

    Examples
    --------
    >>> B = np.array([[1, 0], [0, 1], [1, 1]])
    >>> y = np.array([1, 2, 3])
    >>> c = solve_least_squares(B, y)
    >>> np.allclose(B @ c, y, atol=1e-10)
    True
    """

Code Style

  • 4-space indentation

  • Max 88 characters per line (Black standard)

  • PEP 8 compliance (flake8)

  • Type hints where helpful

def forward_pass(
    X: np.ndarray,
    y: np.ndarray,
    max_terms: int = 30
) -> list:
    """Run forward pass.

    Parameters
    ----------
    X : ndarray
        Features (n, d).
    y : ndarray
        Target (n,).
    max_terms : int
        Maximum iterations.

    Returns
    -------
    basis_functions : list
        Fitted basis functions.
    """

Contributing

How to Contribute

  1. Fork the repository

  2. Create a feature branch: git checkout -b feature/my-feature

  3. Make changes and add tests

  4. Format code: black pymars/ tests/

  5. Test thoroughly: pytest tests/ -v

  6. Commit with clear message: git commit -m "Add feature X"

  7. Push to branch: git push origin feature/my-feature

  8. Submit pull request

Commit Message Guidelines

Short description (max 50 chars)

Longer explanation if needed. Describe:
- What was changed
- Why it was changed
- How it works

Fixes #123  (reference issues)

Performance Optimization

Profiling

import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

# ... code to profile ...

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

Common Bottlenecks

  1. Least squares solver - O(N²M) per iteration

  2. Knot search - Quadratic in candidate count

  3. Design matrix construction - For large M

Optimization Ideas

  • Cache basis function evaluations

  • Vectorize knot search (NumPy broadcasting)

  • Use sparse matrices for large M

  • Parallelize variable search

Release Checklist

Before releasing version X.Y.Z:

  • [ ] All tests pass locally

  • [ ] Code review completed

  • [ ] Documentation updated

  • [ ] Changelog updated

  • [ ] Version number bumped

  • [ ] Commit tagged: git tag v0.X.Y

  • [ ] Built distributions: python -m build

  • [ ] Verified distributions: twine check dist/*

  • [ ] Pushed tag: git push origin v0.X.Y

  • [ ] Released on GitHub

  • [ ] Uploaded to PyPI (when ready)

Troubleshooting Development

ImportError: No module ‘pymars’

pip install -e .

Test Failures

# Clear cache
pytest --cache-clear tests/

# Run with verbose output
pytest tests/ -vv

Documentation Build Fails

cd docs
rm -rf _build/
pip install -r requirements.txt
make html

Roadmap

Planned Improvements

  • v0.2: Categorical support, missing values

  • v0.3: Multi-output, classification, parallelization

  • v1.0: Stable API, performance optimizations

  • 1.x: Extended features, domain extensions

Contributing to Roadmap

See GitHub Issues and Discussions for community input on priorities.

Questions?

Thank you for contributing to PyMARS! 🎉