===============================
MARS Algorithms
===============================

.. contents::
   :local:
   :backlinks: none

Overview
========

This section details the three core algorithms in MARS (Friedman 1991):

1. **Algorithm 1** – Recursive partitioning basis generation
2. **Algorithm 2** – Forward pass (greedy basis expansion)
3. **Algorithm 3** – Backward pass (GCV-based pruning)

These are reproduced directly from the paper with implementation notes.

Algorithm 1: Recursive Partitioning
===================================

**Purpose:** Generate candidate basis functions through recursive partitioning.

**Conceptual Algorithm:**

.. code-block:: text

   RECURSIVEPARTITION(data, depth=0):
       if (depth >= max_depth) or (n_samples < min_samples):
           return {constant_function}
       
       best_split = None
       best_gain = 0
       
       for each variable j:
           for each candidate knot t in valid_knots(x_j):
               split_left = data where x_j <= t
               split_right = data where x_j > t
               gain = reduction_in_error(left, right)
               
               if gain > best_gain:
                   best_gain = gain
                   best_split = (j, t)
       
       if best_split is None:
           return {constant_function}
       
       left_basis = RECURSIVEPARTITION(split_left, depth+1)
       right_basis = RECURSIVEPARTITION(split_right, depth+1)
       
       return left_basis ∪ right_basis

**Notes:**

- Forms a binary tree of partitions
- Each partition becomes a basis function candidate
- MARS uses this conceptually but computes it more efficiently

**Connection to Implementation:**

In `pymars/basis.py`, the `HingeFunction` class represents these partitions as products of univariate hinges, avoiding explicit tree structures.

Algorithm 2: Forward Pass
==========================

**Purpose:** Greedily expand basis functions to fit the residuals.

**Formal Statement** (Friedman 1991, Algorithm 2):

.. code-block:: text

   FORWARD(X, y, M_max, d_max, minspan, endspan, α):
       /* Input: N×d data matrix X, response vector y
                 M_max = max basis functions
                 d_max = max interaction degree
                 minspan, endspan = spacing constraints
                 α = significance level
       */
       
       /* Initialize */
       B = {B_1 = 1}                    /* Constant basis */
       coef = solve_least_squares(B, y) /* Fit constant */
       RSS = sum((y - coef)^2)
       RSS_previous = ∞
       
       /* Main loop: add basis function pairs */
       m = 1
       while m < M_max:
           RSS_best = ∞
           pair_best = None
           
           /* For each existing basis function */
           for m' in 1...|B|:
               B_m' = B_{m'}
               
               /* For each variable */
               for j in 1..d:
                   /* Find valid knot locations */
                   knots = valid_knots(X[:, j], minspan, endspan)
                   
                   /* For each knot */
                   for t in knots:
                       /* Check degree constraint */
                       if degree(B_m') < d_max:
                           /* Try both directions */
                           for dir in {+, -}:
                               /* Create hinge pair */
                               h_plus = B_m' * h(X[:, j], t, +)
                               h_minus = B_m' * h(X[:, j], t, -)
                               
                               /* Temporary basis with new pair */
                               B_temp = B ∪ {h_plus, h_minus}
                               
                               /* Fit least squares */
                               coef_temp = solve_least_squares(B_temp, y)
                               RSS_temp = sum((y - coef_temp)^2)
                               
                               /* Track best */
                               if RSS_temp < RSS_best:
                                   RSS_best = RSS_temp
                                   pair_best = (h_plus, h_minus, j, t, dir)
           
           /* Add best pair */
           if pair_best is not None:
               B = B ∪ {pair_best.h_plus, pair_best.h_minus}
               coef = solve_least_squares(B, y)
               RSS = RSS_best
               m = m + 1
           
           /* Check improvement */
           if RSS >= RSS_previous:
               break                    /* No improvement */
           RSS_previous = RSS
       
       return B, coef

**Key Implementation Details:**

.. list-table::
   :header-rows: 1

   * - Component
     - Description
     - Implementation Location
   * - Valid knots
     - Respect minspan/endspan
     - `pymars/utils.py`
     - `get_candidate_knots()`
   * - Hinge function
     - :math:`h(x_j, t, d) = \max(0, d(x_j - t))`
     - `pymars/basis.py`
     - `HingeFunction.__call__()`
   * - Least squares
     - :math:`\hat{c} = (B^T B)^{-1} B^T y`
     - `pymars/utils.py`
     - `solve_least_squares()`
   * - RSS reduction
     - :math:`\Delta RSS = RSS_{\text{old}} - RSS_{\text{new}}`
     - `pymars/model.py`
     - `ForwardPass._find_best_split()`

**Complexity:**

- Per iteration: :math:`O(M \cdot d \cdot N \cdot |\text{knots}|)`
- Total (M iterations): :math:`O(M^2 \cdot d \cdot N \cdot |\text{knots}|)`
- Typically 5–30 iterations, so :math:`O(100 \text{ to } 1000) \times N \times d`

Algorithm 3: Backward Pruning
=============================

**Purpose:** Remove redundant basis functions using GCV criterion.

**Formal Statement** (Friedman 1991, Algorithm 3):

.. code-block:: text

   BACKWARD(B, X, y, d_penalty):
       /* Input: Basis functions B from forward pass
                 Data X, y
                 GCV penalty d_penalty (typically 2–3)
       */
       
       /* Initialize */
       gcv_best = ∞
       sequence = []
       
       /* Pruning loop */
       while |B| > 1:
           /* Try removing each basis */
           gcv_min = ∞
           b_remove = None
           
           for each b in B:
               /* Remove basis b */
               B_temp = B \ {b}
               
               /* Fit model */
               coef = solve_least_squares(B_temp, y)
               
               /* Compute GCV */
               RSS = sum((y - coef)^2)
               df = trace(B_temp * (B_temp^T * B_temp)^{-1} * B_temp^T) 
                    + d_penalty * |B_temp|
               
               gcv = RSS / (N * (1 - df/N)^2)
               
               /* Track minimum */
               if gcv < gcv_min:
                   gcv_min = gcv
                   b_remove = b
           
           /* Remove best candidate */
           if b_remove is not None:
               B = B \ {b_remove}
               
               /* Store state */
               sequence.append((B, gcv_min))
               
               /* Check if improved */
               if gcv_min >= gcv_best:
                   break                /* No improvement; restore previous */
               gcv_best = gcv_min
       
       return sequence

**Select Model:** Choose model with minimum GCV from sequence.

**Key Implementation Details:**

.. list-table::
   :header-rows: 1

   * - Component
     - Description
     - Location
   * - GCV score
     - :math:`\text{GCV} = \frac{\text{RSS}}{N(1 - \text{df}/N)^2}`
     - `pymars/gcv.py`
     - `GCVCalculator.calculate()`
   * - Degrees of freedom
     - :math:`\text{df} = \text{trace}(B(B^T B)^{-1}B^T) + dM`
     - `pymars/gcv.py`
     - `GCVCalculator.complexity()`
   * - Least squares
     - Solve for each candidate removal
     - `pymars/utils.py`
     - `solve_least_squares()`

**Complexity:**

- Per removal: :math:`O(M \cdot N^2 + M^3)` (least squares)
- Total (M removals): :math:`O(M^2 \cdot N^2 + M^4)`
- Typically fast (M ≤ 50)

Pseudocode: Complete MARS Algorithm
====================================

Combining the three algorithms:

.. code-block:: text

   MARS(X, y, M_max, d_max, penalty, minspan, endspan, α):
       
       /* Forward pass */
       B_forward, coef_forward = FORWARD(X, y, M_max, d_max, 
                                          minspan, endspan, α)
       
       /* Backward pass */
       B_sequence = BACKWARD(B_forward, X, y, penalty)
       
       /* Select best model */
       gcv_min = ∞
       B_best = None
       
       for (B_m, gcv_m) in B_sequence:
           if gcv_m < gcv_min:
               gcv_min = gcv_m
               B_best = B_m
       
       /* Fit final model */
       coef_best = solve_least_squares(B_best, y)
       
       return B_best, coef_best, gcv_min

Knot Selection Details
======================

Valid Knot Computation
----------------------

For variable :math:`j`, valid knots respect:

**Minspan constraint:**

.. math::

   \#\{i : x_{ij} < t\} \geq L \quad \text{and} \quad \#\{i : x_{ij} > t\} \geq L

where :math:`L = \lfloor -\log_2(\alpha/N) / 2.5 \rfloor`

**Endspan constraint:**

.. math::

   \min_i \{x_{ij}\} + L_e \leq t \leq \max_i \{x_{ij}\} - L_e

where :math:`L_e = \lceil 3 - \log_2(\alpha/N) \rceil`

**Implementation** (`pymars/utils.py`):

.. code-block:: python

   def get_candidate_knots(X_j, minspan, endspan):
       """
       Returns sorted array of valid knot locations.
       """
       x_min, x_max = X_j.min(), X_j.max()
       
       # Unique values in middle section
       x_sorted = np.sort(X_j)
       n = len(x_sorted)
       
       # Skip minspan obs from each end
       start_idx = minspan
       end_idx = n - minspan - 1
       
       # Apply endspan constraint
       if endspan is not None:
           x_min_valid = x_sorted[endspan - 1]
           x_max_valid = x_sorted[n - endspan]
       else:
           x_min_valid = x_min
           x_max_valid = x_max
       
       # Valid knot candidates
       knots = []
       for i in range(start_idx, end_idx + 1):
           t = (x_sorted[i] + x_sorted[i+1]) / 2
           if x_min_valid <= t <= x_max_valid:
               knots.append(t)
       
       return np.array(knots)

Least Squares Solver
====================

The core operation repeated many times:

.. code-block:: text

   Given: Design matrix B (N × M), response y (N,)
   Solve: min_c ||y - B @ c||_2^2
   Return: c = (B^T B)^{-1} B^T y
   
   Stability chain:
   1. Try: QR decomposition (most stable)
   2. Else: Cholesky on B^T B
   3. Else: Pseudoinverse (most robust)

Implementation (`pymars/utils.py`):

.. code-block:: python

   def solve_least_squares(B, y):
       """Stable least squares solver with fallback chain."""
       try:
           # First try: QR decomposition
           Q, R = np.linalg.qr(B)
           c = np.linalg.solve(R, Q.T @ y)
       except:
           try:
               # Second try: Cholesky
               L = np.linalg.cholesky(B.T @ B)
               c = np.linalg.solve(L.T, 
                   np.linalg.solve(L, B.T @ y))
           except:
               # Last resort: Pseudoinverse
               c = np.linalg.pinv(B) @ y
       return c

GCV Calculation
===============

**Step 1: Fit model**

.. code-block:: python

   coef = solve_least_squares(B, y)
   y_pred = B @ coef

**Step 2: Compute RSS**

.. code-block:: python

   residuals = y - y_pred
   RSS = np.sum(residuals**2)

**Step 3: Compute degrees of freedom**

.. code-block:: python

   # Effective degrees of freedom
   # df = trace(B @ inv(B^T @ B) @ B^T) + d * M
   
   def compute_df(B, d_penalty):
       BtB_inv = np.linalg.pinv(B.T @ B)
       trace_term = np.trace(B @ BtB_inv @ B.T)
       df = trace_term + d_penalty * B.shape[1]
       return df

**Step 4: Compute GCV**

.. code-block:: python

   df = compute_df(B, d_penalty)
   N = B.shape[0]
   gcv = RSS / (N * (1 - df/N)**2)

Implementation Location
=======================

.. list-table::
   :header-rows: 1

   * - Algorithm
     - File
     - Class/Function
   * - Forward Pass
     - `pymars/model.py`
     - `ForwardPass`
   * - Backward Pass
     - `pymars/model.py`
     - `BackwardPass`
     - Combined in `MARS.fit()`
   * - Knot Selection
     - `pymars/utils.py`
     - `get_candidate_knots()`
   * - Minspan/Endspan
     - `pymars/utils.py`
     - `calculate_minspan()`
     - `calculate_endspan()`
   * - Least Squares
     - `pymars/utils.py`
     - `solve_least_squares()`
   * - GCV
     - `pymars/gcv.py`
     - `GCVCalculator`
   * - Basis Functions
     - `pymars/basis.py`
     - `HingeFunction`
     - `BasisFunction`

Numerical Example
=================

Small Example: Friedman 1-D
---------------------------

Data: :math:`y = \sin(\pi x_1) + 0.5 x_2^2`, N=50

**Forward Pass:**

Iteration 1:
   - Try: h(x_1, +, 0.2), h(x_1, -, 0.2)
   - RSS drops from 10.5 to 5.2
   - RSS reduction = 5.3

   Iteration 2:
   - Try: h(x_2, +, 0.0), h(x_2, -, 0.0) [on residual]
   - RSS drops from 5.2 to 2.8
   - RSS reduction = 2.4

   Iteration 3: Continue until RSS plateaus

**Backward Pass:**

   Remove h(x_1, 0.8, -)?
       GCV_without = 1.2
   
   Remove h(x_2, 0.1, +)?
       GCV_without = 3.5 (worse)
   
   Keep all; GCV = 0.8 is minimum

**Result:**

   :math:`\hat{f}(x) = -0.2 + 1.1 h(x_1, +, 0.2) + 0.8 h(x_2, +, 0.1)`

References
==========

See :doc:`references` for full citations.

Primary: Friedman (1991) sections 3.4–3.9.