================
Advanced Topics
================

.. contents::
   :local:
   :backlinks: none

Numerical Solver Details
========================

Least Squares Stability Chain
-----------------------------

PyMARS uses a robust three-tier fallback strategy for solving least squares:

.. code-block:: python

   def solve_least_squares(B, y):
       """
       Solve: min_c ||y - B @ c||_2^2
       
       Uses fallback chain:
       1. QR decomposition (preferred, numerically stable)
       2. Cholesky on B^T B
       3. Pseudoinverse (most robust)
       """
       try:
           # Tier 1: QR decomposition
           Q, R = np.linalg.qr(B, mode='reduced')
           c = np.linalg.solve(R, Q.T @ y)
           
       except np.linalg.LinAlgError:
           try:
               # Tier 2: Cholesky
               L = np.linalg.cholesky(B.T @ B)
               c = np.linalg.solve(
                   L.T,
                   np.linalg.solve(L, B.T @ y)
               )
           except np.linalg.LinAlgError:
               # Tier 3: Pseudoinverse (always works)
               c = np.linalg.pinv(B) @ y
       
       return c

**Why This Matters:**

- QR: Fast, backward stable, preferred
- Cholesky: Medium speed, less stable
- Pseudoinverse: Slowest but always works

Condition Number Management
----------------------------

For ill-conditioned problems:

.. code-block:: python

   import numpy as np
   
   # Check condition number
   BtB = B.T @ B
   cond = np.linalg.cond(BtB)
   
   if cond > 1e10:
       print("WARNING: Ill-conditioned problem")
       print(f"Condition number: {cond:.2e}")
       
       # Solutions:
       # 1. Use pseudoinverse
       c = np.linalg.pinv(B) @ y
       
       # 2. Standardize features (default in MARS)
       # 3. Remove collinear features
       # 4. Increase regularization (not in MARS)

Computational Complexity
=========================

Algorithm Complexity Analysis
-----------------------------

**Forward Pass:**

.. math::

   O(M_{\max} \cdot d \cdot N \cdot |\text{knots}| \cdot N^2)

Breakdown:
- :math:`M_{\max}` iterations
- :math:`d` variables per iteration
- :math:`N` knot candidates per variable
- :math:`N^2` for least squares (per candidate)

**For typical MARS:**

- Forward: :math:`O(30 \times 10 \times 200 \times 30 \times 40000) \approx 7.2 \times 10^{11}` ops
- But: Knot search is pruned, least squares is fast with QR

**Backward Pass:**

.. math::

   O(M \cdot M \cdot N^2)

Faster than forward, dominated by least squares refitting.

**Total:** O(1-2 minutes) for typical dataset (N=1000, d=20)

Memory Usage
------------

Peak memory scales with:

.. math::

   \text{Memory} \approx 8 \times (N \times M + d \times M + \text{workspace})

For N=1000, M=50, d=20:
- Design matrix B: 1000 × 50 = 50k elements = 400 KB
- Coefficients: 50 elements = 400 B
- QR workspace: ~20 MB
- **Total: ~20-30 MB**

Optimization Strategies
=======================

For Large Datasets
-------------------

If N > 10,000:

.. code-block:: python

   model = MARS(
       max_terms=20,          # Reduce complexity
       minspan=50,            # Fewer knots to test
       endspan=10,            # Fewer endpoint candidates
       verbose=True           # Monitor progress
   )

For High-Dimensional Data
---------------------------

If d > 50:

.. code-block:: python

   # 1. Dimension reduction first
   from sklearn.decomposition import PCA
   pca = PCA(n_components=20)
   X_reduced = pca.fit_transform(X)
   
   # 2. Or use sparse MARS
   model = MARS(
       max_terms=30,
       max_degree=1,          # No interactions for speed
       penalty=2.0            # More pruning
   )

For Real-Time Prediction
------------------------

.. code-block:: python

   # Prediction is fast: O(M) per sample
   # M = basis functions (typically < 50)
   
   # Example: 100,000 predictions in < 1 second
   import time
   
   start = time.time()
   y_pred = model.predict(X_test)  # X_test.shape = (100000, 20)
   elapsed = time.time() - start
   
   rate = len(X_test) / elapsed
   print(f"Prediction rate: {rate:.0f} samples/sec")
   # Output: ~200,000-500,000 samples/sec

Caching Strategies
==================

If Refitting Models Repeatedly
-------------------------------

Cache basis function evaluations:

.. code-block:: python

   # Evaluate all bases once
   B_cache = {}
   for bf in model.basis_functions_:
       B_cache[bf.id] = bf.evaluate(X)
   
   # Now comparisons are O(M) instead of O(M*N)
   for penalty in [2.0, 2.5, 3.0]:
       # Use cached B instead of recomputing
       gcv = compute_gcv(B_cache, y, penalty)

Parallel Processing
===================

Knot Search Parallelization
----------------------------

Forward pass can be parallelized:

.. code-block:: python

   from joblib import Parallel, delayed
   
   def search_variable(X_j, y, current_basis):
       """Search all knots for one variable."""
       knots = get_candidate_knots(X_j, minspan, endspan)
       
       best_rss = float('inf')
       best_pair = None
       
       for t in knots:
           # Fit and evaluate pair
           # ... (as in forward pass)
       
       return best_pair, best_rss
   
   # Parallel loop over variables
   results = Parallel(n_jobs=-1)(
       delayed(search_variable)(X[:, j], y, current_basis)
       for j in range(X.shape[1])
   )

**Note:** PyMARS doesn't use built-in parallelization by default to maintain simplicity. Users can add it as shown above.

Debugging & Profiling
======================

Verbose Output
--------------

.. code-block:: python

   model = MARS(verbose=True)
   model.fit(X, y)
   
   # Shows iteration progress

Profiling Fit Time
------------------

.. code-block:: python

   import cProfile
   import pstats
   
   profiler = cProfile.Profile()
   profiler.enable()
   
   model = MARS()
   model.fit(X, y)
   
   profiler.disable()
   stats = pstats.Stats(profiler)
   stats.sort_stats('cumulative')
   stats.print_stats(10)  # Top 10 functions by time

Memory Profiling
----------------

.. code-block:: python

   from memory_profiler import profile
   
   @profile
   def fit_mars():
       model = MARS()
       model.fit(X, y)
       return model
   
   fit_mars()
   
   # Output shows memory usage per line

Edge Cases & Robustness
=======================

Constant Features
-----------------

If a feature has no variance:

.. code-block:: python

   # Detection
   if np.std(X[:, j]) < 1e-10:
       print(f"Feature {j} is constant, removing...")
       X = np.delete(X, j, axis=1)

Single Knot Problem
-------------------

If all data values are identical:

.. code-block:: python

   # Add small noise or handle specially
   if len(np.unique(X[:, j])) == 1:
       # Can't place knots; skip this variable
       pass

Singular Design Matrix
----------------------

If B^T B is singular:

.. code-block:: python

   # Automatic fallback to pseudoinverse
   # (handled in solve_least_squares)
   
   # Or manually use pseudoinverse
   c = np.linalg.pinv(B) @ y

Perfect Fit (RSS = 0)
---------------------

If model fits perfectly (rare):

.. code-block:: python

   # GCV formula becomes 0 / 0
   # Handled gracefully:
   
   if RSS < 1e-10:
       gcv = 0.0  # Perfect fit
   else:
       gcv = RSS / (N * (1 - df/N)**2)

Extension Points
================

Custom Basis Functions
----------------------

Implement your own by subclassing:

.. code-block:: python

   from pymars.basis import HingeFunction
   
   class SineHinge(HingeFunction):
       """Custom: sin(x) instead of hinge"""
       
       def __call__(self, X):
           x = X[:, self.variable]
           return np.sin(x - self.knot)

Custom Loss Functions
---------------------

MARS uses squared loss by default. For custom losses:

.. code-block:: python

   def fit_with_custom_loss(X, y, loss_fn, loss_grad):
       """
       Gradient descent instead of least squares.
       (Not in standard MARS)
       """
       # Iterative refinement using loss_grad
       pass

Custom Penalty Functions
------------------------

Instead of constant penalty :math:`d`:

.. code-block:: python

   def adaptive_penalty(M, n_interactions):
       """Penalty that increases with interactions."""
       base_penalty = 3.0
       interaction_penalty = 0.5 * n_interactions
       return base_penalty + interaction_penalty

Advanced: Fine-Tuning the Algorithm
====================================

Modifying Knot Density
-----------------------

Change how many knots are tested:

.. code-block:: python

   # Default: test all unique sorted values
   # Custom: test every k-th value
   
   def sparse_knot_candidates(X_j, k=5):
       """Test every k-th unique value."""
       unique_vals = np.unique(X_j)
       return unique_vals[::k]

Interaction Constraints
-----------------------

Add domain-specific constraints:

.. code-block:: python

   class ConstrainedMARS(MARS):
       """MARS with interaction constraints."""
       
       def _find_best_split(self, basis, y):
           """Override to add constraints."""
           
           # ... standard search ...
           
           # Check constraint before adding
           if self._is_valid_interaction(basis, new_hinge):
               # Keep candidate
               pass

Algorithm Variants
===================

Boosting MARS
-------------

Stack multiple MARS models on residuals:

.. code-block:: python

   def boost_mars(X, y, n_boosters=5, learning_rate=0.1):
       """Boosted MARS ensemble."""
       
       residuals = y.copy()
       models = []
       
       for i in range(n_boosters):
           model = MARS(max_terms=10)
           model.fit(X, residuals)
           
           pred = model.predict(X)
           residuals = residuals - learning_rate * pred
           
           models.append(model)
       
       return models
   
   def predict_boosted(models, X, learning_rate=0.1):
       """Predict with boosted ensemble."""
       
       y_pred = np.zeros(len(X))
       for model in models:
           y_pred += learning_rate * model.predict(X)
       
       return y_pred

Sparse MARS
-----------

Force sparsity via L1 penalty:

.. code-block:: python

   # Not built-in; requires modifying least squares
   # Use Elastic Net instead of OLS:
   
   from sklearn.linear_model import ElasticNet
   
   def solve_sparse(B, y, alpha=0.1, l1_ratio=0.5):
       """Solve with L1 + L2 penalty."""
       
       en = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
       en.fit(B, y)
       return en.coef_

Summary
=======

**Key Advanced Topics:**

- ✅ Numerical stability via fallback chain
- ✅ Complexity analysis and optimization
- ✅ Memory-efficient computation
- ✅ Caching and parallelization strategies
- ✅ Debugging and profiling tools
- ✅ Extension points for customization
- ✅ Algorithm variants (boosting, sparse)

**When to Use Advanced Features:**

- Large-scale datasets: Optimize memory and time
- Custom applications: Extend with domain knowledge
- Research: Implement algorithm variants
- Production: Add robustness and monitoring