Advanced Topics
Numerical Solver Details
Least Squares Stability Chain
PyMARS uses a robust three-tier fallback strategy for solving least squares:
def solve_least_squares(B, y):
"""
Solve: min_c ||y - B @ c||_2^2
Uses fallback chain:
1. QR decomposition (preferred, numerically stable)
2. Cholesky on B^T B
3. Pseudoinverse (most robust)
"""
try:
# Tier 1: QR decomposition
Q, R = np.linalg.qr(B, mode='reduced')
c = np.linalg.solve(R, Q.T @ y)
except np.linalg.LinAlgError:
try:
# Tier 2: Cholesky
L = np.linalg.cholesky(B.T @ B)
c = np.linalg.solve(
L.T,
np.linalg.solve(L, B.T @ y)
)
except np.linalg.LinAlgError:
# Tier 3: Pseudoinverse (always works)
c = np.linalg.pinv(B) @ y
return c
Why This Matters:
QR: Fast, backward stable, preferred
Cholesky: Medium speed, less stable
Pseudoinverse: Slowest but always works
Condition Number Management
For ill-conditioned problems:
import numpy as np
# Check condition number
BtB = B.T @ B
cond = np.linalg.cond(BtB)
if cond > 1e10:
print("WARNING: Ill-conditioned problem")
print(f"Condition number: {cond:.2e}")
# Solutions:
# 1. Use pseudoinverse
c = np.linalg.pinv(B) @ y
# 2. Standardize features (default in MARS)
# 3. Remove collinear features
# 4. Increase regularization (not in MARS)
Computational Complexity
Algorithm Complexity Analysis
Forward Pass:
Breakdown: - \(M_{\max}\) iterations - \(d\) variables per iteration - \(N\) knot candidates per variable - \(N^2\) for least squares (per candidate)
For typical MARS:
Forward: \(O(30 \times 10 \times 200 \times 30 \times 40000) \approx 7.2 \times 10^{11}\) ops
But: Knot search is pruned, least squares is fast with QR
Backward Pass:
Faster than forward, dominated by least squares refitting.
Total: O(1-2 minutes) for typical dataset (N=1000, d=20)
Memory Usage
Peak memory scales with:
For N=1000, M=50, d=20: - Design matrix B: 1000 × 50 = 50k elements = 400 KB - Coefficients: 50 elements = 400 B - QR workspace: ~20 MB - Total: ~20-30 MB
Optimization Strategies
For Large Datasets
If N > 10,000:
model = MARS(
max_terms=20, # Reduce complexity
minspan=50, # Fewer knots to test
endspan=10, # Fewer endpoint candidates
verbose=True # Monitor progress
)
For High-Dimensional Data
If d > 50:
# 1. Dimension reduction first
from sklearn.decomposition import PCA
pca = PCA(n_components=20)
X_reduced = pca.fit_transform(X)
# 2. Or use sparse MARS
model = MARS(
max_terms=30,
max_degree=1, # No interactions for speed
penalty=2.0 # More pruning
)
For Real-Time Prediction
# Prediction is fast: O(M) per sample
# M = basis functions (typically < 50)
# Example: 100,000 predictions in < 1 second
import time
start = time.time()
y_pred = model.predict(X_test) # X_test.shape = (100000, 20)
elapsed = time.time() - start
rate = len(X_test) / elapsed
print(f"Prediction rate: {rate:.0f} samples/sec")
# Output: ~200,000-500,000 samples/sec
Caching Strategies
If Refitting Models Repeatedly
Cache basis function evaluations:
# Evaluate all bases once
B_cache = {}
for bf in model.basis_functions_:
B_cache[bf.id] = bf.evaluate(X)
# Now comparisons are O(M) instead of O(M*N)
for penalty in [2.0, 2.5, 3.0]:
# Use cached B instead of recomputing
gcv = compute_gcv(B_cache, y, penalty)
Parallel Processing
Knot Search Parallelization
Forward pass can be parallelized:
from joblib import Parallel, delayed
def search_variable(X_j, y, current_basis):
"""Search all knots for one variable."""
knots = get_candidate_knots(X_j, minspan, endspan)
best_rss = float('inf')
best_pair = None
for t in knots:
# Fit and evaluate pair
# ... (as in forward pass)
return best_pair, best_rss
# Parallel loop over variables
results = Parallel(n_jobs=-1)(
delayed(search_variable)(X[:, j], y, current_basis)
for j in range(X.shape[1])
)
Note: PyMARS doesn’t use built-in parallelization by default to maintain simplicity. Users can add it as shown above.
Debugging & Profiling
Verbose Output
model = MARS(verbose=True)
model.fit(X, y)
# Shows iteration progress
Profiling Fit Time
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
model = MARS()
model.fit(X, y)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 functions by time
Memory Profiling
from memory_profiler import profile
@profile
def fit_mars():
model = MARS()
model.fit(X, y)
return model
fit_mars()
# Output shows memory usage per line
Edge Cases & Robustness
Constant Features
If a feature has no variance:
# Detection
if np.std(X[:, j]) < 1e-10:
print(f"Feature {j} is constant, removing...")
X = np.delete(X, j, axis=1)
Single Knot Problem
If all data values are identical:
# Add small noise or handle specially
if len(np.unique(X[:, j])) == 1:
# Can't place knots; skip this variable
pass
Singular Design Matrix
If B^T B is singular:
# Automatic fallback to pseudoinverse
# (handled in solve_least_squares)
# Or manually use pseudoinverse
c = np.linalg.pinv(B) @ y
Perfect Fit (RSS = 0)
If model fits perfectly (rare):
# GCV formula becomes 0 / 0
# Handled gracefully:
if RSS < 1e-10:
gcv = 0.0 # Perfect fit
else:
gcv = RSS / (N * (1 - df/N)**2)
Extension Points
Custom Basis Functions
Implement your own by subclassing:
from pymars.basis import HingeFunction
class SineHinge(HingeFunction):
"""Custom: sin(x) instead of hinge"""
def __call__(self, X):
x = X[:, self.variable]
return np.sin(x - self.knot)
Custom Loss Functions
MARS uses squared loss by default. For custom losses:
def fit_with_custom_loss(X, y, loss_fn, loss_grad):
"""
Gradient descent instead of least squares.
(Not in standard MARS)
"""
# Iterative refinement using loss_grad
pass
Custom Penalty Functions
Instead of constant penalty \(d\):
def adaptive_penalty(M, n_interactions):
"""Penalty that increases with interactions."""
base_penalty = 3.0
interaction_penalty = 0.5 * n_interactions
return base_penalty + interaction_penalty
Advanced: Fine-Tuning the Algorithm
Modifying Knot Density
Change how many knots are tested:
# Default: test all unique sorted values
# Custom: test every k-th value
def sparse_knot_candidates(X_j, k=5):
"""Test every k-th unique value."""
unique_vals = np.unique(X_j)
return unique_vals[::k]
Interaction Constraints
Add domain-specific constraints:
class ConstrainedMARS(MARS):
"""MARS with interaction constraints."""
def _find_best_split(self, basis, y):
"""Override to add constraints."""
# ... standard search ...
# Check constraint before adding
if self._is_valid_interaction(basis, new_hinge):
# Keep candidate
pass
Algorithm Variants
Boosting MARS
Stack multiple MARS models on residuals:
def boost_mars(X, y, n_boosters=5, learning_rate=0.1):
"""Boosted MARS ensemble."""
residuals = y.copy()
models = []
for i in range(n_boosters):
model = MARS(max_terms=10)
model.fit(X, residuals)
pred = model.predict(X)
residuals = residuals - learning_rate * pred
models.append(model)
return models
def predict_boosted(models, X, learning_rate=0.1):
"""Predict with boosted ensemble."""
y_pred = np.zeros(len(X))
for model in models:
y_pred += learning_rate * model.predict(X)
return y_pred
Sparse MARS
Force sparsity via L1 penalty:
# Not built-in; requires modifying least squares
# Use Elastic Net instead of OLS:
from sklearn.linear_model import ElasticNet
def solve_sparse(B, y, alpha=0.1, l1_ratio=0.5):
"""Solve with L1 + L2 penalty."""
en = ElasticNet(alpha=alpha, l1_ratio=l1_ratio)
en.fit(B, y)
return en.coef_
Summary
Key Advanced Topics:
✅ Numerical stability via fallback chain
✅ Complexity analysis and optimization
✅ Memory-efficient computation
✅ Caching and parallelization strategies
✅ Debugging and profiling tools
✅ Extension points for customization
✅ Algorithm variants (boosting, sparse)
When to Use Advanced Features:
Large-scale datasets: Optimize memory and time
Custom applications: Extend with domain knowledge
Research: Implement algorithm variants
Production: Add robustness and monitoring