API Reference
Core Module
Main MARS regression class
Public API for Multivariate Adaptive Regression Splines.
- class pymars.mars.MARS(max_terms: int = 30, max_degree: int = 1, penalty: float = 3.0, minspan: str = 'auto', endspan: str = 'auto', alpha: float = 0.05, standardize: bool = True, verbose: bool = True, smooth: bool = False)[source]
Bases:
objectMultivariate Adaptive Regression Splines (MARS)
Nonparametric regression method that automatically selects relevant variables and detects interactions through adaptive basis function construction.
- Parameters:
max_terms (int, default=30) – Maximum number of basis functions to create in forward pass. The final model will typically have fewer due to pruning.
max_degree (int, default=1) – Maximum interaction order (number of variables in a product). - max_degree=1: Additive model (no interactions) - max_degree=2: Pairwise interactions allowed - max_degree=n: Full interactions allowed
penalty (float, default=3.0) – GCV penalty parameter (d in Friedman 1991). - penalty=2: Recommended for additive models - penalty=3: Recommended for general MARS Larger values produce smoother models.
minspan (int or 'auto', default='auto') – Minimum number of observations between knots. ‘auto’ uses Friedman’s formula: -log2(alpha/n) / 2.5
endspan (int or 'auto', default='auto') – Minimum observations from data endpoints. ‘auto’ uses Friedman’s formula: 3 - log2(alpha/n)
alpha (float, default=0.05) – Significance level for automatic span calculation. Only used if minspan or endspan is ‘auto’.
standardize (bool, default=True) – Whether to standardize features before fitting. Recommended for numerical stability.
verbose (bool, default=True) – Whether to print progress messages during fitting.
- basis_functions_
Selected basis functions after pruning
- Type:
- coefficients_
Fitted coefficients for basis functions
- Type:
array
- feature_importances_
Importance score for each feature
- Type:
array
Examples
>>> from pymars import MARS >>> import numpy as np >>> >>> # Generate synthetic data >>> X = np.random.randn(100, 5) >>> y = X[:, 0]**2 + 2*X[:, 1] + np.random.randn(100)*0.1 >>> >>> # Fit MARS model >>> model = MARS(max_terms=20, max_degree=2) >>> model.fit(X, y) >>> >>> # Make predictions >>> y_pred = model.predict(X) >>> >>> # View summary >>> model.summary()
- __init__(max_terms: int = 30, max_degree: int = 1, penalty: float = 3.0, minspan: str = 'auto', endspan: str = 'auto', alpha: float = 0.05, standardize: bool = True, verbose: bool = True, smooth: bool = False)[source]
- fit(X: ndarray, y: ndarray) MARS[source]
Fit MARS model to data
- Parameters:
X (array-like, shape (n_samples, n_features)) – Training input features
y (array-like, shape (n_samples,)) – Training target values
- Returns:
self – Fitted estimator
- Return type:
- predict(X: ndarray) ndarray[source]
Make predictions using fitted MARS model
- Parameters:
X (array-like, shape (n_samples, n_features)) – Input features (in original scale, not standardized)
- Returns:
y_pred – Predicted values
- Return type:
array, shape (n_samples,)
Notes
All knot locations in basis_functions_ are stored in the standardized domain (after mean/std transformation if self.standardize=True). Input X is automatically scaled to match before evaluation.
- score(X: ndarray, y: ndarray) float[source]
Calculate R² score on test data
- Parameters:
X (array-like, shape (n_samples, n_features)) – Test features
y (array-like, shape (n_samples,)) – True values
- Returns:
r2 – R² score
- Return type:
- summary() Dict[source]
Get model summary with statistics and basis functions
- Returns:
summary – Model information including: - n_basis: Number of basis functions - gcv_score: GCV score - train_mse: Training MSE (if available) - train_r2: Training R² (if available) - basis_info: List of basis function descriptions - feature_importances: Feature importance scores
- Return type:
- get_anova_decomposition() Dict[int, List[BasisFunction]][source]
Get ANOVA decomposition of model
Returns functions grouped by interaction order: - order 0: constant - order 1: main effects (single variables) - order 2: two-way interactions - etc.
- Returns:
anova – Maps interaction order to list of basis functions
- Return type:
Basis Functions
Basis functions for MARS
Implements hinge functions and basis function combinations for building multivariate adaptive regression splines.
- class pymars.basis.HingeFunction(variable: int, knot: float, direction: int)[source]
Bases:
objectSingle hinge function: [s(x - t)]+ where s ∈ {-1, +1}
- The function equals:
s(x - t) if s(x - t) > 0
0 otherwise
- Parameters:
- class pymars.basis.BasisFunction(hinges: List[HingeFunction] | None = None, parent_id: int | None = None, basis_id: int | None = None)[source]
Bases:
objectMARS basis function: product of hinge functions
B_m(x) = ∏ [s_k(x_v(k,m) - t_k)]+
- Parameters:
hinges (list of HingeFunction) – Individual hinge functions to multiply
parent_id (int, optional) – ID of parent basis function (for tracking hierarchy)
basis_id (int, optional) – Unique stable ID for this basis (generated if not provided)
- __init__(hinges: List[HingeFunction] | None = None, parent_id: int | None = None, basis_id: int | None = None)[source]
- evaluate(X: ndarray) ndarray[source]
Evaluate basis function on data matrix X
- Parameters:
X (array-like, shape (n_samples, n_features)) – Input data
- Returns:
values – Basis function values
- Return type:
array, shape (n_samples,)
- add_hinge(hinge: HingeFunction) BasisFunction[source]
Create new basis function by adding a hinge
- Parameters:
hinge (HingeFunction) – Hinge to add to product
- Returns:
new_basis – New basis function with added hinge
- Return type:
- class pymars.basis.InteractionConstraint(max_degree: int = 1, allowed_interactions: List[Tuple[int, ...]] | None = None, forbidden_variables: List[int] | None = None)[source]
Bases:
objectManages constraints on variable interactions
- Parameters:
- __init__(max_degree: int = 1, allowed_interactions: List[Tuple[int, ...]] | None = None, forbidden_variables: List[int] | None = None)[source]
- is_valid(basis: BasisFunction, new_variable: int) bool[source]
Check if adding new_variable to basis is valid
- Parameters:
basis (BasisFunction) – Current basis function
new_variable (int) – Variable to potentially add
- Returns:
valid – True if the combination is allowed
- Return type:
- pymars.basis.build_design_matrix(X: ndarray, basis_functions: List[BasisFunction]) ndarray[source]
Build design matrix from basis functions
- Parameters:
X (array-like, shape (n_samples, n_features)) – Input data
basis_functions (list of BasisFunction) – Basis functions to evaluate
- Returns:
B – Design matrix where each column is a basis function evaluation
- Return type:
array, shape (n_samples, n_basis)
Model Selection (GCV)
Generalized Cross-Validation (GCV) for MARS
- Implements the GCV criterion for model selection:
GCV = RSS / [N * (1 - C(M)/N)^2]
Where C(M) is the complexity penalty accounting for both fitted parameters and the adaptive basis function selection.
- class pymars.gcv.GCVCalculator(penalty: float = 3.0)[source]
Bases:
objectCalculate Generalized Cross-Validation scores
- Parameters:
penalty (float, default=3.0) –
Penalty parameter d for each basis function optimization Friedman recommends:
d = 2 for additive models
d = 3 for general MARS
- complexity(n_basis: int, design_matrix: ndarray) float[source]
Calculate complexity cost C(M)
C(M) = trace[B(B’B)^-1 B’] + d*M
The first term is the effective number of parameters in linear fit (trace). The second term accounts for adaptive basis selection.
- calculate(residuals: ndarray, design_matrix: ndarray, n_basis: int) float[source]
Calculate GCV score
GCV = (RSS/N) / (1 - C(M)/N)^2
- class pymars.gcv.CVCalculator(n_folds: int = 10, random_state: int | None = None)[source]
Bases:
objectStandard k-fold cross-validation calculator
- Parameters:
- pymars.gcv.calculate_rss(residuals: ndarray) float[source]
Calculate Residual Sum of Squares
- Parameters:
residuals (array-like) – Residuals from model fit
- Returns:
rss – Sum of squared residuals
- Return type:
Utilities
Utility functions for MARS
Matrix operations, knot selection, and helper functions.
- pymars.utils.solve_least_squares(A: ndarray, b: ndarray, ridge: float = 1e-10) ndarray[source]
Solve least squares problem: min ||Ax - b||^2
Uses np.linalg.lstsq for robustness (handles under/overdetermined systems). Falls back to regularized normal equations if needed.
- Parameters:
A (array, shape (n_samples, n_features)) – Design matrix
b (array, shape (n_samples,)) – Target values
ridge (float, default=1e-10) – Ridge parameter for fallback regularization
- Returns:
x – Least squares solution
- Return type:
array, shape (n_features,)
- pymars.utils.get_candidate_knots(X: ndarray, variable: int, minspan: int = 0, existing_knots: List[float] | None = None) ndarray[source]
Get candidate knot locations for a variable
Selects knot candidates ensuring minimum spacing in sorted data. minspan is the minimum number of sorted samples between consecutive knots.
- Parameters:
- Returns:
knots – Candidate knot locations (unique, sorted)
- Return type:
array
- pymars.utils.calculate_minspan(n_samples: int, alpha: float = 0.05) int[source]
Calculate minimum span between knots
- From Friedman (1991):
L = -log2(alpha/n) / 2.5
- pymars.utils.calculate_endspan(n_samples: int, n_features: int, alpha: float = 0.05) int[source]
Calculate minimum span from endpoints
- From Friedman (1991):
Le = 3 - log2(alpha/n)
- pymars.utils.apply_endspan_constraint(knots: ndarray, x_values: ndarray, endspan: int) ndarray[source]
Remove knots too close to data endpoints
Ensures knots are at least endspan observations from the start and end of the sorted data.
- Parameters:
knots (array) – Candidate knot locations
x_values (array) – Data values for this variable
endspan (int) – Minimum observations from endpoints
- Returns:
valid_knots – Knots satisfying endspan constraint (strictly between boundaries)
- Return type:
array
- pymars.utils.standardize_data(X: ndarray, mean: ndarray | None = None, std: ndarray | None = None) Tuple[ndarray, ndarray, ndarray][source]
Standardize features to zero mean and unit variance
- Parameters:
X (array, shape (n_samples, n_features)) – Input data
mean (array, optional) – Pre-computed means (for transform)
std (array, optional) – Pre-computed standard deviations (for transform)
- Returns:
X_scaled (array) – Standardized data
mean (array) – Feature means
std (array) – Feature standard deviations
- pymars.utils.unstandardize_predictions(y: ndarray, y_mean: float, y_std: float) ndarray[source]
Transform predictions back to original scale
- pymars.utils.linear_extrapolation(x: float, x1: float, y1: float, x2: float, y2: float) float[source]
Linear extrapolation through two points
Model Algorithms
Forward and backward stepwise algorithms for MARS
Implements the core forward selection and backward pruning procedures.
- class pymars.model.ForwardPass(max_terms: int = 30, max_degree: int = 1, minspan: int = 0, endspan: int = 0)[source]
Bases:
objectForward stepwise basis function selection
Builds an overfitted model by iteratively adding basis function pairs that best reduce the residual sum of squares.
- Parameters:
- fit(X: ndarray, y: ndarray) Tuple[List[BasisFunction], ndarray][source]
Run forward pass to build basis function set
- Parameters:
X (array, shape (n_samples, n_features)) – Input features
y (array, shape (n_samples,)) – Target values
- Returns:
basis_functions (list of BasisFunction) – Selected basis functions (including constant)
coefficients (array) – Fitted coefficients
- class pymars.model.BackwardPass(gcv_calculator: GCVCalculator)[source]
Bases:
objectBackward stepwise basis function pruning
Removes basis functions one at a time to optimize GCV score.
- Parameters:
gcv_calculator (GCVCalculator) – GCV calculator instance
- __init__(gcv_calculator: GCVCalculator)[source]
- prune(X: ndarray, y: ndarray, basis_functions: List[BasisFunction], coefficients: ndarray) Tuple[List[BasisFunction], ndarray, float][source]
Prune basis functions to minimize GCV
- Parameters:
X (array, shape (n_samples, n_features)) – Input features
y (array, shape (n_samples,)) – Target values
basis_functions (list) – Initial (overfitted) basis functions
coefficients (array) – Initial coefficients
- Returns:
best_basis (list) – Pruned basis functions
best_coef (array) – Pruned coefficients
best_gcv (float) – Best GCV score achieved
Cubic Extension
Cubic Spline Conversion for MARS
Converts piecewise-linear MARS model to cubic splines with continuous first derivatives.
- class pymars.cubic.CubicHingeFunction(variable: int, knot_center: float, knot_left: float, knot_right: float, direction: int)[source]
Bases:
objectCubic truncated power hinge function with continuous derivatives
C(x; s, t-, t, t+) where: - t is the central knot - t-, t+ are side knots for smoothness
- class pymars.cubic.CubicBasisFunction(cubic_hinges: List[CubicHingeFunction])[source]
Bases:
objectBasis function composed of cubic hinges
- __init__(cubic_hinges: List[CubicHingeFunction])[source]
- pymars.cubic.place_side_knots(central_knots: List[float], x_min: float, x_max: float) Dict[float, Tuple[float, float]][source]
Place side knots (t-, t+) for each central knot
Side knots are placed at midpoints between adjacent central knots
- pymars.cubic.convert_to_cubic(linear_basis: List[BasisFunction], X: ndarray, y: ndarray) Tuple[List[CubicBasisFunction], ndarray][source]
Convert linear MARS model to cubic splines with continuous derivatives
- Parameters:
linear_basis (list of BasisFunction) – Linear basis functions from MARS
X (array) – Training data
y (array) – Training targets
- Returns:
cubic_basis (list of CubicBasisFunction) – Cubic basis functions
coefficients (array) – New coefficients fitted to cubic basis
Plotting
Visualization tools for MARS models
Functions for plotting basis functions, model predictions, and diagnostics.
- pymars.plots.plot_univariate_effects(model, X: ndarray, feature_idx: int, n_points: int = 100, ax: Axes | None = None) Axes[source]
Plot the effect of a single feature on predictions
- pymars.plots.plot_bivariate_effect(model, X: ndarray, feature1: int, feature2: int, n_points: int = 50, plot_type: str = 'contour', ax: Axes | None = None) Axes[source]
Plot interaction effect between two features
- Parameters:
- Returns:
ax
- Return type:
matplotlib axes
- pymars.plots.plot_basis_functions(model, X: ndarray, max_plot: int = 6, figsize: Tuple[int, int] = (12, 8))[source]
Plot individual basis functions
- pymars.plots.plot_feature_importances(model, feature_names: List[str] | None = None, figsize: Tuple[int, int] = (8, 5)) Figure[source]
Bar plot of feature importances
Interactions
Advanced interaction analysis for MARS
Tools for analyzing and visualizing variable interactions in MARS models.
- class pymars.interactions.InteractionAnalyzer(model)[source]
Bases:
objectAnalyze interactions in fitted MARS model
- Parameters:
model (MARS) – Fitted MARS model
- get_interaction_strength() Dict[Tuple[int, ...], float][source]
Calculate strength of each variable interaction
Returns sum of absolute coefficients for each unique variable set.
- Returns:
interactions – Maps variable tuple to interaction strength
- Return type:
- get_pairwise_interactions() ndarray[source]
Get matrix of pairwise interaction strengths
- Returns:
matrix – Symmetric matrix where [i,j] is strength of x_i * x_j interaction
- Return type:
array, shape (n_features, n_features)
- find_pure_additive_effects() List[int][source]
Find variables that only appear in additive terms (degree=1) and never in interactions (degree > 1)
- decompose_prediction(x: ndarray) Dict[str, float][source]
Decompose a single prediction into contributions
- Parameters:
x (array, shape (n_features,)) – Single input vector (in original, non-standardized scale)
- Returns:
contributions – Maps component name to its contribution value
- Return type:
Notes
If model.standardize=True, input x is automatically standardized to match the domain where basis functions were trained.
- interaction_test(var1: int, var2: int, X: ndarray, y: ndarray) float[source]
Test if interaction between two variables improves fit
Compares model with and without the specific interaction.
- hierarchical_interaction_map() Dict[int, Set[Tuple[int, ...]]][source]
Create hierarchical map of interactions
- Returns:
hierarchy – Maps degree to set of variable tuples at that degree
- Return type:
Quick API Summary
MARS Class
Main entry point for all MARS operations.
from pymars import MARS
model = MARS(
max_terms=30,
max_degree=1,
penalty=3.0,
minspan='auto',
endspan='auto',
alpha=0.05,
standardize=True,
smooth=False,
verbose=True
)
model.fit(X, y)
y_pred = model.predict(X)
score = model.score(X_test, y_test)
model.summary()
Key Methods
- fit(X, y)
Fit the MARS model to data.
- param X:
Feature matrix (n_samples, n_features)
- param y:
Target vector (n_samples,)
- predict(X)
Make predictions on new data.
- param X:
Feature matrix
- return:
Predicted values
- score(X, y)
Compute R² score.
- param X:
Feature matrix
- param y:
Target values
- return:
R² score (0 to 1, higher is better)
- summary()
Print detailed model summary.
- get_anova_decomposition()
Get ANOVA decomposition by interaction order.
- return:
Dictionary {order: [basis_functions]}
Key Attributes
- basis_functions_
List of fitted basis functions.
- coefficients_
Fitted model coefficients (length = len(basis_functions_)).
- feature_importances_
Importance score for each feature (0 to 1).
- gcv_score_
Final GCV score of selected model.
- n_features_in_
Number of input features.
- n_basis_functions_
Number of basis functions in final model.
Basis Functions
from pymars.basis import HingeFunction, BasisFunction
# Hinge: h(x, t, d) = max(0, d*(x - t))
hinge = HingeFunction(variable=0, knot=1.5, direction=1)
# Basis: product of hinges
bf = BasisFunction(hinges=[hinge], basis_id=1)
GCV Calculator
from pymars.gcv import GCVCalculator
gcv_calc = GCVCalculator(penalty=3.0)
gcv_score = gcv_calc.calculate(B, y) # Design matrix B, target y
Utilities
from pymars.utils import (
calculate_minspan,
calculate_endspan,
get_candidate_knots,
solve_least_squares
)
L = calculate_minspan(N=200, alpha=0.05)
Le = calculate_endspan(N=200, alpha=0.05)
knots = get_candidate_knots(X_j, minspan=L, endspan=Le)
c = solve_least_squares(B, y)
Cubic Splines
from pymars.cubic import convert_to_cubic
# Convert fitted linear model to cubic
model_cubic = convert_to_cubic(model)
Interactions
from pymars.interactions import InteractionAnalysis
ia = InteractionAnalysis(model)
ia.summary()
effects = ia.get_interaction_effects()
Plotting
from pymars.plots import (
plot_basis_functions,
plot_predictions,
plot_residuals,
plot_feature_importance
)
plot_basis_functions(model, X, y)
plot_feature_importance(model)
Parameter Details
max_terms
Type: int Default: 30 Range: 5–500
Maximum number of basis functions in forward pass. Higher values allow more complex models but increase fitting time.
Small data: 10–20
Medium data: 20–50
Large data: 50–100
max_degree
Type: int Default: 1 Range: 1–5
Maximum interaction order.
1: Main effects only
2: Up to 2-way interactions
3: Up to 3-way interactions
penalty
Type: float Default: 3.0 Range: 2.0–5.0
GCV penalty per basis function. Controls pruning aggressiveness.
2.0: Aggressive (fewer basis functions)
3.0: Standard (balanced)
4.0: Conservative (more basis functions)
minspan & endspan
Type: int or ‘auto’ Default: ‘auto’
Knot spacing constraints. ‘auto’ uses Friedman formulas:
alpha
Type: float Default: 0.05 Range: 0.01–0.10
Significance level for span calculations.
standardize
Type: bool Default: True
Standardize features to mean 0, std 1. Highly recommended for numerical stability.
smooth
Type: bool Default: False
Use cubic splines instead of linear hinges. Produces smoother models with continuous 2nd derivatives.
verbose
Type: bool Default: True
Print iteration progress during fitting.
Common Patterns
Pattern 1: Additive Model
No interactions, simple fit:
model = MARS(
max_terms=20,
max_degree=1, # No interactions
penalty=2.0 # More pruning
)
model.fit(X, y)
Pattern 2: With Interactions
Allow 2-way interactions:
model = MARS(
max_terms=40,
max_degree=2, # 2-way interactions
penalty=3.0
)
model.fit(X, y)
Pattern 3: Smooth Model
Cubic splines:
model = MARS(
max_terms=25,
max_degree=1,
smooth=True # Cubic instead of linear
)
model.fit(X, y)
Pattern 4: High-Dimensional
Feature selection with many variables:
model = MARS(
max_terms=50, # Higher limit
max_degree=1, # Focus on main effects
penalty=2.5 # Medium pruning
)
model.fit(X, y)
# See selected features
important = np.where(model.feature_importances_ > 0)[0]
Exceptions & Errors
ValueError: max_terms must be >= 2
The model needs at least 2 basis functions (constant + 1 hinge).
Solution: Increase max_terms to ≥ 2.
ValueError: max_degree must be >= 1
Interactions require at least degree 1.
Solution: Set max_degree=1.
LinAlgError: Singular matrix
Least-squares matrix is rank-deficient.
- Causes:
Constant feature (all same value)
Collinear features
Too many basis functions
- Solutions:
Remove constant/near-constant features
Standardize (default)
Reduce max_terms
Increase penalty
ConvergenceWarning
Forward pass did not reach max_terms.
Meaning: RSS improvement plateaued before max_terms.
This is normal behavior – indicates model has reached optimal complexity.
Logging & Debugging
Enable verbose output:
model = MARS(verbose=True)
model.fit(X, y)
Check GCV progression:
print(f"GCV Score: {model.gcv_score_:.6f}")
print(f"Training R²: {model.score(X, y):.6f}")
Inspect basis functions:
for i, bf in enumerate(model.basis_functions_):
print(f"B_{i}: {bf}")
print(f" Coefficient: {model.coefficients_[i]:.6f}")
Performance Considerations
Fitting Time
Typical execution times on standard hardware:
Samples |
Features |
Max Terms |
Time |
|---|---|---|---|
100 |
5 |
20 |
~0.5 s |
200 |
10 |
30 |
~2 s |
500 |
15 |
40 |
~15 s |
1000 |
20 |
50 |
~60 s |
Memory Usage
Approximate peak memory (MB):
where N = samples, d = features, M = basis functions.
Prediction Speed
Once fitted, prediction is very fast:
import time
start = time.time()
y_pred = model.predict(X_test) # 10,000 samples
elapsed = time.time() - start
print(f"Time: {elapsed:.3f}s, Rate: {len(X_test)/elapsed:.0f} samples/sec")
Typical: 50,000–100,000 predictions/second.