API Reference

Core Module

Main MARS regression class

Public API for Multivariate Adaptive Regression Splines.

class pymars.mars.MARS(max_terms: int = 30, max_degree: int = 1, penalty: float = 3.0, minspan: str = 'auto', endspan: str = 'auto', alpha: float = 0.05, standardize: bool = True, verbose: bool = True, smooth: bool = False)[source]

Bases: object

Multivariate Adaptive Regression Splines (MARS)

Nonparametric regression method that automatically selects relevant variables and detects interactions through adaptive basis function construction.

Parameters:
  • max_terms (int, default=30) – Maximum number of basis functions to create in forward pass. The final model will typically have fewer due to pruning.

  • max_degree (int, default=1) – Maximum interaction order (number of variables in a product). - max_degree=1: Additive model (no interactions) - max_degree=2: Pairwise interactions allowed - max_degree=n: Full interactions allowed

  • penalty (float, default=3.0) – GCV penalty parameter (d in Friedman 1991). - penalty=2: Recommended for additive models - penalty=3: Recommended for general MARS Larger values produce smoother models.

  • minspan (int or 'auto', default='auto') – Minimum number of observations between knots. ‘auto’ uses Friedman’s formula: -log2(alpha/n) / 2.5

  • endspan (int or 'auto', default='auto') – Minimum observations from data endpoints. ‘auto’ uses Friedman’s formula: 3 - log2(alpha/n)

  • alpha (float, default=0.05) – Significance level for automatic span calculation. Only used if minspan or endspan is ‘auto’.

  • standardize (bool, default=True) – Whether to standardize features before fitting. Recommended for numerical stability.

  • verbose (bool, default=True) – Whether to print progress messages during fitting.

basis_functions_

Selected basis functions after pruning

Type:

list of BasisFunction

coefficients_

Fitted coefficients for basis functions

Type:

array

gcv_score_

Final GCV score

Type:

float

n_features_in_

Number of input features

Type:

int

feature_importances_

Importance score for each feature

Type:

array

Examples

>>> from pymars import MARS
>>> import numpy as np
>>>
>>> # Generate synthetic data
>>> X = np.random.randn(100, 5)
>>> y = X[:, 0]**2 + 2*X[:, 1] + np.random.randn(100)*0.1
>>>
>>> # Fit MARS model
>>> model = MARS(max_terms=20, max_degree=2)
>>> model.fit(X, y)
>>>
>>> # Make predictions
>>> y_pred = model.predict(X)
>>>
>>> # View summary
>>> model.summary()
__init__(max_terms: int = 30, max_degree: int = 1, penalty: float = 3.0, minspan: str = 'auto', endspan: str = 'auto', alpha: float = 0.05, standardize: bool = True, verbose: bool = True, smooth: bool = False)[source]
fit(X: ndarray, y: ndarray) MARS[source]

Fit MARS model to data

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Training input features

  • y (array-like, shape (n_samples,)) – Training target values

Returns:

self – Fitted estimator

Return type:

MARS

predict(X: ndarray) ndarray[source]

Make predictions using fitted MARS model

Parameters:

X (array-like, shape (n_samples, n_features)) – Input features (in original scale, not standardized)

Returns:

y_pred – Predicted values

Return type:

array, shape (n_samples,)

Notes

All knot locations in basis_functions_ are stored in the standardized domain (after mean/std transformation if self.standardize=True). Input X is automatically scaled to match before evaluation.

score(X: ndarray, y: ndarray) float[source]

Calculate R² score on test data

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Test features

  • y (array-like, shape (n_samples,)) – True values

Returns:

r2 – R² score

Return type:

float

summary() Dict[source]

Get model summary with statistics and basis functions

Returns:

summary – Model information including: - n_basis: Number of basis functions - gcv_score: GCV score - train_mse: Training MSE (if available) - train_r2: Training R² (if available) - basis_info: List of basis function descriptions - feature_importances: Feature importance scores

Return type:

dict

get_anova_decomposition() Dict[int, List[BasisFunction]][source]

Get ANOVA decomposition of model

Returns functions grouped by interaction order: - order 0: constant - order 1: main effects (single variables) - order 2: two-way interactions - etc.

Returns:

anova – Maps interaction order to list of basis functions

Return type:

dict

Basis Functions

Basis functions for MARS

Implements hinge functions and basis function combinations for building multivariate adaptive regression splines.

class pymars.basis.HingeFunction(variable: int, knot: float, direction: int)[source]

Bases: object

Single hinge function: [s(x - t)]+ where s ∈ {-1, +1}

The function equals:
  • s(x - t) if s(x - t) > 0

  • 0 otherwise

Parameters:
  • variable (int) – Index of the variable this hinge acts on

  • knot (float) – Knot location t

  • direction (int) – Sign s: +1 for right hinge, -1 for left hinge

__init__(variable: int, knot: float, direction: int)[source]
evaluate(X: ndarray) ndarray[source]

Evaluate hinge function on data matrix X

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data

Returns:

values – Hinge function values

Return type:

array, shape (n_samples,)

class pymars.basis.BasisFunction(hinges: List[HingeFunction] | None = None, parent_id: int | None = None, basis_id: int | None = None)[source]

Bases: object

MARS basis function: product of hinge functions

B_m(x) = ∏ [s_k(x_v(k,m) - t_k)]+

Parameters:
  • hinges (list of HingeFunction) – Individual hinge functions to multiply

  • parent_id (int, optional) – ID of parent basis function (for tracking hierarchy)

  • basis_id (int, optional) – Unique stable ID for this basis (generated if not provided)

__init__(hinges: List[HingeFunction] | None = None, parent_id: int | None = None, basis_id: int | None = None)[source]
property degree: int

Number of hinges (interaction order)

property variables: List[int]

Variables involved in this basis function

evaluate(X: ndarray) ndarray[source]

Evaluate basis function on data matrix X

Parameters:

X (array-like, shape (n_samples, n_features)) – Input data

Returns:

values – Basis function values

Return type:

array, shape (n_samples,)

add_hinge(hinge: HingeFunction) BasisFunction[source]

Create new basis function by adding a hinge

Parameters:

hinge (HingeFunction) – Hinge to add to product

Returns:

new_basis – New basis function with added hinge

Return type:

BasisFunction

get_knot_info() List[Tuple[int, float, int]][source]

Get information about all knots in this basis

Returns:

knots – Each tuple: (variable, knot_value, direction)

Return type:

list of tuples

class pymars.basis.InteractionConstraint(max_degree: int = 1, allowed_interactions: List[Tuple[int, ...]] | None = None, forbidden_variables: List[int] | None = None)[source]

Bases: object

Manages constraints on variable interactions

Parameters:
  • max_degree (int) – Maximum interaction order (number of variables in a product)

  • allowed_interactions (list of tuples, optional) – Specific allowed variable combinations

  • forbidden_variables (list of int, optional) – Variables that cannot interact with others

__init__(max_degree: int = 1, allowed_interactions: List[Tuple[int, ...]] | None = None, forbidden_variables: List[int] | None = None)[source]
is_valid(basis: BasisFunction, new_variable: int) bool[source]

Check if adding new_variable to basis is valid

Parameters:
  • basis (BasisFunction) – Current basis function

  • new_variable (int) – Variable to potentially add

Returns:

valid – True if the combination is allowed

Return type:

bool

pymars.basis.build_design_matrix(X: ndarray, basis_functions: List[BasisFunction]) ndarray[source]

Build design matrix from basis functions

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Input data

  • basis_functions (list of BasisFunction) – Basis functions to evaluate

Returns:

B – Design matrix where each column is a basis function evaluation

Return type:

array, shape (n_samples, n_basis)

Model Selection (GCV)

Generalized Cross-Validation (GCV) for MARS

Implements the GCV criterion for model selection:

GCV = RSS / [N * (1 - C(M)/N)^2]

Where C(M) is the complexity penalty accounting for both fitted parameters and the adaptive basis function selection.

class pymars.gcv.GCVCalculator(penalty: float = 3.0)[source]

Bases: object

Calculate Generalized Cross-Validation scores

Parameters:

penalty (float, default=3.0) –

Penalty parameter d for each basis function optimization Friedman recommends:

  • d = 2 for additive models

  • d = 3 for general MARS

__init__(penalty: float = 3.0)[source]
complexity(n_basis: int, design_matrix: ndarray) float[source]

Calculate complexity cost C(M)

C(M) = trace[B(B’B)^-1 B’] + d*M

The first term is the effective number of parameters in linear fit (trace). The second term accounts for adaptive basis selection.

Parameters:
  • n_basis (int) – Number of basis functions (including constant)

  • design_matrix (array, shape (n_samples, n_cols)) – Design matrix B

Returns:

complexity – Total complexity cost

Return type:

float

calculate(residuals: ndarray, design_matrix: ndarray, n_basis: int) float[source]

Calculate GCV score

GCV = (RSS/N) / (1 - C(M)/N)^2

Parameters:
  • residuals (array, shape (n_samples,)) – Residuals from model fit

  • design_matrix (array, shape (n_samples, n_basis)) – Design matrix

  • n_basis (int) – Number of basis functions (including constant)

Returns:

gcv_score – GCV criterion value (lower is better)

Return type:

float

calculate_from_fit(y_true: ndarray, y_pred: ndarray, design_matrix: ndarray, n_basis: int) float[source]

Calculate GCV from predictions

Parameters:
  • y_true (array, shape (n_samples,)) – True target values

  • y_pred (array, shape (n_samples,)) – Predicted values

  • design_matrix (array, shape (n_samples, n_basis)) – Design matrix

  • n_basis (int) – Number of basis functions

Returns:

gcv_score – GCV criterion value

Return type:

float

class pymars.gcv.CVCalculator(n_folds: int = 10, random_state: int | None = None)[source]

Bases: object

Standard k-fold cross-validation calculator

Parameters:
  • n_folds (int, default=10) – Number of CV folds

  • random_state (int, optional) – Random seed for fold generation

__init__(n_folds: int = 10, random_state: int | None = None)[source]
split_folds(n_samples: int) list[source]

Generate fold indices

Parameters:

n_samples (int) – Number of samples

Returns:

folds – Each tuple contains (train_idx, test_idx)

Return type:

list of tuples

pymars.gcv.calculate_rss(residuals: ndarray) float[source]

Calculate Residual Sum of Squares

Parameters:

residuals (array-like) – Residuals from model fit

Returns:

rss – Sum of squared residuals

Return type:

float

pymars.gcv.calculate_mse(y_true: ndarray, y_pred: ndarray) float[source]

Calculate Mean Squared Error

Parameters:
  • y_true (array-like) – True values

  • y_pred (array-like) – Predicted values

Returns:

mse – Mean squared error

Return type:

float

pymars.gcv.calculate_r2(y_true: ndarray, y_pred: ndarray) float[source]

Calculate R-squared score

Parameters:
  • y_true (array-like) – True values

  • y_pred (array-like) – Predicted values

Returns:

r2 – R-squared score

Return type:

float

Utilities

Utility functions for MARS

Matrix operations, knot selection, and helper functions.

pymars.utils.solve_least_squares(A: ndarray, b: ndarray, ridge: float = 1e-10) ndarray[source]

Solve least squares problem: min ||Ax - b||^2

Uses np.linalg.lstsq for robustness (handles under/overdetermined systems). Falls back to regularized normal equations if needed.

Parameters:
  • A (array, shape (n_samples, n_features)) – Design matrix

  • b (array, shape (n_samples,)) – Target values

  • ridge (float, default=1e-10) – Ridge parameter for fallback regularization

Returns:

x – Least squares solution

Return type:

array, shape (n_features,)

pymars.utils.get_candidate_knots(X: ndarray, variable: int, minspan: int = 0, existing_knots: List[float] | None = None) ndarray[source]

Get candidate knot locations for a variable

Selects knot candidates ensuring minimum spacing in sorted data. minspan is the minimum number of sorted samples between consecutive knots.

Parameters:
  • X (array, shape (n_samples, n_features) or (n_samples,)) – Input data

  • variable (int) – Variable index (0 if X is 1D)

  • minspan (int) – Minimum number of sorted observations between knots

  • existing_knots (list of float, optional) – Already used knot locations to avoid duplicates

Returns:

knots – Candidate knot locations (unique, sorted)

Return type:

array

pymars.utils.calculate_minspan(n_samples: int, alpha: float = 0.05) int[source]

Calculate minimum span between knots

From Friedman (1991):

L = -log2(alpha/n) / 2.5

Parameters:
  • n_samples (int) – Number of samples

  • alpha (float, default=0.05) – Significance level for run resistance

Returns:

minspan – Minimum number of observations between knots

Return type:

int

pymars.utils.calculate_endspan(n_samples: int, n_features: int, alpha: float = 0.05) int[source]

Calculate minimum span from endpoints

From Friedman (1991):

Le = 3 - log2(alpha/n)

Parameters:
  • n_samples (int) – Number of samples

  • n_features (int) – Number of features

  • alpha (float, default=0.05) – Significance level

Returns:

endspan – Minimum observations from endpoints

Return type:

int

pymars.utils.apply_endspan_constraint(knots: ndarray, x_values: ndarray, endspan: int) ndarray[source]

Remove knots too close to data endpoints

Ensures knots are at least endspan observations from the start and end of the sorted data.

Parameters:
  • knots (array) – Candidate knot locations

  • x_values (array) – Data values for this variable

  • endspan (int) – Minimum observations from endpoints

Returns:

valid_knots – Knots satisfying endspan constraint (strictly between boundaries)

Return type:

array

pymars.utils.standardize_data(X: ndarray, mean: ndarray | None = None, std: ndarray | None = None) Tuple[ndarray, ndarray, ndarray][source]

Standardize features to zero mean and unit variance

Parameters:
  • X (array, shape (n_samples, n_features)) – Input data

  • mean (array, optional) – Pre-computed means (for transform)

  • std (array, optional) – Pre-computed standard deviations (for transform)

Returns:

  • X_scaled (array) – Standardized data

  • mean (array) – Feature means

  • std (array) – Feature standard deviations

pymars.utils.unstandardize_predictions(y: ndarray, y_mean: float, y_std: float) ndarray[source]

Transform predictions back to original scale

Parameters:
  • y (array) – Standardized predictions

  • y_mean (float) – Original target mean

  • y_std (float) – Original target std

Returns:

y_original – Predictions in original scale

Return type:

array

pymars.utils.linear_extrapolation(x: float, x1: float, y1: float, x2: float, y2: float) float[source]

Linear extrapolation through two points

Parameters:
  • x (float) – Point to extrapolate to

  • x1 (float) – First point

  • y1 (float) – First point

  • x2 (float) – Second point

  • y2 (float) – Second point

Returns:

y – Extrapolated value

Return type:

float

pymars.utils.cubic_interpolation(x: float, knots: List[float], values: List[float]) float[source]

Piecewise cubic interpolation with continuous derivatives

Parameters:
  • x (float) – Point to interpolate

  • knots (list of float) – Knot locations (sorted)

  • values (list of float) – Function values at knots

Returns:

y – Interpolated value

Return type:

float

Model Algorithms

Forward and backward stepwise algorithms for MARS

Implements the core forward selection and backward pruning procedures.

class pymars.model.ForwardPass(max_terms: int = 30, max_degree: int = 1, minspan: int = 0, endspan: int = 0)[source]

Bases: object

Forward stepwise basis function selection

Builds an overfitted model by iteratively adding basis function pairs that best reduce the residual sum of squares.

Parameters:
  • max_terms (int) – Maximum number of basis functions to create

  • max_degree (int) – Maximum interaction order

  • minspan (int) – Minimum samples between knots

  • endspan (int) – Minimum samples from data endpoints

__init__(max_terms: int = 30, max_degree: int = 1, minspan: int = 0, endspan: int = 0)[source]
fit(X: ndarray, y: ndarray) Tuple[List[BasisFunction], ndarray][source]

Run forward pass to build basis function set

Parameters:
  • X (array, shape (n_samples, n_features)) – Input features

  • y (array, shape (n_samples,)) – Target values

Returns:

  • basis_functions (list of BasisFunction) – Selected basis functions (including constant)

  • coefficients (array) – Fitted coefficients

class pymars.model.BackwardPass(gcv_calculator: GCVCalculator)[source]

Bases: object

Backward stepwise basis function pruning

Removes basis functions one at a time to optimize GCV score.

Parameters:

gcv_calculator (GCVCalculator) – GCV calculator instance

__init__(gcv_calculator: GCVCalculator)[source]
prune(X: ndarray, y: ndarray, basis_functions: List[BasisFunction], coefficients: ndarray) Tuple[List[BasisFunction], ndarray, float][source]

Prune basis functions to minimize GCV

Parameters:
  • X (array, shape (n_samples, n_features)) – Input features

  • y (array, shape (n_samples,)) – Target values

  • basis_functions (list) – Initial (overfitted) basis functions

  • coefficients (array) – Initial coefficients

Returns:

  • best_basis (list) – Pruned basis functions

  • best_coef (array) – Pruned coefficients

  • best_gcv (float) – Best GCV score achieved

Cubic Extension

Cubic Spline Conversion for MARS

Converts piecewise-linear MARS model to cubic splines with continuous first derivatives.

class pymars.cubic.CubicHingeFunction(variable: int, knot_center: float, knot_left: float, knot_right: float, direction: int)[source]

Bases: object

Cubic truncated power hinge function with continuous derivatives

C(x; s, t-, t, t+) where: - t is the central knot - t-, t+ are side knots for smoothness

__init__(variable: int, knot_center: float, knot_left: float, knot_right: float, direction: int)[source]
evaluate(X: ndarray) ndarray[source]

Evaluate cubic hinge function

class pymars.cubic.CubicBasisFunction(cubic_hinges: List[CubicHingeFunction])[source]

Bases: object

Basis function composed of cubic hinges

__init__(cubic_hinges: List[CubicHingeFunction])[source]
property degree: int
property variables: List[int]
evaluate(X: ndarray) ndarray[source]

Evaluate cubic basis function

pymars.cubic.place_side_knots(central_knots: List[float], x_min: float, x_max: float) Dict[float, Tuple[float, float]][source]

Place side knots (t-, t+) for each central knot

Side knots are placed at midpoints between adjacent central knots

Parameters:
  • central_knots (list of float) – Central knot locations (sorted)

  • x_min (float) – Data range on this variable

  • x_max (float) – Data range on this variable

Returns:

side_knots – Maps central_knot -> (knot_left, knot_right)

Return type:

dict

pymars.cubic.convert_to_cubic(linear_basis: List[BasisFunction], X: ndarray, y: ndarray) Tuple[List[CubicBasisFunction], ndarray][source]

Convert linear MARS model to cubic splines with continuous derivatives

Parameters:
  • linear_basis (list of BasisFunction) – Linear basis functions from MARS

  • X (array) – Training data

  • y (array) – Training targets

Returns:

  • cubic_basis (list of CubicBasisFunction) – Cubic basis functions

  • coefficients (array) – New coefficients fitted to cubic basis

pymars.cubic.evaluate_cubic_basis(cubic_basis: List[CubicBasisFunction], X: ndarray) ndarray[source]

Evaluate cubic basis functions on data

Parameters:
  • cubic_basis (list) – Cubic basis functions

  • X (array) – Data to evaluate on

Returns:

B – Design matrix

Return type:

array

Plotting

Visualization tools for MARS models

Functions for plotting basis functions, model predictions, and diagnostics.

pymars.plots.plot_univariate_effects(model, X: ndarray, feature_idx: int, n_points: int = 100, ax: Axes | None = None) Axes[source]

Plot the effect of a single feature on predictions

Parameters:
  • model (MARS) – Fitted MARS model

  • X (array) – Reference data for other features

  • feature_idx (int) – Index of feature to plot

  • n_points (int) – Number of points to evaluate

  • ax (matplotlib axes, optional) – Axes to plot on

Returns:

ax

Return type:

matplotlib axes

pymars.plots.plot_bivariate_effect(model, X: ndarray, feature1: int, feature2: int, n_points: int = 50, plot_type: str = 'contour', ax: Axes | None = None) Axes[source]

Plot interaction effect between two features

Parameters:
  • model (MARS) – Fitted MARS model

  • X (array) – Reference data

  • feature1 (int) – Indices of features to plot

  • feature2 (int) – Indices of features to plot

  • n_points (int) – Grid resolution

  • plot_type (str) – ‘contour’ or ‘surface’

  • ax (matplotlib axes, optional)

Returns:

ax

Return type:

matplotlib axes

pymars.plots.plot_basis_functions(model, X: ndarray, max_plot: int = 6, figsize: Tuple[int, int] = (12, 8))[source]

Plot individual basis functions

Parameters:
  • model (MARS) – Fitted MARS model

  • X (array) – Data to evaluate on

  • max_plot (int) – Maximum number of basis functions to plot

  • figsize (tuple) – Figure size

pymars.plots.plot_feature_importances(model, feature_names: List[str] | None = None, figsize: Tuple[int, int] = (8, 5)) Figure[source]

Bar plot of feature importances

Parameters:
  • model (MARS) – Fitted model

  • feature_names (list of str, optional) – Names for features

  • figsize (tuple) – Figure size

Returns:

fig

Return type:

matplotlib figure

pymars.plots.plot_predictions(model, X: ndarray, y: ndarray, figsize: Tuple[int, int] = (8, 6)) Figure[source]

Scatter plot of predictions vs actual values

Parameters:
  • model (MARS) – Fitted model

  • X (arrays) – Data to predict on

  • y (arrays) – Data to predict on

  • figsize (tuple) – Figure size

Returns:

fig

Return type:

matplotlib figure

pymars.plots.plot_anova_summary(model, figsize: Tuple[int, int] = (10, 6)) Figure[source]

Summary plot of ANOVA decomposition

Parameters:
  • model (MARS) – Fitted model

  • figsize (tuple) – Figure size

Returns:

fig

Return type:

matplotlib figure

Interactions

Advanced interaction analysis for MARS

Tools for analyzing and visualizing variable interactions in MARS models.

class pymars.interactions.InteractionAnalyzer(model)[source]

Bases: object

Analyze interactions in fitted MARS model

Parameters:

model (MARS) – Fitted MARS model

__init__(model)[source]
get_interaction_strength() Dict[Tuple[int, ...], float][source]

Calculate strength of each variable interaction

Returns sum of absolute coefficients for each unique variable set.

Returns:

interactions – Maps variable tuple to interaction strength

Return type:

dict

get_pairwise_interactions() ndarray[source]

Get matrix of pairwise interaction strengths

Returns:

matrix – Symmetric matrix where [i,j] is strength of x_i * x_j interaction

Return type:

array, shape (n_features, n_features)

rank_interactions(top_k: int = 10) List[Tuple][source]

Rank interactions by strength

Parameters:

top_k (int) – Number of top interactions to return

Returns:

ranked – Each tuple: (variables, strength)

Return type:

list of tuples

find_pure_additive_effects() List[int][source]

Find variables that only appear in additive terms (degree=1) and never in interactions (degree > 1)

Returns:

variables – Variable indices with only additive effects (sorted)

Return type:

list of int

decompose_prediction(x: ndarray) Dict[str, float][source]

Decompose a single prediction into contributions

Parameters:

x (array, shape (n_features,)) – Single input vector (in original, non-standardized scale)

Returns:

contributions – Maps component name to its contribution value

Return type:

dict

Notes

If model.standardize=True, input x is automatically standardized to match the domain where basis functions were trained.

interaction_test(var1: int, var2: int, X: ndarray, y: ndarray) float[source]

Test if interaction between two variables improves fit

Compares model with and without the specific interaction.

Parameters:
  • var1 (int) – Variable indices

  • var2 (int) – Variable indices

  • X (arrays) – Test data

  • y (arrays) – Test data

Returns:

improvement – R² improvement from including interaction

Return type:

float

hierarchical_interaction_map() Dict[int, Set[Tuple[int, ...]]][source]

Create hierarchical map of interactions

Returns:

hierarchy – Maps degree to set of variable tuples at that degree

Return type:

dict

compute_h_statistic(var1: int, var2: int, X: ndarray) float[source]

Compute Friedman’s H-statistic for interaction strength

H measures the proportion of variance in the joint effect that cannot be explained by the sum of main effects.

Parameters:
  • var1 (int) – Variable indices

  • var2 (int) – Variable indices

  • X (array) – Input data

Returns:

h_stat – H-statistic (0 = no interaction, 1 = pure interaction)

Return type:

float

pymars.interactions.analyze_interactions_full(model, X: ndarray, threshold: float = 0.01) Dict[source]

Comprehensive interaction analysis

Parameters:
  • model (MARS) – Fitted model

  • X (array) – Input data for analysis

  • threshold (float) – Minimum strength threshold

Returns:

analysis – Complete interaction analysis results

Return type:

dict

pymars.interactions.print_interaction_report(analysis: Dict)[source]

Print formatted interaction analysis report

Parameters:

analysis (dict) – Output from analyze_interactions_full

Quick API Summary

MARS Class

Main entry point for all MARS operations.

from pymars import MARS

model = MARS(
    max_terms=30,
    max_degree=1,
    penalty=3.0,
    minspan='auto',
    endspan='auto',
    alpha=0.05,
    standardize=True,
    smooth=False,
    verbose=True
)
model.fit(X, y)
y_pred = model.predict(X)
score = model.score(X_test, y_test)
model.summary()

Key Methods

fit(X, y)

Fit the MARS model to data.

param X:

Feature matrix (n_samples, n_features)

param y:

Target vector (n_samples,)

predict(X)

Make predictions on new data.

param X:

Feature matrix

return:

Predicted values

score(X, y)

Compute R² score.

param X:

Feature matrix

param y:

Target values

return:

R² score (0 to 1, higher is better)

summary()

Print detailed model summary.

get_anova_decomposition()

Get ANOVA decomposition by interaction order.

return:

Dictionary {order: [basis_functions]}

Key Attributes

basis_functions_

List of fitted basis functions.

coefficients_

Fitted model coefficients (length = len(basis_functions_)).

feature_importances_

Importance score for each feature (0 to 1).

gcv_score_

Final GCV score of selected model.

n_features_in_

Number of input features.

n_basis_functions_

Number of basis functions in final model.

Basis Functions

from pymars.basis import HingeFunction, BasisFunction

# Hinge: h(x, t, d) = max(0, d*(x - t))
hinge = HingeFunction(variable=0, knot=1.5, direction=1)

# Basis: product of hinges
bf = BasisFunction(hinges=[hinge], basis_id=1)

GCV Calculator

from pymars.gcv import GCVCalculator

gcv_calc = GCVCalculator(penalty=3.0)
gcv_score = gcv_calc.calculate(B, y)  # Design matrix B, target y

Utilities

from pymars.utils import (
    calculate_minspan,
    calculate_endspan,
    get_candidate_knots,
    solve_least_squares
)

L = calculate_minspan(N=200, alpha=0.05)
Le = calculate_endspan(N=200, alpha=0.05)
knots = get_candidate_knots(X_j, minspan=L, endspan=Le)
c = solve_least_squares(B, y)

Cubic Splines

from pymars.cubic import convert_to_cubic

# Convert fitted linear model to cubic
model_cubic = convert_to_cubic(model)

Interactions

from pymars.interactions import InteractionAnalysis

ia = InteractionAnalysis(model)
ia.summary()
effects = ia.get_interaction_effects()

Plotting

from pymars.plots import (
    plot_basis_functions,
    plot_predictions,
    plot_residuals,
    plot_feature_importance
)

plot_basis_functions(model, X, y)
plot_feature_importance(model)

Parameter Details

max_terms

Type: int Default: 30 Range: 5–500

Maximum number of basis functions in forward pass. Higher values allow more complex models but increase fitting time.

  • Small data: 10–20

  • Medium data: 20–50

  • Large data: 50–100

max_degree

Type: int Default: 1 Range: 1–5

Maximum interaction order.

  • 1: Main effects only

  • 2: Up to 2-way interactions

  • 3: Up to 3-way interactions

penalty

Type: float Default: 3.0 Range: 2.0–5.0

GCV penalty per basis function. Controls pruning aggressiveness.

  • 2.0: Aggressive (fewer basis functions)

  • 3.0: Standard (balanced)

  • 4.0: Conservative (more basis functions)

minspan & endspan

Type: int or ‘auto’ Default: ‘auto’

Knot spacing constraints. ‘auto’ uses Friedman formulas:

\[L = \lfloor -\log_2(\alpha/N) / 2.5 \rfloor\]
\[L_e = \lceil 3 - \log_2(\alpha/N) \rceil\]

alpha

Type: float Default: 0.05 Range: 0.01–0.10

Significance level for span calculations.

standardize

Type: bool Default: True

Standardize features to mean 0, std 1. Highly recommended for numerical stability.

smooth

Type: bool Default: False

Use cubic splines instead of linear hinges. Produces smoother models with continuous 2nd derivatives.

verbose

Type: bool Default: True

Print iteration progress during fitting.

Common Patterns

Pattern 1: Additive Model

No interactions, simple fit:

model = MARS(
    max_terms=20,
    max_degree=1,  # No interactions
    penalty=2.0    # More pruning
)
model.fit(X, y)

Pattern 2: With Interactions

Allow 2-way interactions:

model = MARS(
    max_terms=40,
    max_degree=2,  # 2-way interactions
    penalty=3.0
)
model.fit(X, y)

Pattern 3: Smooth Model

Cubic splines:

model = MARS(
    max_terms=25,
    max_degree=1,
    smooth=True  # Cubic instead of linear
)
model.fit(X, y)

Pattern 4: High-Dimensional

Feature selection with many variables:

model = MARS(
    max_terms=50,   # Higher limit
    max_degree=1,   # Focus on main effects
    penalty=2.5     # Medium pruning
)
model.fit(X, y)

# See selected features
important = np.where(model.feature_importances_ > 0)[0]

Exceptions & Errors

ValueError: max_terms must be >= 2

The model needs at least 2 basis functions (constant + 1 hinge).

Solution: Increase max_terms to ≥ 2.

ValueError: max_degree must be >= 1

Interactions require at least degree 1.

Solution: Set max_degree=1.

LinAlgError: Singular matrix

Least-squares matrix is rank-deficient.

Causes:
  • Constant feature (all same value)

  • Collinear features

  • Too many basis functions

Solutions:
  • Remove constant/near-constant features

  • Standardize (default)

  • Reduce max_terms

  • Increase penalty

ConvergenceWarning

Forward pass did not reach max_terms.

Meaning: RSS improvement plateaued before max_terms.

This is normal behavior – indicates model has reached optimal complexity.

Logging & Debugging

Enable verbose output:

model = MARS(verbose=True)
model.fit(X, y)

Check GCV progression:

print(f"GCV Score: {model.gcv_score_:.6f}")
print(f"Training R²: {model.score(X, y):.6f}")

Inspect basis functions:

for i, bf in enumerate(model.basis_functions_):
    print(f"B_{i}: {bf}")
    print(f"  Coefficient: {model.coefficients_[i]:.6f}")

Performance Considerations

Fitting Time

Typical execution times on standard hardware:

Samples

Features

Max Terms

Time

100

5

20

~0.5 s

200

10

30

~2 s

500

15

40

~15 s

1000

20

50

~60 s

Memory Usage

Approximate peak memory (MB):

\[\text{Memory} \approx 8 \times N \times (d + M)\]

where N = samples, d = features, M = basis functions.

Prediction Speed

Once fitted, prediction is very fast:

import time

start = time.time()
y_pred = model.predict(X_test)  # 10,000 samples
elapsed = time.time() - start
print(f"Time: {elapsed:.3f}s, Rate: {len(X_test)/elapsed:.0f} samples/sec")

Typical: 50,000–100,000 predictions/second.