Models API¶
Module contents¶
class Model (model, backend, preprocessor=None)
Bases:
objectClass wrapper for pre-trained models with optional preprocessing pipeline.
Provides a unified interface for machine learning models from different frameworks (scikit-learn and PyTorch) with consistent
predict()andpredict_proba()methods.
- Parameters:
model (object) – Pre-trained model (sklearn or PyTorch model). Can also be a sklearn Pipeline object.
backend (str) – Model framework identifier. Options:
"sklearn"or"pytorch".preprocessor (sklearn.compose.ColumnTransformer or sklearn.pipeline.Pipeline*, **optional*) – Preprocessing pipeline to apply before model prediction. If model is already a Pipeline, this parameter will be ignored. Default is None.
- Raises:
ValueError – If backend is not one of the supported frameworks.
Note:
For sklearn models, if your model is a Pipeline, the Model class will automatically detect it and handle preprocessing internally. The preprocessor parameter is only needed when you have a separate preprocessor and classifier.
Example:
from xai_cola.ce_sparsifier.models import Model from sklearn.ensemble import GradientBoostingClassifier from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline # Option 1: Model with separate preprocessor scaler = StandardScaler() scaler.fit(X_train) clf = GradientBoostingClassifier() clf.fit(scaler.transform(X_train), y_train) model = Model(model=clf, backend="sklearn", preprocessor=scaler) # Option 2: Model with Pipeline (recommended) pipe = Pipeline([("scaler", StandardScaler()), ("clf", GradientBoostingClassifier())]) pipe.fit(X_train, y_train) model = Model(model=pipe, backend="sklearn") # Option 3: PyTorch model import torch.nn as nn class NeuralNet(nn.Module): def __init__(self): super().__init__() self.layers = nn.Sequential( nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 2) ) def forward(self, x): return self.layers(x) torch_model = NeuralNet() # ... train model ... model = Model(model=torch_model, backend="pytorch")predict (x_factual)
Generate predictions using the pre-trained model.
- Parameters:
x_factual (np.ndarray or pd.DataFrame) – Input data for prediction (raw data)
- Returns:
Predictions
- Return type:
np.ndarray
Example:
model = Model(model=pipe, backend="sklearn") predictions = model.predict(X_test)predict_proba (X)
Predict probability function that returns the probability distribution for each class.
- Parameters:
X (np.ndarray or pd.DataFrame) – Input data for which to predict probabilities (raw data)
- Returns:
Probability of positive class (class 1) for binary classification
- Return type:
np.ndarray
Example:
model = Model(model=pipe, backend="sklearn") probabilities = model.predict_proba(X_test)to (device)
Move PyTorch model to specified device.
- Parameters:
device (str or torch.device) – Device to move model to (e.g., ‘cuda’, ‘cpu’)
- Returns:
Returns self for method chaining
- Return type:
Model
- Raises:
NotImplementedError – If backend is not ‘pytorch’
Example:
model = Model(model=torch_model, backend="pytorch") model.to("cuda") # Move to GPU predictions = model.predict(X_test)__call__ (x)
Call the model directly (mainly for PyTorch models).
- Parameters:
x (torch.Tensor or np.ndarray) – Input data
- Returns:
Model output
- Return type:
torch.Tensor or np.ndarray
- Raises:
NotImplementedError – If backend is not ‘pytorch’
Example:
model = Model(model=torch_model, backend="pytorch") import torch x = torch.randn(32, 10) output = model(x) # Direct call
Supported Backends¶
COLA supports the following machine learning frameworks:
Scikit-learn¶
Backend string: "sklearn"
Requirements:
scikit-learn installed
Model must have
predict()andpredict_proba()methods
Supported Model Types:
All scikit-learn classifiers (RandomForest, GradientBoosting, LogisticRegression, etc.)
Scikit-learn Pipeline objects (automatically detected and handled)
Any custom classifier implementing the scikit-learn interface
Key Features:
Automatic Pipeline detection
Handles both raw features and preprocessed data
Supports separate preprocessor parameter for non-Pipeline models
PyTorch¶
Backend string: "pytorch"
Requirements:
PyTorch installed
Model must be a
torch.nn.ModuleModel should output logits or probabilities for classification
Key Features:
GPU support via
.to(device)methodAutomatic tensor conversion from numpy arrays/pandas DataFrames
Direct model call support via
__call__()methodHandles NaN values in predictions
Note:
For PyTorch models, the Model class automatically:
Converts input to torch.FloatTensor
Sets model to evaluation mode
Handles device placement (CPU/GPU)
Applies softmax for probability predictions
Examples¶
Basic Usage with sklearn Pipeline¶
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from xai_cola.ce_sparsifier.models import Model
# Create and train pipeline
pipe = Pipeline([
('scaler', StandardScaler()),
('clf', RandomForestClassifier())
])
pipe.fit(X_train, y_train)
# Wrap for COLA
ml_model = Model(model=pipe, backend="sklearn")
# Use with COLA
from xai_cola import COLA
from xai_cola.ce_sparsifier.data import COLAData
data = COLAData(factual_data=df, label_column='Risk')
data.add_counterfactuals(cf_df)
cola = COLA(data=data, ml_model=ml_model)
sklearn with Separate Preprocessor¶
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from xai_cola.ce_sparsifier.models import Model
# Fit preprocessor
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
# Train classifier on scaled data
clf = LogisticRegression()
clf.fit(X_train_scaled, y_train)
# Wrap both
ml_model = Model(
model=clf,
backend="sklearn",
preprocessor=scaler
)
# Automatically scales input before prediction
predictions = ml_model.predict(X_test) # X_test is raw data
sklearn with ColumnTransformer¶
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.ensemble import GradientBoostingClassifier
from xai_cola.ce_sparsifier.models import Model
# Define custom preprocessor
preprocessor = ColumnTransformer([
('num', StandardScaler(), ['Age', 'Income']),
('cat', OrdinalEncoder(), ['Gender', 'Education'])
])
preprocessor.fit(X_train)
# Train classifier on transformed data
X_train_transformed = preprocessor.transform(X_train)
clf = GradientBoostingClassifier()
clf.fit(X_train_transformed, y_train)
# Wrap both
ml_model = Model(
model=clf,
backend="sklearn",
preprocessor=preprocessor
)
# Use with raw data
predictions = ml_model.predict(X_test)
PyTorch Neural Network (CPU)¶
import torch
import torch.nn as nn
from xai_cola.ce_sparsifier.models import Model
class NeuralNet(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.layers = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 2)
)
def forward(self, x):
return self.layers(x)
# Train model
model = NeuralNet(input_dim=10)
# ... training code ...
# Wrap for COLA
ml_model = Model(model=model, backend="pytorch")
# Predictions work with numpy arrays or DataFrames
predictions = ml_model.predict(X_test)
probabilities = ml_model.predict_proba(X_test)
PyTorch Neural Network (GPU)¶
import torch
import torch.nn as nn
from xai_cola.ce_sparsifier.models import Model
# Define and train model
model = NeuralNet(input_dim=10)
# ... training code on GPU ...
# Wrap for COLA
ml_model = Model(model=model, backend="pytorch")
# Move to GPU
ml_model.to("cuda")
# Predictions automatically handle device placement
predictions = ml_model.predict(X_test)
# Or use directly with tensors
x_tensor = torch.randn(32, 10).to("cuda")
output = ml_model(x_tensor)
PyTorch with Preprocessing¶
import torch
import torch.nn as nn
from sklearn.preprocessing import StandardScaler
from xai_cola.ce_sparsifier.models import Model
# Train preprocessor
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
# Train PyTorch model on scaled data
model = NeuralNet(input_dim=X_train_scaled.shape[1])
# ... training code using X_train_scaled ...
# Wrap both
ml_model = Model(
model=model,
backend="pytorch",
preprocessor=scaler
)
# Automatically scales input before prediction
predictions = ml_model.predict(X_test) # X_test is raw data
Comparing Different Model Types¶
from xai_cola import COLA
from xai_cola.ce_sparsifier.data import COLAData
from xai_cola.ce_sparsifier.models import Model
# Prepare data
data = COLAData(factual_data=df, label_column='Risk')
data.add_counterfactuals(cf_df)
# Scikit-learn model
sklearn_model = Model(model=sklearn_pipe, backend="sklearn")
cola_sklearn = COLA(data=data, ml_model=sklearn_model)
sklearn_results = cola_sklearn.get_refined_counterfactual(limited_actions=10)
# PyTorch model
pytorch_model = Model(model=torch_model, backend="pytorch")
cola_pytorch = COLA(data=data, ml_model=pytorch_model)
pytorch_results = cola_pytorch.get_refined_counterfactual(limited_actions=10)
# Both work seamlessly with COLA!