Models API¶

Module contents¶

class Model (model, backend, preprocessor=None)

Bases: object

Class wrapper for pre-trained models with optional preprocessing pipeline.

Provides a unified interface for machine learning models from different frameworks (scikit-learn and PyTorch) with consistent predict() and predict_proba() methods.

Parameters:

model (object) – Pre-trained model (sklearn or PyTorch model). Can also be a sklearn Pipeline object.

backend (str) – Model framework identifier. Options: "sklearn" or "pytorch".

preprocessor (sklearn.compose.ColumnTransformer or sklearn.pipeline.Pipeline*, **optional*) – Preprocessing pipeline to apply before model prediction. If model is already a Pipeline, this parameter will be ignored. Default is None.

Raises:
ValueError – If backend is not one of the supported frameworks.

Note:

For sklearn models, if your model is a Pipeline, the Model class will automatically detect it and handle preprocessing internally. The preprocessor parameter is only needed when you have a separate preprocessor and classifier.

Example:
from xai_cola.ce_sparsifier.models import Model
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Option 1: Model with separate preprocessor
scaler = StandardScaler()
scaler.fit(X_train)
clf = GradientBoostingClassifier()
clf.fit(scaler.transform(X_train), y_train)

model = Model(model=clf, backend="sklearn", preprocessor=scaler)

# Option 2: Model with Pipeline (recommended)
pipe = Pipeline([("scaler", StandardScaler()), ("clf", GradientBoostingClassifier())])
pipe.fit(X_train, y_train)

model = Model(model=pipe, backend="sklearn")

# Option 3: PyTorch model
import torch.nn as nn

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 2)
        )

    def forward(self, x):
        return self.layers(x)

torch_model = NeuralNet()
# ... train model ...

model = Model(model=torch_model, backend="pytorch")
predict (x_factual)
Generate predictions using the pre-trained model.

Parameters:
x_factual (np.ndarray or pd.DataFrame) – Input data for prediction (raw data)

Returns:
Predictions

Return type:
np.ndarray

Example:
model = Model(model=pipe, backend="sklearn")
predictions = model.predict(X_test)
predict_proba (X)
Predict probability function that returns the probability distribution for each class.

Parameters:
X (np.ndarray or pd.DataFrame) – Input data for which to predict probabilities (raw data)

Returns:
Probability of positive class (class 1) for binary classification

Return type:
np.ndarray

Example:
model = Model(model=pipe, backend="sklearn")
probabilities = model.predict_proba(X_test)
to (device)
Move PyTorch model to specified device.

Parameters:
device (str or torch.device) – Device to move model to (e.g., ‘cuda’, ‘cpu’)

Returns:
Returns self for method chaining

Return type:
Model

Raises:
NotImplementedError – If backend is not ‘pytorch’

Example:
model = Model(model=torch_model, backend="pytorch")
model.to("cuda")  # Move to GPU
predictions = model.predict(X_test)
__call__ (x)
Call the model directly (mainly for PyTorch models).

Parameters:
x (torch.Tensor or np.ndarray) – Input data

Returns:
Model output

Return type:
torch.Tensor or np.ndarray

Raises:
NotImplementedError – If backend is not ‘pytorch’

Example:
model = Model(model=torch_model, backend="pytorch")
import torch
x = torch.randn(32, 10)
output = model(x)  # Direct call

Supported Backends¶

COLA supports the following machine learning frameworks:

Scikit-learn¶

Backend string: "sklearn"

Requirements:

scikit-learn installed
Model must have predict() and predict_proba() methods

Supported Model Types:

All scikit-learn classifiers (RandomForest, GradientBoosting, LogisticRegression, etc.)
Scikit-learn Pipeline objects (automatically detected and handled)
Any custom classifier implementing the scikit-learn interface

Key Features:

Automatic Pipeline detection
Handles both raw features and preprocessed data
Supports separate preprocessor parameter for non-Pipeline models

PyTorch¶

Backend string: "pytorch"

Requirements:

PyTorch installed
Model must be a torch.nn.Module
Model should output logits or probabilities for classification

Key Features:

GPU support via .to(device) method
Automatic tensor conversion from numpy arrays/pandas DataFrames
Direct model call support via __call__() method
Handles NaN values in predictions

Note:

For PyTorch models, the Model class automatically:

Converts input to torch.FloatTensor
Sets model to evaluation mode
Handles device placement (CPU/GPU)
Applies softmax for probability predictions

Examples¶

Basic Usage with sklearn Pipeline¶

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from xai_cola.ce_sparsifier.models import Model

# Create and train pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier())
])
pipe.fit(X_train, y_train)

# Wrap for COLA
ml_model = Model(model=pipe, backend="sklearn")

# Use with COLA
from xai_cola import COLA
from xai_cola.ce_sparsifier.data import COLAData

data = COLAData(factual_data=df, label_column='Risk')
data.add_counterfactuals(cf_df)

cola = COLA(data=data, ml_model=ml_model)

sklearn with Separate Preprocessor¶

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from xai_cola.ce_sparsifier.models import Model

# Fit preprocessor
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Train classifier on scaled data
clf = LogisticRegression()
clf.fit(X_train_scaled, y_train)

# Wrap both
ml_model = Model(
    model=clf,
    backend="sklearn",
    preprocessor=scaler
)

# Automatically scales input before prediction
predictions = ml_model.predict(X_test)  # X_test is raw data

sklearn with ColumnTransformer¶

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from sklearn.ensemble import GradientBoostingClassifier
from xai_cola.ce_sparsifier.models import Model

# Define custom preprocessor
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), ['Age', 'Income']),
    ('cat', OrdinalEncoder(), ['Gender', 'Education'])
])
preprocessor.fit(X_train)

# Train classifier on transformed data
X_train_transformed = preprocessor.transform(X_train)
clf = GradientBoostingClassifier()
clf.fit(X_train_transformed, y_train)

# Wrap both
ml_model = Model(
    model=clf,
    backend="sklearn",
    preprocessor=preprocessor
)

# Use with raw data
predictions = ml_model.predict(X_test)

PyTorch Neural Network (CPU)¶

import torch
import torch.nn as nn
from xai_cola.ce_sparsifier.models import Model

class NeuralNet(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 2)
        )

    def forward(self, x):
        return self.layers(x)

# Train model
model = NeuralNet(input_dim=10)
# ... training code ...

# Wrap for COLA
ml_model = Model(model=model, backend="pytorch")

# Predictions work with numpy arrays or DataFrames
predictions = ml_model.predict(X_test)
probabilities = ml_model.predict_proba(X_test)

PyTorch Neural Network (GPU)¶

import torch
import torch.nn as nn
from xai_cola.ce_sparsifier.models import Model

# Define and train model
model = NeuralNet(input_dim=10)
# ... training code on GPU ...

# Wrap for COLA
ml_model = Model(model=model, backend="pytorch")

# Move to GPU
ml_model.to("cuda")

# Predictions automatically handle device placement
predictions = ml_model.predict(X_test)

# Or use directly with tensors
x_tensor = torch.randn(32, 10).to("cuda")
output = ml_model(x_tensor)

PyTorch with Preprocessing¶

import torch
import torch.nn as nn
from sklearn.preprocessing import StandardScaler
from xai_cola.ce_sparsifier.models import Model

# Train preprocessor
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

# Train PyTorch model on scaled data
model = NeuralNet(input_dim=X_train_scaled.shape[1])
# ... training code using X_train_scaled ...

# Wrap both
ml_model = Model(
    model=model,
    backend="pytorch",
    preprocessor=scaler
)

# Automatically scales input before prediction
predictions = ml_model.predict(X_test)  # X_test is raw data

Comparing Different Model Types¶

from xai_cola import COLA
from xai_cola.ce_sparsifier.data import COLAData
from xai_cola.ce_sparsifier.models import Model

# Prepare data
data = COLAData(factual_data=df, label_column='Risk')
data.add_counterfactuals(cf_df)

# Scikit-learn model
sklearn_model = Model(model=sklearn_pipe, backend="sklearn")
cola_sklearn = COLA(data=data, ml_model=sklearn_model)
sklearn_results = cola_sklearn.get_refined_counterfactual(limited_actions=10)

# PyTorch model
pytorch_model = Model(model=torch_model, backend="pytorch")
cola_pytorch = COLA(data=data, ml_model=pytorch_model)
pytorch_results = cola_pytorch.get_refined_counterfactual(limited_actions=10)

# Both work seamlessly with COLA!