Frequently Asked Questions¶
General Questions¶
What is COLA?¶
COLA (COunterfactual explanations with Limited Actions) is a Python framework that refines counterfactual explanations by reducing the number of feature changes needed while maintaining the same outcome.
Key benefits:
Reduces feature changes by 30-70%
Works with any ML model
Compatible with various CF explainers (DiCE, DisCount, Alibi, etc.)
Provides rich visualizations
When should I use COLA?¶
Use COLA when:
You need actionable counterfactual explanations
Users complain CFs require too many changes
You want to provide minimal action plans
You need to compare different CF methods
You want theoretically grounded refinements
Installation & Setup¶
How do I install COLA?¶
pip install xai-cola
See Installation for detailed instructions.
Which Python versions are supported?¶
Python 3.8 to 3.12 are supported.
Do I need PyTorch or TensorFlow?¶
No, they’re optional. COLA works with scikit-learn by default. Install PyTorch/TensorFlow only if you’re using models from those frameworks.
Data & Models¶
What data format does COLA accept?¶
COLA accepts:
Pandas DataFrame (recommended)
NumPy arrays (must provide column names)
# Option 1: DataFrame (easiest)
data = COLAData(factual_data=df, label_column='target')
# Option 2: NumPy array
data = COLAData(
factual_data=X,
label_column='target',
column_names=['feature1', 'feature2', 'target']
)
Which ML frameworks are supported?¶
✅ Scikit-learn - All classifiers
✅ PyTorch - Neural networks
✅ TensorFlow 1.x & 2.x - Keras models
✅ Any custom model with
predict()andpredict_proba()methods
Can I use COLA with regression models?¶
Currently, COLA is designed for classification tasks only. Regression support is planned for future releases.
Does COLA work with multi-class classification?¶
Yes! COLA supports both binary and multi-class classification.
# Works for any number of classes
explainer.generate_counterfactuals(
data=data,
factual_class=2, # Any class
total_cfs=2
)
Counterfactual Generation¶
Which CF explainers can I use?¶
COLA includes:
DiCE - Instance-wise CFs
DisCount - Distributional CFs
External explainers also work:
Alibi (CounterfactualProto, etc.)
Any custom explainer that outputs DataFrames
# Use any explainer, then refine with COLA
cf_df = your_explainer.generate(...)
data.add_counterfactuals(cf_df)
sparsifier = COLA(data=data, ml_model=ml_model)
How many counterfactuals should I generate per instance?¶
Recommendation:
Start with
total_cfs=1for speedUse
total_cfs=2-3for better qualityUse
total_cfs=5+for maximum diversity
# Balance between quality and speed
explainer.generate_counterfactuals(
data=data,
total_cfs=2 # Good default
)
More CFs give COLA more options, leading to potentially better refinements.
What if no counterfactuals are found?¶
Common causes:
Too many immutable features
Too strict feature ranges
Model is very confident
Solutions:
# Solution 1: Relax constraints
explainer.generate_counterfactuals(
features_to_keep=['Age'], # Fewer immutable features
permitted_range={'Income': [0, None]} # Wider range
)
# Solution 2: Increase total_cfs
explainer.generate_counterfactuals(total_cfs=5)
# Solution 3: Use different desired_class
explainer.generate_counterfactuals(factual_class=0) # Try different class
COLA Refinement¶
What does “limited_actions” mean?¶
limited_actions specifies the maximum number of features that can be changed in the refined counterfactual.
# Allow up to 5 feature changes
refined = sparsifier.refine_counterfactuals(limited_actions=5)
Lower values = more sparse (fewer changes) Higher values = less restrictive
How do I choose the right limited_actions value?¶
Option 1: Use query_minimum_actions (recommended)
min_actions = sparsifier.query_minimum_actions()
refined = sparsifier.refine_counterfactuals(limited_actions=min_actions)
Option 2: Try different values
for k in [3, 5, 7]:
refined = sparsifier.refine_counterfactuals(limited_actions=k)
# Evaluate results...
Option 3: Domain knowledge
Based on what’s realistic for your use case (e.g., “users can change at most 3 things”).
Which matching policy should I use?¶
Recommendation: Start with ect for exploration, use ot for final results.
# Quick exploration
sparsifier.set_policy(matcher="ect", attributor="pshap")
# Best quality
sparsifier.set_policy(matcher="ot", attributor="pshap")
Can I restrict which features can be modified?¶
Yes! Use features_to_vary:
# Only these features can change
refined = sparsifier.refine_counterfactuals(
limited_actions=5,
features_to_vary=['Income', 'Duration', 'LoanAmount']
)
This is useful for:
Enforcing immutable features (age, gender)
Focusing on actionable features (income, education)
Domain-specific constraints
Visualization¶
How do I save visualizations?¶
import os
os.makedirs('./results', exist_ok=True)
# Save all visualizations
sparsifier.heatmap_direction(save_path='./results')
sparsifier.stacked_bar_chart(save_path='./results')
# Highlighted DataFrames to HTML
_, ce_style, ace_style = sparsifier.highlight_changes_final()
ce_style.to_html('./results/original_cf.html')
ace_style.to_html('./results/refined_ace.html')
What do the visualization colors mean?¶
Direction Heatmap:
🟦 Blue = Feature increased
🟧 Red/Orange = Feature decreased
⬜ White = No change
Binary Heatmap:
⬛ Black = Feature changed
⬜ White = No change
Highlighted DataFrames:
🟦 Blue background = Value increased
🟧 Orange background = Value decreased
⬜ White background = No change
How can I customize figure sizes?¶
# For papers (high resolution)
fig = sparsifier.heatmap_direction(
figsize=(10, 6),
dpi=300
)
# For presentations (larger)
fig = sparsifier.stacked_bar_chart(
figsize=(16, 10),
dpi=150
)
Performance¶
How long does COLA take to run?¶
Typical timings:
DiCE generation: 1-30 seconds (depends on data size)
COLA refinement with ECT: <1 second
COLA refinement with OT: 1-10 seconds
Visualization: <1 second
For 100 instances:
# Fast (ECT matcher)
sparsifier.set_policy(matcher="ect", attributor="pshap")
# ~2 seconds total
# Best quality (OT matcher)
sparsifier.set_policy(matcher="ot", attributor="pshap")
# ~5 seconds total
How can I speed up COLA?¶
Option 1: Use faster matcher
# ECT is much faster than OT
sparsifier.set_policy(matcher="ect", attributor="pshap")
Option 2: Reduce data size
# Process subset
data = COLAData(factual_data=df.head(100), label_column='target')
Option 3: Generate fewer CFs per instance
# 1 CF per instance (fastest)
explainer.generate_counterfactuals(total_cfs=1)
Can COLA handle large datasets?¶
Yes, but with considerations:
<1000 instances: No problem with any matcher
1000-10000 instances: Use ECT or NN matcher
>10000 instances: Batch processing recommended
# Batch processing for large datasets
import pandas as pd
batch_size = 1000
all_refined = []
for i in range(0, len(df), batch_size):
batch_df = df.iloc[i:i+batch_size]
# ... process batch ...
all_refined.append(refined)
final_refined = pd.concat(all_refined)
Troubleshooting¶
Error: “Counterfactual data not set”¶
Cause: Forgot to add counterfactuals to COLAData.
Solution:
# Must add CFs before creating COLA
data.add_counterfactuals(cf_df, with_target_column=True)
sparsifier = COLA(data=data, ml_model=ml_model) # Now works!
Error: “Must call set_policy before refining”¶
Cause: Trying to refine without setting matching policy.
Solution:
# Always set policy first
sparsifier.set_policy(matcher="ot", attributor="pshap")
refined = sparsifier.refine_counterfactuals(limited_actions=5)
Error: “Column mismatch between factual and counterfactual”¶
Cause: Counterfactual DataFrame has different columns than factual.
Solution:
# Verify columns match
print("Factual columns:", data.factual_df.columns.tolist())
print("CF columns:", cf_df.columns.tolist())
# Ensure they're the same (order doesn't matter)
assert set(data.factual_df.columns) == set(cf_df.columns)
Visualizations are blank/all white¶
Cause: No features actually changed.
Solution:
# Check if any changes occurred
factual, ce, ace = sparsifier.get_all_results(limited_actions=5)
changes = (factual != ace).sum().sum()
print(f"Total changes: {changes}")
if changes == 0:
# Increase limited_actions
refined = sparsifier.refine_counterfactuals(limited_actions=10)
Best Practices¶
What’s the recommended workflow?¶
# 1. Prepare data with numerical features specified
data = COLAData(
factual_data=df,
label_column='target',
numerical_features=['feature1', 'feature2'] # Explicit!
)
# 2. Use pipeline for model (recommended)
from sklearn.pipeline import Pipeline
pipe = Pipeline([('prep', preprocessor), ('clf', classifier)])
pipe.fit(X_train, y_train)
ml_model = Model(model=pipe, backend="sklearn")
# 3. Generate multiple CFs per instance
explainer.generate_counterfactuals(total_cfs=2)
# 4. Always add CFs before COLA
data.add_counterfactuals(cf, with_target_column=True)
# 5. Use OT for best quality
sparsifier.set_policy(matcher="ot", attributor="pshap", random_state=42)
# 6. Query minimum actions
min_actions = sparsifier.query_minimum_actions()
# 7. Refine
refined = sparsifier.refine_counterfactuals(limited_actions=min_actions)
# 8. Visualize
sparsifier.heatmap_direction(save_path='./results')
sparsifier.stacked_bar_chart(save_path='./results')
How do I ensure reproducible results?¶
# Set all random seeds
import random
import numpy as np
random.seed(42)
np.random.seed(42)
# Set random_state in COLA
sparsifier.set_policy(
matcher="ot",
attributor="pshap",
random_state=42 # Reproducibility!
)
Should I use a pipeline or separate preprocessing?¶
Recommended: Use Pipeline
# ✅ Best practice
pipe = Pipeline([
('preprocessor', preprocessor),
('classifier', classifier)
])
pipe.fit(X_train, y_train)
ml_model = Model(model=pipe, backend="sklearn")
Alternative: PreprocessorWrapper
# If you must separate them
X_train_prep = preprocessor.fit_transform(X_train)
classifier.fit(X_train_prep, y_train)
ml_model = PreprocessorWrapper(
model=classifier,
backend="sklearn",
preprocessor=preprocessor
)
Advanced Topics¶
Can I implement custom matchers?¶
Yes! Inherit from BaseCounterfactualMatchingPolicy:
from xai_cola.ce_sparsifier.policies.matching import (
BaseCounterfactualMatchingPolicy
)
class MyCustomMatcher(BaseCounterfactualMatchingPolicy):
def match(self, factual_df, cf_df):
# Your matching logic
return matching_dict
Can I use COLA with time-series data?¶
COLA is designed for tabular data. For time-series, you’d need to:
Extract tabular features from time-series
Use those features with COLA
Map refined CFs back to time-series
How does COLA compare to other CF methods?¶
COLA is a refinement method, not a generation method. It works on top of existing CF generators to make them more actionable.
Comparison:
DiCE alone: Generates diverse CFs (may require many changes)
COLA + DiCE: Refines DiCE’s CFs to require fewer changes
Result: Same or similar outcome with 30-70% fewer actions
Getting Help¶
Where can I find more examples?¶
Tutorial 1: Basic COLA Workflow - Complete tutorial
Data Interface - Detailed guides
How do I report bugs?¶
Check existing GitHub Issues
Open a new issue with: - Python version - COLA version (
print(xai_cola.__version__)) - Minimal code to reproduce - Full error message
How can I contribute?¶
See Contributing for contribution guidelines.
Who do I contact for questions?¶
Lin Zhu: s232291@student.dtu.dk
Lei You: leiyo@dtu.dk
GitHub Issues: https://github.com/understanding-ml/COLA/issues
Citation¶
If you use COLA in your research, please cite:
@article{you2024refining,
title={Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality},
author={You, Lei and Bian, Yijun and Cao, Lele},
journal={arXiv preprint arXiv:2410.05419},
year={2024}
}
See Also¶
Installation - Installation guide
Quick Start - Quick start guide
COLA API - API reference
Paper - Research paper