Frequently Asked Questions¶

General Questions¶

What is COLA?¶

COLA (COunterfactual explanations with Limited Actions) is a Python framework that refines counterfactual explanations by reducing the number of feature changes needed while maintaining the same outcome.

Key benefits:

Reduces feature changes by 30-70%
Works with any ML model
Compatible with various CF explainers (DiCE, DisCount, Alibi, etc.)
Provides rich visualizations

When should I use COLA?¶

Use COLA when:

You need actionable counterfactual explanations
Users complain CFs require too many changes
You want to provide minimal action plans
You need to compare different CF methods
You want theoretically grounded refinements

Installation & Setup¶

How do I install COLA?¶

pip install xai-cola

See Installation for detailed instructions.

Which Python versions are supported?¶

Python 3.8 to 3.12 are supported.

Do I need PyTorch or TensorFlow?¶

No, they’re optional. COLA works with scikit-learn by default. Install PyTorch/TensorFlow only if you’re using models from those frameworks.

Data & Models¶

What data format does COLA accept?¶

COLA accepts:

Pandas DataFrame (recommended)
NumPy arrays (must provide column names)

# Option 1: DataFrame (easiest)
data = COLAData(factual_data=df, label_column='target')

# Option 2: NumPy array
data = COLAData(
    factual_data=X,
    label_column='target',
    column_names=['feature1', 'feature2', 'target']
)

Which ML frameworks are supported?¶

✅ Scikit-learn - All classifiers
✅ PyTorch - Neural networks
✅ TensorFlow 1.x & 2.x - Keras models
✅ Any custom model with predict() and predict_proba() methods

Can I use COLA with regression models?¶

Currently, COLA is designed for classification tasks only. Regression support is planned for future releases.

Does COLA work with multi-class classification?¶

Yes! COLA supports both binary and multi-class classification.

# Works for any number of classes
explainer.generate_counterfactuals(
    data=data,
    factual_class=2,  # Any class
    total_cfs=2
)

Counterfactual Generation¶

Which CF explainers can I use?¶

COLA includes:

DiCE - Instance-wise CFs
DisCount - Distributional CFs

External explainers also work:

Alibi (CounterfactualProto, etc.)
Any custom explainer that outputs DataFrames

# Use any explainer, then refine with COLA
cf_df = your_explainer.generate(...)
data.add_counterfactuals(cf_df)
sparsifier = COLA(data=data, ml_model=ml_model)

How many counterfactuals should I generate per instance?¶

Recommendation:

Start with total_cfs=1 for speed
Use total_cfs=2-3 for better quality
Use total_cfs=5+ for maximum diversity

# Balance between quality and speed
explainer.generate_counterfactuals(
    data=data,
    total_cfs=2  # Good default
)

More CFs give COLA more options, leading to potentially better refinements.

What if no counterfactuals are found?¶

Common causes:

Too many immutable features
Too strict feature ranges
Model is very confident

Solutions:

# Solution 1: Relax constraints
explainer.generate_counterfactuals(
    features_to_keep=['Age'],  # Fewer immutable features
    permitted_range={'Income': [0, None]}  # Wider range
)

# Solution 2: Increase total_cfs
explainer.generate_counterfactuals(total_cfs=5)

# Solution 3: Use different desired_class
explainer.generate_counterfactuals(factual_class=0)  # Try different class

Visualization¶

How do I save visualizations?¶

import os
os.makedirs('./results', exist_ok=True)

# Save all visualizations
sparsifier.heatmap_direction(save_path='./results')
sparsifier.stacked_bar_chart(save_path='./results')

# Highlighted DataFrames to HTML
_, ce_style, ace_style = sparsifier.highlight_changes_final()
ce_style.to_html('./results/original_cf.html')
ace_style.to_html('./results/refined_ace.html')

What do the visualization colors mean?¶

Direction Heatmap:

🟦 Blue = Feature increased
🟧 Red/Orange = Feature decreased
⬜ White = No change

Binary Heatmap:

⬛ Black = Feature changed
⬜ White = No change

Highlighted DataFrames:

🟦 Blue background = Value increased
🟧 Orange background = Value decreased
⬜ White background = No change

How can I customize figure sizes?¶

# For papers (high resolution)
fig = sparsifier.heatmap_direction(
    figsize=(10, 6),
    dpi=300
)

# For presentations (larger)
fig = sparsifier.stacked_bar_chart(
    figsize=(16, 10),
    dpi=150
)

Performance¶

How long does COLA take to run?¶

Typical timings:

DiCE generation: 1-30 seconds (depends on data size)
COLA refinement with ECT: <1 second
COLA refinement with OT: 1-10 seconds
Visualization: <1 second

For 100 instances:

# Fast (ECT matcher)
sparsifier.set_policy(matcher="ect", attributor="pshap")
# ~2 seconds total

# Best quality (OT matcher)
sparsifier.set_policy(matcher="ot", attributor="pshap")
# ~5 seconds total

How can I speed up COLA?¶

Option 1: Use faster matcher

# ECT is much faster than OT
sparsifier.set_policy(matcher="ect", attributor="pshap")

Option 2: Reduce data size

# Process subset
data = COLAData(factual_data=df.head(100), label_column='target')

Option 3: Generate fewer CFs per instance

# 1 CF per instance (fastest)
explainer.generate_counterfactuals(total_cfs=1)

Can COLA handle large datasets?¶

Yes, but with considerations:

<1000 instances: No problem with any matcher
1000-10000 instances: Use ECT or NN matcher
>10000 instances: Batch processing recommended

# Batch processing for large datasets
import pandas as pd

batch_size = 1000
all_refined = []

for i in range(0, len(df), batch_size):
    batch_df = df.iloc[i:i+batch_size]
    # ... process batch ...
    all_refined.append(refined)

final_refined = pd.concat(all_refined)

Troubleshooting¶

Error: “Counterfactual data not set”¶

Cause: Forgot to add counterfactuals to COLAData.

Solution:

# Must add CFs before creating COLA
data.add_counterfactuals(cf_df, with_target_column=True)
sparsifier = COLA(data=data, ml_model=ml_model)  # Now works!

Error: “Must call set_policy before refining”¶

Cause: Trying to refine without setting matching policy.

Solution:

# Always set policy first
sparsifier.set_policy(matcher="ot", attributor="pshap")
refined = sparsifier.refine_counterfactuals(limited_actions=5)

Error: “Column mismatch between factual and counterfactual”¶

Cause: Counterfactual DataFrame has different columns than factual.

Solution:

# Verify columns match
print("Factual columns:", data.factual_df.columns.tolist())
print("CF columns:", cf_df.columns.tolist())

# Ensure they're the same (order doesn't matter)
assert set(data.factual_df.columns) == set(cf_df.columns)

Visualizations are blank/all white¶

Cause: No features actually changed.

Solution:

# Check if any changes occurred
factual, ce, ace = sparsifier.get_all_results(limited_actions=5)
changes = (factual != ace).sum().sum()
print(f"Total changes: {changes}")

if changes == 0:
    # Increase limited_actions
    refined = sparsifier.refine_counterfactuals(limited_actions=10)

Best Practices¶

What’s the recommended workflow?¶

# 1. Prepare data with numerical features specified
data = COLAData(
    factual_data=df,
    label_column='target',
    numerical_features=['feature1', 'feature2']  # Explicit!
)

# 2. Use pipeline for model (recommended)
from sklearn.pipeline import Pipeline
pipe = Pipeline([('prep', preprocessor), ('clf', classifier)])
pipe.fit(X_train, y_train)
ml_model = Model(model=pipe, backend="sklearn")

# 3. Generate multiple CFs per instance
explainer.generate_counterfactuals(total_cfs=2)

# 4. Always add CFs before COLA
data.add_counterfactuals(cf, with_target_column=True)

# 5. Use OT for best quality
sparsifier.set_policy(matcher="ot", attributor="pshap", random_state=42)

# 6. Query minimum actions
min_actions = sparsifier.query_minimum_actions()

# 7. Refine
refined = sparsifier.refine_counterfactuals(limited_actions=min_actions)

# 8. Visualize
sparsifier.heatmap_direction(save_path='./results')
sparsifier.stacked_bar_chart(save_path='./results')

How do I ensure reproducible results?¶

# Set all random seeds
import random
import numpy as np

random.seed(42)
np.random.seed(42)

# Set random_state in COLA
sparsifier.set_policy(
    matcher="ot",
    attributor="pshap",
    random_state=42  # Reproducibility!
)

Should I use a pipeline or separate preprocessing?¶

Recommended: Use Pipeline

# ✅ Best practice
pipe = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', classifier)
])
pipe.fit(X_train, y_train)
ml_model = Model(model=pipe, backend="sklearn")

Alternative: PreprocessorWrapper

# If you must separate them
X_train_prep = preprocessor.fit_transform(X_train)
classifier.fit(X_train_prep, y_train)

ml_model = PreprocessorWrapper(
    model=classifier,
    backend="sklearn",
    preprocessor=preprocessor
)

Advanced Topics¶

Can I implement custom matchers?¶

Yes! Inherit from BaseCounterfactualMatchingPolicy:

from xai_cola.ce_sparsifier.policies.matching import (
    BaseCounterfactualMatchingPolicy
)

class MyCustomMatcher(BaseCounterfactualMatchingPolicy):
    def match(self, factual_df, cf_df):
        # Your matching logic
        return matching_dict

Can I use COLA with time-series data?¶

COLA is designed for tabular data. For time-series, you’d need to:

Extract tabular features from time-series
Use those features with COLA
Map refined CFs back to time-series

How does COLA compare to other CF methods?¶

COLA is a refinement method, not a generation method. It works on top of existing CF generators to make them more actionable.

Comparison:

DiCE alone: Generates diverse CFs (may require many changes)
COLA + DiCE: Refines DiCE’s CFs to require fewer changes
Result: Same or similar outcome with 30-70% fewer actions

Getting Help¶

Where can I find more examples?¶

Tutorial 1: Basic COLA Workflow - Complete tutorial
GitHub examples/ directory
Data Interface - Detailed guides

How do I report bugs?¶

Check existing GitHub Issues
Open a new issue with: - Python version - COLA version (print(xai_cola.__version__)) - Minimal code to reproduce - Full error message

How can I contribute?¶

See Contributing for contribution guidelines.

Who do I contact for questions?¶

Lin Zhu: s232291@student.dtu.dk
Lei You: leiyo@dtu.dk
GitHub Issues: https://github.com/understanding-ml/COLA/issues

Citation¶

If you use COLA in your research, please cite:

@article{you2024refining,
  title={Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality},
  author={You, Lei and Bian, Yijun and Cao, Lele},
  journal={arXiv preprint arXiv:2410.05419},
  year={2024}
}

Frequently Asked Questions¶

General Questions¶

What is COLA?¶

When should I use COLA?¶

Installation & Setup¶

How do I install COLA?¶

Which Python versions are supported?¶

Do I need PyTorch or TensorFlow?¶

Data & Models¶

What data format does COLA accept?¶

Which ML frameworks are supported?¶

Can I use COLA with regression models?¶

Does COLA work with multi-class classification?¶

Counterfactual Generation¶

Which CF explainers can I use?¶

How many counterfactuals should I generate per instance?¶

What if no counterfactuals are found?¶

COLA Refinement¶

What does “limited_actions” mean?¶

How do I choose the right limited_actions value?¶

Which matching policy should I use?¶

Can I restrict which features can be modified?¶

Visualization¶

How do I save visualizations?¶

What do the visualization colors mean?¶

How can I customize figure sizes?¶

Performance¶

How long does COLA take to run?¶

How can I speed up COLA?¶

Can COLA handle large datasets?¶

Troubleshooting¶

Error: “Counterfactual data not set”¶

Error: “Must call set_policy before refining”¶

Error: “Column mismatch between factual and counterfactual”¶

Visualizations are blank/all white¶

Best Practices¶

What’s the recommended workflow?¶

How do I ensure reproducible results?¶

Should I use a pipeline or separate preprocessing?¶

Advanced Topics¶

Can I implement custom matchers?¶

Can I use COLA with time-series data?¶

How does COLA compare to other CF methods?¶

Getting Help¶

Where can I find more examples?¶

How do I report bugs?¶

How can I contribute?¶

Who do I contact for questions?¶

Citation¶

See Also¶