Frequently Asked Questions

General Questions

What is COLA?

COLA (COunterfactual explanations with Limited Actions) is a Python framework that refines counterfactual explanations by reducing the number of feature changes needed while maintaining the same outcome.

Key benefits:

  • Reduces feature changes by 30-70%

  • Works with any ML model

  • Compatible with various CF explainers (DiCE, DisCount, Alibi, etc.)

  • Provides rich visualizations

When should I use COLA?

Use COLA when:

  • You need actionable counterfactual explanations

  • Users complain CFs require too many changes

  • You want to provide minimal action plans

  • You need to compare different CF methods

  • You want theoretically grounded refinements

Installation & Setup

How do I install COLA?

pip install xai-cola

See Installation for detailed instructions.

Which Python versions are supported?

Python 3.8 to 3.12 are supported.

Do I need PyTorch or TensorFlow?

No, they’re optional. COLA works with scikit-learn by default. Install PyTorch/TensorFlow only if you’re using models from those frameworks.

Data & Models

What data format does COLA accept?

COLA accepts:

  • Pandas DataFrame (recommended)

  • NumPy arrays (must provide column names)

# Option 1: DataFrame (easiest)
data = COLAData(factual_data=df, label_column='target')

# Option 2: NumPy array
data = COLAData(
    factual_data=X,
    label_column='target',
    column_names=['feature1', 'feature2', 'target']
)

Which ML frameworks are supported?

  • Scikit-learn - All classifiers

  • PyTorch - Neural networks

  • TensorFlow 1.x & 2.x - Keras models

  • Any custom model with predict() and predict_proba() methods

Can I use COLA with regression models?

Currently, COLA is designed for classification tasks only. Regression support is planned for future releases.

Does COLA work with multi-class classification?

Yes! COLA supports both binary and multi-class classification.

# Works for any number of classes
explainer.generate_counterfactuals(
    data=data,
    factual_class=2,  # Any class
    total_cfs=2
)

Counterfactual Generation

Which CF explainers can I use?

COLA includes:

  • DiCE - Instance-wise CFs

  • DisCount - Distributional CFs

External explainers also work:

  • Alibi (CounterfactualProto, etc.)

  • Any custom explainer that outputs DataFrames

# Use any explainer, then refine with COLA
cf_df = your_explainer.generate(...)
data.add_counterfactuals(cf_df)
sparsifier = COLA(data=data, ml_model=ml_model)

How many counterfactuals should I generate per instance?

Recommendation:

  • Start with total_cfs=1 for speed

  • Use total_cfs=2-3 for better quality

  • Use total_cfs=5+ for maximum diversity

# Balance between quality and speed
explainer.generate_counterfactuals(
    data=data,
    total_cfs=2  # Good default
)

More CFs give COLA more options, leading to potentially better refinements.

What if no counterfactuals are found?

Common causes:

  1. Too many immutable features

  2. Too strict feature ranges

  3. Model is very confident

Solutions:

# Solution 1: Relax constraints
explainer.generate_counterfactuals(
    features_to_keep=['Age'],  # Fewer immutable features
    permitted_range={'Income': [0, None]}  # Wider range
)

# Solution 2: Increase total_cfs
explainer.generate_counterfactuals(total_cfs=5)

# Solution 3: Use different desired_class
explainer.generate_counterfactuals(factual_class=0)  # Try different class

COLA Refinement

What does “limited_actions” mean?

limited_actions specifies the maximum number of features that can be changed in the refined counterfactual.

# Allow up to 5 feature changes
refined = sparsifier.refine_counterfactuals(limited_actions=5)

Lower values = more sparse (fewer changes) Higher values = less restrictive

How do I choose the right limited_actions value?

Option 1: Use query_minimum_actions (recommended)

min_actions = sparsifier.query_minimum_actions()
refined = sparsifier.refine_counterfactuals(limited_actions=min_actions)

Option 2: Try different values

for k in [3, 5, 7]:
    refined = sparsifier.refine_counterfactuals(limited_actions=k)
    # Evaluate results...

Option 3: Domain knowledge

Based on what’s realistic for your use case (e.g., “users can change at most 3 things”).

Which matching policy should I use?

Recommendation: Start with ect for exploration, use ot for final results.

# Quick exploration
sparsifier.set_policy(matcher="ect", attributor="pshap")

# Best quality
sparsifier.set_policy(matcher="ot", attributor="pshap")

Can I restrict which features can be modified?

Yes! Use features_to_vary:

# Only these features can change
refined = sparsifier.refine_counterfactuals(
    limited_actions=5,
    features_to_vary=['Income', 'Duration', 'LoanAmount']
)

This is useful for:

  • Enforcing immutable features (age, gender)

  • Focusing on actionable features (income, education)

  • Domain-specific constraints

Visualization

How do I save visualizations?

import os
os.makedirs('./results', exist_ok=True)

# Save all visualizations
sparsifier.heatmap_direction(save_path='./results')
sparsifier.stacked_bar_chart(save_path='./results')

# Highlighted DataFrames to HTML
_, ce_style, ace_style = sparsifier.highlight_changes_final()
ce_style.to_html('./results/original_cf.html')
ace_style.to_html('./results/refined_ace.html')

What do the visualization colors mean?

Direction Heatmap:

  • 🟦 Blue = Feature increased

  • 🟧 Red/Orange = Feature decreased

  • ⬜ White = No change

Binary Heatmap:

  • ⬛ Black = Feature changed

  • ⬜ White = No change

Highlighted DataFrames:

  • 🟦 Blue background = Value increased

  • 🟧 Orange background = Value decreased

  • ⬜ White background = No change

How can I customize figure sizes?

# For papers (high resolution)
fig = sparsifier.heatmap_direction(
    figsize=(10, 6),
    dpi=300
)

# For presentations (larger)
fig = sparsifier.stacked_bar_chart(
    figsize=(16, 10),
    dpi=150
)

Performance

How long does COLA take to run?

Typical timings:

  • DiCE generation: 1-30 seconds (depends on data size)

  • COLA refinement with ECT: <1 second

  • COLA refinement with OT: 1-10 seconds

  • Visualization: <1 second

For 100 instances:

# Fast (ECT matcher)
sparsifier.set_policy(matcher="ect", attributor="pshap")
# ~2 seconds total

# Best quality (OT matcher)
sparsifier.set_policy(matcher="ot", attributor="pshap")
# ~5 seconds total

How can I speed up COLA?

Option 1: Use faster matcher

# ECT is much faster than OT
sparsifier.set_policy(matcher="ect", attributor="pshap")

Option 2: Reduce data size

# Process subset
data = COLAData(factual_data=df.head(100), label_column='target')

Option 3: Generate fewer CFs per instance

# 1 CF per instance (fastest)
explainer.generate_counterfactuals(total_cfs=1)

Can COLA handle large datasets?

Yes, but with considerations:

  • <1000 instances: No problem with any matcher

  • 1000-10000 instances: Use ECT or NN matcher

  • >10000 instances: Batch processing recommended

# Batch processing for large datasets
import pandas as pd

batch_size = 1000
all_refined = []

for i in range(0, len(df), batch_size):
    batch_df = df.iloc[i:i+batch_size]
    # ... process batch ...
    all_refined.append(refined)

final_refined = pd.concat(all_refined)

Troubleshooting

Error: “Counterfactual data not set”

Cause: Forgot to add counterfactuals to COLAData.

Solution:

# Must add CFs before creating COLA
data.add_counterfactuals(cf_df, with_target_column=True)
sparsifier = COLA(data=data, ml_model=ml_model)  # Now works!

Error: “Must call set_policy before refining”

Cause: Trying to refine without setting matching policy.

Solution:

# Always set policy first
sparsifier.set_policy(matcher="ot", attributor="pshap")
refined = sparsifier.refine_counterfactuals(limited_actions=5)

Error: “Column mismatch between factual and counterfactual”

Cause: Counterfactual DataFrame has different columns than factual.

Solution:

# Verify columns match
print("Factual columns:", data.factual_df.columns.tolist())
print("CF columns:", cf_df.columns.tolist())

# Ensure they're the same (order doesn't matter)
assert set(data.factual_df.columns) == set(cf_df.columns)

Visualizations are blank/all white

Cause: No features actually changed.

Solution:

# Check if any changes occurred
factual, ce, ace = sparsifier.get_all_results(limited_actions=5)
changes = (factual != ace).sum().sum()
print(f"Total changes: {changes}")

if changes == 0:
    # Increase limited_actions
    refined = sparsifier.refine_counterfactuals(limited_actions=10)

Best Practices

How do I ensure reproducible results?

# Set all random seeds
import random
import numpy as np

random.seed(42)
np.random.seed(42)

# Set random_state in COLA
sparsifier.set_policy(
    matcher="ot",
    attributor="pshap",
    random_state=42  # Reproducibility!
)

Should I use a pipeline or separate preprocessing?

Recommended: Use Pipeline

# ✅ Best practice
pipe = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', classifier)
])
pipe.fit(X_train, y_train)
ml_model = Model(model=pipe, backend="sklearn")

Alternative: PreprocessorWrapper

# If you must separate them
X_train_prep = preprocessor.fit_transform(X_train)
classifier.fit(X_train_prep, y_train)

ml_model = PreprocessorWrapper(
    model=classifier,
    backend="sklearn",
    preprocessor=preprocessor
)

Advanced Topics

Can I implement custom matchers?

Yes! Inherit from BaseCounterfactualMatchingPolicy:

from xai_cola.ce_sparsifier.policies.matching import (
    BaseCounterfactualMatchingPolicy
)

class MyCustomMatcher(BaseCounterfactualMatchingPolicy):
    def match(self, factual_df, cf_df):
        # Your matching logic
        return matching_dict

Can I use COLA with time-series data?

COLA is designed for tabular data. For time-series, you’d need to:

  1. Extract tabular features from time-series

  2. Use those features with COLA

  3. Map refined CFs back to time-series

How does COLA compare to other CF methods?

COLA is a refinement method, not a generation method. It works on top of existing CF generators to make them more actionable.

Comparison:

  • DiCE alone: Generates diverse CFs (may require many changes)

  • COLA + DiCE: Refines DiCE’s CFs to require fewer changes

  • Result: Same or similar outcome with 30-70% fewer actions

Getting Help

Where can I find more examples?

How do I report bugs?

  1. Check existing GitHub Issues

  2. Open a new issue with: - Python version - COLA version (print(xai_cola.__version__)) - Minimal code to reproduce - Full error message

How can I contribute?

See Contributing for contribution guidelines.

Who do I contact for questions?

Citation

If you use COLA in your research, please cite:

@article{you2024refining,
  title={Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality},
  author={You, Lei and Bian, Yijun and Cao, Lele},
  journal={arXiv preprint arXiv:2410.05419},
  year={2024}
}

See Also