========================== Frequently Asked Questions ========================== General Questions ================= What is COLA? ------------- COLA (COunterfactual explanations with Limited Actions) is a Python framework that refines counterfactual explanations by reducing the number of feature changes needed while maintaining the same outcome. **Key benefits:** - Reduces feature changes by 30-70% - Works with any ML model - Compatible with various CF explainers (DiCE, DisCount, Alibi, etc.) - Provides rich visualizations When should I use COLA? ------------------------ Use COLA when: - You need **actionable** counterfactual explanations - Users complain CFs require **too many changes** - You want to provide **minimal action plans** - You need to **compare** different CF methods - You want **theoretically grounded** refinements Installation & Setup ==================== How do I install COLA? ----------------------- .. code-block:: bash pip install xai-cola See :doc:`installation` for detailed instructions. Which Python versions are supported? ------------------------------------- Python 3.8 to 3.12 are supported. Do I need PyTorch or TensorFlow? --------------------------------- No, they're optional. COLA works with scikit-learn by default. Install PyTorch/TensorFlow only if you're using models from those frameworks. Data & Models ============= What data format does COLA accept? ----------------------------------- COLA accepts: - **Pandas DataFrame** (recommended) - **NumPy arrays** (must provide column names) .. code-block:: python # Option 1: DataFrame (easiest) data = COLAData(factual_data=df, label_column='target') # Option 2: NumPy array data = COLAData( factual_data=X, label_column='target', column_names=['feature1', 'feature2', 'target'] ) Which ML frameworks are supported? ----------------------------------- - ✅ **Scikit-learn** - All classifiers - ✅ **PyTorch** - Neural networks - ✅ **TensorFlow 1.x & 2.x** - Keras models - ✅ **Any custom model** with ``predict()`` and ``predict_proba()`` methods Can I use COLA with regression models? --------------------------------------- Currently, COLA is designed for **classification** tasks only. Regression support is planned for future releases. Does COLA work with multi-class classification? ------------------------------------------------ Yes! COLA supports both binary and multi-class classification. .. code-block:: python # Works for any number of classes explainer.generate_counterfactuals( data=data, factual_class=2, # Any class total_cfs=2 ) Counterfactual Generation ========================== Which CF explainers can I use? ------------------------------- COLA includes: - **DiCE** - Instance-wise CFs - **DisCount** - Distributional CFs External explainers also work: - Alibi (CounterfactualProto, etc.) - Any custom explainer that outputs DataFrames .. code-block:: python # Use any explainer, then refine with COLA cf_df = your_explainer.generate(...) data.add_counterfactuals(cf_df) sparsifier = COLA(data=data, ml_model=ml_model) How many counterfactuals should I generate per instance? --------------------------------------------------------- **Recommendation:** - Start with ``total_cfs=1`` for speed - Use ``total_cfs=2-3`` for better quality - Use ``total_cfs=5+`` for maximum diversity .. code-block:: python # Balance between quality and speed explainer.generate_counterfactuals( data=data, total_cfs=2 # Good default ) More CFs give COLA more options, leading to potentially better refinements. What if no counterfactuals are found? -------------------------------------- **Common causes:** 1. Too many immutable features 2. Too strict feature ranges 3. Model is very confident **Solutions:** .. code-block:: python # Solution 1: Relax constraints explainer.generate_counterfactuals( features_to_keep=['Age'], # Fewer immutable features permitted_range={'Income': [0, None]} # Wider range ) # Solution 2: Increase total_cfs explainer.generate_counterfactuals(total_cfs=5) # Solution 3: Use different desired_class explainer.generate_counterfactuals(factual_class=0) # Try different class COLA Refinement =============== What does "limited_actions" mean? ---------------------------------- ``limited_actions`` specifies the maximum number of features that can be changed in the refined counterfactual. .. code-block:: python # Allow up to 5 feature changes refined = sparsifier.refine_counterfactuals(limited_actions=5) Lower values = more sparse (fewer changes) Higher values = less restrictive How do I choose the right limited_actions value? ------------------------------------------------- **Option 1: Use query_minimum_actions (recommended)** .. code-block:: python min_actions = sparsifier.query_minimum_actions() refined = sparsifier.refine_counterfactuals(limited_actions=min_actions) **Option 2: Try different values** .. code-block:: python for k in [3, 5, 7]: refined = sparsifier.refine_counterfactuals(limited_actions=k) # Evaluate results... **Option 3: Domain knowledge** Based on what's realistic for your use case (e.g., "users can change at most 3 things"). Which matching policy should I use? ------------------------------------ +----------------+------------------+--------------------+ | Matcher | When to Use | Speed | +================+==================+====================+ | **ot** | Best quality | Slower (O(n³)) | +----------------+------------------+--------------------+ | **ect** | Fast, binary | Fast (O(n)) | +----------------+------------------+--------------------+ | **nn** | Prototyping | Very fast (O(n log n)) | +----------------+------------------+--------------------+ | **softcem** | Probabilistic | Medium | +----------------+------------------+--------------------+ **Recommendation:** Start with **ect** for exploration, use **ot** for final results. .. code-block:: python # Quick exploration sparsifier.set_policy(matcher="ect", attributor="pshap") # Best quality sparsifier.set_policy(matcher="ot", attributor="pshap") Can I restrict which features can be modified? ----------------------------------------------- Yes! Use ``features_to_vary``: .. code-block:: python # Only these features can change refined = sparsifier.refine_counterfactuals( limited_actions=5, features_to_vary=['Income', 'Duration', 'LoanAmount'] ) This is useful for: - Enforcing immutable features (age, gender) - Focusing on actionable features (income, education) - Domain-specific constraints Visualization ============= How do I save visualizations? ------------------------------ .. code-block:: python import os os.makedirs('./results', exist_ok=True) # Save all visualizations sparsifier.heatmap_direction(save_path='./results') sparsifier.stacked_bar_chart(save_path='./results') # Highlighted DataFrames to HTML _, ce_style, ace_style = sparsifier.highlight_changes_final() ce_style.to_html('./results/original_cf.html') ace_style.to_html('./results/refined_ace.html') What do the visualization colors mean? --------------------------------------- **Direction Heatmap:** - 🟦 Blue = Feature increased - 🟧 Red/Orange = Feature decreased - ⬜ White = No change **Binary Heatmap:** - ⬛ Black = Feature changed - ⬜ White = No change **Highlighted DataFrames:** - 🟦 Blue background = Value increased - 🟧 Orange background = Value decreased - ⬜ White background = No change How can I customize figure sizes? ---------------------------------- .. code-block:: python # For papers (high resolution) fig = sparsifier.heatmap_direction( figsize=(10, 6), dpi=300 ) # For presentations (larger) fig = sparsifier.stacked_bar_chart( figsize=(16, 10), dpi=150 ) Performance =========== How long does COLA take to run? -------------------------------- **Typical timings:** - DiCE generation: 1-30 seconds (depends on data size) - COLA refinement with ECT: <1 second - COLA refinement with OT: 1-10 seconds - Visualization: <1 second **For 100 instances:** .. code-block:: python # Fast (ECT matcher) sparsifier.set_policy(matcher="ect", attributor="pshap") # ~2 seconds total # Best quality (OT matcher) sparsifier.set_policy(matcher="ot", attributor="pshap") # ~5 seconds total How can I speed up COLA? ------------------------- **Option 1: Use faster matcher** .. code-block:: python # ECT is much faster than OT sparsifier.set_policy(matcher="ect", attributor="pshap") **Option 2: Reduce data size** .. code-block:: python # Process subset data = COLAData(factual_data=df.head(100), label_column='target') **Option 3: Generate fewer CFs per instance** .. code-block:: python # 1 CF per instance (fastest) explainer.generate_counterfactuals(total_cfs=1) Can COLA handle large datasets? -------------------------------- Yes, but with considerations: - **<1000 instances**: No problem with any matcher - **1000-10000 instances**: Use ECT or NN matcher - **>10000 instances**: Batch processing recommended .. code-block:: python # Batch processing for large datasets import pandas as pd batch_size = 1000 all_refined = [] for i in range(0, len(df), batch_size): batch_df = df.iloc[i:i+batch_size] # ... process batch ... all_refined.append(refined) final_refined = pd.concat(all_refined) Troubleshooting =============== Error: "Counterfactual data not set" ------------------------------------- **Cause:** Forgot to add counterfactuals to COLAData. **Solution:** .. code-block:: python # Must add CFs before creating COLA data.add_counterfactuals(cf_df, with_target_column=True) sparsifier = COLA(data=data, ml_model=ml_model) # Now works! Error: "Must call set_policy before refining" ---------------------------------------------- **Cause:** Trying to refine without setting matching policy. **Solution:** .. code-block:: python # Always set policy first sparsifier.set_policy(matcher="ot", attributor="pshap") refined = sparsifier.refine_counterfactuals(limited_actions=5) Error: "Column mismatch between factual and counterfactual" ------------------------------------------------------------ **Cause:** Counterfactual DataFrame has different columns than factual. **Solution:** .. code-block:: python # Verify columns match print("Factual columns:", data.factual_df.columns.tolist()) print("CF columns:", cf_df.columns.tolist()) # Ensure they're the same (order doesn't matter) assert set(data.factual_df.columns) == set(cf_df.columns) Visualizations are blank/all white ----------------------------------- **Cause:** No features actually changed. **Solution:** .. code-block:: python # Check if any changes occurred factual, ce, ace = sparsifier.get_all_results(limited_actions=5) changes = (factual != ace).sum().sum() print(f"Total changes: {changes}") if changes == 0: # Increase limited_actions refined = sparsifier.refine_counterfactuals(limited_actions=10) Best Practices ============== What's the recommended workflow? --------------------------------- .. code-block:: python # 1. Prepare data with numerical features specified data = COLAData( factual_data=df, label_column='target', numerical_features=['feature1', 'feature2'] # Explicit! ) # 2. Use pipeline for model (recommended) from sklearn.pipeline import Pipeline pipe = Pipeline([('prep', preprocessor), ('clf', classifier)]) pipe.fit(X_train, y_train) ml_model = Model(model=pipe, backend="sklearn") # 3. Generate multiple CFs per instance explainer.generate_counterfactuals(total_cfs=2) # 4. Always add CFs before COLA data.add_counterfactuals(cf, with_target_column=True) # 5. Use OT for best quality sparsifier.set_policy(matcher="ot", attributor="pshap", random_state=42) # 6. Query minimum actions min_actions = sparsifier.query_minimum_actions() # 7. Refine refined = sparsifier.refine_counterfactuals(limited_actions=min_actions) # 8. Visualize sparsifier.heatmap_direction(save_path='./results') sparsifier.stacked_bar_chart(save_path='./results') How do I ensure reproducible results? -------------------------------------- .. code-block:: python # Set all random seeds import random import numpy as np random.seed(42) np.random.seed(42) # Set random_state in COLA sparsifier.set_policy( matcher="ot", attributor="pshap", random_state=42 # Reproducibility! ) Should I use a pipeline or separate preprocessing? --------------------------------------------------- **Recommended: Use Pipeline** .. code-block:: python # ✅ Best practice pipe = Pipeline([ ('preprocessor', preprocessor), ('classifier', classifier) ]) pipe.fit(X_train, y_train) ml_model = Model(model=pipe, backend="sklearn") **Alternative: PreprocessorWrapper** .. code-block:: python # If you must separate them X_train_prep = preprocessor.fit_transform(X_train) classifier.fit(X_train_prep, y_train) ml_model = PreprocessorWrapper( model=classifier, backend="sklearn", preprocessor=preprocessor ) Advanced Topics =============== Can I implement custom matchers? --------------------------------- Yes! Inherit from ``BaseCounterfactualMatchingPolicy``: .. code-block:: python from xai_cola.ce_sparsifier.policies.matching import ( BaseCounterfactualMatchingPolicy ) class MyCustomMatcher(BaseCounterfactualMatchingPolicy): def match(self, factual_df, cf_df): # Your matching logic return matching_dict Can I use COLA with time-series data? -------------------------------------- COLA is designed for tabular data. For time-series, you'd need to: 1. Extract tabular features from time-series 2. Use those features with COLA 3. Map refined CFs back to time-series How does COLA compare to other CF methods? ------------------------------------------- COLA is a **refinement** method, not a generation method. It works **on top of** existing CF generators to make them more actionable. **Comparison:** - **DiCE alone**: Generates diverse CFs (may require many changes) - **COLA + DiCE**: Refines DiCE's CFs to require fewer changes - **Result**: Same or similar outcome with 30-70% fewer actions Getting Help ============ Where can I find more examples? -------------------------------- - :doc:`tutorials/01_basic_tutorial` - Complete tutorial - `GitHub examples/ directory `_ - :doc:`user_guide/data_interface` - Detailed guides How do I report bugs? ---------------------- 1. Check existing `GitHub Issues `_ 2. Open a new issue with: - Python version - COLA version (``print(xai_cola.__version__)``) - Minimal code to reproduce - Full error message How can I contribute? ---------------------- See :doc:`contributing` for contribution guidelines. Who do I contact for questions? -------------------------------- - Lin Zhu: s232291@student.dtu.dk - Lei You: leiyo@dtu.dk - GitHub Issues: https://github.com/understanding-ml/COLA/issues Citation ======== If you use COLA in your research, please cite: .. code-block:: bibtex @article{you2024refining, title={Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality}, author={You, Lei and Bian, Yijun and Cao, Lele}, journal={arXiv preprint arXiv:2410.05419}, year={2024} } See Also ======== - :doc:`installation` - Installation guide - :doc:`quickstart` - Quick start guide - :doc:`api/cola` - API reference - `Paper `_ - Research paper