Using GALE to compare sets of ML model explanations

Exploring a novel approach to machine learning explainability using topological data analysis.

Brian Barr

August 15, 2022|5 min read

Updated September 27, 2023

Imagine you’re dropped into an unknown area and are handed a stack of unlabeled maps. A visual inspection of the terrain — identifying features like steep inclines, open valleys and bodies of water — can give you confidence in selecting the correct map to navigate your way. Now imagine you’re using a similar set of maps to try to explain a complex machine learning model — where feature importance stands in for elevation. In this case, we have to move from topography to the mathematical field of topology, the study of the properties of geometric shapes in mathematics, which happens to be a novel approach we can take to advance explainability in machine learning.

My Applied ML research team at Capital One, in collaboration with partners including researchers at NYU, recently published new findings that propose a method using Topological Data Analysis to explore the space of explanations and provide a stable, robust view. The research, called GALE: Globally Assessing Local Explanations, essentially offers a way to compare sets of model explanations and determine their similarities.

Using ablation studies framework for topological data analysis

To revisit the map analogy, if you’re following a topographic map, you would expect the contours of the landscape on the map to reflect what you’re experiencing in real life: Here I am at that ridge, over there is the base of the mountain. The challenge in explaining machine learning decisions is that we’re working with complex, cumbersome, multi-dimensional maps called manifolds. But what if we turned those manifolds into approximate graphical (graph network) representations? We could then compare whether or not the manifolds are similar.

This becomes important because machine learning model explanations don’t have a singular ground truth; there’s no training, no loss minimization for explanations. Instead, they’re built off of assumptions and axioms that we assume they’ll uphold. The “correctness” of the explanations in practice can vary based on the method’s translation from theory into code. They can also be sensitive to choices of hyperparameters, but with no ground truth, there is no notion of optimizing those hyperparameters. Our work offers a novel approach to compare multiple sets of model explanations and to determine whether they agree or disagree; the consensus among differing methods’ explanations provides a mechanism to build trust.

This research was recently presented at the TAG in Machine Learning Workshop at ICML and is closely related to our work on ablation studies (a common way ML practitioners test the feature importance), which could be a useful reference to gauge GALE’s performance. Our work on ablation studies for explainability, which will be presented at KDD’s Machine Learning in Finance Workshop, provides a framework to assess the faithfulness of a set of explanations to their model.

Leveraging machine learning models to inform topological data research

As machine learning advances, so do the breadth and depth of applications for more high-stakes, automated decision systems. Whether mitigating financial risk, making important medical diagnoses or teaching vehicles how to drive themselves, there’s becoming a broad, cross-industry need for faithful explanations to machine learning models. So it’s important to understand what these types of consequential machine learning models look at and how they make decisions.

Conclusion: Using topological data analysis for machine learning explainability

While not yet heavily explored within machine learning, the study of topology is an emerging approach to address this multifaceted challenge, and it has considerable potential. By leveraging topological data analysis at every step in the machine learning model pipeline, researchers may discover a level of tooling and introspection that has otherwise not been possible. That’s an exciting possibility — and could be one of the north stars on the map toward better models and greater explainability.

Brian Barr, Sr. Lead Machine Learning Engineer

Brian Barr is a Sr Lead Machine Learning Engineer at Capital One's Center for Machine Learning Applied Research team. For the past three years, he has been working on explainable AI (XAI) as well as using explanations to improve models. By training, Brian is a mechanical engineer with a background in computational fluid dynamics, with experience in multi-objective optimization using surrogate models. That background provided a predictable trajectory to the world of machine learning. Over the course of 15 years at GE's research lab, he helped forecast loss during the financial crisis, built deep learned surrogates for airfoil design, as well as keeping really big lasers in spec during long 3D print jobs. When he unplugs, he likes to hike, canoe, and ski.

Using GALE to compare sets of ML model explanations

Exploring a novel approach to machine learning explainability using topological data analysis.

Using ablation studies framework for topological data analysis

Leveraging machine learning models to inform topological data research

Conclusion: Using topological data analysis for machine learning explainability

Related Content

Democratizing machine learning

Introduction to deep tabular models

Deep learning for transfer tabular models

Footnotes