Layer-wise Integrated Gradients
for within-layer flow in Transformers

Set-to-set Integrated Gradients at Attention and MLP boundaries inside each Transformer layer — with an interactive demo and a Python package. Theory and experiments are in the arXiv preprint.

Eight Suzuki^1,3 · Hideitsu Hino^1,2 · Noboru Murata¹

¹ Waseda University
² The Institute of Statistical Mathematics
³ Fujitsu Limited

Papersoon arXiv Code PyPI Interactive demo Contact

Overview

What is LIG?

Layer-wise Integrated Gradients (LIG) attributes how tokens influence each other inside a Transformer layer — not only across layers — by applying Integrated Gradients at Attention and MLP module boundaries.

Definitions, baselines, L₂ diagnostics, and PTB experiments are in the paper. This page covers notation, figures, install, and the visualization.

LIG is model-agnostic at the block level — the same API covers BERT-style encoders (ATT/MLP split), block-only models (layer granularity only), and GPT-2 decoders, with reproducible PTB experiments in the release repository. Boundary detection is automatic: describe_boundaries(model_id) reports residual-stream nodes z, attention outputs u, and IG hook points.

Notation

Reading the demo

Each Transformer layer processes a set of token vectors on the residual stream. z_i^(l) is the representation of token i at the input of layer l. After multi-head attention (ATT), head h produces u_i^(l,h) for each token; MLP then updates the stream to z^(l+1).

With layer l fixed, one block is z^(l) → ATT → u → MLP → z^(l+1) (see the flow below). LIG attributes token-to-token influence inside that block using Integrated Gradients at the ATT and MLP module boundaries.

The interactive demo plots within-layer z→z maps (route label z2z): how much each input token contributes to each output token after the full layer block. Labels z2u and u2z refer to the ATT and MLP steps in between when you switch to the composed route in the demo.

Layer l−1 z^(l)

Layer l

ATT u^(l,h) MLP

z^(l+1) Layer l+1

z → u z2u · Attention

How much each input token z contributes to each attention output u (per head).

u → z u2z · MLP

How much each attention output u (per head) contributes to the next-layer token vector.

z → z z2z · Layer

Token-to-token contribution for the whole layer block — measured directly (top path) or as the product of z2u and u2z (bottom path).

Two views within layer l (Fig. 2)

LIG can attribute within-layer flow in two ways. They answer the same question — how tokens influence each other inside one layer — but follow different paths through the block.

Layer-direct · z2z top path: Apply Integrated Gradients once on the layer-whole map z^(l) → z^(l+1). The demo’s default view uses this route: each cell shows how much source token i contributes to target token j within the same layer.
Composed · z2u × u2z bottom path: Measure ATT (z→u, z2u) and MLP (u→z, u2z) separately, multiply per-head contributions, then sum over heads. The paper compares this composition with the layer-direct map under an L₂ criterion (details in the paper).

Top: layer-direct z-to-z path. Bottom: composed Attention z-to-u and MLP u-to-z paths.

Visualization

Within-layer contribution map

Samples 00016 / 00410. Use Display options (top-right) to toggle paths and adjust circle sizes. Click the preview below or use Open in new tab if you prefer a separate window.

Open in new tab DATA_NOTICE.md

Open interactive demo

Data

Penn Treebank (PTB)

The paper's Experiment A evaluates within-layer flow consistency on Penn Treebank development sentences in Stanford Dependencies format (indices 0–1699). This site ships only two excerpt sentences for the interactive demo (00016 and 00410).

The full Treebank-3 corpus is not included in this repository. Reproducing Experiment A requires an LDC license for Treebank-3 (LDC99T42). See DATA_NOTICE.md for licensing details and what is redistributed here.

DATA_NOTICE.md Reproduction guide

References

Paper bibliography

Key citations from the LIG paper — methods, models, baselines, and evaluation data. Full bibliography: arXiv preprint.

Sundararajan, Mukund; Taly, Ankur; Yan, Qiqi (2017). Axiomatic Attribution for Deep Networks. Proceedings of ICML.

BibTeX (sundararajan2017)

@article{sundararajan2017,
  title        = {Axiomatic Attribution for Deep Networks},
  author       = {Sundararajan, Mukund and Taly, Ankur and Yan, Qiqi},
  journal      = {Proceedings of the 34th International Conference on Machine Learning},
  year         = {2017},
  pages        = {3319--3328}
}

Bach, Sebastian; Binder, Alexander; Montavon, Grégoire; et al. (2015). On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLOS ONE.

BibTeX (bach2015)

@article{bach2015,
  title        = {On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation},
  author       = {Bach, Sebastian and Binder, Alexander and Montavon, Gr{\'e}goire and Klauschen, Frederick and M{\"u}ller, Klaus-Robert and Samek, Wojciech},
  journal      = {PLOS ONE},
  year         = {2015},
  volume       = {10},
  number       = {7},
  pages        = {e0130140}
}

Montavon, Grégoire; Lapuschkin, Sebastian; Binder, Alexander; et al. (2019). Layer-Wise Relevance Propagation: An Overview. Springer (Explainable AI).

BibTeX (montavon2019)

@article{montavon2019,
  title        = {Layer-Wise Relevance Propagation: An Overview},
  author       = {Montavon, Gr{\'e}goire and Lapuschkin, Sebastian and Binder, Alexander and Samek, Wojciech and M{\"u}ller, Klaus-Robert},
  journal      = {Explainable {AI}: Interpreting, Explaining and Visualizing Deep Learning},
  year         = {2019},
  pages        = {193--209},
  publisher    = {Springer}
}

Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; et al. (2017). Attention Is All You Need. NeurIPS.

BibTeX (vaswani2017)

@inproceedings{vaswani2017,
  title        = {Attention Is All You Need},
  author       = {Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, {\L}ukasz and Polosukhin, Illia},
  booktitle    = {Advances in Neural Information Processing Systems (NeurIPS)},
  year         = {2017},
  pages        = {5998--6008}
}

Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT.

BibTeX (devlin2019)

@inproceedings{devlin2019,
  title        = {{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author       = {Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  booktitle    = {Proceedings of NAACL-HLT},
  year         = {2019},
  pages        = {4171--4186}
}

Achtibat, Reduan; Hatefi, Sayed Mohammad Vakilzadeh; Dreyer, Maximilian; et al. (2024). AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers. ICML.

BibTeX (achtibat2024attnlrp)

@inproceedings{achtibat2024attnlrp,
  title        = {{AttnLRP}: {Attention-Aware} {Layer-Wise} {Relevance} {Propagation} for {Transformers}},
  author       = {Achtibat, Reduan and Hatefi, Sayed Mohammad Vakilzadeh and Dreyer, Maximilian and Samek, Wojciech and Lapuschkin, Sebastian},
  booktitle    = {Proceedings of the International Conference on Machine Learning (ICML)},
  year         = {2024}
}

Sturmfels, Pascal; Lundberg, Scott; Lee, Su-In (2020). Visualizing the Impact of Feature Attribution Baselines. Distill.

BibTeX (sturmfels2020)

@article{sturmfels2020,
  title        = {Visualizing the Impact of Feature Attribution Baselines},
  author       = {Sturmfels, Pascal and Lundberg, Scott and Lee, Su-In},
  journal      = {Distill},
  year         = {2020}
}

Marcus, Mitchell P.; Santorini, Beatrice; Marcinkiewicz, Mary Ann; Taylor, Ann (1999). Treebank-3. Linguistic Data Consortium, Philadelphia. LDC Catalog No. LDC99T42.

BibTeX (marcus1999treebank)

@misc{marcus1999treebank,
  author       = {Marcus, Mitchell P. and Santorini, Beatrice and Marcinkiewicz, Mary Ann and Taylor, Ann},
  title        = {Treebank-3},
  howpublished = {Web Download},
  publisher    = {Linguistic Data Consortium},
  address      = {Philadelphia},
  year         = {1999},
  note         = {LDC Catalog No. LDC99T42. https://catalog.ldc.upenn.edu/LDC99T42}
}

Resources

Install & reproduction

Install from PyPI

Install PyTorch first (CUDA or CPU wheel), then the package. CLI: lig explain "…"

pip install torch  # pick your CUDA/CPU index
pip install layer-wise-integrated-gradients

PyPI GitHub

Python API

One-call attribution to JSON — z→u, u→z, and z→z inside each layer.

from lig import explain

explain(
    "The cat sat on the mat.",
    model="bert-base-uncased",
    num_steps=32,
    granularity="all",
    layers=[0, 11],
)

README Computation guide

Experiment A (PTB)

Reproduce the paper's PTB dev evaluation (indices 0–1699) with the scripts in the repository. Data licensing and the Treebank-3 citation are summarized in the Penn Treebank section.

PTB & licensing Reproduction guide

Model boundaries

Inspect residual-stream nodes z, attention outputs u, and hook points without running IG.

from lig import describe_boundaries

describe_boundaries("gpt2", load_weights=False)

Boundary guide

Cite

BibTeX

Cite the arXiv preprint (2606.21564). Conference paper BibTeX will be added when available.

@article{suzuki2026lig,
  title         = {LIG: Layer-wise Integrated Gradients for Within-Layer Flow Analysis in Transformers},
  author        = {Suzuki, Eight and Hino, Hideitsu and Murata, Noboru},
  year          = {2026},
  eprint        = {2606.21564},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG}
}

Layer-wise Integrated Gradients for within-layer flow in Transformers