Publications

Revising and Falsifying Sparse Autoencoder Feature Explanations

Published in NeurIPS, 2025

We developed new methods to refine and falsify sparse autoencoder feature explanations, yielding higher-quality interpretability of large language models.

Recommended citation: George Ma, Samuel Pfrommer, Somayeh Sojoudi (2025). Revising and Falsifying Sparse Autoencoder Feature Explanations. In Thirty-ninth Conference on Neural Information Processing Systems. https://openreview.net/forum?id=OJAW2mHVND

A Canonicalization Perspective on Invariant and Equivariant Learning

Published in NeurIPS, 2024

We analysed the efficiency and expressiveness of invariant and equivariant networks from a canonicalization perspective.

Recommended citation: George Ma, Yifei Wang, Derek Lim, Stefanie Jegelka, Yisen Wang (2024). A Canonicalization Perspective on Invariant and Equivariant Learning. In Thirty-eighth Conference on Neural Information Processing Systems. https://openreview.net/forum?id=jjcY92FX4R&noteId=jjcY92FX4R

Baking Symmetry into GFlowNets

Published in NeurIPS-AI4Science, 2023

We proposed to incorporate state and action symmetries into GFlowNets.

Recommended citation: Jiangyan Ma, Emmanuel Bengio, Yoshua Bengio, Dinghuai Zhang (2023). Baking Symmetry into GFlowNets. In NeurIPS 2023 AI for Science: from Theory to Practice. https://openreview.net/forum?id=CZGHAeeBk3

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

Published in NeurIPS, 2023

We designed the Laplacian Canonization algorithm to address the sign and basis ambiguities of Laplacian eigenvectors.

Recommended citation: Jiangyan Ma, Yifei Wang, Yisen Wang (2023). Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=1mAYtdoYw6