Revising and Falsifying Sparse Autoencoder Feature Explanations
George Ma, Samuel Pfrommer, Somayeh Sojoudi (2025). Revising and Falsifying Sparse Autoencoder Feature Explanations. In Thirty-ninth Conference on Neural Information Processing Systems.
My name is George Ma (Jiangyan Ma). I am an EECS PhD student at UC Berkeley, advised by Prof. Somayeh Sojoudi. Previously, I was an undergraduate student at Peking University, where I did research on graph learning at Prof. Yisen Wang’s lab. My email: george_ma@berkeley.edu. My homepage: George Ma’s Homepage. I actively write blogs on Zhihu: George M’s Zhihu Homepage.
Undergraduate; Information and Computing Science
I studied at School of Electronic Engineering and Computer Science, Peking University. I majored in Applied Physics in the first two years of college and switched to Information and Computing Science thereafter.
PhD student; Electrical Engineering and Computer Sciences
Currently, I am an EECS PhD student at UC Berkeley.
George Ma, Samuel Pfrommer, Somayeh Sojoudi (2025). Revising and Falsifying Sparse Autoencoder Feature Explanations. In Thirty-ninth Conference on Neural Information Processing Systems.
George Ma, Yifei Wang, Derek Lim, Stefanie Jegelka, Yisen Wang (2024). A Canonicalization Perspective on Invariant and Equivariant Learning. In Thirty-eighth Conference on Neural Information Processing Systems.
Jiangyan Ma, Emmanuel Bengio, Yoshua Bengio, Dinghuai Zhang (2023). Baking Symmetry into GFlowNets. In NeurIPS 2023 AI for Science: from Theory to Practice.
Jiangyan Ma, Yifei Wang, Yisen Wang (2023). Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding. In Thirty-seventh Conference on Neural Information Processing Systems.
George Ma, Anurag Koul, Qi Chen, Yawen Wu, Sachit Kuhar, Yu Yu, Aritra Sengupta, Varun Kumar, Murali Krishna Ramanathan (2025). SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion. arXiv preprint arXiv:2510.17925.
Samuel Pfrommer, George Ma, Yixiao Huang, Somayeh Sojoudi (2025). Spooky Action at a Distance: Normalization Layers Enable Side-Channel Spatial Communication. arXiv preprint arXiv:2507.04709.
May 2025 – Sep 2025; SpecAgent for Code Completion
Interned at Amazon and proposed SpecAgent, an indexing-time retrieval agent that anticipates future edits in code repositories to reduce inference-time latency. Designed a leakage-free benchmark to avoid future context leakage and provide more realistic evaluation. Experiments demonstrated 9–11% absolute improvements in code completion performance.
Nov 2024 – May 2025; Revising SAE Feature Explanations
Investigated mechanistic interpretability of LLMs using sparse autoencoders (SAEs), which disentangle hidden representations into interpretable features. Proposed structured explanations, a tree-based explainer, and hard-negative sampling to address biases in current methods. The resulting paper was published at NeurIPS 2025.
Dec 2024 – Feb 2025; Normalization Layers and Side-Channel Communication
Studied the role of normalization layers in CNNs and discovered that they enable long-range spatial communication beyond local receptive fields. Analyzed this effect in a toy localization task, showing normalization layers act as iterative message-passing mechanisms. The work highlights risks in applications requiring spatial locality.
Jun 2023 – May 2024; Canonicalization for Invariant & Equivariant Learning
Collaborated with MIT researchers to introduce a canonicalization framework that unifies invariant and equivariant learning. This framework resolved an open problem on the expressiveness of invariant networks with equivariance constraints. We also designed new canonicalization algorithms for eigenvectors, leading to a publication at NeurIPS 2024.
Jun 2023 – Sep 2023; Baking Symmetry into GFlowNets
Interned at Prof. Yoshua Bengio’s lab (Mila) and applied my background in invariant networks to address symmetric actions in GFlowNets. Developed methods to incorporate symmetries into the generation process, improving both diversity and reward. The paper was presented as an oral at NeurIPS 2023 AI4Science.
Oct 2022 – Aug 2023; Laplacian Canonization for GNNs
Explored Laplacian eigenvectors as universal graph positional encodings, which suffer from sign and basis ambiguity. Proposed Laplacian Canonization, a preprocessing algorithm that resolves these ambiguities. The work was published as a poster at NeurIPS 2023.