Gemma 4B IT · Layer 22 · Residual Post · 65K SAE · L0 Medium
Metric: Pearson correlation · 8 cognitive domains · Adaptive dimension selection
Instead of a fixed top-K, each domain selects its own reliable SAE dimensions:
Union of per-domain selections = dimensions for visualization.
Silhouette score: (domain separability in UMAP space)
Each point = 1 question. Colored by domain. Hover for details.
Pearson correlation of domain mean activations on the union of reliable dimensions.
For domains with 2+ independent datasets: how similar is each dataset's "similarity profile" to other datasets in the same domain? A high value means the domain's position in the similarity space is stable regardless of which benchmark you use.
✅ > 0.7 consistent · ⚠️ 0.4–0.7 mixed · ❌ < 0.4 unstable
For each question in each dataset, we run Gemma 4B IT, hook into Layer 22 residual stream, apply the SAE encoder (65K features, L0 Medium), and take the mean activation across all tokens after thresholding (feature_acts × (feature_acts > threshold)). This yields a 65,536-dimensional vector per question.
Previous versions used a fixed top-K (e.g., top-200 active dimensions). This was problematic because:
Now, each domain independently selects dimensions where activation is consistent across its 120 questions (CV < 0.5, mean > 10). This is a unified principle, not a fixed K. Coding gets 55 dims, sentiment gets 370 — reflecting genuine cognitive structure.
Pearson correlation preserves activation magnitude. If two domains both strongly activate the same SAE feature (e.g., dim 839: coding=187, geography=579), Pearson captures this as a high correlation — which correctly reflects that they share a "brain region." Spearman discards this magnitude information.
| Domain | Datasets | Source |
|---|---|---|
| coding | HumanEval, MBPP | Code generation benchmarks |
| math | MMLU College Math, GSM8K, MMLU HS Math | Math reasoning benchmarks |
| geography | MMLU Geography, MMLU HS Geography | Geography knowledge |
| history | MMLU US History, MMLU HS History | Historical knowledge |
| sentiment | DAIR-Emotion, IMDb, SST-2 | Sentiment analysis |
| language | GLUE/CoLA, GLUE/RTE | Language understanding |
| symbolic | ReClor, OpenBookQA | Logical/scientific reasoning |
| spatial | HellaSwag | Commonsense/physical reasoning |
All datasets are from public benchmarks. No self-built questions included.