Gabriele Sarti's picture

Gabriele Sarti

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

upvoted a collection about 20 hours ago

authored a paper about 24 hours ago

Agents of Chaos

authored a paper about 24 hours ago

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

View all activity

Organizations

upvoted a collection about 20 hours ago

Qwen3.5

17 items • Updated 1 day ago • 764

upvoted 2 papers 1 day ago

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

Paper • 2602.08964 • Published 22 days ago • 1

Agents of Chaos

Paper • 2602.20021 • Published 8 days ago • 28

upvoted a paper 21 days ago

Faithful Persona-based Conversational Dataset Generation with Large Language Models

Paper • 2312.10007 • Published Dec 15, 2023 • 11

upvoted 2 papers about 1 month ago

Language Models Change Facts Based on the Way You Talk

Paper • 2507.14238 • Published Jul 17, 2025 • 1

Demographic Probing of Large Language Models Lacks Construct Validity

Paper • 2601.18486 • Published Jan 26 • 1

upvoted an article about 1 month ago

Article

🪄 Interpreto: A Unified Toolkit for Interpretability of Transformer Models

Jan 20

•

37

upvoted a collection 2 months ago

Activation Oracles

12 items • Updated Dec 26, 2025 • 14

upvoted a paper 2 months ago

GIM: Improved Interpretability for Large Language Models

Paper • 2505.17630 • Published May 23, 2025 • 1

upvoted a collection 3 months ago

Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability

A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated Dec 16, 2025 • 24

upvoted a paper 3 months ago

Accumulating Context Changes the Beliefs of Language Models

Paper • 2511.01805 • Published Nov 3, 2025 • 2

upvoted an article 4 months ago

Article

SYNTH: the new data frontier

Nov 10, 2025

•

9

upvoted a collection 4 months ago

🧩 Word games

A collection of resources for word games in various languages • 16 items • Updated Sep 24, 2025 • 2

upvoted 2 papers 4 months ago

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published Oct 17, 2025 • 3

Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17, 2025 • 69

upvoted 3 papers 5 months ago

Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1, 2025 • 6

Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published Oct 1, 2025 • 2

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

Paper • 2507.08802 • Published Jul 11, 2025 • 1

upvoted an article 5 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

95

upvoted a collection 6 months ago

Hallucination Probes

https://arxiv.org/abs/2509.03531 • 5 items • Updated Oct 15, 2025 • 2