I started out in undergrad trying to explain black-box rumour-detection models — why does this LSTM flag a post as a lie? That question has followed me through every project since, and has kept sharpening.
If a model's output can't be audited — traced back to evidence, to a rationale, to a circuit — then for high-stakes work it may as well be a coin flip with confidence.
My MSc pushed me toward graph neural networks: structure is a useful prior when you're trying to reason about relations between claims and evidence. At Fatima Fellowship I got to put that into practice — first as a cross-graph GNN, then as an agentic pipeline (MERIT) where LVLMs plan, call web-search tools, and build verifiable evidence graphs before committing to an answer.
What I want to do next, in a PhD, is go one level deeper: not just build reasoning systems but understand them. Mechanistic interpretability of multi-step reasoning, especially in agentic settings where one model delegates to tools or to other models. How does reasoning compose? Where does it break? Can we build agents whose chains we can audit the way we audit circuits?
i.
Reasoning in LLMs
Chain-of-thought is convenient but unfaithful. I'm interested in what counts as actual reasoning inside a forward pass, and how the scratchpad relates to the computation.
ii.
Mechanistic interpretability
Circuits-level methods — SAEs, causal mediation, activation patching — applied to the reasoning traces I care about, not just toy tasks.
iii.
Agentic reasoning & communication
What happens when one reasoner delegates to another? Tool-calls as a window into compositional structure; inter-agent communication as a proxy for intermediate representation.
iv.
Verifiable evidence
Outputs are only as good as the rationales they sit on. Structured evidence (graphs, citations, tool traces) is how I keep systems honest.