Zanwen Fu
Currently · Machine Learning Engineer Intern @ Robinhood

Agents that
survive
production.

Engineer & founder building production-grade autonomous agents, grounded in strong software engineering to solve real-world problems.

I build agentic systems — reliable, observable, and designed to survive production, not just demos.

MLE (Agentic) @ Robinhood·51.6% SWE-bench·Acquired by Sonar·500 pilot users·Agent OS
01

About

From NUS to Duke — shipping production systems at every stop.

I'm a Software Engineer and MS Computer Science (AI/ML) student at Duke University, focused on building production-grade autonomous systems — from multi-agent orchestration and LLM tooling to the distributed backends that run them reliably at scale.

I'm the sole founder and engineer of VYNN AI, an agentic financial analyst platform built end-to-end and deployed to ~500 pilot users.

Previously, I designed core components of AutoCodeRover, an autonomous code repair system acquired by Sonar, integrating agentic reasoning directly into JetBrains IDEs. In parallel, I've led research as sole first author on multi-agent LLM frameworks for medical text mining, achieving 98.2% sensitivity across 15 systematic reviews (~150K citations).

This summer, I’m at Robinhood in Menlo Park as a Machine Learning Engineer on the Central AI team, continuing my focus on building autonomous systems that operate reliably at real-world scale.

What drives me

Systems that are reliable, observable, and production-ready — not just demo-ready. I care deeply about turning ideas into robust software that solves real problems and serves real users.

Current obsession

Agent harness design — the infrastructure layer that makes agents actually work in production. The agent itself is the easy part; the harness that makes it reliable, observable, and debuggable is what I want to build.

Git-based MemoryOrchestrationContext EngineeringEval PipelinesSelf-MonitoringFailure Recovery
Duke University

Duke University

M.S. Computer Science (AI/ML)

2025 – 2027 · Graduate Teaching Assistant

National University of Singapore

National University of Singapore

B.Comp. in Computer Science (Honours)

2021 – 2025 · Distinction

Distinction in Software Engineering

View verified credential

Exchange Semester

The University of Hong Kong · Fall 2023

02

Experience

Startups, big tech, research, and teaching.

Robinhood logo

Machine Learning Engineer (Agentic)

CURRENT
Robinhood·Central AI Team
May 2026 – Aug 2026·Menlo Park, CA

On the Central Agentic team, working on agent reliability and evaluation — the infrastructure layer that lets Robinhood ship AI products into a regulated financial domain. Building Kafka-based news/market data pipelines, Braintrust-driven evals, and post-training workflows (SFT on Databricks) spanning closed-source (GPT-4.1) and open-source models.

PythonOpenAI Agent SDKBraintrustKubernetesKafkaDatabricksSFT
VYNN AI logo

Founder & Software Engineer (Agentic)

Jul 2025 – Dec 2025·Durham, NC

Designed, built, and deployed a full-stack agentic financial analysis platform as sole engineer — from LangGraph multi-agent backend and FastAPI orchestration layer to React dashboard and production infrastructure on Hetzner Cloud. Serves ~500 pilot users with institutional-quality equity research (DCF modeling, news intelligence, automated reports) in under 7 minutes end-to-end.

PythonLangGraphFastAPIReactTypeScriptMongoDBRedisKafkaDockerHetzner
Duke University logo

Graduate Teaching Assistant

Aug 2025 – Apr 2026·Durham, NC

Architected and led CS 590 (Software Development Studio), where graduate students build AI debugging agents inspired by AutoCodeRover and deploy full-stack applications. Also mentored teams in CS 408 and CS 390 on software architecture, DevOps, and LLM-oriented programming — shipping production software for real clients.

PythonDockerCI/CDGit/GitLabWeb Assembly
AutoCodeRover (acquired by Sonar) logo

Research Software Engineer

Aug 2024 – May 2025·Singapore

Built the JetBrains IDE plugin end-to-end for autonomous code repair — GumTree-based 3-way AST merge, embedded SonarLint analysis, and real-time SSE streaming with per-step developer feedback. Enhanced the agentic repair backend with LLM-as-a-Judge self-improvement, lifting SWE-bench Verified to 51.6% (state-of-the-art among open-source agents). Core technology acquired by Sonar.

KotlinPythonIntelliJ Platform SDKGumTreeSonarLintJGitOkHttp
Binance logo

Software Engineer

Binance·Web3 Wallet Team
Jul 2025 – Oct 2025·Singapore

Built backend validation infrastructure for Binance's Boosters campaign — automated API regression suites in CI, load-tested services to ~500K concurrent transactions via JMeter, and instrumented monitoring to catch consistency failures before production. Worked directly with backend and Web3 Wallet engineers to root-cause and patch defects, cutting resolution time by 40%.

JavaREST APIsiOSAndroid SDKsPostmanJMeterCI/CD
NUS Undergraduate Research logo

AI Researcher

Jan 2024 – Jul 2025·Singapore

Led research as first author on a multi-agent AI framework for medical evidence synthesis. Designed and built LUMINA, a four-agent LLM framework that automates citation screening for medical systematic reviews — achieving 98.2% sensitivity and 87.9% specificity across 15 SRMAs (~150K citations) with a 35× reduction in false negatives vs. prior state-of-the-art.

PythonLangChainCI/CDGPT-4o-miniGPT-o3-mini

Earlier Experience

Full-Stack Software EngineerST Engineering
May 2023 – Aug 2023
May 2024 – Dec 2024
Web DeveloperNUS Computing
Feb 2024 – Nov 2024
03

Selected Work

FOUNDER · SOLE ENGINEER~500 PILOT USERSFULL-STACK · PRODUCTION

VYNN AI

Bloomberg-grade equity research, built for retail.

A LangGraph supervisor orchestrates 7 specialized agents — fundamentals, news intelligence, DCF modeling, report generation, and a 3-layer recommendation engine with deterministic validation. Every figure traces to a deterministic source, every recommendation enforces ≥97% citation coverage. Built end-to-end as sole engineer in Python + React/TypeScript on Docker infrastructure (Hetzner Cloud), and shipped to ~500 pilot users in production.

System Architecture

React / TypeScript + ViteAI Chat (SSE streaming)Market DashboardPortfolio MgmtDaily ReportsNews FeedSSERESTWSS ×2Caddy (auto-HTTPS + reverse proxy)FastAPI BackendAuth (OAuth)Chat / SSE / JobsWS HubPrices + NewsDaily Scheduler8:30 AM ET cronPortfolio CRUDDocker SDK →spawns ephemeral containersEphemeral Docker Container (~975MB)Supervisor Agent (LangGraph Cyclical State Graph)Ticker Extraction → Intent Classification → Dependency-Aware RoutingCOMPREHENSIVE | MODEL_ONLY | QUICK_NEWS | CUSTOM (deterministic fallback)Financial Data Agentyfinance · fundamentalsDCF Model AgentGeneric · SaaS · REIT · Bank · Utility · EnergyFormula Evaluator (1,293L)News Intelligence AgentScrape → Filter → ScreenwaitsReport Generator Agent (HTML / PDF / XLSX)Recommendation Engine (3-Layer Validation)Deterministic Calculator → LLM Narrative → Regex Validator (≥95% citation coverage)Output ArtifactsXLSX (10-tab DCF)PDF Analyst ReportJSON APIFinancialState (Blackboard Pattern)33 externalized prompt templatesOpenAI + Anthropic (provider-agnostic)RedisQueues + Cache + SessionsMongoDBDocuments + NewsNginxSPA Static Serve

< 7 min

End-to-end equity analysis — fundamentals, news intel, DCF modeling, validated PDF report

0.985

Reproducibility score across paired runs (CV 0.016) — symbolic outputs match exactly under identical inputs

97%

Citation coverage enforced on every recommendation — zero invented numbers, every figure traceable

~500

Pilot users on production Hetzner Cloud infrastructure with zero-downtime deployments

PythonLangGraphFastAPIWebSocketReactTypeScriptMongoDBRedisDockerHetzner VPS
SONAR · SOFTWARE ENGINEERACQUIRED BY SONARISSTA 2024 + arXiv

AutoCodeRover — IDE Plugin + Repair Agent

Brought autonomous code repair from research to a production developer tool. AutoCodeRover is a multi-agent system that resolves real GitHub issues end-to-end — reproducing bugs, searching codebases across 7 languages via tree-sitter, generating patches with iterative refinement, and self-correcting through an LLM-as-a-Judge reviewer. I built the JetBrains IDE plugin end-to-end in Kotlin: a conversational agent UI with real-time SSE streaming, GumTree-based three-way AST merge for conflict-free patch application, embedded SonarLint static analysis, and a feedback loop where developers can critique any reasoning step to trigger guided re-runs. On the backend, I designed the self-fix agent that diagnoses inapplicable patches and autonomously replays the pipeline from the most suspicious stage — lifting SWE-bench Verified to 51.6%. The core technology was acquired by Sonar. Sonar Foundation Agent, built on the AutoCodeRover core, has since reached 79.2% on SWE-bench Verified — #1 on the leaderboard (Feb 2026).

Repair Pipeline Architecture

JetBrains IDE (IntelliJ / PyCharm)PSI Traversal · Build/Test Listeners · Git4Idea · Editor APIACR Plugin (Kotlin)Chat UI + SSEReal-time streaming with typewriter animationSonarLint EngineEmbedded Java/Python static analysisGumTree 3-Way MergeBaseline → Modified → Patched AST alignmentContext EnrichmentPSI refs + cursor history + open filesUser Feedback per Reasoning Step→ Guided Re-runcritique any agent stepAutoCodeRover Backend (Python · Docker)Meta-Agent OrchestratorHardcoded or LLM-drivenReproducer AgentContext Retrieval7-language tree-sitterPatching AgentReviewer AgentSelection AgentBest-of-N + regressionSelf-Fix AgentLLM-as-a-JudgeSWE-bench 51.6% VerifiedPatched Code + EvidenceSpecs + Reproducer + ReviewREST API (OkHttp)SSE Stream (real-time logs)replay loopFeedbackContext / FeedbackPatch

51.6%

SWE-bench Verified (Jan 2025)

State-of-the-art across 2,294 real GitHub issues — highest among open-source agents

13.2%

Resolve Rate Improvement

Lifted SWE-bench Verified from 38.4% (Jun 2024) to 51.6% (Jan 2025) — via Self-Fix Agent with LLM-as-a-Judge and interactive feedback loops

3-Way

AST Merge (GumTree)

Conflict-free patch application when local code has diverged from agent's baseline

7

Languages Supported

Tree-sitter search across Python, Java, JS, TS, C/C++, Go, PHP

Autonomous Repair

Describe a bug → ACR localizes, patches, and validates autonomously

SonarLint Integration

Embedded static analysis for Java/Python with one-click ACR fixes

3-Way AST Merge

GumTree conflict resolution across baseline/modified/patched

Interactive Feedback

Critique any agent reasoning step — triggers guided pipeline re-run

Self-Fix Agent

LLM-as-a-Judge diagnoses inapplicable patches and replays from failure point

Build/Test Capture

Auto-captures IDE build and test failures with one-click ACR submission

KotlinPythonJetBrains PSIGumTreetree-sitterSonarLintOkHttpSSEREST APIsJGitDockerClaude 3.5 SonnetGPT-4o
CURRENTLY BUILDING · v0AGENT INFRASTRUCTUREOPEN SOURCE · MIT

taste — An Operating System for Agents

The industry keeps hitting the same wall: every agent team rebuilds the same plumbing — context management, rollback, multi-agent coordination, transparency — and glues it together with progress files and ad-hoc scripts. The blog post argues that what's actually needed isn't another framework. It's an operating system: a general-purpose substrate where developers plug in agents and the kernel handles orchestration, memory, state, and auditability.

taste is the implementation. A three-core CPU model (Opus 4.7 planner, Sonnet 4.6 workers, Haiku 4.5 monitor) with git as the memory substrate — branches are execution contexts, commits are checkpoints, git reset --hard is rollback, and git worktree gives every parallel worker real filesystem-level process isolation. Three end-to-end demos shipped. More coming soon.

Kernel Loop & Git Substrate

taste — Agent OSGit as the memory substrate · three specialized Claude cores · atomic commit-or-rollback per stepCPU — 3 specialized coresreasoning sandwich: xhigh on plan & verify, standard on implementationPlannerClaude Opus 4.7task → DAG of steps + depsplan.json committed to session branchwaves = steps sharing dep-setsWorkerClaude Sonnet 4.6executes one subtask, minimal ctxtools: read / write / shell (CLI-first)runs inside its own worktreeMonitorClaude Haiku 4.5 · pytestgates every commit; no self-evalwrites verdict to monitor/step-NN.jsonopt-in — composable, not load-bearingKernel dispatches each wave → isolated worktreesWave execution — parallel, isolated worktreeseach step gets its own `git worktree` → filesystem-level process isolationworktree · step-01branch: session/step-01Worker (Sonnet 4.6)Monitor gatepass → commitfail → reset --hardworktree · step-02branch: session/step-02Worker (Sonnet 4.6)Monitor gatepass → commitfail → reset --hardworktree · step-03branch: session/step-03Worker (Sonnet 4.6)Monitor gatepass → commitfail → reset --hardatomic merge-back into session branchMemory — git as the substratebranches = execution contexts · commits = checkpoints · `git show` = demand paging · `reset --hard` = rollbacksession branchinitstep-01step-02step-02'step-03HEADrollback (git reset --hard)retry in fresh worktree.git/taste/events.jsonlevent stream stored out-of-tree —survives `git reset --hard`, preserves auditreplayable timeline for the dashboardBuild to deleteevery subsystem is opt-in. When next-gen modelsself-evaluate, turn off Monitor; kernel survives.The stable layer is the git abstractions.v0 demostodo_api: real Claude · $0.096 · 43s · 15/15 testsrefactor: hermetic rollback on step-2 failparallel: 3 worktrees · 21.5s vs ~32s serial

$0.096

Real-Claude end-to-end demo

todo_api: 43s, 15/15 tests green, zero rollbacks on Sonnet 4.6 (7 calls, 16.5K input / 3.1K output tokens)

33%

Wall-clock reduction (parallel)

Three worktrees running concurrently: 21.5s vs ~32s sequential on matched three-step task

Atomic

Commit-or-rollback per step

Failed step → git reset --hard to last passing checkpoint; session branch stays clean, no zombie commits

Opt-in

Every subsystem is composable

Build to delete: when models self-evaluate reliably, disable the Monitor — kernel API survives untouched

Multi-core CPU

Planner (Opus 4.7) → Worker (Sonnet 4.6) → Monitor (Haiku 4.5) — the reasoning sandwich, structurally separated

Git-native memory

Every kernel artifact is a commit. plan.json, monitor/step-NN.json, agent spec — all versioned, navigable, audit-replayable

Worktree isolation

Each parallel worker gets a real filesystem-level git worktree. No virtual branches, no shared working directory

Monitor gates every commit

pytest (or LLM-judge) evaluates each step before commit. On fail: git reset --hard, retry in fresh worktree

Off-tree event stream

.git/taste/events.jsonl survives git reset --hard — rollback doesn't erase the audit trail

Dashboard from git state

taste dashboard renders timeline, per-step outcomes, and branch topology into one self-contained HTML — htop for agents

PythonGit · WorktreeClaude Opus 4.7Claude Sonnet 4.6Claude Haiku 4.5pytest

Actively building · expect more soon

Architecture deep dive
The full archive

Every project,
one level deeper.

Per-repo architectural deep-dives across three flagship products and five reviewer-verifiable research studies.

VYNN AI

3 repos

AutoCodeRover

2 repos

taste · Agent OS

1 repo

Research

5 studies

/projects/allExplore all
04

Writing & Thoughts

Ideas, architecture, engineering, and lessons learned.

05

Teaching

Graduate TA at Duke — three CS courses.

Architected and led hands-on production engineering for 48 students across 15 teams — Docker, CI/CD, and agentic AI systems shipped as reproducible labs and debugger benchmarks, not slides.

Open-source classroom

Every artifact,
pull and run.

Labs, benchmarks, and the LLM-teammate pipeline — the reproducible infrastructure shipped to 48 students.

llm-teammate

MCP · GitLab pipeline

debugger-benchmarks

Seeded bugs · pytest

lab_docker / lab_pipeline

Production labs

example_ai_agents

FastAPI · Nginx · routing

/teaching/allExplore all