Colin Zhou blog

Posts

Jun 27, 2026
NanoGPT: Tree-Based Speculative Decoding
Jun 25, 2026
NanoGPT: Inference Profiler
Jun 23, 2026
NanoGPT: Guided Decoding
Jun 21, 2026
NanoGPT: CUDA Graph Replay
Jun 21, 2026
NanoGPT: Evaluation Harness
Jun 17, 2026
NanoGPT: Fused Multi-Head Attention
Jun 16, 2026
NanoGPT: Disaggregated Prefill and Decode
Jun 13, 2026
NanoGPT: Sliding Window Eviction
Jun 11, 2026
Comparing NanoGPT vs SGLang
Jun 11, 2026
Comparing NanoGPT vs vLLM
Jun 10, 2026
NanoGPT - Adding a Simple HTTP Server
Jun 9, 2026
NanoGPT - Radix Tree Prefix Caching
Jun 8, 2026
NanoGPT - Correctness Tests for an Inference Engine
Jun 6, 2026
Part 3: Speculative Decoding - Trading Accuracy for Parallelism
Jun 5, 2026
Adding Trigram to Speculative Decoding
Jun 5, 2026
Testing Correctness Across Every Inference Optimization
Jun 4, 2026
Benchmarking NanoGPT Inference, Part 2: Teaching The Server To Spend Its Next Forward Pass
May 29, 2026
Part 1: Benchmarking KV Cache, Continuous Batching, and Chunked Prefill
May 29, 2026
Adding Interleaving to NanoGPT
May 26, 2026
Adding Speculative Decoding to NanoGPT
May 24, 2026
Adding Paged Attention to NanoGPT
May 22, 2026
Adding Prefix Caching to NanoGPT
May 20, 2026
Quantizing NanoGPT
May 17, 2026
Adding Scheduling to NanoGPT
May 13, 2026
Adding Chunked Prefill to NanoGPT
May 11, 2026
Adding Continuous Batching to NanoGPT
May 10, 2026
Adding KV Cache to NanoGPT
Apr 18, 2026
How I use AI Agents for coding in 2026
Apr 16, 2026
Building a Knowledge Base for AI Agents
Apr 7, 2026
I Trained an AI to Speak Like JFK
Apr 5, 2026
Building a Local Voice Agent on CPU
Apr 4, 2026
making money should be harder
Mar 21, 2026
the mental cost
Mar 21, 2026
agents doing research? it's too early...
Mar 21, 2026
i made my personal website ai-friendly!
Mar 21, 2026
my life is already simple without ai, what now?
Mar 20, 2026
america's biggest problem