Posts
-
NanoGPT: Tree-Based Speculative Decoding
-
NanoGPT: Inference Profiler
-
NanoGPT: Guided Decoding
-
NanoGPT: CUDA Graph Replay
-
NanoGPT: Evaluation Harness
-
NanoGPT: Fused Multi-Head Attention
-
NanoGPT: Disaggregated Prefill and Decode
-
NanoGPT: Sliding Window Eviction
-
Comparing NanoGPT vs SGLang
-
Comparing NanoGPT vs vLLM
-
NanoGPT - Adding a Simple HTTP Server
-
NanoGPT - Radix Tree Prefix Caching
-
NanoGPT - Correctness Tests for an Inference Engine
-
Part 3: Speculative Decoding - Trading Accuracy for Parallelism
-
Adding Trigram to Speculative Decoding
-
Testing Correctness Across Every Inference Optimization
-
Benchmarking NanoGPT Inference, Part 2: Teaching The Server To Spend Its Next Forward Pass
-
Part 1: Benchmarking KV Cache, Continuous Batching, and Chunked Prefill
-
Adding Interleaving to NanoGPT
-
Adding Speculative Decoding to NanoGPT
-
Adding Paged Attention to NanoGPT
-
Adding Prefix Caching to NanoGPT
-
Quantizing NanoGPT
-
Adding Scheduling to NanoGPT
-
Adding Chunked Prefill to NanoGPT
-
Adding Continuous Batching to NanoGPT
-
Adding KV Cache to NanoGPT
-
How I use AI Agents for coding in 2026
-
Building a Knowledge Base for AI Agents
-
I Trained an AI to Speak Like JFK
-
Building a Local Voice Agent on CPU
-
making money should be harder
-
the mental cost
-
agents doing research? it's too early...
-
i made my personal website ai-friendly!
-
my life is already simple without ai, what now?
-
america's biggest problem
subscribe via RSS