Browser-native speculative decoding for Qwen3-1.7B using AngelSlim's pretrained EAGLE-3 head. 100% WebGPU, no server. Click Race to watch it run against the greedy baseline on your GPU.
Pre-EAGLE-3 SpecController against a 0.5B draft of the same family. Qwen2.5-1.5B + Qwen2.5-0.5B-Instruct, hits a real 1.45× mean speedup. Kept as a working comparison and proof the spec-decoding plumbing works end-to-end.
Ran the EAGLE-3 race above first? The browser may hit its storage quota loading these models. Click Clear browser cache below first (page will reload).
Individual model runs and decoder paths, useful when debugging the fork.