Spectra: EAGLE-3 in the browser.

Browser-native speculative decoding for Qwen3-1.7B using AngelSlim's pretrained EAGLE-3 head. 100% WebGPU, no server. Click Race to watch it run against the greedy baseline on your GPU.

GitHub → Postmortem → AngelSlim head →

Baseline

67 tok/s

Qwen3-1.7B greedy (M3 Mac)

EAGLE-3

40 tok/s

α = 0.39, γ = 2

Correctness

✓ identical

byte-equal to baseline

idle — click Race to load models (~30s first time)

max tokens γ

Baseline

Qwen3-1.7B greedy decode

—

(waiting)

EAGLE-3 · Spectra V2

+ head KV reuse + carry-over verify

—

(waiting)

Earlier work · 1.45× speedup on Qwen2.5 (Phase 4)

Pre-EAGLE-3 SpecController against a 0.5B draft of the same family. Qwen2.5-1.5B + Qwen2.5-0.5B-Instruct, hits a real 1.45× mean speedup. Kept as a working comparison and proof the spec-decoding plumbing works end-to-end.

Ran the EAGLE-3 race above first? The browser may hit its storage quota loading these models. Click Clear browser cache below first (page will reload).

max tokens γ

Baseline

Qwen2.5-1.5B greedy

—

(waiting)

Spectra · EAGLE-1 spec

1.5B target + 0.5B draft

—

(waiting)

Developer debug controls

Individual model runs and decoder paths, useful when debugging the fork.

γ (decoder default) 2

(no output yet)

Console log