PLE LayerLookup
Project

Systems of Ashish

PLE LayerLookup

Gemma-style Per-Layer Embeddings trained from scratch. Every transformer layer gets its own tiny token table, streamed per-token from a memory-mapped file. Beats dense at matched inference VRAM — 149.88 PPL vs 160.39 PPL.

pytorchpythontransformersmmaparchitecture

Related projects

More things from the same systems rabbit hole.