PLE LayerLookup

Gemma-style Per-Layer Embeddings trained from scratch. Every transformer layer gets its own tiny token table, streamed per-token from a memory-mapped file. Beats dense at matched inference VRAM — 149.88 PPL vs 160.39 PPL.

pytorchpythontransformersmmaparchitecture

Source Code

Related projects

More things from the same systems rabbit hole.

Project

LLM Inference Engine in Rust

A Rust-based LLM inference engine focused on efficient model execution, token generation, and low-level systems performance.

Project

Namaste Screen

🚀 Introducing NamasteScreen: The Launcher That Fights Phone Addiction! 🇮🇳 🔥 Inspired by my conversation with Hon'ble Prime Minister of India Narendra Modi about Digital Well-Being

Project

Math Explainer Agent

This was converted into research by University of Waterloo