// infrastructure · performance · ml
Matthew Leone
I build and tune the systems that make machine learning run in production — high-performance AI infrastructure, GPU-level performance work, and the MLOps plumbing in between. My background spans virtualization, kernel development, and systems performance, which is mostly a long way of saying I'm comfortable wherever the bottleneck happens to be.
I write about it at The Hypervisor.
open work
- [01]
research
A working notebook of latency-sensitive infrastructure experiments — CUDA, load balancers, deployment patterns. Where blog posts get their proofs of concept.
github · public
- [02]
the hypervisor
Technical writing on AI infrastructure, performance, and the operational side of running ML systems. CUDA primers, local LLM deployment, reverse-proxy internals.
substack · ongoing
writing
- 2025.08
Running your first CUDA kernel
A from-scratch walkthrough — host code, device code, why GPUs matter for the math you actually do.
- 2025.01
How to run DeepSeek-R1 locally
Practical guide to getting a frontier-class open model running on consumer hardware via Ollama.
- 2025.01
Getting started with Traefik
What reverse proxies actually do, why service-oriented systems need them, and a working example.