Matthew Leone

I build and tune the systems that make machine learning run in production — high-performance AI infrastructure, GPU-level performance work, and the MLOps plumbing in between. My background spans virtualization, kernel development, and systems performance, which is mostly a long way of saying I'm comfortable wherever the bottleneck happens to be.

I write about it at The Hypervisor.

github
linkedin
credly
email

open work

[01]
research
A working notebook of latency-sensitive infrastructure experiments — CUDA, load balancers, deployment patterns. Where blog posts get their proofs of concept.
github · public
[02]
the hypervisor
Technical writing on AI infrastructure, performance, and the operational side of running ML systems. CUDA primers, local LLM deployment, reverse-proxy internals.
substack · ongoing

writing

2025.08
Running your first CUDA kernel
A from-scratch walkthrough — host code, device code, why GPUs matter for the math you actually do.
2025.01
How to run DeepSeek-R1 locally
Practical guide to getting a frontier-class open model running on consumer hardware via Ollama.
2025.01
Getting started with Traefik
What reverse proxies actually do, why service-oriented systems need them, and a working example.

→ full archive at thehypervisor.blog

Matthew Leone

open work

research

the hypervisor

writing

Running your first CUDA kernel

How to run DeepSeek-R1 locally

Getting started with Traefik