Project · 2025

titan-engine

High-performance LLM inference, in C++ and CUDA.

in progressC++ · CUDA

Overview

titan-engine is an inference engine for language models focused on performance: custom CUDA kernels, careful memory management and a lightweight C++ runtime.

[Draft] Tell me more details and I'll fill out this sheet.

Features

  • CUDA kernels
  • C++ runtime
  • Memory management
  • Dynamic batching

How it works

01Load the model weights into GPU memory.
02Schedule the batch of inference requests.
03Generate tokens with optimized kernels.