Project · 2025

titan-engine

High-performance LLM inference, in C++ and CUDA.

in progressC++ · CUDA

Overview

titan-engine is an inference engine for language models focused on performance: custom CUDA kernels, careful memory management and a lightweight C++ runtime.

[Draft] Tell me more details and I'll fill out this sheet.

Features

→CUDA kernels
→C++ runtime
→Memory management
→Dynamic batching

How it works

01Load the model weights into GPU memory.

02Schedule the batch of inference requests.

03Generate tokens with optimized kernels.