Project · 2025
titan-engine
High-performance LLM inference, in C++ and CUDA.
in progressC++ · CUDA
Overview
titan-engine is an inference engine for language models focused on performance: custom CUDA kernels, careful memory management and a lightweight C++ runtime.
[Draft] Tell me more details and I'll fill out this sheet.
Features
- →CUDA kernels
- →C++ runtime
- →Memory management
- →Dynamic batching
How it works
01Load the model weights into GPU memory.
02Schedule the batch of inference requests.
03Generate tokens with optimized kernels.