We are looking for strong engineers with experience training production machine learning models. If you are interested in contributing to open-source projects and evolving Modal's infrastructure to train the next generation of language models, we'd love to hear from you!
5+ years of experience writing high-quality, high-performance code.
Experience working with torch and high-level training frameworks (Huggingface, verl, slime)
Experience with ML training optimization (tell us a story about eliminating data loading bottlenecks, overlapping communications with compute, rewriting a trainer to handle off-policy rollouts, etc.)
Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).
Ability to work in-person, in our NYC or San Francisco office.