FriendliAI, a Redwood City, CA-based startup, is building the next-generation AI inference platform that accelerates the deployment of large language and multimodal models with unmatched performance and efficiency. Our infrastructure supports high-throughput, low-latency AI workloads for organizations worldwide. We are also integrated with the Hugging Face platform, allowing instant access to over 400,000 open-source models. We are on a mission to deliver the world's best platform for generative and agentic AI.
The Role
We are seeking a highly technical Inference Engine Engineer to optimize the performance and efficiency of our core inference engine. You will focus on designing, implementing, and optimizing GPU kernels and supporting infrastructure for next-generation generative and agentic AI workloads. Your work will directly power the most latency-critical and compute-intensive systems deployed by our customers.
The Person
You are an exceptional engineer with a strong foundation in GPU programming and compiler infrastructure. You enjoy pushing the performance boundaries and have experience supporting production-scale machine learning applications.
Key Responsibilities
Qualifications
Preferred Experience
Employment details
We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.