Job Details

Machine Learning Engineer - Model Performance

  2025-09-08     inference.net     San Francisco,CA  
Description:

Join to apply for the Machine Learning Engineer - Model Performance role at inference.net

Join to apply for the Machine Learning Engineer - Model Performance role at inference.net

Inference.net is seeking a Machine Learning Engineer to join our team, focusing on optimizing the performance of our cutting-edge AI inference systems. This role involves working with state-of-the-art large language models and ensuring they run efficiently and effectively at scale. You will be responsible for deploying state-of-the-art models at scale and performing optimizations to increase throughput and enable new features. This position offers the chance to collaborate closely with our engineering team and make significant contributions to open source projects, like SGLang and vLLM.

About Inference.net

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute that can be used for running large-language models like DeepSeek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network.

We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems. We primarily work in-person from our office in downtown San Francisco. Our investors include A16z CSX and Multicoin. We are high-agency, adaptable, and collaborative. We value creativity alongside technical prowess and humility. We work hard, and deeply enjoy the work that we do.

Responsibilities

  • Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models
  • Deploy and maintain large language models at scale in production environments
  • Deploy new models as they are released by frontier labs
  • Implement techniques like quantization, speculative decoding, and KV cache reuse
  • Contribute regularly to open source projects such as SGLang and vLLM
  • Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues
  • Collaborate with the engineering team to bring new features and capabilities to our inference platform
  • Develop robust and scalable infrastructure for AI model serving
  • Create and maintain technical documentation for inference systems

Requirements

  • 3+ years of experience writing high-performance, production-quality code
  • Strong proficiency with Python and deep learning frameworks, particularly PyTorch
  • Demonstrated experience with LLM inference optimization techniques
  • Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred
  • Familiarity with Docker and Kubernetes for containerized deployments
  • Experience with CUDA programming and GPU optimization
  • Strong understanding of distributed systems and scalability challenges
  • Proven track record of optimizing AI models for production environments
  • Nice to Have

  • Familiarity with TensorRT and TensorRT-LLM
  • Knowledge of vision models and multimodal AI systems
  • Experience implementing techniques like quantization and speculative decoding
  • Contributions to open source machine learning projects
  • Experience with large-scale distributed computing
  • Compensation

    We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including :

  • Full healthcare coverage
  • Quarterly offsites
  • Flexible PTO
  • Equal Opportunity

    Inference.net is an equal opportunity employer. We welcome applicants from all backgrounds and don't discriminate based on race, color, religion, gender, sexual orientation, national origin, genetics, disability, age, or veteran status.

    If you're passionate about building the next generation of high-performance systems that push the boundaries of what's possible with large language models, we want to hear from you!

    Seniority level

    Seniority level

    Not Applicable

    Employment type

    Employment type

    Full-time

    Job function

    Job function

    Engineering and Information Technology

    Industries

    Software Development

    Referrals increase your chances of interviewing at inference.net by 2x

    Sign in to set job alerts for "Machine Learning Engineer" roles.

    San Francisco, CA $115,000.00-$185,000.00 5 days ago

    San Francisco, CA $140,000.00-$180,000.00 5 months ago

    San Francisco, CA $175,000.00-$225,000.00 8 months ago

    San Francisco, CA $150,000.00-$225,000.00 3 months ago

    AI / ML Engineer (Founding Technical Team)

    San Francisco, CA $145,000.00-$175,000.00 1 month ago

    San Francisco, CA $100,000.00-$300,000.00 1 month ago

    San Francisco, CA $140,000.00-$160,000.00 4 months ago

    Research Engineer - Machine Learning (ML)

    San Francisco, CA $140,000.00-$200,000.00 3 weeks ago

    San Mateo, CA $140,000.00-$210,000.00 1 month ago

    San Mateo, CA $195,000.00-$255,000.00 7 months ago

    San Francisco, CA $150,000.00-$195,000.00 3 days ago

    San Francisco, CA $150,000.00-$225,000.00 2 weeks ago

    Machine Learning Engineer, Identity Product

    San Francisco, CA $212,000.00-$318,000.00 4 days ago

    Software Engineer - Data Acquisition / Web Crawling

    San Francisco, CA $225,000.00-$325,000.00 6 months ago

    San Francisco, CA $85,000.00-$120,000.00 1 hour ago

    San Francisco, CA $133,687.50-$178,250.00 3 days ago

    San Francisco, CA $140.00-$210.00 8 months ago

    ML Research Engineer, Foundation Models (Senior / Staff / Principal)

    San Francisco, CA $85,000.00-$300,000.00 4 weeks ago

    San Francisco, CA $140,000.00-$250,000.00 1 month ago

    We're unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

    J-18808-Ljbffr

    #J-18808-Ljbffr


    Apply for this Job

    Please use the APPLY HERE link below to view additional details and application instructions.

    Apply Here

    Back to Search