Job Details

Sr. Machine Learning Engineer, Amazon General Intelligence (AGI)

  2025-11-27     Amazon     San Francisco,CA  
Description:

Sr. Machine Learning Engineer, Amazon General Intelligence (AGI)

Our Machine Learning training infrastructure (ML Infra) team is responsible for designing, implementing, and optimizing large-scale computing infrastructure that powers our cutting-edge AI and machine learning initiatives. We leverage advanced hardware, innovative software architectures, and distributed computing techniques to enable breakthrough research and product development across the company.

We are seeking a Senior Machine Learning Engineer to join our team and lead the development of our next-generation ML training infrastructure. This is a high impact, high visibility role that will shape the future of our machine learning capabilities and contribute to the advancement of AI technology across the industry.

Key Responsibilities

  • Lead the definition, design, architecture quality, implementation, and delivery of the most advanced, complex and cross-cutting challenges spanning our ML infrastructure.
  • Align teams in ML Infrastructure and related organizations to a coherent technical vision and deliver systems that fit well together.
  • Influence multiple teams, increasing their productivity and effectiveness. Hold peers and teams to a high bar for performance and efficiency.
  • Guide difficult trade-off decisions and drive awareness about the impact and consequences of technical decisions on AI research and product development.
  • Demonstrate significant innovation, creativity, and judgment when solving challenging AI/ML infrastructure problems.
  • Identify future skills needed across the organization and advocate for skill development or acquisition to senior leaders. Scout top talent and recruit them to the company.
  • Actively mentor senior and Principal engineers, scale yourself by developing and institutionalizing best practices in AI/ML infrastructure and distributed computing across the organization.

A day in the life

  • 8+ years of professional software development experience in distributed systems with emphasis on ML infrastructure.
  • 8+ years of current programming experience building ML infrastructure using languages such as Python, C++, or Rust.
  • Hands‑on experience with parallel computing platforms such as CUDA, OpenMP, etc.
  • Deep understanding of AI frameworks such as PyTorch, TensorFlow, and JAX, and their demands on underlying compute infrastructure, memory bandwidth, network interconnect, and storage as scale goes up.
  • Knowledge of emerging AI hardware accelerators and architectures.
  • Experience with containerization and orchestration technologies (Docker, Kubernetes).
  • Experience with cloud computing platforms (AWS, Azure, GCP) and their offerings.

Basic Qualifications

  • 5+ years of non‑internship professional software development experience.
  • 5+ years of programming with at least one software programming language.
  • 5+ years of leading design or architecture of new and existing systems.
  • Experience as a mentor, tech lead or leading an engineering team.

Preferred Qualifications

  • 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
  • Bachelor's degree in computer science or equivalent.

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, please visit for more information.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search