A stealth-stage AI infrastructure company is building a self-healing system for software that automates defect resolution and development. The platform is used by engineering and support teams to:
Autonomously debug problems in production software
Fix issues directly in the codebase
Prevent recurring issues through intelligent root-cause automation
The company is backed by top-tier investors such as Foundation Capital, WndrCo, and Green Bay Ventures, as well as prominent operators including Matei Zaharia, Drew Houston, Dylan Field, Guillermo Rauch, and others.
We believe that as software development accelerates, the burden of maintaining quality and reliability shifts heavily onto engineering and support teams. This challenge creates a rare opportunity to reimagine how software is supported and sustained-with AI-powered systems that respond autonomously. About the Role
We're looking for an experienced backend/infrastructure engineer who thrives at the intersection of systems and AI - and who loves turning research prototypes into rock-solid production services. You'll design and scale the core backend that powers our AI inference stack - from ingestion pipelines and feature stores to GPU orchestration and vector search.
If you care deeply about performance, correctness, observability, and fast iteration, you'll fit right in. What You'll Do
Own mission-critical services end-to-end - from architecture and design reviews to deployment, observability, and service-level objectives.
Scale LLM-driven systems: build RAG pipelines, vector indexes, and evaluation frameworks handling billions of events per day.
Design data-heavy backends: streaming ETL, columnar storage, time-series analytics - all fueling the self-healing loop.
Optimize for cost and latency across compute types (CPUs, GPUs, serverless); profile hot paths and squeeze out milliseconds.
Drive reliability: implement automated testing, chaos engineering, and progressive rollout strategies for new models.
Work cross-functionally with ML researchers, product engineers, and real customers to build infrastructure that actually matters.
You Might Thrive in This Role If You:
Have 2-5+ years of experience building scalable backend or infra systems in production environments
Bring a builder mindset - you like owning projects end-to-end and thinking deeply about data, scale, and maintainability
Have transitioned ML or data-heavy prototypes to production, balancing speed and robustness
Are comfortable with data engineering workflows: parsing, transforming, indexing, and querying structured or unstructured data
Have some exposure to search infrastructure or LLM-backed systems (e.g., document retrieval, RAG, semantic search)
Bonus Points
Experience with vector databases (e.g., pgvector, Pinecone, Weaviate) or inverted-index search (e.g., Elasticsearch, Lucene)
Hands-on with GPU orchestration (Kubernetes, Ray, KServe) or model-parallel inference tuning
Familiarity with Go / Rust (primary stack), with some TypeScript for light full-stack tasks
Deep knowledge of observability tooling (OpenTelemetry, Grafana, Datadog) and profiling distributed systems
Contributions to open-source ML or systems infrastructure projects
Let me know if you'd like a version optimized for careers pages, job boards, or stealth pitch decks.
Apply for this Job
Please use the APPLY HERE link below to view additional details and application instructions.