Engineering Manager, Evaluations & Observability
Nearly every company in the world runs on custom software for critical operations like tracking performance metrics, handling customer support workflows, building admin dashboards, and countless other processes you might not have even thought of. But most companies don't have adequate resources to properly invest in these tools, leading to a lot of old and clunky internal software or, even worse, users still stuck in manual and spreadsheet flows.
At Retool, we're building the first enterprise AppGen platform: software that transforms natural language into production-ready code, integrates directly with business data, and meets the highest standards of security and governance. AI is redefining what it means to build softwareand who gets to build it. The definition of "developer" now includes analysts, operators, and domain experts creating solutions directly. As the pool of builders widens, so does the complexity of what they need to build. The opportunity is enormous, but so is the challenge of enabling this larger community to build production-grade software safely. That means AI that understands real business data, enforces enterprise policies automatically, and empowers teams to create once and reuse everywhere with shared, trusted components.
Over 100 million hours of work has been automated by developers and domain experts using our platform, freeing them to focus on creative problem-solving and strategic initiatives that drive real business value. The people closest to knowing what needs to be built can now safely create custom solutions within enterprise guardrails. And that's a mission worth striving for.
Let's build the future together!
Our engineering leaders are at the forefront of Retool's product development, bridging the gap between engineering excellence and customer impact. We look for leaders who not only bring strong technical expertise but also the strategic vision to shape Retool's product direction, balancing day-to-day execution with long-term thinking.
In this role, you'll lead Retool's Evaluations & Observability platform. You'll own setting the bar for what "good" looks like across our newly-launched Assist experience, making sure it works and works consistently, at scale. You'll build the systems, tools, and culture that let us measure, understand, and improve quality in real time, driving relentless iteration across everything we ship.
You'll guide engineers focused on:
- Evaluation platforms: building the frameworks that let us test and compare performance across LLM providers and model versions.
- Quality systems: defining and enforcing rubrics, metrics, and evaluation loops that answer the hardest question in AI: "Is this actually good?"
- Data curation: managing the datasets that power and test our AI models, sourced from real-world usage to keep our systems grounded and relevant.
- Search & retrieval quality: owning the retrieval layer that underpins both AI and non-AI experiences ensuring results are relevant, accurate, and fast.
- Reusable AI quality infrastructure: creating the building blocks (evaluation tools, pipelines, and feedback systems) that other teams can leverage to maintain quality across Retool's AI surface area.
- Culture of continuous improvement: embed a data-driven approach to AI quality, where experimentation and measurement are the default as we scale our capabilities.
In this role, you will:
- Communicate and collaborate effectively with Product and other Engineering counterparts
- Manage a team of engineers; support the team by identifying growth opportunities, providing continuous feedback, and performance management as appropriate.
- Understand the needs of our Assist roadmap, helping define rubrics & automated systems that allow engineers to iterate quickly on product features, with confidence.
- Establish and define your team's strategy to ensure execution maximizes business impact.
- Introduce scalable, repeatable processes that help engineering and product teams deliver a successful product.
- Partner with recruiting on building out a diverse team of exceptionally motivated engineers.
The skillset you'll bring:
- 3+ years of experience successfully leading and managing teams.
- Familiarity with AI evaluation & observability systems we use Braintrust, but exposure to general evaluation & LLM observability systems is a plus.
- Deep technical curiosity. You'll be writing code, engaging deeply on design and architecture, and tinkering to uncover what's really happening under the hood.
- A strong ability to champion an engineering team through macro process changes
- A history of orienting teams and setting strategy toward engineering goals, with a bias toward impact.
- A propensity to operate as a business-ownercaring deeply about our customers, product, and team.
- Thoughtfulness around engineering culture, process, and identity.
- Proficiency in navigating through ambiguity, managing stakeholders, communicating in a structured manner, as well as driving maximal accountability and excellence.
- A collaborative skillset to partner effectively with product, design, and go-to-market teams.
San Francisco
$188,400 - $251,900 USD