Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Member of Technical Staff

2025-11-21 Cerebro San Francisco,CA

Description:

Supporting the USA's leading startups with world class AI & Robotics Talent | Co-Founder of Mentors in Machine Learning | Recruitment like a 5* hotel

Join the Frontier: Research Engineer, AI Benchmarking

Are you passionate about shaping how the world measures and trusts AI? We're seeking exceptional AI researchers and engineers to architect the next generation of LLM benchmarks—impacting how foundation models evolve and are adopted globally.

Your work will define the standards by which LLMs are judged—from everyday applications to breakthroughs in finance, healthcare, and beyond. You'll design, build, and analyze cutting‑edge evaluation pipelines, collaborating with leading model labs and enterprises. If you thrive at the intersection of deep research and real‑world impact, this is your stage.

What You'll Do

Invent and build new benchmarks that test the boundaries of LLMs in real‑world scenarios
Conduct rigorous research to ensure benchmarks are robust, valid, and actionable
Collaborate with AI labs and enterprise partners to identify emerging evaluation needs
Analyze and interpret model performance, communicating insights to diverse audiences
Publish and present research findings in top venues, contributing to the evaluation community
Work closely with infra engineers to scale your benchmark designs
Stay ahead of the curve on LLM capabilities and evaluation methodologies

Your Background

Advanced research experience: MS/PhD in CS, NLP, ML, or related field (exceptional undergrads considered)
Publication record: Papers at NeurIPS, ICML, ACL, EMNLP, etc.—especially on NLP, ML evaluation, or benchmarking
Python proficiency for prototyping and experimentation
Excellent communicator, able to synthesize complex ideas for all audiences
Collaborative spirit: Experience working in research teams, open to feedback
Portfolio: Evidence of impactful research

Location: In‑person in San Francisco. Relocation/transportation support provided.

Bonus Points

Experience with LLM evaluation, benchmarking, or foundation models
Collaboration with industry or applied research partners
Background in HCI, psychology, or domain‑specific evaluation
Startup or early‑stage lab experience
Contributions to open‑source evaluation tools/datasets

What's in It for You?

Competitive salary & meaningful equity
Relocation and transit support
Unlimited PTO
Opportunities to publish, present, and shape the field

Who We Are

Our founding team brings together leading experience from top research institutions and industry giants. The platform's core is rooted in advanced NLP evaluation research and is backed by premier investors. Our collective work is highly cited, and we're committed to setting the gold standard for AI benchmarking. Tech stack: React (TSX) frontend, Django backend, AWS infra.

What Matters Most

Raw intelligence and research ability trump pedigree. We care about what you can build and discover.
Ownership: We move fast and expect initiative. You'll have autonomy and a chance to make a visible impact.
Intensity: The LLM landscape evolves at breakneck speed. We need researchers who thrive in a dynamic, high‑execution environment.
Solution focus: Every evaluation challenge is an opportunity to innovate.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Information Technology

Industries

Technology, Information & Media and Research Services

#J-18808-Ljbffr

Job Details

View jobs in our app

Member of Technical Staff

Supporting the USA's leading startups with world class AI & Robotics Talent | Co-Founder of Mentors in Machine Learning | Recruitment like a 5* hotel

What You'll Do

Your Background

Bonus Points

What's in It for You?

Who We Are

What Matters Most

Seniority level

Employment type

Job function

Industries

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Member of Technical Staff

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care