We are looking for exceptional researchers and research engineers to design and build the next generation of AI benchmarks. You will create high-impact, challenging evaluations that push the boundaries of what we can measure in foundation models. This role is perfect for someone with deep research expertise who wants to see their work directly influence how the world evaluates AI systems.
You will lead the design and development of novel benchmarks that assess real-world capabilities of LLMs. Our benchmark shape how foundation models are developed and generative AI applications are built. We work with all the major foundation model labs, some of the largest financial institutions, and hospital systems in the world. Our work has been featured by the Wall Street Journal, Washington Post, and Bloomberg.
We are building the standard for evaluating the ability of LLMs to perform real-world tasks. You will be at the forefront of defining what that standard looks like.
Founding team: The core methodology behind this platform comes from NLP evaluation research we had done at Stanford. We raised a $5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work.
Tech stack: Our frontend is built in React with TSX. We use Django as our back-end framework. All of the infra is on AWS.
Know someone who would be a good fit? Connect them with ...@vals.ai. If we hire them and they stay on for 90 days youll get a $10,000 referral bonus and Vals AI merch!
#J-18808-Ljbffr