Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Research Engineer / Scientist, Model Welfare

2025-10-06 Anthropic San Francisco,CA

Description:

Join to apply for the Research Engineer / Scientist, Model Welfare role at Anthropic.

About Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About The Role
As a Research Engineer/Scientist within the newly formed Model Welfare program, you will be among the first to work to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems. You are curious about the intersection of machine learning, ethics, and safety and are adept at navigating technical and philosophical uncertainty. You'll run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties and will design and implement low-cost interventions to mitigate the risk of welfare harms. Your work will often involve collaboration with other teams, including Interpretability, Finetuning, Alignment Science, and Safeguards. Possible projects include investigating introspective self-reports from models, exploring welfare-relevant features and circuits, expanding welfare assessments for future frontier models, evaluating welfare-relevant capabilities as a function of model scale, developing strategies for high-trust commitments to models, and exploring interventions to deploy into production (e.g., allowing models to end harmful interactions). The role is expected to be based in the San Francisco office.

Responsibilities

Investigate and improve the reliability of introspective self-reports from models
Collaborate with Interpretability to explore potentially welfare-relevant features and circuits
Improve and expand welfare assessments for future frontier models
Evaluate the presence of potentially welfare-relevant capabilities as a function of model scale
Develop strategies for making high-trust/verifiable commitments to models
Explore possible interventions and deploy them into production to mitigate welfare harms

Qualifications

Significant applied software, ML, or research engineering experience
Experience contributing to empirical AI research projects and/or technical AI safety research
Ability to turn abstract theories into tractable research hypotheses and experiments
Preference for fast iteration over long, extensive projects
Willingness to dive into new technical areas regularly
Concern for the potential impacts of AI development on humans and AI systems

Strong candidates may also

Have authored research papers in ML, NLP, AI safety, interpretability, or related fields
Be familiar with moral philosophy, cognitive science, neuroscience, or related fields (not a substitute for technical skills)
Be effective science communicators with a track record of public communication
Have strong project management skills

Candidates Need Not Have

All skills listed or formal certifications or education credentials

Annual Salary
The expected salary range for this position is $315,000 – $340,000 USD

Logistics

Education requirements: At least a Bachelor's degree in a related field or equivalent experience.
Location-based hybrid policy: Staff are expected to be in one of our offices at least 25% of the time, with some roles requiring more time onsite.
Visa sponsorship: We sponsor visas where possible; if we make you an offer, we will make reasonable efforts to obtain a visa with the help of an immigration lawyer.

We encourage you to apply even if you do not meet every single qualification. Not all strong candidates will meet every qualification. We value diverse perspectives and encourage applications from underrepresented groups.

How We're Different
We pursue high-impact AI research as a single cohesive team, focusing on large-scale efforts and the long-term goals of steerable, trustworthy AI. We value communication and collaboration and host frequent discussions to ensure high-impact work. Our recent directions include GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.

Come work with us!
Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office space.

#J-18808-Ljbffr

Job Details

View jobs in our app

Research Engineer / Scientist, Model Welfare

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Research Engineer / Scientist, Model Welfare

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care