Join to apply for the Research Engineer / Scientist, Model Welfare role at Anthropic.
About Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
About The Role
As a Research Engineer/Scientist within the newly formed Model Welfare program, you will be among the first to work to better understand, evaluate, and address concerns about the potential welfare and moral status of AI systems. You are curious about the intersection of machine learning, ethics, and safety and are adept at navigating technical and philosophical uncertainty. You'll run technical research projects to investigate model characteristics of plausible relevance to welfare, consciousness, or related properties and will design and implement low-cost interventions to mitigate the risk of welfare harms. Your work will often involve collaboration with other teams, including Interpretability, Finetuning, Alignment Science, and Safeguards. Possible projects include investigating introspective self-reports from models, exploring welfare-relevant features and circuits, expanding welfare assessments for future frontier models, evaluating welfare-relevant capabilities as a function of model scale, developing strategies for high-trust commitments to models, and exploring interventions to deploy into production (e.g., allowing models to end harmful interactions). The role is expected to be based in the San Francisco office.
Responsibilities
Qualifications
Strong candidates may also
Candidates Need Not Have
Annual Salary
The expected salary range for this position is $315,000 – $340,000 USD
Logistics
We encourage you to apply even if you do not meet every single qualification. Not all strong candidates will meet every qualification. We value diverse perspectives and encourage applications from underrepresented groups.
How We're Different
We pursue high-impact AI research as a single cohesive team, focusing on large-scale efforts and the long-term goals of steerable, trustworthy AI. We value communication and collaboration and host frequent discussions to ensure high-impact work. Our recent directions include GPT-3, Circuit-Based Interpretability, Multimodal Neurons, Scaling Laws, AI & Compute, Concrete Problems in AI Safety, and Learning from Human Preferences.
Come work with us!
Anthropic is a public benefit corporation headquartered in San Francisco. We offer competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office space.