Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Research Scientist, Interpretability

2025-12-01 Anthropic San Francisco,CA

Description:

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role

When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?" The Interpretability team at Anthropic is working to reverse‑engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. We're looking for researchers and engineers to join our efforts. We focus on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as doing "biology" or "neuroscience" of neural networks using microscopes we build, or treating neural networks as binary computer programs we're trying to reverse‑engineer.

We aim to create a solid foundation for mechanistically understanding neural networks and making them safe (see our vision post). In the short term, we have focused on resolving the issue of "superposition" (see Toy Models of Superposition, Superposition, Memorization, and Double Descent, and our May 2023 update), which causes the computational units of the models, like neurons and attention heads, to be individually uninterpretable, and on finding ways to decompose models into more interpretable components. Our subsequent work found millions of features in Sonnet, one of our production language models, represents progress in this direction. In our most recent work, we develop methods that allow us to build circuits using features and use these circuits to understand the mechanisms associated with a model's computation and study specific examples of multi‑hop reasoning, planning, and chain‑of‑thought faithfulness on Haiku 3.5, one of our production models. This is a stepping stone towards our overall goal of mechanistically understanding neural networks. We often collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts to use our work to make Anthropic's models safer. We also have an Interpretability Architectures project that involves collaborating with Pretraining.

Responsibilities

Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
Design and run robust experiments, both quickly in toy scenarios and at scale in large models
Create and analyze new interpretability features and circuits to better understand how models work
Build infrastructure for running experiments and visualizing results
Work with colleagues to communicate results internally and publicly

You may be a good fit if you

Have a strong track record of scientific research (in any field), and have done some work on interpretability
Enjoy team science – working collaboratively to make big discoveries
Are comfortable with messy experimental science. We're inventing the field as we work, and the first textbook is years away
You view research and engineering as two sides of the same coin. Every team member writes code, designs and runs experiments, and interprets results
You can clearly articulate and discuss the motivations behind your work, and teach us about what you've learned. You like writing up and communicating your results, even when they're null
Familiarity with Python is required for this role

Role Specific Location Policy

This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case‑by‑case basis.

Compensation

$315,000 - $560,000 USD

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.

Location‑based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. If we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

EEO Statement

As set forth in Anthropic's Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Voluntary Self‑Identification of Disability

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

Alcohol or other substance use disorder (not currently using drugs illegally)
Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
Blind or low vision
Cancer (past or present)
Cardiovascular or heart disease
Celiac disease
Cerebral palsy
Deaf or serious difficulty hearing
Diabetes
Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
Epilepsy or other seizure disorder
Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
Intellectual or developmental disability
Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
Missing limbs or partially missing limbs
Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
Nervous system condition, for example, migraine headaches, Parkinson's disease, multiple sclerosis (MS)
Neurodivergence, for example, attention‑deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
Partial or complete paralysis (any cause)
Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
Short stature (dwarfism)
Traumatic brain injury

Public burden statement: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.

#J-18808-Ljbffr

Job Details

View jobs in our app

Research Scientist, Interpretability

About the role

Responsibilities

You may be a good fit if you

Role Specific Location Policy

Compensation

Logistics

EEO Statement

Voluntary Self‑Identification of Disability

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Research Scientist, Interpretability

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care