Job Details

Research Scientist, Interpretability

  2025-12-01     Anthropic     San Francisco,CA  
Description:

Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About the role

When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?" The Interpretability team at Anthropic is working to reverse‑engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. We're looking for researchers and engineers to join our efforts. We focus on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as doing "biology" or "neuroscience" of neural networks using microscopes we build, or treating neural networks as binary computer programs we're trying to reverse‑engineer.

We aim to create a solid foundation for mechanistically understanding neural networks and making them safe (see our vision post). In the short term, we have focused on resolving the issue of "superposition" (see Toy Models of Superposition, Superposition, Memorization, and Double Descent, and our May 2023 update), which causes the computational units of the models, like neurons and attention heads, to be individually uninterpretable, and on finding ways to decompose models into more interpretable components. Our subsequent work found millions of features in Sonnet, one of our production language models, represents progress in this direction. In our most recent work, we develop methods that allow us to build circuits using features and use these circuits to understand the mechanisms associated with a model's computation and study specific examples of multi‑hop reasoning, planning, and chain‑of‑thought faithfulness on Haiku 3.5, one of our production models. This is a stepping stone towards our overall goal of mechanistically understanding neural networks. We often collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts to use our work to make Anthropic's models safer. We also have an Interpretability Architectures project that involves collaborating with Pretraining.

Responsibilities

  • Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights
  • Design and run robust experiments, both quickly in toy scenarios and at scale in large models
  • Create and analyze new interpretability features and circuits to better understand how models work
  • Build infrastructure for running experiments and visualizing results
  • Work with colleagues to communicate results internally and publicly

You may be a good fit if you

  • Have a strong track record of scientific research (in any field), and have done some work on interpretability
  • Enjoy team science – working collaboratively to make big discoveries
  • Are comfortable with messy experimental science. We're inventing the field as we work, and the first textbook is years away
  • You view research and engineering as two sides of the same coin. Every team member writes code, designs and runs experiments, and interprets results
  • You can clearly articulate and discuss the motivations behind your work, and teach us about what you've learned. You like writing up and communicating your results, even when they're null
  • Familiarity with Python is required for this role

Role Specific Location Policy

This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case‑by‑case basis.

Compensation

$315,000 - $560,000 USD

Logistics

Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.

Location‑based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.

Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. If we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.

EEO Statement

As set forth in Anthropic's Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Voluntary Self‑Identification of Disability

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.

  • Alcohol or other substance use disorder (not currently using drugs illegally)
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS
  • Blind or low vision
  • Cancer (past or present)
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or serious difficulty hearing
  • Diabetes
  • Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders
  • Epilepsy or other seizure disorder
  • Gastrointestinal disorders, for example, Crohn's Disease, irritable bowel syndrome
  • Intellectual or developmental disability
  • Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD
  • Missing limbs or partially missing limbs
  • Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports
  • Nervous system condition, for example, migraine headaches, Parkinson's disease, multiple sclerosis (MS)
  • Neurodivergence, for example, attention‑deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities
  • Partial or complete paralysis (any cause)
  • Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema
  • Short stature (dwarfism)
  • Traumatic brain injury

Public burden statement: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search