Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
When you see what modern language models are capable of, do you wonder, "How do these things work? How can we trust them?" The Interpretability team at Anthropic is working to reverse‑engineer how trained models work because we believe that a mechanistic understanding is the most robust way to make advanced systems safe. We're looking for researchers and engineers to join our efforts. We focus on mechanistic interpretability, which aims to discover how neural network parameters map to meaningful algorithms. Some useful analogies might be to think of us as doing "biology" or "neuroscience" of neural networks using microscopes we build, or treating neural networks as binary computer programs we're trying to reverse‑engineer.
We aim to create a solid foundation for mechanistically understanding neural networks and making them safe (see our vision post). In the short term, we have focused on resolving the issue of "superposition" (see Toy Models of Superposition, Superposition, Memorization, and Double Descent, and our May 2023 update), which causes the computational units of the models, like neurons and attention heads, to be individually uninterpretable, and on finding ways to decompose models into more interpretable components. Our subsequent work found millions of features in Sonnet, one of our production language models, represents progress in this direction. In our most recent work, we develop methods that allow us to build circuits using features and use these circuits to understand the mechanisms associated with a model's computation and study specific examples of multi‑hop reasoning, planning, and chain‑of‑thought faithfulness on Haiku 3.5, one of our production models. This is a stepping stone towards our overall goal of mechanistically understanding neural networks. We often collaborate with teams across Anthropic, such as Alignment Science and Societal Impacts to use our work to make Anthropic's models safer. We also have an Interpretability Architectures project that involves collaborating with Pretraining.
This role is based in the San Francisco office; however, we are open to considering exceptional candidates for remote work on a case‑by‑case basis.
$315,000 - $560,000 USD
Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Location‑based hybrid policy: Currently, we expect all staff to be in one of our offices at least 25% of the time. However, some roles may require more time in our offices.
Visa sponsorship: We do sponsor visas! However, we aren't able to successfully sponsor visas for every role and every candidate. If we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
As set forth in Anthropic's Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years.
Public burden statement: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.