Base pay range: $230,000.00 – $360,000.00 per year.
Company OverviewAt TwelveLabs, we pioneer frontier multimodal foundation models that can see, hear and understand the world as humans do. Our models redefine the standards in video‑language modeling, enabling developers to build programs with state‑of‑the‑art semantic search, summarization and analysis capabilities. We have raised $107 million in Seed + Series A funding from leading VC & corporate partners including NVIDIA, NEA, Radical Ventures, Index Ventures, Snowflake and Databricks. Our advisory team features AI visionaries such as Fei‑Fei Li, Silvio Savarese, Alexandr Wang. We are headquartered in San Francisco with a strong presence in Seoul, underscoring our commitment to global innovation.
About the Science TeamThe Science team leads multimodal AI research, tackling critical challenges in video understanding. Core research areas include video embedding and search, multimodal language models that reason over video content, and intelligent agents that interact with and analyze video data. We integrate research outcomes directly into products and platforms, working closely with Engineering and Product teams in a collaborative culture.
About the RoleAs a Senior Research Scientist (Finetuning), you will drive core technology research and shape its direction. Your responsibilities include pioneering work in videos understanding, multimodal learning and AI agents; identifying critical research problems; designing innovative solutions and running experiments; developing data strategies and evaluation methodologies; leading finetuning efforts for video embedding and video language models; collaborating with MLE and Solutions Engineering to productionize finetuning; and communicating findings to the broader research roadmap.
QualificationsMid‑Senior level
Employment TypeFull‑time
Job FunctionOther; Industries: Software Development
ReferralsReferrals increase your chances of interviewing by 2x.