Overview
ML/AI Research Engineer — Agentic AI Lab (Founding Team) at Fabrion. Location: San Francisco Bay Area. Type: Full-Time. Compensation: Competitive salary + meaningful equity (founding tier). Backed by 8VC, we are building a world-class team to tackle one of the industry's most critical infrastructure problems.
About the Role
We're designing the future of enterprise AI infrastructure — grounded in agents, retrieval-augmented generation (RAG), knowledge graphs, and multi-tenant governance. We're looking for an ML/AI Research Engineer to join our AI Lab and lead the design, training, evaluation, and optimization of agent-native AI models. You'll work at the intersection of LLMs, vector search, graph reasoning, and reinforcement learning — building the intelligence layer that sits on top of our enterprise data fabric.
This isn't a prompt engineer role. It's full-cycle ML: from data curation and fine-tuning to evaluation, interpretability, and deployment — with cost-awareness, alignment, and agent coordination all in scope.
Core Responsibilities
- Fine-tune and evaluate open-source LLMs (e.g. LLaMA 3, Mistral, Falcon, Mixtral) for enterprise use cases with both structured and unstructured data
- Build and optimize RAG pipelines using LangChain, LangGraph, LlamaIndex, or Dust — integrated with our vector DBs and internal knowledge graph
- Train agent architectures (ReAct, AutoGPT, BabyAGI, OpenAgents) using enterprise task data
- Develop embedding-based memory and retrieval chains with token-efficient chunking strategies
- Create reinforcement learning pipelines to optimize agent behaviors (e.g. RLHF, DPO, PPO)
- Establish scalable evaluation harnesses for LLM and agent performance, including synthetic evals, trace capture, and explainability tools
- Contribute to model observability, drift detection, error classification, and alignment
- Optimize inference latency and GPU resource utilization across cloud and on-prem environments
Desired Experience
Model Training
- Deep experience fine-tuning open-source LLMs using HuggingFace Transformers, DeepSpeed, vLLM, FSDP, LoRA/QLoRA
- Worked with both base and instruction-tuned models; familiar with SFT, RLHF, DPO pipelines
- Comfortable building and maintaining custom training datasets, filters, and eval splits
- Understand tradeoffs in batch size, token window, optimizer, precision (FP16, bfloat16), and quantization
RAG + Knowledge Graphs
- Experience building enterprise-grade RAG pipelines integrated with real-time or contextual data
- Familiar with LangChain, LangGraph, LlamaIndex, and open-source vector DBs (Weaviate, Qdrant, FAISS)
- Experience grounding models with structured data (SQL, graph, metadata) + unstructured sources
- Bonus: Worked with Neo4j, Puppygraph, RDF, OWL, or other semantic modeling systems
Agent Intelligence
- Experience training or customizing agent frameworks with multi-step reasoning and memory
- Understand common agent loop patterns (e.g. Plan→Act→Reflect), memory recall, and tools
- Familiar with self-correction, multi-agent communication, and agent ops logging
Optimization
- Strong background in token cost optimization, chunking strategies, reranking (e.g. Cohere, Jina), compression, and retrieval latency tuning
- Experience running models under quantized (int4/int8) or multi-GPU settings with inference tuning (vLLM, TGI)
Preferred Tech Stack
- LLM Training & Inference: HuggingFace Transformers, DeepSpeed, vLLM, FlashAttention, FSDP, LoRA
- Agent Orchestration: LangChain, LangGraph, ReAct, OpenAgents, LlamaIndex
- Vector DBs: Weaviate, Qdrant, FAISS, Pinecone, Chroma
- Graph Knowledge Systems: Neo4j, Puppygraph, RDF, Gremlin, JSON-LD
- Storage & Access: Iceberg, DuckDB, Postgres, Parquet, Delta Lake
- Evaluation: OpenLLM Evals, Trulens, Ragas, LangSmith, Weight & Biases
- Compute: Ray, Kubernetes, TGI, Sagemaker, LambdaLabs, Modal
- Languages: Python (core), optionally Rust (for inference layers) or JS (for UX experimentation)
Soft Skills & Mindset
- Startup DNA: resourceful, fast-moving, and capable of working in ambiguity
- Deep curiosity about agent-based architectures and real-world enterprise complexity
- Comfortable owning model performance end-to-end: from dataset to deployment
- Strong instincts around explainability, safety, and continuous improvement
- Enjoy pair-designing with product and UX to shape capabilities, not just APIs
Why This Role Matters
This role is foundational to our thesis: that agents + enterprise data + knowledge modeling can create intelligent infrastructure for real-world, multi-billion-dollar workflows. Your work won't be buried in research reports — it will be productionized and activated by hundreds of users and hundreds of thousands of decisions. If this is your dream role - we would love to hear from you.
Seniority level
Employment type
Job function
- Engineering and Information Technology
- Industries: Technology, Information and Internet
Referrals increase your chances of interviewing at Fabrion by 2x