QAT Global is seeking for a highly skilled and passionate AI/ML professional to build a scalable AI system.
You will work closely with a cross-functional team to design, deploy, and optimize an advanced AI architecture for our client, based in California.
• Build and optimize multi-step and multi-agent workflows for production-grade Agent and RAG frameworks for this California based client partner.
• Develop scalable AI/ML solutions using Python, GCP, and LLM ecosystems.
• Implement RAG pipelines and document retrieval strategies. Preprocessing, chunking, enrichment, embedding, and reranking workflows.
•Design prompt templates, system instructions, and guardrails.
• Integrate agents with tools, API's, and internal services.
• Optimize latency, accuracy, and token usage.
• Create automated LLM evaluation and regression tests.
• Deploy applications on cloud-native environments.
• Work with Vector and Graph databases to enable high-performance retrieval.
• Collaborate with data, cloud, and engineering teams to deliver end-to-end solutions.
• Conduct performance tuning, architecture optimization, and continuous model improvements.
• 3+ years of Python development with strong AI/ML engineering experience.
• Experience monitoring, working knowledge of model versioning. Some exposure to drift detection.
• Experience with one of following: LangGraph, LlamaIndex, or DSPy.
• Hands-on expertise with LLM APIs. Gemini, Bedrock, Vertex AI, Claude.
• Solid understanding of prompt engineering and hallucination mitigation.
• Familiarity with cloud AI services.
• Understanding of the fundamentals of end-to-end RAG architectures.
• Understanding of Vector, Graph databases and retrieval systems such as Pinecone and Weaviate.
• Exposure to LangSmith, custom evaluation pipelines.
• Experience processing high-volume, multimodal documents and operationalizing data pipelines.
• Strong problem-solving, architectural thinking, and production deployment experience.