Job Details

Machine Learning Engineer

  2025-11-14     AlphaX     San Francisco,CA  
Description:

Overview

Machine Learning Engineer — Post-Training, Evaluation & Continuous Improvement

Location: Bay Area preferred (remote-first; ~10% in-office for workshops/on-sites)

Employment: Full-time (new grads welcome) • Internship/Co-op options available

About AlphaX

AlphaX builds financial reasoning models and agent workflows for professional investment research. Our stack spans RLHF/DPO post-training, reasoning workflow generation, agent tooling & model routing, and a financial data lake (prices, filings, transcripts, news).

The Role

Own the complete post-training life cycle for AlphaX's financial reasoning model—from data curation through human-in-the-loop feedback, evaluation, regression gating, and continuous improvement. You'll stand up and maintain the training/eval pipelines that turn research ideas into measurable production gains across our analyst-style task suite (e.g., earnings analysis, risk scoring, forecast memos).

What You'll Do

  • Post-training & alignment
  • Run and refine SFT and preference-based training (RLHF, DPO, KTO or similar) on finance-specific datasets.
  • Train and version reward models and rubric-based scorers for reasoning quality, factuality, and safety.
  • Build ingestion & cleaning for filings, transcripts, market data, and analyst workflows; dedup, redact, and split with leakage controls.
  • Operate human-in-the-loop loops (experts, students, crowd) with clear rubrics and QA.
  • Design a multi-layer eval harness: unit tests for tools/prompts, scenario suites for research tasks, red-team probes, latency/cost tracking.
  • Implement automated A/B and canary gating with statistically sound decision rules and regression alerts.
  • Instrument chain-of-thought-free scoring proxies, tool-use success rates, and multi-step task completion.
  • Tune prompt policies, tool-calling strategies, and model routing (OpenAI/Claude/Gemini, etc.) behind a consistent interface.
  • Ship pipelines on GPUs, schedule jobs, track experiments, and maintain reproducible artifacts and datasets.
  • Add observability for drift, outliers, PII/financial compliance checks.
  • Close the loop: mine failures from production, generate counter-examples, synthesize new training/eval data, and re-train with tight feedback cycles.

Qualifications

  • BS/MS (or rising senior) in CS/EE/Math or equivalent with hands-on experience shipping post-training pipelines.
  • Practical experience with one or more: RLHF/DPO/KTO, reward modeling, or structured preference data.
  • Experience building training and evaluation pipelines end-to-end (data → train → eval → release), including experiment tracking (e.g., MLflow/W&B) and artifact/version control (e.g., DVC, Git-LFS).
  • Comfort reading financial text and reasoning about factuality & compliance (you don't need to be an investor, just curious and precise).

Tech You'll Touch

Python, PyTorch, Ray, MLflow/W&B, Airflow/Prefect, GPUs, Postgres/BigQuery, vector DBs, Docker/K8s, and major model APIs (OpenAI/Claude/Gemini).

Work Setup

Remote-first with ~10% in-office (Bay Area) for collaboration sprints, model & agent jam sessions, and eval workshops.

#J-18808-Ljbffr


Apply for this Job

Please use the APPLY HERE link below to view additional details and application instructions.

Apply Here

Back to Search