About the job Remote | Expert Professors - Professional Domains - $70-$95/hour
We are sharing a specialised part-time consulting opportunity for current or retired professors across finance, accounting, law, and other professional services domains with strong domain expertise, structuredreasoning ability, and the ability to design and evaluate challenging real-world tasks for frontier AI systems.
This role supports an exciting collaboration with a leading AI lab focused on improving frontier models through high-quality benchmark task design, golden solution development, model evaluation, and analysis of reasoning and problem-solving gaps in coding and agentic workflows.
Selected professionals will design domain-specific benchmark tasks, prepare detailed specifications and golden solutions in an agentic development environment, evaluate cross-model performance, identify reasoning failures, analyze agent trajectories, and help improve overall model quality. This opportunity is especially well-suited to highly analytical academic experts who are comfortable translating professional domain knowledge into structured evaluation tasks that reflect real-world complexity and executable testing standards.
Key Responsibilities
Professionals in this role may contribute to:
Task Design & Development
Design challenging, real-world domain-specific problems that serve as the foundation for agentic tasks
Construct problems to target specific capability and reasoning failures in frontier AI models
Help ensure that tasks are robust, realistic, and suitable for rigorous evaluation workflows
Specification & Golden Solution Generation
Integrate problems into an agentic development environment using Python
Prepare detailed task instructions, overviews, and golden solutions
Contribute domain-specific consultation and feedback to support high-quality task development
Evaluation & Model Analysis
Evaluate model performance across designed tasks
Identify tasks where the target model fails to pass all tests, particularly where failures reflect logical reasoning gaps
Analyze agent trajectories to extract core capability loss patterns and support model improvement
Ideal Profile
Strong candidates may have:
Current or retired professor experience in finance, accounting, law, or other professional services domains
A degree in finance, accounting, law, or a closely related field
Ability to engage reliably for at least 30 hours per week during weekdays
Basic ability to work independently and manage time effectively
Strong verbal and written communication, problem solving, and interpersonal skills
Preferred qualifications
Past experience in AI training, model evaluation, or data annotation
Ability to translate domain expertise into structured benchmark and evaluation tasks
Comfort working with Python in an agentic development environment
Strong consistency and precision in evaluating reasoning and problem-solving workflows
Why This Opportunity
Contribute specialised academic and professional domain expertise to a cutting-edge AI collaboration
Help shape the next generation of frontier AI tools through benchmark design and reasoning evaluation
Work on high-impact tasks with strong real-world and research relevance
Structured remote work with competitive hourly compensation
Contract Details
W2 employment position with Cincinnatus LLC
Contingent remote role
Hourly compensation of $70-$95 per hour
Open to candidates located in the United States
Expected commitment of at least 30 hours per week during weekdays, including at least 6 hours per day on weekdays
Opportunity to be placed at a leading AI lab as part of its extended workforce
Role-based position with structured collaboration and integration into standard enterprise workflows
Employment, onboarding, payroll, benefits, and compliance are administered by Cincinnatus LLC
Start date: Immediate
About the Platform
This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects.
Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific content validation.
By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: