Be one of the founding engineers at Nen, shaping the AI layer that powers automation across enterprise desktop environments at scale.The roleBuild and extend a multi-model agent loop across leading AI providersBenchmark models across cost, latency, and reliability — and own the framework for doing so continuouslyImprove agent reliability through better perception, grounding, and structured contextInstrument traces and build the data foundation for future fine-tuningShape the Python workflow SDK so improvements are transparent to usersRequirementsHands-on experience building with LLMs in productionStrong Python; comfortable working across SDK, API, and model integration layersExperience evaluating and benchmarking models with structured evals, not just vibesFamiliarity with agent architectures, tool use, and multi-step reasoning loopsCuriosity about the computer-use model landscape and how it's evolving fast