Job Details

View jobs in our app

Learn more about the app. Workinapps.com

Member of Technical Staff - Kernels

2026-04-15 Acceler8 Talent San Francisco,CA

Description:

Member of Technical Staff - Kernels/GPU Performance

A variety of soft skills and experience may be required for the following role Please ensure you check the overview below carefully.

We are building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today's homogeneous, vertically integrated infrastructure. We are addressing this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale.

On top of this foundation, we are building a production-grade neocloud for agentic workloads. Customers use our platform to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization.

We are working with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters.

We are seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators and execution hardware to extract maximum performance from AI workloads across diverse and rapidly evolving platforms. You will analyze low-level execution behavior, design and optimize kernels, and ensure performance is reliable across both established and emerging hardware. xywuqvp

This role is ideal for engineers who enjoy deep performance work, reasoning about hardware tradeoffs, and turning theoretical peak performance into real-world results.

Responsibilities

Design, implement, and optimize GPU and accelerator kernels for AI workloads
Analyze and tune performance across the GPU execution stack, including memory access patterns, synchronization, and instruction scheduling
Work with compilers and runtimes to ensure kernels integrate cleanly and perform well in end-to-end systems
Bring up and optimize execution on new or emerging accelerators
Profile, benchmark, and debug performance issues across kernels, runtimes, and hardware
Ensure performance optimizations are robust, correct, and production-ready at scale

Qualifications

Strong software engineering fundamentals
Experience working on performance-critical systems close to hardware
Comfort reasoning about low-level execution behavior, memory hierarchies, and performance tradeoffs

Preferred Qualifications

Experience with CUDA, Triton, CUTLASS, or other accelerator programming models
Deep understanding of GPU execution models (warps/wavefronts, blocks, grids)
Experience optimizing memory access patterns (coalescing, shared memory, cache behavior)
Familiarity with occupancy, latency hiding, and instruction-level parallelism
Experience using profiling and performance analysis tools
Familiarity with multi-GPU or distributed execution is a plus

Job Details

View jobs in our app

Member of Technical Staff - Kernels

Apply for this Job

Registration Required

Login to Apply

You are leaving our site

Registration Required

Email this job to a friend

Job: Member of Technical Staff - Kernels

Job Alert Sign Up

Add To Job Alert

Job Alert Updated

Email Customer Care