Search Open Jobs

All GTN W2 consultants get full benefits. Learn more.

Senior Kubernetes Engineer

Dallas, TX

Posted: 03/27/2026 Employment Type: Direct Hire Job Number: 28027 Pay Rate: $ - $250000 Remote Friendly?:

Job Description


Senior Kubernetes Engineer
Location: Dallas, TX (Hybrid) - relo available

Base + Bonus + 100% company paid benefits 

Overview

This organization is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance high-performance computing (HPC) and cloud infrastructure that supports clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.

We are seeking a highly skilled Senior Kubernetes Engineer to join our office in Dallas. In this role, you will design, implement, and optimise GPU-accelerated container platforms at scale, enabling high-performance workloads (AI/ML, HPC, LLM training) across hybrid or on-prem environments. You will have deep expertise with both NVIDIA and Kubernetes ecosystems, including GPU scheduling, device plugins, and custom operators.

Key Responsibilities

- Architect and operate Kubernetes clusters optimised for GPU workloads, leveraging NVIDIA GPU Operator, Network Operator, and DCGM.
- Develop, deploy, and maintain custom Kubernetes operators and controllers to automate infrastructure services.
- Integrate NVIDIA device plugins, Multi-Instance GPU (MIG), and GPU sharing features into the scheduling layer.
- Optimise GPU utilisation and job placement through scheduler extensions, such as kube-scheduler plugins, Slurm, and Volcano.
- Collaborate with HPC, ML, and DevOps teams to ensure multi-tenant, high-throughput cluster performance.
- Drive observability and telemetry integrations using Prometheus, Grafana, DCGM Exporter, and OpenTelemetry.
- Implement secure multi-user and multi-namespace GPU isolation, with RBAC and policy enforcement, such as OPA or Gatekeeper.
- Maintain CI/CD pipelines for Kubernetes infrastructure using GitOps, ArgoCD, and FluxCD.
- Contribute to infrastructure-as-code, using Terraform, Helm, and Kustomize.
- Participate in performance tuning, incident response, and production readiness reviews.

Required Experience

- Extensive experience with Kubernetes in production-grade environments and working with NVIDIA and Kubernetes, including GPU Operator, device plugin, NVML, MIG, and DCGM.
- Proficiency in Go or Python for operator development and Kubernetes controller logic.
- Deep understanding of Kubernetes internals, including CRDs, RBAC, custom controllers, and scheduler extensions.
- Experience with GPU-intensive workloads, for example for LLMs, training pipelines, and scientific computing.
- Hands-on experience with Helm, Kustomize, and GitOps workflows.
- Familiarity with CNI plugins, especially NVIDIA CNI and Multus.
- Experience with monitoring GPU metrics and cluster health using Prometheus and DCGM Exporter.
Apply Online

Send an email reminder to:

Share This Job:

Related Jobs:

Login to save this search and get notified of similar positions.

About Dallas, TX

Unlock your potential in the vibrant job market of the Dallas-Fort Worth metroplex! This bustling region in the great state of Texas boasts a perfect blend of southern charm and big-city opportunities. Dive into a dynamic career scene with access to renowned landmarks like the Dallas Arboretum and Botanical Garden, exquisite cuisine from Tex-Mex to BBQ, and cultural hotspots such as the Dallas Museum of Art and the AT&T Performing Arts Center. Cheer for the Dallas Cowboys at the AT&T Stadium or enjoy the outdoors at White Rock Lake. Discover why Dallas is the ultimate destination for growth, opportunity, and a fulfilling career journey. Explore our job listings today and embark on a new chapter in this captivating city!