Search Open Jobs
All GTN W2 consultants get full benefits. Learn more.
Sr. Cloud Operations Engineer
McKinney, TX US
Job Description
Type: Full Time (onsite)
Location: Mckinney, TX
JOB SUMMARY
Lead the design, implementation, and optimization of enterprise-scale OpenStack infrastructure, ensuring reliability, security, and cost-efficiency across hybrid cloud and colocation environments. Your expertise will bridge physical hardware procurement, OpenStack deployment, and data center operations to support our virtualization ecosystem.
Responsibilities: Infrastructure Design and Deployment
●Deliver resilient OpenStack environments with 99.999% uptime, optimized for hybrid cloud workloads.
●Architect and manage OpenStack clusters across multiple geo-distributed data center environments
●Identify, qualitfiy, procure, and manage physical hardware (compute, storage, networking) aligned with OpenStack and Ceph storage requirements.
Data Center Operations and Capacity Planning
●Collaborate with colocation vendors to ensure power/cooling capacity meets workload demands, using DCIM tools for rack-level planning (power, BTU/hr, U space)
●Conduct quarterly capacity audits to identify resource gaps, depletion risks, and upgrade timelines
Networking and Security Governance
●Configure routing, SD-WAN integrations, and AWS Direct Connect/VPC peering/mesh networking for hybrid cloud connectivity
●Implement security standards via OpenStack and Linux capabilities (Keystone, Neutron, encrypted Ceph storage, etc).
●Perform regular security audits and remediate security findings
Automation and Observability
●Develop Ansible playbooks/Terraform modules/MaaS configurations for OpenStack component provisioning
●Build predictive analytics pipelines using metrics, logs, and traces
●Design self-healing workflows for compute and storage failures
Competencies:
●Proficiency in KVM/QEMU, Ceph, and OpenStack services (Nova, Neutron, Keystone)
●Proficiency in infrastructure as code and configuration management tools (Terraform, OpenTofu, Pulumi, Ansible, Salt).
●Proficiency in Linux performance analysis (eBPF).
●Advanced, at-scale automation of bare-metal provisioning
●Hands-on experience with DCIM tools
●Hands-on experience with modern observability stacks (APM, OpenTelemetry, Grafana, Prometheus, Loki).
●Experience in evaluating, qualifying, and managing networking hardware and capacity planning.
●Expertise designing and managing network infrastructure across multiple data centers and public cloud environments.
●Familiarity with configuring and troubleshooting network hardware and software, as well as analyzing packet captures.
●In-depth understanding of networking layers, protocols, services, and security practices.
●Proven track record negotiating hardware procurement and colocation SLAs
●Expertise with container management (Kubernetes, ECS, Docker, Helm)
●Experience with VCS systems and providers (Git, Mercurial, Github, Sourcehut)
●Experience with CI/CD systems (Github Actions, Circle CI, Argo)
●Experience with ticket management systems (Jira, Shortcut, Azure Devops)
