Utwórz profil, aby pracodawcy mogli Cię znaleźć, otrzymywać lepiej dopasowane oferty pracy i szybciej aplikować.
  • Wyszukiwanie ofert pracy
  • Zapisane
  • Stwórz CV
    Nowe
  • Wynagrodzenia
  • Subskrypcje

L2 Datacenter Support Engineer

Pełny etat

Mirantis



Mirantis helps organizations ship code faster on public and private clouds. The company provides a public cloud experience on any infrastructure from the data center to the edge. With Lens and the Mirantis Cloud Native Platform, Mirantis empowers a new breed of Kubernetes developers by removing infrastructure and operations complexity and providing one cohesive cloud experience for complete app and devops portability, a single pane of glass, and automated full-stack lifecycle management with continuous updates.

Mirantis serves many of the world’s leading enterprises, including Adobe, DocuSign, Liberty Mutual, PayPal, Reliance Jio, Societe Generale, Splunk, and Volkswagen. Learn more at  .

Job Description



We are looking for an experienced L2 Engineer to operate and support high-performance AI infrastructure platforms, including NVIDIA GPU clusters, InfiniBand fabrics, and Kubernetes-based IaaS environments.

This role focuses on deep infrastructure expertise, ensuring performance, scalability, and reliability of the platform layer that powers AI workloads — without being responsible for the workloads themselves.

You will play a key role in bare metal lifecycle management, advanced InfiniBand troubleshooting, and platform stability, working closely with engineering teams to operate cutting-edge infrastructure at scale.

Key responsibilities:

  • Troubleshoot and maintain InfiniBand fabrics, including performance tuning, link issues, and topology validation.
  • Act as the escalation point for L1 for complex infrastructure and hardware issues.
  • Own and maintain accurate infrastructure modeling, IPAM, and source-of-truth data in NetBox.
  • Own InfiniBand fabric management and advanced troubleshooting, utilizing Verity for configuration, monitoring, and optimization of high-performance interconnects. 
  • Diagnose and resolve issues across GPU servers, networking, storage, and Kubernetes platforms.
  • Perform deep hardware and system-level diagnostics (GPUs, PCIe, NICs, firmware, etc.).
  • Support Kubernetes platform stability (node health, networking, scheduling issues).
  • Contribute to automation of provisioning and operational workflows.
  • Lead incident response, root cause analysis (RCA), and post-incident improvements.
  • Collaborate with vendors and internal engineering teams on complex issues.
  • Support infrastructure upgrades, firmware management, and capacity expansion.

Qualifications



 

Required Skills & Experience:

  • 3–6+ years of experience in infrastructure operations, datacenter engineering, or cloud platforms.
  • Strong Linux systems expertise.
  • Hands-on experience with bare metal provisioning systems and lifecycle management.
  • Strong experience with InfiniBand networking (troubleshooting, performance, fabric management using UFM).
  • Experience with IPAM/DCIM tools such as NetBox and Ethernet network configuration and validation leveraging Verity.
  • Solid understanding of datacenter networking, storage, and hardware architecture.
  • Working knowledge of Kubernetes in production environments.
  • Strong troubleshooting skills across hardware and distributed systems.

Preferred

qualifications

:

  • Experience with NVIDIA GPU platforms and accelerated computing infrastructure.
  • Familiarity with automation tools (Terraform, Ansible, etc.).
  • Exposure to OpenStack (optional).
  • Experience with observability stacks (Prometheus, Grafana, ELK).

Success in this role:

  • Rapid resolution of complex infrastructure and networking issues.
  • High reliability and performance of InfiniBand and GPU infrastructure.
  • Scalable and efficient bare metal provisioning processes.
  • Strong contribution to automation and operational excellence.
  • Trusted escalation point and technical leader within the team.

Additional Information



We offer:

  • Work with an established Silicon Valley leader in the cloud infrastructure industry;
  • Work with exceptionally passionate, talented and engaging colleagues, helping Fortune 500 and Global 2000 customers implement next-generation cloud technologies;
  • Be a part of cutting-edge, open-source innovation;
  • Thrive in the high-energy environment of a young company where openness, collaboration, risk-taking, and continuous growth are valued;
  • Professional development and training;
  • Attend conferences and working groups;
  • Company outings, happy hours, hackathons, and tech talks;
  • Receive a competitive compensation package with a strong benefits plan.

We are a  Leader for Container Management in G2 (#2 after AWS)!

Oferta pracy dodana 3 dni temu