Platform Engineer
állásajánlat

Munkavégzés helye:
Budapest XII. – Hillside irodaház
Munkaidő:
Teljes munkaidő
Munkaviszony:
Alkalmazott

Platform Engineer

Hivatkozási szám: PLATFORM-20260415

In our AI Lab, we merge the stability of a bank with the dynamism of a startup. Our mission is to build groundbreaking AI products from scratch. We're looking for a Senior Platform Engineer to architect and build the high-availability, scalable platform that will power our entire AI operation.

Our platform will be built on a multi-region Azure foundation (AKS + Cosmos DB + Event Hubs). We are just starting to build our Platform team, and you will be a founding member. You won't just be operating a platform; you will be building it from the ground up: from the Terraform code for our AKS clusters to the CI/CD pipelines for our models. This is a hands-on role focused on engineering & automation. We work according to SRE best practices with the goal of creating a platform that will achieve 99.9%+ availability.

What You'll Do:

  • Build the Platform from Scratch:

  • Code new AKS clusters, networking (VNet), and IAM guardrails using Terraform and Helm charts.

  • Create "golden" Docker images, GitOps pipelines (ArgoCD/Flux), automatic node provisioning, and scaling policies for both CPU and GPU workloads.

  • Design and implement the core MLOps infrastructure, including artifact repositories, model registries, and feature stores.

  • Automate for Reliability:

  • Implement and fine-tune our observability stack: Azure Monitor metrics, Prometheus, Grafana dashboards.

  • Build automated recovery mechanisms and chaos engineering tests to proactively find and fix weaknesses in the system.

  • Champion Platform Best Practices:

  • Work with development teams to ensure they are building reliable, observable, and secure applications from day one.

  • Create runbooks and documentation to prepare for future incident management.

Key Responsibilities:

  • IaC Development and Maintenance: Manage our infrastructure state with Terraform Cloud or Atlantis.

  • Kubernetes Operations: Handle version upgrades, manage node pools (including GPU nodes), and define network policies.

  • Data Environment Reliability: Ensure the reliability of our data stores (e.g., Cosmos DB geo-replication, Event Hubs consumer group management).

  • Security Hardening: Implement security best practices, including CVE scanning for Docker images and regular patching of node AMIs.

  • Observability Pipeline: Manage log processing, alerting rules, and capacity forecasting to stay ahead of problems.

  • Support AI Engineers: Provide a self-service platform and tooling that enables AI Engineers to train, deploy, and monitor their models with minimal friction.

What You'll Bring:

  • 5+ years of experience in a DevOps, SRE, or Platform Engineering role.

  • Deep, hands-on experience with at least one major cloud provider (Azure is a strong plus).

  • Proven experience with containerization (Docker) and orchestration (Kubernetes) in a production environment.

  • Expertise in Infrastructure as Code (Terraform is a must).

  • Strong programming skills in a scripting language (Python is a strong plus).

  • Experience building and maintaining production-grade CI/CD systems.

  • A proactive mindset focused on preventing incidents rather than just reacting to them.

What We Offer:

  • A Green-field Opportunity: You will be building a state-of-the-art AI platform from the ground up, using the best tools for the job.

  • A Modern Toolkit: Work with GitHub, Kubernetes, Managed Grafana, Terraform, and the latest Azure AI services.

  • Real Impact: Your work is the foundation upon which our entire AI strategy is built. You are a critical enabler for the entire team.

  • Focus on Engineering, Not Firefighting: In the initial phase, your role is 100% focused on building and automating, not on reactive, on-call firefighting.

  • A Laid-back, Senior Team: We have one daily stand-up, then we focus on deep work.

  • Competitive Salary.

  • HO-friendly with a cool HQ in Budapest.

This is NOT the job for you if:

  • You are looking for a role that is primarily about maintaining existing systems. We are building from scratch.

  • You enjoy manual configuration and doing the same task twice.

  • You are not passionate about building secure, reliable, and highly automated systems.

Az állás alapinformációi

  • Munkaterület: AI Lab
  • Pozíciószint: Specialista
  • Szükséges tapasztalat: 3-5 év
  • Nyelvtudás: Angol
  • Munkarend: Általános