CloudOPS (SME)
Our enterprise clients are moving from fragmented data foundations to AI-first data platforms capable of supporting large-scale, business-critical AI systems.
AI performance is directly constrained by data quality, availability, governance, and latency.
This role exists to build and operate the data backbone that enables reliable, scalable, and compliant AI at enterprise scale.
...
____________________________________________________________
Mission
You will operate as a Subject Matter Expert in complex enterprise environments, designing and delivering AI-ready data platforms where reliability, scalability, lineage, and governance are non-negotiable.
Acting in consultative client engagements, outsourced delivery, or product-based models, you will own the end-to-end lifecycle of data pipelines, from ingestion to serving, while acting as a technical authority on data engineering for AI systems.
Key Responsibilities
AI Infrastructure (Core)
Design and operate compute platforms for AI workloads (CPU, GPU, accelerators)
Manage hybrid and cloud-based AI infrastructures
Ensure high availability, resilience, and performance of AI platforms
Plan and manage capacity for training and inference workloads
Cloud, Data Center & Platform Operations
Operate containerized and virtualized environments supporting AI systems
Manage storage and networking optimized for data-intensive workloads
Implement observability, monitoring, and incident response for AI platforms
Ensure operational readiness and 24/7 reliability where required
Cost, Performance & Scalability
Optimize infrastructure for cost, throughput, and latency
Implement FinOps practices for AI compute and storage
Balance performance requirements with budget and sustainability constraints
Support scaling strategies from POC to enterprise-wide deployment
Technical Scope
Infrastructure Stack
Cloud platforms (public, private, hybrid)
On-prem data centers and hybrid extensions
GPU scheduling and accelerator management
High-performance storage and networking
Platform & Operations
Container orchestration (Kubernetes or equivalent)
Infrastructure as Code
Monitoring, logging, alerting systems
Backup, disaster recovery, and resilience patterns
Production Awareness
SLA-driven operations
Security and compliance alignment with infrastructure design
Close collaboration with AI/ML, Data, DevOps, and Security teams
Profile
Experience
Strong background in infrastructure, cloud, or platform operations
Experience operating high-performance or data-intensive systems
Exposure to AI or ML workloads in production environments
Mindset
Operations-driven engineering mindset
Strong sense of ownership and accountability
Comfortable operating under reliability and performance constraints
Continuous improvement approach to scalability and cost efficiency
___________________________________________________________
This is not a traditional infrastructure role.
This role is designed to power enterprise AI systems at scale, acting as the compute and platform backbone for advanced AI initiatives.
You will position yourself as an AI Infrastructure SME, enabling AI platforms to scale by orders of magnitude, and contributing to the build-out of one of the most robust AI compute ecosystems across Europe.