MLOps & Model Lifecycle
• Build automated CI/CD pipelines for ML (MLflow, Kubeflow, SageMaker, Vertex AI).
• Set up feature stores, model registries, and canary rollout processes.
• Create monitoring & alerting for drift, bias, and performance (Prometheus, Evidently, Arize).
Leadership & Delivery
• Recruit, coach, and promote a high-performing team of data engineers, ML engineers, and DevOps specialists.
• Drive quarterly OKRs, roadmaps, and architectural review boards.
• Manage budgets, vendor contracts, and cloud cost optimization.
Security, Compliance, & Governance
• Enforce IAM, data-encryption, and least-privilege practices.
• Ensure adherence to GDPR, PDPA, HIPAA, or other relevant regulations.
• Champion reproducibility and auditability across data and ML assets.
Innovation & Thought Leadership
• Evaluate emerging paradigms like data mesh, vector databases, LLMOps, and GenAI for business fit.
• Publish best-practice playbooks and present at internal tech forums or external meet-ups.
Required Qualifications
- 8+ years combined experience in data engineering, software engineering, or ML infrastructure, with 3+ years leading teams.
- Deep proficiency with Python/Scala/SQL and modern data processing frameworks (Spark, Flink).
- Hands-on with Docker, Kubernetes, Terraform, CI/CD (GitHub Actions, Jenkins).
- Proven record of shipping and operating ML models in production at scale.
- Solid grasp of distributed-system design, data modeling, and micro-service architectures.
- Excellent stakeholder management and communication skills.
Preferred / “Bonus Points”
- Experience in GenAI or LLM pipelines, vector similarity search (FAISS, Pinecone, Weaviate).
- Multi-cloud (AWS, GCP, Azure) certification or FinOps expertise.
- Contributions to open-source data or MLOps projects.
- Familiarity with privacy-preserving ML (federated learning, differential privacy).
Success Metrics (First 12 Months)
- Reduce model deployment lead-time from commit → production to < 24 hours.
- Achieve ≥ 99.9 % uptime for core data pipelines.
- Launch unified feature store serving at least 3 flagship ML products.
- Hire and onboard 4+ engineers with < 90-day ramp-up.
experience
15