Building MLOps Infrastructure: MLFlow, Airflow, and Ray on Kubernetes
Production MLOps stack on AWS EKS—MLFlow experiment tracking, Airflow orchestration, Ray distributed training, and Evidently AI drift monitoring.
Krishna C
August 15, 2022
•
3 min read
•
Updated March 20, 2023
I built MLOps infrastructure on AWS EKS that data scientists can use directly without waiting on platform engineers. The stack: Airflow for orchestration, MLFlow for experiment tracking, Ray for distributed training, and Evidently AI for drift monitoring—all packaged in Helm charts for consistent environments.
Automated Workflows with Airflow
Data scientists write Airflow DAGs defining their ML pipelines. KubernetesExecutor spawns isolated pods for each task.
1from airflow import DAG2from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator34with DAG("customer_churn_prediction", schedule_interval="@daily") as dag:56 load_data = KubernetesPodOperator(7 task_id="load_data",8 image="ml-pipeline:latest",9 cmds=["python", "load_data.py"],10 namespace="ml-workflows",11 resources={"request_memory": "4Gi", "request_cpu": "2"}12 )1314 train_model = KubernetesPodOperator(15 task_id="train_model",16 image="ml-pipeline:latest",17 cmds=["python", "train_model.py"],18 namespace="ml-workflows",19 resources={"request_memory": "16Gi", "request_cpu": "8"}20 )2122 evaluate_drift = KubernetesPodOperator(23 task_id="evaluate_drift",24 image="evidently-monitor:latest",25 cmds=["python", "check_drift.py"],26 namespace="ml-workflows",27 )2829 load_data >> train_model >> evaluate_drift
Airflow provides visual DAG interface, automatic retries, cron or event-driven scheduling, and resource isolation per task. MLFlow tracking happens inside task code—parameters, metrics, and artifacts automatically logged.
Experiment Tracking with MLFlow
MLFlow tracking server runs in the cluster, backed by S3 for artifacts and PostgreSQL for metadata. Open source, no vendor lock-in.
Every experiment run logs:
- Parameters: Hyperparameters, feature configs, algorithm choices
- Metrics: Accuracy, precision, recall, loss curves
- Artifacts: Model files, plots, feature importance charts
- Environment: Python version, dependencies, system config
Data scientists compare experiments through the MLFlow UI. Promote good models to the registry with a single click. S3 storage means no disk space worries for multi-GB model artifacts.
Drift Monitoring with Evidently AI
Models degrade as data distributions shift. Evidently AI provides continuous monitoring integrated with MLFlow.
1from evidently.report import Report2from evidently.metric_preset import DataDriftPreset, TargetDriftPreset3import mlflow45def monitor_model_drift(model_name, production_data, reference_data):6 report = Report(metrics=[DataDriftPreset(), TargetDriftPreset()])7 report.run(reference_data=reference_data, current_data=production_data)89 mlflow.log_artifact(report.save_html("drift_report.html"))10 mlflow.log_metrics({11 "data_drift_score": report.as_dict()["metrics"][0]["result"]["drift_score"],12 "target_drift_detected": report.as_dict()["metrics"][1]["result"]["drift_detected"]13 })
Airflow DAG runs Evidently reports nightly. Statistical tests (KS, PSI) identify feature drift. Slack notifications when thresholds exceeded. Severe drift triggers retraining DAGs with human approval before production deployment.
Distributed Training with Ray
Large models require distributed training. Ray Cluster on Kubernetes handles coordination and scaling.
1from ray import train2from ray.train import ScalingConfig3from ray.train.xgboost import XGBoostTrainer4import mlflow56def train_distributed_model():7 mlflow.set_tracking_uri("http://mlflow-server:5000")8 mlflow.start_run()910 trainer = XGBoostTrainer(11 scaling_config=ScalingConfig(12 num_workers=4,13 use_gpu=True,14 resources_per_worker={"CPU": 4, "GPU": 1}15 ),16 label_column="target",17 params={"objective": "binary:logistic", "max_depth": 6},18 )1920 result = trainer.fit()21 mlflow.log_params(trainer.params)22 mlflow.log_metrics(result.metrics)23 mlflow.end_run()
Data scientists spin up their own Ray clusters via a Jenkins job that deploys the cluster and provides a dashboard endpoint. Workers scale up during training, down when idle. Works with XGBoost, LightGBM, PyTorch, TensorFlow—same API regardless of framework.
What We Achieved
The platform removed the bottleneck between data scientists and infrastructure. Teams run experiments without tickets, track everything in MLFlow, and deploy models through GitOps. Drift monitoring catches model degradation before it affects production.
Most importantly: data scientists focus on models, not infrastructure.
Why Kubernetes
Kubernetes made this possible. Every component—Airflow, MLFlow, Ray, Evidently—runs as containers with consistent deployment patterns. Benefits we saw:
- Resource efficiency: Autoscaling spins up nodes for training jobs, terminates them when idle. No paying for unused compute.
- Isolation: Each experiment runs in its own pod. No dependency conflicts, no "works on my machine."
- Self-service: Jenkins jobs and Helm charts let data scientists provision their own Ray clusters and environments.
- Portability: Same Helm charts deploy to dev laptops, staging, and production. No environment drift.
The initial Kubernetes learning curve paid off quickly. Once the platform existed, adding new tools was just another Helm chart.