Skip to content

Snippet: MLOps & Model Deployment

Domain Context

Taking models from experimentation to production with reliability, monitoring, and automation. The model is only as good as the system around it.

CI/CD for ML

  • Model training pipelines must be reproducible from a single command or config
  • Every model artifact must be traceable: commit hash, data version, config, training logs
  • Automated tests for data pipelines: schema validation, distribution drift, missing values
  • Automated tests for model quality: eval metrics must pass a minimum threshold gate
  • Separate pipelines: data prep → training → evaluation → packaging → deployment

Model Registry

  • Every production model must be registered with: version, metrics, training config, data hash
  • Promote models through stages: dev → staging → production — never skip staging
  • Keep at least 2 previous production model versions for instant rollback
  • Model metadata must include: input/output schema, expected latency, resource requirements

Serving & Inference

  • Define SLA upfront: latency p99, throughput, availability target
  • Health checks must verify model is loaded and producing valid outputs, not just HTTP 200
  • Implement graceful degradation: fallback model or cached responses when primary fails
  • Batch inference: prefer offline batch processing for non-real-time use cases (cheaper, simpler)
  • A/B testing infrastructure: route traffic by percentage, log predictions for both models

Monitoring in Production

  • Data drift detection is mandatory — monitor input feature distributions daily
  • Model performance monitoring: track prediction distribution shift, not just system metrics
  • Alert on: prediction latency spike, error rate increase, confidence score distribution shift
  • Log all predictions with timestamps — enables retroactive analysis when issues are discovered
  • Dashboard must show: request volume, latency percentiles, error rate, model version, drift score

Infrastructure

  • Containerize everything: model serving, data pipelines, evaluation jobs
  • GPU resource management: right-size instances, use spot/preemptible for training
  • Model artifacts stored in versioned object storage (S3/GCS), not local filesystem
  • Secrets and credentials: use vault/secret manager, never env files in containers

Common Pitfalls

  • Training-serving skew: feature engineering differs between training and inference
  • Silent model degradation: model still returns predictions but quality drops over weeks
  • Missing monitoring: team discovers issues from user complaints, not alerts
  • Over-engineering: not every model needs Kubernetes — start simple, scale when needed