Snippet: Streaming ML & Online Learning¶
Domain Context¶
Real-time feature engineering, online model updates, and streaming inference pipelines. Latency and correctness under time pressure are the primary constraints.
Streaming Architecture¶
- Clearly separate: event ingestion → feature computation → model serving → output action
- Each stage must be independently scalable and monitorable
- Use message queues (Kafka, Pulsar) between stages — never direct function calls in production
- Exactly-once processing: understand your framework's guarantees and document them
- Define and enforce SLA per stage: max latency, max backlog, error rate threshold
Real-Time Feature Engineering¶
- Point-in-time correctness is non-negotiable — a feature must never use data from the future
- Windowed aggregations: clearly define window size, slide interval, and late-arrival policy
- Feature consistency: same feature definition must be used in training (offline) and serving (online)
- Feature store integration: compute features once, serve everywhere — avoid training/serving skew
- Late data handling: define a watermark policy — how long to wait for late events before closing the window
Online Learning¶
- Model update frequency: tune based on concept drift rate, not arbitrary schedule
- Learning rate for online updates: lower than batch training (typically 1/10th to 1/100th)
- Catastrophic forgetting: monitor performance on historical data after each update
- Replay buffer: maintain a sample of historical data to mix with new data during updates
- Rollback capability: always maintain the previous model version for instant revert
Concept Drift Detection¶
- Monitor input feature distributions in real-time — use PSI (Population Stability Index) or KS test
- Prediction distribution monitoring: alert when output distribution shifts beyond threshold
- Supervised drift: when labels arrive (even delayed), compare live accuracy vs. training accuracy
- Drift response policy: retrain on fresh data, expand training window, or trigger human review
- Log drift metrics alongside model predictions for retrospective analysis
Inference Pipeline¶
- Warm-up: pre-load models and caches before routing traffic — cold-start latency is unacceptable
- Timeout policy: if inference exceeds SLA, return fallback prediction (cached, rule-based, or default)
- Feature missing handling: define default values for each feature — never fail silently
- Batch micro-batching: accumulate requests for short window (10-50ms) for GPU efficiency
- Shadow mode: run new model alongside production model, log both predictions, compare before switching
Evaluation for Streaming Systems¶
- Prequential evaluation: evaluate on each sample BEFORE using it for training (test-then-train)
- Sliding window metrics: report accuracy on recent window (e.g., last 1 hour, last 1 day)
- Throughput: events processed per second — must exceed peak ingestion rate
- End-to-end latency: from event creation to prediction output — measure p50, p95, p99
- Backpressure metrics: monitor queue depth to detect when processing can't keep up with input
Common Pitfalls¶
- Training/serving skew: feature computation differs between offline pipeline and streaming pipeline
- Clock skew between systems causing temporal feature corruption
- Unbounded state: streaming aggregations that grow indefinitely → OOM
- Testing only with steady-state load — also test burst traffic, recovery after failures, and data gaps
- Assuming events arrive in order — they don't in distributed systems; handle out-of-order explicitly