Late last year, a conversation started about improving our DSP Fallback performance by introducing a CTR prediction model.
Fallback kicks in when the main DSP decides there’s no ad to serve. Its purpose is to raise fill rate — the ratio of impressions to ad slots.
I was a backend engineer. I had no background in AI.
The expectation was that wiring the surrounding systems would take more work than the model itself, so the project landed on my plate.
This post is a record of the technical decisions I made, the reasoning behind them, and what I learned as an AI non-expert building ML infrastructure.
Model Choice: Logistic Regression
The model was Logistic Regression.
Since the goal was improving ad CTR, we just needed to learn whether a given impression would be clicked — a binary classification problem.
Internally, both LR and LightGBM were recommended — the two models commonly used in ad platforms. But this was an initial version, and I didn’t want to take on complex tuning and operational burden from day one.
So I picked the simpler option: LR.
Language and Framework Choice
I went with Python and sklearn. For both the training batch and the inference server.
I initially considered ONNX + Go. Our backend was evaluating a migration from Node to Go for performance reasons, and a new project felt like a place where I could start with Go. For inference, pushing the model through ONNX would give me framework independence and better performance.
But the internal ML operating environment was Python-centric. The reference examples, the shareable code, the deployment patterns — all in Python. When you need advice and reviews, the same language felt like the right call. I set aside the performance angle and chose continuity of operations.
The framework choice followed similar logic. I knew ONNX has better inference performance than sklearn, but for a lightweight model like LR, that gain wouldn’t move the needle. sklearn felt enough for training and saving, and forcing a heavy pipeline onto a light model seemed like overengineering to me.
ML Lifecycle Architecture: Split Into Three Components
I divided the ML Lifecycle into three components.
- Training batch: Periodically trains the LR model and pushes the trained model to the model store.
- Model store: Built on MLflow. Keeps versioned copies of models written by the training batch.
- Inference server: Loads the latest model from the store and serves real-time predictions.
flowchart LR
A["Training batch"] -->|"① push model
② move champion alias"| B["Model store
(MLflow)"]
A -->|"③ call Argo Rollouts API"| C["Inference server"]
B -.->|"④ load champion on pod startup"| C
The flow is simple: training batch → model store → inference server. The three components connect only through model files, and the training schedule runs independently from inference.
Inside the Training Batch: The Promotion Gate
The training batch wasn’t just “train → save.” Once training finished, the model had to pass through a Promotion Gate — a quality check — before the champion alias would move.
flowchart LR
A["Data loading"] --> B["Preprocessing"] --> C["Training"] --> D["Evaluation"] --> E{"Promotion Gate"}
E -->|"PASS"| F["Update champion alias
+ trigger rollout"]
E -->|"FAIL"| G["Keep current champion"]
The criteria were simple. If the trained model’s evaluation metrics crossed the predefined thresholds, it passed; otherwise, it failed. On pass, the champion alias moved to the new version and a rollout was triggered. On failure, the new model was only logged to the registry while the current champion kept serving traffic.
This meant a degraded model couldn’t accidentally reach production — without any code changes.
Deployment: Argo Rollouts
For getting new models into the inference server, I used Argo Rollouts — a natural choice since we were already running on k8s.
sequenceDiagram
participant T as Training batch
participant R as MLflow
participant I as Inference server pod
T->>R: ① Register new model
T->>R: ② Move champion alias to new version
T->>I: ③ Call Argo Rollouts API
Note over I: ④ Rollout replaces pods one by one
I->>R: ⑤ New pod loads the champion model
R-->>I: Model + metadata
Note over I: ⑥ Service resumes with the new model
MLflow’s alias feature lets you tag a model version with a name like “champion” to point at the current production model. When the training batch passes the Promotion Gate, it moves the champion alias to the new version and then calls the Argo Rollouts API to trigger a deployment. The rollout replaces inference server pods one at a time, and each new pod loads whichever model has the champion alias on startup before entering the service.
Looking Back
The LR + sklearn + MLflow combination was simple, and it ran light and fast.
What I regret most was choosing Python + sklearn. The inference server is currently a FastAPI-based Python service running on k8s. Each pod runs on a single core and loads its own copy of the model. As features grew, inference cost climbed and the number of pods grew with it. If we had gone with ONNX + Go and used multiple cores inside a single process, the same load could probably have been handled with fewer pods. At the time, I judged that continuity of operations was the right call — but the cost of that decision showed up in the operational phase.
Starting out, my biggest worry was “can I do this without an AI background?” By the end, I found that what I needed was a bit different. It wasn’t ML algorithms or infrastructure expertise — what mattered more was how precisely I understood the domain, and being able to judge which features to combine and how. Reading data and spotting patterns — that analysis skill — turned out to be just as important.