TradeRev Is a Machine Learning Vehicle Appraisal / Auctioning System

with Nick Janetakis and Amit Jain

May 11, 202043m 5s

Show Notes

In this episode of Running in Production, Amit Jain goes over building an auctioning system that uses machine / deep learning and is powered by Flask and Python. It’s all hosted on AWS and has been up and running since mid 2011.

Amit goes over a few machine learning libraries, refactoring a 100k+ line monolith into microservices without any automated tests, the importance of machine learning accuracy, using a bunch of AWS services to deploy a large site, treating your infrastructure as code and more.

Topics Include

3:58 – Amit lead a team of ~10 R&D engineers responsible for Data Science / ML
4:33 – Roughly 1,000 cars a day are being traded with 8-10k auctions / bids per day
5:15 – Motivation for using Flask and Python
6:55 – Scikit-Learn and TensorFlow for machine / deep learning
7:39 – Did things start off with multiple microservices or was it a monolith early on?
9:41 – There’s about 80,000 to 120,000 lines of code across 200-300+ Python files
10:14 – The huge refactor to microservices was done without automated tests initially
11:11 – After the refactor now there’s 86% test coverage which is enough to be confident
12:24 – Flask-Restplus is the main library used to build their RESTful APIs
12:43 – Other notable libraries were gunicorn and boto3 (AWS SDK for Python)
13:05 – Locust is an open source load / performance testing tool
13:40 – With machine learning, speed is important but accuracy is even more important
15:30 – gunicorn is very compact, performant and easy to configure
16:28 – Most caches were in memory and they used Amazon DynamoDB
17:09 – The primary database is MySQL running on Amazon RDS
18:04 – SQLAlchemy is used on the Python side as an ORM
19:29 – Docker is sort of being used in development
21:02 – The platform runs on AWS with Lambda, API Gateway and AWS Fargate with ECS
22:24 – What is AWS Fargate and what does it allow you to do?
23:48 – Scaling with Fargate while using auto scaling policies and configuration
26:28 – Taking advantage of the cloud and setting up load balancing with configuration
28:04 – How do you deal with secrets when using Fargate / ECS?
30:02 – What about logging and metrics? Are you exclusively using all of AWS’ services?
31:12 – What about error reporting, such as getting notified if an error happens
31:34 – The deploy process from development to production (includes CI / CD with Jenkins)
33:26 – A Walk through of how the different AWS services come together
36:54 – Terraform is being used to manage the infrastructure as code (valuable tool)
40:04 – Database backups were performed by the DevOps team
40:41 – Best tips? Start slow and expect failures, also don’t chase perfection
42:14 – You can find Amit on Twitter at @ml_amit and on LinkedIn

TradeRev Is a Machine Learning Vehicle Appraisal / Auctioning System

Show Notes

Topics Include

Links

📄 References

⚙️ Tech Stack