
Collecting and Processing Genomic Data to Help Cure Rare Diseases
with Nick Janetakis and Dan Kolbman
Running in Production · Nick Janetakis
September 21, 202054m 59s
Show Notes
In this episode of Running in Production, Dan Kolbman goes over using Django to build an internal tool that helps make sense of ~5 Petabytes of Genomic data that is then made available to clinicians. It’s running across many different AWS resources using ECS Fargate.
Dan walks us through what their app does, dealing with loads of data, using GraphQL, getting away from using Serverless and going mostly all-in with AWS. Their apps are open source too. The ones we’ll be talking about are on GitHub here and here.
Topics Include
- 3:55 – Motivation for using Django and Python
- 6:11 – Using GraphQL and having a few separate apps (micro-service’ish)
- 11:16 – Querying ~5 Petabytes of Genomic data stored on S3
- 17:21 – Using both Graphene (GraphQL) and Django REST Framework
- 22:44 – Docker is being used in dev (Docker Compose) and in production (ECS Fargate)
- 25:20 – PostgreSQL and Redis are being used too with lots of background tasks
- 27:29 – Breaking down which AWS resources they use, along with using Terraform
- 37:02 – Netlify is being used for deploy previews and CloudFront for production
- 39:34 – Breaking down the work flow for deploying something from dev to prod
- 46:55 – Planning for disasters and handling backing up data
- 51:02 – Automated metrics around CPU and memory, along with alerting
- 52:29 – Best tips? Use tools that a lot of people have thought long and hard about
- 54:21 – You can find Dan on GitHub and his personal website