Collecting and Processing Genomic Data to Help Cure Rare Diseases

with Nick Janetakis and Dan Kolbman

September 21, 202054m 59s

Audio is streamed directly from the publisher (runninginproduction.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

In this episode of Running in Production, Dan Kolbman goes over using Django to build an internal tool that helps make sense of ~5 Petabytes of Genomic data that is then made available to clinicians. It’s running across many different AWS resources using ECS Fargate.

Dan walks us through what their app does, dealing with loads of data, using GraphQL, getting away from using Serverless and going mostly all-in with AWS. Their apps are open source too. The ones we’ll be talking about are on GitHub here and here.

Topics Include

3:55 – Motivation for using Django and Python
6:11 – Using GraphQL and having a few separate apps (micro-service’ish)
11:16 – Querying ~5 Petabytes of Genomic data stored on S3
17:21 – Using both Graphene (GraphQL) and Django REST Framework
22:44 – Docker is being used in dev (Docker Compose) and in production (ECS Fargate)
25:20 – PostgreSQL and Redis are being used too with lots of background tasks
27:29 – Breaking down which AWS resources they use, along with using Terraform
37:02 – Netlify is being used for deploy previews and CloudFront for production
39:34 – Breaking down the work flow for deploying something from dev to prod
46:55 – Planning for disasters and handling backing up data
51:02 – Automated metrics around CPU and memory, along with alerting
52:29 – Best tips? Use tools that a lot of people have thought long and hard about
54:21 – You can find Dan on GitHub and his personal website

Collecting and Processing Genomic Data to Help Cure Rare Diseases

Show Notes

Topics Include

Links

📄 References

⚙️ Tech Stack