Episode 177

177: Vector Databases

November 4, 20241h 28m

Audio is streamed directly from the publisher (s3.amazonaws.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Intro topic: Buying a Car

News/Links:

Cognitive Load is what Matters
- https://github.com/zakirullin/cognitive-load
Diffusion models are Real-Time Game Engines
- https://gamengen.github.io/
Your Company Needs Junior Devs
- https://softwaredoug.com/blog/2024/09/07/your-team-needs-juniors
Seamless Streaming / Fish Speech / LLaMA Omni
- Seamless: https://huggingface.co/facebook/seamless-streaming
- Fish: https://github.com/fishaudio/fish-speech
- LLaMA Omni: https://github.com/ictnlp/LLaMA-Omni

Book of the Show

Patrick:
- Thought Emporium Youtube
  - https://youtu.be/8X1_HEJk2Hw?si=T8EaHul-QMahyUvQ
Jason:
- Novel Minds
  - https://www.novelminds.ai/

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Patrick:
- Escape Simulator
  - https://pinestudio.com/games/escape-simulator/
Jason:
- Cursor IDE
  - https://www.cursor.com/

Topic: Vector Databases (~54 min)

How computers represent data traditionally
- ASCII values
- RGB values
How traditional compression works
- Huffman encoding (tree structure)
- Lossy example: Fourier Transform & store coefficients
How embeddings are computed
- Pairwise (contrastive) methods
- Forward models (self-supervised)
Similarity metrics
Approximate Nearest Neighbors (ANN)
Sub-Linear ANN
- Clustering
- Space Partitioning (e.g. K-D Trees)
What a vector database does
- Perform nearest-neighbors with many different similarity metrics
- Store the vectors and the data structures to support sub-linear ANN
- Handle updates, deletes, rebalancing/reclustering, backups/restores
Examples
- pgvector: a vector-database plugin for postgres
- Weaviate, Pinecone
- Milvus

★ Support this podcast on Patreon ★

Topics

Programming ThrowdownProgramming LanguagesCCJavaPythonObjective C

← All episodes of Programming Throwdown