PLAY PODCASTS
Data Science at Home

Data Science at Home

310 episodes — Page 4 of 7

Ep 157A simple trick for very unbalanced data (Ep. 157)

Data from the real world are never perfectly balanced. In this episode I explain a simple yet effective trick to train models with very unbalanced data. Enjoy the show! Sponsors Get one of the best VPN at a massive discount with coupon code DATASCIENCE. It provides you with an 83% discount which unlocks the best price in the market plus 3 extra months for free. Here is the link https://surfshark.deals/DATASCIENCE References Leo Breiman, Random Forests, 2001 C. Chen, A. Liaw, L. Breiman, Using Random Forest to Learn Imbalanced Data (2004)

Jun 22, 202122 min

Ep 156Time to take your data back with Tapmydata (Ep. 156)

In this episode I am with Gilbert Hill, head of strategy at https://tapmydata.com/ We speak about personal data, blockchain and the ability to control it and monetize with another simple yet effective app in the ecosystem. References https://tapmydata.com/ https://medium.com/@tholder/we-dont-want-your-data-pushing-boundaries-in-data-collection-and-end-to-end-encryption-for-apps-ebd1d5f79df5

Jun 15, 202141 min

Ep 155True Machine Intelligence just like the human brain (Ep. 155)

In this episode I have a really interesting conversation with Karan Grewal, member of the research staff at Numenta where he investigates how biological principles of intelligence can be translated into silicon. We speak about the thousand brains theory and why neural networks forget. References Main paper on the Thousand Brains Theory: https://www.frontiersin.org/articles/10.3389/fncir.2018.00121/full Blog post on Thousand Brains Theory: https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/ GLOM paper by Geoff Hinton: https://arxiv.org/pdf/2102.12627.pdf Why neural networks forget? https://numenta.com/blog/2021/02/04/why-neural-networks-forget-and-lessons-from-the-brain

Jun 4, 202133 min

Ep 154Delivering unstoppable data with Streamr (Ep. 154)

Delivering unstoppable data to unstoppable apps is now possible with Streamr Network Streamr is a layer zero protocol for real-time data which powers the decentralized Streamr pub/sub network. The technology works in tandem with companion blockchains - currently Ethereum and xDai chain - which are used for identity, security and payments. On top is the application layer, including the Data Union framework, Marketplace and Core, and all third party applications. In this episode I have a very interesting conversation with Streamr founder and CEO Henri Pihkala References Streamr project website: https://streamr.network/ More about the Streamr Network: https://streamr.network/discover/network More about Data Unions: https://streamr.network/discover/data-unions More about the Data Marketplace: https://streamr.network/discover/marketplace Developer docs: https://streamr.network/docs Streamr Github: https://github.com/streamr-dev Streamr Discord: https://discord.gg/gZAm8P7hK8 Streamr Twitter: https://twitter.com/streamr Streamr YouTube: https://www.youtube.com/channel/UCGWEA61RueG-9DV53s-ZyJQ Streamr Reddit: https://reddit.com/r/streamr Scalability & latency research blog: https://blog.streamr.network/streamr-network-performance-and-scalability-whitepaper/ Swash, a Data Union built on Streamr: https://swashapp.io/

May 26, 202143 min

Ep 153MLOps: the good, the bad and the ugly (Ep. 153)

Our Sponsor Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

May 24, 202124 min

Ep 152MLOps: what is and why it is important Part 2 (Ep. 152)

Our Sponsor Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

May 19, 202130 min

Ep 151MLOps: what is and why it is important (Ep. 151)

If you think that knowing Tensorflow and Scikit-learn is enough, think again. MLOps is one of those trendy terms today. What is MLOps and why is it important? In this episode I speak about the undeniable evolution of the data scientist in the last 5-10 years. Sponsors If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery. Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

May 11, 202133 min

Ep 150Can I get paid for my data? With Mike Andi from Mytiki (Ep. 150)

Your data is worth thousands a year. Why aren’t you getting your fair share? There is a company that has a mission: they want you to take back control and get paid for your data. In this episode I speak about knowledge graphs, data confidentiality and privacy with Mike Audi, CEO of MyTiki. You can reach them on their website https://mytiki.com/ Discord official channel https://discord.com/invite/evjYQq48Be Telegram https://t.me/mytikiapp Signal https://signal.group/#CjQKIA66Eq2VHecpcCd-cu-dziozMRSH3EuQdcZJNyMOYNi5EhC0coWtjWzKQ1dDKEjMqhkP

Apr 28, 202139 min

Ep 149Building high-growth data businesses with Lillian Pierson (Ep. 149)

In this episode I have an amazing conversation with Lillian Pierson from data-mania.com This is an action-packed episode on how data professionals can quickly convert their data expertise into high-growth data businesses, all by selecting optimal business models, revenue models, and pricing structures. If you want to know more or get in touch with Lillian, follow the links below: Weekly Free Trainings: We currently publish 1 free training per week on YouTube! https://www.youtube.com/channel/UCK4MGP0A6lBjnQWAmcWBcKQ Becoming World-Class Data Leaders and Data Entrepreneurs Facebook Group: https://www.facebook.com/groups/data.leaders.and.entrepreneurs LinkedIn: https://www.linkedin.com/in/lillianpierson/ The Data Entrepreneur’s Toolkit: A recommendation set for 32 free (or low-cost) tools & processes that'll actually grow your data business (even if you still haven’t put up that website yet!). https://www.data-mania.com/data-entrepreneur-toolkit/

Apr 19, 202125 min

Ep 148Learning and training in AI times (Ep. 148)

Is there a gap between life sciences and data science? What's the situation when it comes to interdisciplinary research? In this episode I am with Laura Harris, Director of Training for the Institute of Cyber-Enabled Research (ICER) at Michigan State University (MSU), and we try to answer some of those questions. You can contact Laura at [email protected] or on LinkedIn

Apr 13, 202131 min

Ep 147You are the product [RB] (Ep. 147)

In this episode I am with George Hosu from Cerebralab and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery. Links https://cerebralab.com https://www.eugenewei.com/blog/2019/2/19/status-as-a-service

Apr 11, 202145 min

Ep 146Polars: the fastest dataframe crate in Rust - with Ritchie Vink (Ep. 146)

In this episode I speak with Ritchie Vink, the author of Polars, a crate that is the fastest dataframe library at date of speaking :) If you want to participate to an amazing Rust open source project, this is your change to collaborate to the official repository in the references. References https://github.com/ritchie46/polars

Apr 8, 202132 min

Ep 145Apache Arrow, Ballista and Big Data in Rust with Andy Grove (Ep. 145)

Do you want to know the latest in big data analytics frameworks? Have you ever heard of Apache Arrow? Rust? Ballista? In this episode I speak with Andy Grove one of the main authors of Apache Arrow and Ballista compute engine. Andy explains some challenges while he was designing the Arrow and Ballista memory models and he describes some amazing solutions. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery. References https://arrow.apache.org/ https://ballistacompute.org/ https://github.com/ballista-compute/ballista

Mar 26, 202130 min

Ep 144Pandas vs Rust (Ep. 144)

Pandas is the de-facto standard for data loading and manipulation. Python is the de-facto programming language for such operations. Rust is the underdog. Or is it? In this episode I am showing you why that is no longer the case. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. Useful Links https://github.com/haixuanTao/Data-Manipulation-Rust-Pandas https://github.com/ritchie46/polars https://github.com/rust-ndarray/ndarray

Mar 19, 202131 min

Ep 143Concurrent is not parallel - Part 2 (Ep. 143)

In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems. Rock-star data scientists must know how concurrency works and when to use it IMHO. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. Useful Links http://web.mit.edu/6.005/www/fa14/classes/17-concurrency/ https://doc.rust-lang.org/book/ch16-00-concurrency.html https://urban-institute.medium.com/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba

Mar 13, 202115 min

Ep 142Concurrent is not parallel - Part 1 (Ep. 142)

In plain English, concurrent and parallel are synonyms. Not for a CPU. And definitely not for programmers. In this episode I summarize the ways to parallelize on different architectures and operating systems. Rock-star data scientists must know how concurrency works and when to use it IMHO. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Mar 10, 202132 min

Ep 141Backend technologies for machine learning in production (Ep. 141)

This is one of the most dynamic and fascinating topics: API technologies for machine learning. It's always fun to build ML models. But how about serving them in the real world? In this episode I speak about three must-know technologies to place your model behind an API. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery.

Mar 2, 202125 min

Ep 140You are the product (Ep. 140)

In this episode I am with George Hosu from Cerebralab and we speak about how dangerous it is not to pay for the services you use, and as a consequence how dangerous it is letting an algorithm decide what you like or not. Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience If building software is your passion, you’ll love ThoughtWorks Technology Podcast. It’s a podcast for techies by techies. Their team of experienced technologists take a deep dive into a tech topic that’s piqued their interest — it could be how machine learning is being used in astrophysics or maybe how to succeed at continuous delivery. Links https://cerebralab.com https://www.eugenewei.com/blog/2019/2/19/status-as-a-service

Feb 22, 202145 min

Ep 139How to reinvent banking and finance with data and technology (Ep. 139)

The financial system is changing. It is becoming more efficient and integrated with many more services making our life more... digital. Is the old banking system doomed to fail? Or will it just be disrupted by the smaller players of the fintech industry? In this episode we answer some of these fundamental questions with Alessandro E. Hatami from Pacemakers Subscribe to the Newsletter and come chat with us on the official Discord channel Our Sponsors This episode is supported by Chapman’s Schmid College of Science and Technology, where master’s and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Feb 15, 202136 min

Ep 138What's up with WhatsApp? (Ep. 138)

Have you clicked the button? Accepted the new terms? It's time we have a talk.

Feb 7, 202130 min

Ep 137Is Rust flexible enough for a flexible data model? (Ep. 137)

In this podcast I get inspired by Paul Done's presentation about The Six Principles for Building Robust Yet Flexible Shared Data Applications, and show how powerful of a language Rust is while still maintaining the flexibility of less strict languages. Our Sponsor This episode is supported by Chapman’s Schmid College of Science and Technology, where master's and PhD students join in cutting-edge research as they prepare to take the next big leap in their professional journey. To learn more about the innovative tools and collaborative approach that distinguish the Chapman program in Computational and Data Sciences, visit chapman.edu/datascience

Feb 1, 202128 min

Ep 136Is Apple M1 good for machine learning? (Ep.136)

In this episode I explain the basics of computer architecture and introduce some features of the Apple M1 Is it good for Machine Learning tasks? References Computer architectures book https://www.amazon.com/Computer-Architecture-Quantitative-John-Hennessy/dp/012383872X Performance https://nod.ai/comparing-apple-m1-with-amx2-m1-with-neon/

Jan 25, 202128 min

Ep 135Rust and deep learning with Daniel McKenna (Ep. 135)

In this episode I speak with Daniel McKenna about Rust, machine learning and artificial intelligence. You can find Daniel from http://github.com/xd009642 https://twitter.com/xd009642 Don't forget to come join me in our Discord channel speaking about all things data science. Subscribe to the official Newsletter and never miss an episode

Jan 18, 202122 min

Ep 134Scaling machine learning with clusters and GPUs (Ep. 134)

Let's finish this year with an amazing episode about scaling ML with clusters and GPUs. Kind of as a continuation of Episode 112 I have a terrific conversation with Aaron Richter from Saturn Cloud about, well, making ML faster and scaling it to massive infrastructure. Aaron can be reached on his website https://rikturr.com and Twitter @rikturr Our Sponsor Saturn Cloud is a data science and machine learning platform for scalable Python analytics. Users can jump into cloud-based Jupyter and Dask to scale Python for big data using the libraries they know and love, while leveraging Docker and Kubernetes so that work is reproducible, shareable, and ready for production. Try Saturn Cloud for free at https://saturncloud.io Twitter: @saturn_cloud

Dec 31, 202030 min

Ep 133What is data ethics? (Ep. 133)

What is data ethics? In this episode I have an interesting chat with Denny Wong from FaqBot and Muna. Our Sponsor Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. References Denny's Twitter profile The data ethics awareness workshop for AI practitioners

Dec 19, 202025 min

Ep 132A Standard for the Python Array API (Ep. 132)

Our Links Come join me in our Discord channel speaking about all things data science. Subscribe to the official Newsletter and never miss an episode Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors ProtonMail offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonMail and ProtonVPN’s apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world’s strongest privacy laws Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. References https://data-apis.org/blog/announcing_the_consortium https://data-apis.github.io/array-api/latest/ https://github.com/data-apis/python-record-api

Dec 8, 202033 min

Ep 131What happens to data transfer after Schrems II? (Ep. 131)

In this episode Adam Leon Smith, CTO of DragonFly and expert in data regulations explains some of the consequences of Schrems II and data transfers from EU to US. For very interesting references and a practical example, subscribe to our Newsletter

Dec 4, 202031 min

Ep 130Test-First Machine Learning [RB] (Ep. 130)

Our Links Come join me in our Discord channel speaking about all things data science. Subscribe to the official Newsletter and never miss an episode Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors ProtonMail offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonMail and ProtonVPN’s apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world’s strongest privacy laws Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Dec 1, 202020 min

Ep 129Similarity in Machine Learning (Ep. 129)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Subscribe to the official Newsletter and never miss an episode Our Sponsors ProtonMail offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonMail and ProtonVPN's apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world’s strongest privacy laws Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Nov 24, 202030 min

Ep 128Distill data and train faster, better, cheaper (Ep. 128)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. References Dataset distillation (official paper) GitHub repo

Nov 17, 202023 min

Ep 127Machine Learning in Rust: Amadeus with Alec Mocatta [RB] (ep. 127)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors ProtonVPN offers a simple and trusted solution to protect your internet connection and access blocked or restricted websites. All of ProtonVPN’s apps are open source and have been inspected by cybersecurity experts, and Proton is based in Switzerland, home to some of the world's strongest privacy laws Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Nov 11, 202024 min

Ep 126Top-3 ways to put machine learning models into production (Ep. 126)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors physicspodcast.com is not just a physics podcast. But also interviews with scientists, scholars, authors and reflections on the history and future of science and technology are all in the wheelhouse. Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business.

Nov 7, 202020 min

Ep 125Remove noise from data with deep learning (Ep.125)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors ProtonMail is a secure and private email provider that protects yourmessages with end-to-end encryption and zero-access encryption so that besides you, noone can access them. Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. References DeepInterpolation

Nov 3, 202023 min

Ep 124What is contrastive learning and why it is so powerful? (Ep. 124)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python Our Sponsors The Monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com Amethix use advanced Artificial Intelligence and Machine Learning to build data platforms and predictive engines in domain like finance, healthcare, pharmaceuticals, logistics, energy. Amethix provide solutions to collect and secure data with higher transparency and disintermediation, and build the statistical models that will support your business. References A Simple Framework for Contrastive Learning of Visual Representations

Oct 30, 202026 min

Ep 123Neural search (Ep. 123)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python This episode is supported by Monday.com The Monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com.

Oct 23, 202019 min

Ep 122Let's talk about federated learning (Ep. 122)

Let's talk about federated learning. Why is it important? Why large organizations are not ready yet? Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python This episode is supported by Monday.com The Monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com.

Oct 18, 202030 min

Ep 121How to test machine learning in production (Ep. 121)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python This episode is supported by Monday.com Monday.com bring teams together so you can plan, manage and track everything your team is working on in one centralized place The monday Apps Challenge is bringing developers around the world together to compete in order to build apps that can improve the way teams work together on monday.com.

Oct 11, 202028 min

Ep 120Why synthetic data cannot boost machine learning (Ep. 120)

Come join me in our Discord channel speaking about all things data science. Follow me on Twitch during my live coding sessions usually in Rust and Python This episode is supported by Women in Tech by Manning Conferences

Sep 26, 202023 min

Ep 119Machine learning in production: best practices [LIVE from twitch.tv] (Ep. 119)

Hey there! Having the best time of my life ;) This is the first episode I record while I am live on my new Twitch channel :) So much fun! Feel free to follow me for the next live streaming. You can also see me coding machine learning stuff in Rust :)) Don't forget to jump on the usual Discord and have a chat I'll see you there!

Sep 16, 202037 min

Ep 118Testing in machine learning: checking deeplearning models (Ep. 118)

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning. We cover testing with deep learning (neuron coverage, threshold coverage, sign change coverage, layer coverage, etc.), combinatorial testing and their practical aspects. On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

Sep 4, 202018 min

Ep 117Testing in machine learning: generating tests and data (Ep. 117)

In this episode I speak with Adam Leon Smith, CTO at DragonFly and expert in testing strategies for software and machine learning. On September 15th there will be a live@Manning Rust conference. In one Rust-full day you will attend many talks about what's special about rust, building high performance web services or video game, about web assembly and much more. If you want to meet the tribe, tune in september 15th to the live@manning rust conference.

Aug 29, 202020 min

Ep 116Why you care about homomorphic encryption (Ep. 116)

After deep learning, a new entry is about ready to go on stage. The usual journalists are warming up their keyboards for blogs, news feeds, tweets, in one word, hype. This time it's all about privacy and data confidentiality. The new words, homomorphic encryption. Join and chat with us on the official Discord channel. Sponsors This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence. References Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector IBM Fully Homomorphic Encryption Toolkit for Linux

Aug 12, 202018 min

Ep 112Test-First machine learning (Ep. 115)

In this episode I speak about a testing methodology for machine learning models that are supposed to be integrated in production environments. Don't forget to come chat with us in our Discord channel Enjoy the show! -- This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Aug 3, 202019 min

Ep 111GPT-3 cannot code (and never will) (Ep. 114)

The hype around GPT-3 is alarming and gives and provides us with the awful picture of people misunderstanding artificial intelligence. In response to some comments that claim GPT-3 will take developers' jobs, in this episode I express some personal opinions about the state of AI in generating source code (and in particular GPT-3). If you have comments about this episode or just want to chat, come join us on the official Discord channel. This episode is supported by Amethix Technologies. Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. They are a consulting firm focused on data science, machine learning, and artificial intelligence.

Jul 26, 202019 min

Ep 110Make Stochastic Gradient Descent Fast Again (Ep. 113)

There is definitely room for improvement in the family of algorithms of stochastic gradient descent. In this episode I explain a relatively simple method that has shown to improve on the Adam optimizer. But, watch out! This approach does not generalize well. Join our Discord channel and chat with us. References More descent, less gradient Taylor Series

Jul 22, 202020 min

Ep 109What data transformation library should I use? Pandas vs Dask vs Ray vs Modin vs Rapids (Ep. 112)

In this episode I speak about data transformation frameworks available for the data scientist who writes Python code. The usual suspect is clearly Pandas, as the most widely used library and de-facto standard. However when data volumes increase and distributed algorithms are in place (according to a map-reduce paradigm of computation), Pandas no longer performs as expected. Other frameworks play a role in such context. In this episode I explain the frameworks that are the best equivalent to Pandas in bigdata contexts. Don't forget to join our Discord channel and comment previous episodes or propose new ones. This episode is supported by Amethix Technologies Amethix works to create and maximize the impact of the world’s leading corporations, startups, and nonprofits, so they can create a better future for everyone they serve. Amethix is a consulting firm focused on data science, machine learning, and artificial intelligence. References Pandas a fast, powerful, flexible and easy to use open source data analysis and manipulation tool - https://pandas.pydata.org/ Modin - Scale your pandas workflows by changing one line of code - https://github.com/modin-project/modin Dask advanced parallelism for analytics https://dask.org/ Ray is a fast and simple framework for building and running distributed applications https://github.com/ray-project/ray RAPIDS - GPU data science https://rapids.ai/

Jul 19, 202021 min

Ep 108[RB] It’s cold outside. Let’s speak about AI winter (Ep. 111)

In this episode I speak with Filip Piekniewski about some of the most worth noting findings in AI and machine learning in 2019. As a matter of fact, the entire field of AI has been inflated by hype and claims that are hard to believe. A lot of the promises made a few years ago have revealed quite hard to achieve, if not impossible. Let's stay grounded and realistic on the potential of this amazing field of research, not to bring disillusion in the near future. Join us to our Discord channel to discuss your favorite episode and propose new ones. This episode is brought to you by Protonmail Click on the link in the description or go to protonmail.com/datascience and get 20% off their annual subscription.

Jul 3, 202036 min

Ep 107Rust and machine learning #4: practical tools (Ep. 110)

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly. To make a comparison with the Python ecosystem I will cover frameworks for linear algebra (numpy), dataframes (pandas), off-the-shelf machine learning (scikit-learn), deep learning (tensorflow) and reinforcement learning (openAI). Rust is the language of the future. Happy coding! Reference BLAS linear algebra https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms Rust dataframe https://github.com/nevi-me/rust-dataframe Rustlearn https://github.com/maciejkula/rustlearn Rusty machine https://github.com/AtheMathmo/rusty-machine Tensorflow bindings https://lib.rs/crates/tensorflow Juice (machine learning for hackers) https://lib.rs/crates/juice Rust reinforcement learning https://lib.rs/crates/rsrl

Jun 29, 202024 min

Ep 106Rust and machine learning #3 with Alec Mocatta (Ep. 109)

In the 3rd episode of Rust and machine learning I speak with Alec Mocatta. Alec is a +20 year experience professional programmer who has been spending time at the interception of distributed systems and data analytics. He's the founder of two startups in the distributed system space and author of Amadeus, an open-source framework that encourages you to write clean and reusable code that works, regardless of data scale, locally or distributed across a cluster. Only for June 24th, LDN *Virtual* Talks June 2020 with Bippit (Alec speaking about Amadeus)

Jun 22, 202023 min

Ep 105Rust and machine learning #2 with Luca Palmieri (Ep. 108)

In the second episode of Rust and Machine learning I am speaking with Luca Palmieri, who has been spending a large part of his career at the interception of machine learning and data engineering. In addition, Luca contributed to several projects closer to the machine learning community using the Rust programming language. Linfa is an ambitious project that definitely deserves the attention of the data science community (and it's written in Rust, with Python bindings! How cool??!). References Series Announcement - Zero to Production in Rust https://www.lpalmieri.com/posts/2020-05-10-announcement-zero-to-production-in-rust/ Zero To Production #0: Foreword https://www.lpalmieri.com/posts/2020-05-24-zero-to-production-0-foreword/ Taking ML to production with Rust: a 25x speedup https://www.lpalmieri.com/posts/2019-12-01-taking-ml-to-production-with-rust-a-25x-speedup/

Jun 19, 202027 min