Data Versioning for Machine Learning Practitioners

In this talk, we will discuss the importance of data versioning in machine learning and explore various methods and tools for implementing it with different data sources. We will cover the following topics:

What is data versioning, and why is it important?
The difference between model versioning and data versioning
The challenges of keeping track of different versions of data
The benefits of data versioning for machine learning projects
Best practices for implementing data versioning in your workflow
An overview of the top data versioning tools in the market and a clear explanation of the strengths and weaknesses of each tool so you can choose the one that best fits your needs.

The talk will be technical but suitable for a broad selection of software developers. We will provide code examples and advanced ideas, but we will not assume every audience member to be an expert in every aspect of AI.

Yonatan is a developer advocate at XetHub. A pioneer in serverless machine learning, he was a founder’s engineer in BuiltOn, Cybear, and Vaex. He spent his last ten years helping startups implement machine learning and design smart solutions. At XetHub, he builds cool stuff and gives technical talks. In his free time, you can find him playing volleyball, basketball, and guitar, doing improv and teaching salsa.