Towards high-quality (maybe synthetic) datasets

Towards high-quality (maybe synthetic) datasets

Author: Practical AI LLC October 9, 2024 Duration: 57:02

As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.


Sponsors:

  • Fly.ioThe home of Changelog.com — Deploy your apps close to your users — global Anycast load-balancing, zero-configuration private networking, hardware isolation, and instant WireGuard VPN connections. Push-button deployments that scale to thousands of instances. Check out the speedrun to get started in minutes. 
  • WorkOSA platform that gives developers a set of building blocks for quickly adding enterprise-ready features to their application. Add Single Sign-On (Okta, Azure, Google, Microsoft OAuth), sync users from any SCIM directory, HRIS integration, audit trails (SIEM), free magic link sign-in. WorkOS is designed for developers and offers a single, elegant interface that abstracts dozens of enterprise integrations. Learn more and get started at WorkOS.com
  • Eight SleepTake your sleep and recovery to the next level. Go to eightsleep.com/PRACTICALAI and use the code PRACTICALAI to get $350 off your very own Pod 4 Ultra. You can try it for free for 30 days - but we’re confident you will not want to return it. Once you experience AI-optimized sleep, you’ll wonder how you ever slept without it. Currently shipping to: United States, Canada, United Kingdom, Europe, and Australia. 

Featuring:

Show Notes:

Upcoming Events: 


There's a lot of noise out there about artificial intelligence, but cutting through the hype to find what's genuinely useful can be a challenge. That's the space where Practical AI operates. Hosted by the team at Practical AI LLC, this technology podcast moves beyond abstract theory to explore how AI, machine learning, and large language models are actually being applied right now. Each episode features unscripted conversations with a diverse mix of experts, developers, business leaders, and curious minds. You'll hear tangible discussions about implementing machine learning systems, the realities of MLOps, the evolution of neural networks, and the practical implications of breakthroughs in deep learning and GANs. The dialogue is grounded in real-world scenarios, focusing on how these technologies solve problems, drive productivity, and create value in accessible ways. Whether you're a professional building models, a business person integrating AI tools, or an enthusiast eager to understand the landscape, this podcast offers a clear, conversational entry point. It’s about making sense of a complex field through the lens of practical application, demystifying the concepts that are shaping our world without losing sight of how they work on the ground.
Author: Language: en-us Episodes: 100

Practical AI
Podcast Episodes
How is AI shaping democracy? [not-audio_url] [/not-audio_url]

Duration: 48:23
As AI increasingly shapes geopolitics, elections, and civic life, its impact on democracy is becoming impossible to ignore. In this episode, Daniel and Chris are joined by security expert Bruce Schneier to explore how AI…
Controlling AI Models from the Inside [not-audio_url] [/not-audio_url]

Duration: 43:55
As generative AI moves into production, traditional guardrails and input/output filters can prove too slow, too expensive, and/or too limited. In this episode, Alizishaan Khatri of Wrynx joins Daniel and Chris to explore…
2025 was the year of agents, what's coming in 2026? [not-audio_url] [/not-audio_url]

Duration: 51:15
In this start-of-year FC episode, Chris and Daniel break down what really mattered in AI in 2025, and what to expect in 2026. They explore the rise of AI agents, the practical reality of multimodal AI, and how reasoning…
Beyond chatbots: Agents that tackle your SOPs [not-audio_url] [/not-audio_url]

Duration: 45:53
As AI reshapes the workplace, employees and leaders face questions about meaningful work, automation, and human impact. In this episode, Jason Beutler, CEO of RoboSource, shares how companies can rethink workflows, integ…
The AI engineer skills gap [not-audio_url] [/not-audio_url]

Duration: 45:33
Chris and Daniel talk with returning guest, Ramin Mohammadi, about how those seeking to get into AI Engineer/ Data Science jobs are expected to come in a mid level engineers (not entry level). They explore this growing g…
Technical advances in document understanding [not-audio_url] [/not-audio_url]

Duration: 49:18
Chris and Daniel unpack how AI-driven document processing has rapidly evolved well beyond traditional OCR with many technical advances that fly under the radar. They explore the progression from document structure models…
Chris on AI, autonomous swarming, home automation and Rust! [not-audio_url] [/not-audio_url]

Duration: 1:37:09
This episode is a special crossover between the Practical AI podcast and The Changelog podcast. Chris was recently invited by longtime friends Jerod Santo and Adam Stacoviak, cohosts of The Changelog, to join them on the…
Beyond note-taking with Fireflies [not-audio_url] [/not-audio_url]

Duration: 48:59
Fireflies CEO, Krish Ramineni shares how the company is transforming AI-powered note-taking into a deeper layer of knowledge automation. He breaks down the technology behind real-time functionality like Live Assist, the…
Autonomous Vehicle Research at Waymo [not-audio_url] [/not-audio_url]

Duration: 52:08
Waymo’s VP of Research, Drago Anguelov, joins Practical AI to explore how advances in autonomy, vision models, and large-scale testing are shaping the future of driverless technology. The conversation dives into the dual…
Are we in an AI bubble? [not-audio_url] [/not-audio_url]

Duration: 49:41
Dan and Chris unpack whether today’s surge in AI deployment across enterprise workflows, manufacturing, healthcare, and scientific research signals a lasting transformation or an overhyped bubble. Drawing parallels to th…