EVA - A Framework for Evaluating Voice Agents by ServiceNow

Author: ServiceNow Community Podcasts April 29, 2026 Duration: 29:37

Voice AI agent evaluation — why it's fundamentally harder than text, how cascade failures derail conversations invisibly, and ServiceNow's open-source framework to establish industry evaluation standards. Featuring real audio examples showing authentication failures, leaked reasoning, and latency problems.

WHAT WE COVER

TARA BOGAVELLI — Research Engineer, ServiceNow
Leading the open-source voice agent evaluation framework. Explains why existing benchmarks don't measure what matters and what ServiceNow is releasing to establish industry standards.

KATRINA STANKIEWICZ — Staff Machine Learning Engineer, ServiceNow
Cascade model architecture expert. Breaks down STT → LLM → TTS failure modes, named entity transcription challenges, and real audio example analysis.

GABRIELLE GAUTHIER MELANÇON — Staff Applied Research Scientist, ServiceNow
Multi-language evaluation specialist. Reveals why Large Audio Language Models lag behind, the native speaker requirement, and bot-to-bot simulation methodology.

CHAPTERS
0:00 Introduction — The evaluation gap
1:11 ServiceNow's Open-Source Framework Announcement — Tara Bogavelli
2:43 Meet the Researchers
3:43 Voice-Specific Challenges — Tara Bogavelli
5:03 Cascade Architecture: STT → LLM → TTS — Katrina Stankiewicz
7:57 The Named Entity Problem — Katrina Stankiewicz
10:06 Evaluation Metrics: Accuracy vs Experience — Gabrielle Gauthier Melançon
11:23 Bot-to-Bot Testing at Scale — Gabrielle Gauthier Melançon
14:30 The LALM Gap: Why Audio AI Judges Struggle — Tara Bogavelli
16:57 Real Audio Example: Flight Rebooking Gone Wrong
21:58 Breaking Down the Failures — Katrina Stankiewicz 28:30 Wrap-Up & Resources

KEY INSIGHTS

The Cascade Failure Problem: STT → LLM → TTS errors propagate invisibly Named Entity Transcription: The #1 enterprise blocker—names, confirmation codes, emails break authentication Accuracy vs Experience: Perfect task completion means nothing if users hang up due to poor experience LALM Gap: Large Audio Language Models lag behind text LLMs—human evaluators remain essential Latency Kills Conversations: Five-second pauses make users think the call dropped, breaking the experience even when tasks complete Open-Source Framework: ServiceNow releasing evaluation tools, metrics, and bot-to-bot simulation methodology for the industry.

LEARN MORE

Website: https://servicenow.github.io/eva/ GitHub:
https://github.com/servicenow/eva Blog Post:
https://huggingface.co/blog/ServiceNow-AI/eva Dataset: https://huggingface.co/datasets/ServiceNow-AI/eva

ABOUT

Hosted by Bobby Brill. ServiceNow Insights podcast explores AI research, real-world applications, and the people building the future of work. #VoiceAI #AIEvaluation #ServiceNow #MachineLearning #OpenSource #ConversationalAI #STT #TTS #LLM #VoiceAgents #AIResearch #Podcast

See omnystudio.com/listener for privacy information.

ServiceNow Podcasts

Dive into the dynamic world of ServiceNow through the voices of those who know it best. ServiceNow Podcasts, curated by the ServiceNow Community Podcasts, isn't a single show but a rich library of conversations. Each series within this collection is hosted by different practitioners and experts, bringing you grounded discussions from a wide array of focus areas. Whether you're looking for the latest platform updates, deep dives into implementation strategies, or insights on IT service management and beyond, this podcast has a channel for you. You'll hear authentic talk about real-world applications, emerging trends, and the practical knowledge that professionals are using right now. It’s like having a direct line to the community’s collective experience. Tuning in means accessing a broad variety of topics designed to inform and educate, all framed within the context of what's happening at ServiceNow and in the ecosystems around it. For anyone working with or interested in the platform, this ongoing series of dialogues is an essential resource for staying current and connected. Find your niche, learn something new, and get the perspective only insiders can provide through these regular episodes.

Author: ServiceNow Community Podcasts Language: en-us Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

It's Friday, Juan and Tim rant with Data Day Texas takeaways

31.01.2026

Duration: 34:05

Juan and Tim got together for a beer to rant about what's on their mind, the latest in the data world and share Data Day Texas takeawaysSee omnystudio.com/listener for privacy information.

[not-audio_url]

[/not-audio_url]

All things Data Lineage with Dr. Irina Steenbeek

29.01.2026

Duration: 58:14

Dr. Irina Steenbeek is a data practitioner, and author of multiple books including Data Lineage from a Business Perspective. In this episode, Juan and Tim discuss with Irina the practical realities of implementing data l…

[not-audio_url]

[/not-audio_url]

TAKEAWAY - All things Data Lineage with Dr. Irina Steenbeek

29.01.2026

Duration: 3:34

This is the takeaway episode with Dr. Irina Steenbeek, a data practitioner, and author of multiple books including Data Lineage from a Business Perspective. In this episode, Juan and Tim discuss with Irina the practical…

[not-audio_url]

[/not-audio_url]

Risk at Scale: Governing AI, Sustainability, and Responsible Technology at National Grid

26.01.2026

Duration: 21:28

A conversation with Jody Elliott, Head of IT Risk and Sustainability at National Grid, on embedding responsible AI and sustainability at scale. In this episode of the ServiceNow Executive Circle Podcast, Jody Elliott, He…

[not-audio_url]

[/not-audio_url]

People, Skills, and AI: Transforming Talent for the Future of Financial Services

26.01.2026

Duration: 24:07

AI, Skills, and the Future of Work at Aviva In this episode of the ServiceNow Executive Circle podcast, Kat Finch is joined by Dan Godfrey, Group People Transformation and Talent Director at Aviva, to explore how people…

[not-audio_url]

[/not-audio_url]

Ontologies Aren’t New with Oscar Corcho

22.01.2026

Duration: 43:12

Ontologies are suddenly everywhere but they didn’t come out of nowhere. In this episode, Juan and Tim chat with Professor Oscar Corcho, who has been doing ontology engineering since the early 2000s (check out his book On…

[not-audio_url]

[/not-audio_url]

TAKEAWAY - Ontologies Aren’t New with Oscar Corcho

22.01.2026

Duration: 3:37

This is the takeaway episode of our conversation with Professor Oscar Corcho, who has been doing ontology engineering since the early 2000s where we demystify what ontologies actually are, where they came from, and why t…

[not-audio_url]

[/not-audio_url]

It's Friday! Juan and Tim rant about data and context graphs

17.01.2026

Duration: 20:23

Juan and Tim rant about Context Graphs and categorizations of how companies work with data.See omnystudio.com/listener for privacy information.

[not-audio_url]

[/not-audio_url]

2026 Trends in the data world with Tony Baer and Matt Housley

15.01.2026

Duration: 49:46

Tim and Juan chat with Tony Baer and Matt Housley, hosts of the “It’s About Data” podcast about what are the trends they are seeing with the start of 2026. We talked about AI magical thinking, Agentic architectures, Grap…

[not-audio_url]

[/not-audio_url]

TAKEAWAY - 2026 Trends in the data world with Tony Baer and Matt Housley

15.01.2026

Duration: 4:33

This is the takeaway episode about the chat that Tim and Juan have with Tony Baer and Matt Housley, hosts of the “It’s About Data” podcast about what are the trends they are seeing with the start of 2026. We talked about…