EVA - A Framework for Evaluating Voice Agents by ServiceNow

EVA - A Framework for Evaluating Voice Agents by ServiceNow

Author: ServiceNow Community April 29, 2026 Duration: 29:37

Voice AI agent evaluation — why it's fundamentally harder than text, how cascade failures derail conversations invisibly, and ServiceNow's open-source framework to establish industry evaluation standards. Featuring real audio examples showing authentication failures, leaked reasoning, and latency problems.

WHAT WE COVER 

TARA BOGAVELLI — Research Engineer, ServiceNow
Leading the open-source voice agent evaluation framework. Explains why existing benchmarks don't measure what matters and what ServiceNow is releasing to establish industry standards.

KATRINA STANKIEWICZ — Staff Machine Learning Engineer, ServiceNow
Cascade model architecture expert. Breaks down STT → LLM → TTS failure modes, named entity transcription challenges, and real audio example analysis.

GABRIELLE GAUTHIER MELANÇON — Staff Applied Research Scientist, ServiceNow
Multi-language evaluation specialist. Reveals why Large Audio Language Models lag behind, the native speaker requirement, and bot-to-bot simulation methodology. 

CHAPTERS
0:00 Introduction — The evaluation gap
1:11 ServiceNow's Open-Source Framework Announcement — Tara Bogavelli
2:43 Meet the Researchers
3:43 Voice-Specific Challenges — Tara Bogavelli
5:03 Cascade Architecture: STT → LLM → TTS — Katrina Stankiewicz
7:57 The Named Entity Problem — Katrina Stankiewicz
10:06 Evaluation Metrics: Accuracy vs Experience — Gabrielle Gauthier Melançon
11:23 Bot-to-Bot Testing at Scale — Gabrielle Gauthier Melançon 
14:30 The LALM Gap: Why Audio AI Judges Struggle — Tara Bogavelli
16:57 Real Audio Example: Flight Rebooking Gone Wrong
21:58 Breaking Down the Failures — Katrina Stankiewicz 28:30 Wrap-Up & Resources

KEY INSIGHTS

The Cascade Failure Problem: STT → LLM → TTS errors propagate invisibly Named Entity Transcription: The #1 enterprise blocker—names, confirmation codes, emails break authentication Accuracy vs Experience: Perfect task completion means nothing if users hang up due to poor experience LALM Gap: Large Audio Language Models lag behind text LLMs—human evaluators remain essential Latency Kills Conversations: Five-second pauses make users think the call dropped, breaking the experience even when tasks complete Open-Source Framework: ServiceNow releasing evaluation tools, metrics, and bot-to-bot simulation methodology for the industry.

LEARN MORE

Website: https://servicenow.github.io/eva/ GitHub:
https://github.com/servicenow/eva Blog Post:
https://huggingface.co/blog/ServiceNow-AI/eva Dataset: https://huggingface.co/datasets/ServiceNow-AI/eva

ABOUT

Hosted by Bobby Brill. ServiceNow Insights podcast explores AI research, real-world applications, and the people building the future of work. #VoiceAI #AIEvaluation #ServiceNow #MachineLearning #OpenSource #ConversationalAI #STT #TTS #LLM #VoiceAgents #AIResearch #Podcast

See omnystudio.com/listener for privacy information.


Ever wondered what happens behind the scenes of the platform that powers so much of today's enterprise workflow? ServiceNow Insights pulls back the curtain, offering a direct line to the minds building and shaping the technology. This isn't a series of polished press releases; it's a collection of genuine conversations with the engineers, product managers, and innovators from the ServiceNow Community who are doing the actual work. You'll hear the unscripted stories behind the latest features-not just what they do, but the real-world problems they aim to solve and the interesting challenges encountered along the way. Each episode delves into the practical implications of new updates and products, giving you a clearer sense of how these tools evolve and where they might be headed next. Whether you're deeply embedded in the ecosystem as an admin or platform owner, or simply curious about how complex digital services are orchestrated, this podcast provides context and clarity you can't find elsewhere. Tune in for a grounded, technical, and genuinely insightful look at the forces driving innovation in this space, straight from the source.
Author: Language: en-us Episodes: 100

ServiceNow Insights 
Podcast Episodes
AI Control Tower - Governing AI at Scale with ServiceNow [not-audio_url] [/not-audio_url]

Duration: 18:33
AI governance at scale — what it means, how to do it, and what regulations you need to know now. Host Bobby Brill brings together five ServiceNow experts across two conversations for a complete 20-minute briefing on gove…
AGENTIC AI - The Future of Work and the Agents Building It [not-audio_url] [/not-audio_url]

Duration: 23:05
What is Agentic AI — and what can it actually do for your business? In this episode of the podcast, host Bobby Brill brings together three conversations with the people building Agentic AI at ServiceNow into one 20-minut…
The Human in the Loop | Ethical AI with Di Le [not-audio_url] [/not-audio_url]

Duration: 29:03
The Human in the Loop | Ethical AI with Di Le ServicveNow Insights Podcast - hosted By Bobby Brill What does it actually mean to build AI responsibly? Not the buzzword version. The real version. In our latest episode, I…
Exploring Leadership and AI with Anand Tharanathan [not-audio_url] [/not-audio_url]

Duration: 39:45
Exploring Leadership and AI with Anand Tharanathan In this engaging episode of our podcast, we delve into the world of leadership and Artificial Intelligence with Anand Tharanathan, GVP of Product Research and Insights a…
AI-Driven Innovation with Shruti Shrivastava and Averria Martin [not-audio_url] [/not-audio_url]

Duration: 26:48
In this episode of the podcast, host Bobby Brill takes a break from hosting duties and introduces Shruti Shrivastava, Director of UX Research at ServiceNow in Bangalore, India. Shruti takes the reigns for this episode an…
Narrative Analytics with Agent Ada [not-audio_url] [/not-audio_url]

Duration: 14:45
Sitting down with Amrutha Ramesh, visiting researcher at ServiceNow one of the minds behind Agent Ada, a data-focused AI agent, in the latest episode of the podcast. We talk about the gap in enterprise workflows: you upl…
AI Regulations Explained: EU AI Act, Colorado Law, and NIST Framework [not-audio_url] [/not-audio_url]

Duration: 19:02
Join host Bobby Brill as he sits down with ServiceNow's AI legal and governance experts to break down the complex world of AI regulations. Andrea LaFontain (Director of AI Legal), Ken Miller (Senior Director of Product L…