EVA - A Framework for Evaluating Voice Agents by ServiceNow

Author: ServiceNow Community April 29, 2026 Duration: 29:37

Technology

Voice AI agent evaluation — why it's fundamentally harder than text, how cascade failures derail conversations invisibly, and ServiceNow's open-source framework to establish industry evaluation standards. Featuring real audio examples showing authentication failures, leaked reasoning, and latency problems.

WHAT WE COVER

TARA BOGAVELLI — Research Engineer, ServiceNow
Leading the open-source voice agent evaluation framework. Explains why existing benchmarks don't measure what matters and what ServiceNow is releasing to establish industry standards.

KATRINA STANKIEWICZ — Staff Machine Learning Engineer, ServiceNow
Cascade model architecture expert. Breaks down STT → LLM → TTS failure modes, named entity transcription challenges, and real audio example analysis.

GABRIELLE GAUTHIER MELANÇON — Staff Applied Research Scientist, ServiceNow
Multi-language evaluation specialist. Reveals why Large Audio Language Models lag behind, the native speaker requirement, and bot-to-bot simulation methodology.

CHAPTERS
0:00 Introduction — The evaluation gap
1:11 ServiceNow's Open-Source Framework Announcement — Tara Bogavelli
2:43 Meet the Researchers
3:43 Voice-Specific Challenges — Tara Bogavelli
5:03 Cascade Architecture: STT → LLM → TTS — Katrina Stankiewicz
7:57 The Named Entity Problem — Katrina Stankiewicz
10:06 Evaluation Metrics: Accuracy vs Experience — Gabrielle Gauthier Melançon
11:23 Bot-to-Bot Testing at Scale — Gabrielle Gauthier Melançon
14:30 The LALM Gap: Why Audio AI Judges Struggle — Tara Bogavelli
16:57 Real Audio Example: Flight Rebooking Gone Wrong
21:58 Breaking Down the Failures — Katrina Stankiewicz 28:30 Wrap-Up & Resources

KEY INSIGHTS

The Cascade Failure Problem: STT → LLM → TTS errors propagate invisibly Named Entity Transcription: The #1 enterprise blocker—names, confirmation codes, emails break authentication Accuracy vs Experience: Perfect task completion means nothing if users hang up due to poor experience LALM Gap: Large Audio Language Models lag behind text LLMs—human evaluators remain essential Latency Kills Conversations: Five-second pauses make users think the call dropped, breaking the experience even when tasks complete Open-Source Framework: ServiceNow releasing evaluation tools, metrics, and bot-to-bot simulation methodology for the industry.

LEARN MORE

Website: https://servicenow.github.io/eva/ GitHub:
https://github.com/servicenow/eva Blog Post:
https://huggingface.co/blog/ServiceNow-AI/eva Dataset: https://huggingface.co/datasets/ServiceNow-AI/eva

ABOUT

Hosted by Bobby Brill. ServiceNow Insights podcast explores AI research, real-world applications, and the people building the future of work. #VoiceAI #AIEvaluation #ServiceNow #MachineLearning #OpenSource #ConversationalAI #STT #TTS #LLM #VoiceAgents #AIResearch #Podcast

See omnystudio.com/listener for privacy information.

ServiceNow Insights

Ever wondered what happens behind the scenes of the platform that powers so much of today's enterprise workflow? ServiceNow Insights pulls back the curtain, offering a direct line to the minds building and shaping the technology. This isn't a series of polished press releases; it's a collection of genuine conversations with the engineers, product managers, and innovators from the ServiceNow Community who are doing the actual work. You'll hear the unscripted stories behind the latest features-not just what they do, but the real-world problems they aim to solve and the interesting challenges encountered along the way. Each episode delves into the practical implications of new updates and products, giving you a clearer sense of how these tools evolve and where they might be headed next. Whether you're deeply embedded in the ecosystem as an admin or platform owner, or simply curious about how complex digital services are orchestrated, this podcast provides context and clarity you can't find elsewhere. Tune in for a grounded, technical, and genuinely insightful look at the forces driving innovation in this space, straight from the source.

Author: ServiceNow Community Language: en-us Episodes: 100

Official website RSS

Podcast Episodes

[not-audio_url]

[/not-audio_url]

Being AI Native at ServiceNow

28.05.2026

Duration: 24:08

What does it actually mean to be AI native? Not the buzzword — the real thing. Host Bobby Brill brings together seven ServiceNow experts across six conversations for a complete picture of what AI native thinking, buildin…

[not-audio_url]

[/not-audio_url]

Day One Ready: What New Engineers Need to Know About AI — Engineering Now Unlocked

13.05.2026

Duration: 29:33

Day One Ready: What New Engineers Actually Need to Know About AI | Engineering Now Unlocked Starting your first engineering role — or coming back for a return offer — and wondering what AI actually changes about the job?…

[not-audio_url]

[/not-audio_url]

AI Control Tower - Governing AI at Scale with ServiceNow

15.04.2026

Duration: 18:33

AI governance at scale — what it means, how to do it, and what regulations you need to know now. Host Bobby Brill brings together five ServiceNow experts across two conversations for a complete 20-minute briefing on gove…

[not-audio_url]

[/not-audio_url]

AGENTIC AI - The Future of Work and the Agents Building It

02.04.2026

Duration: 23:05

What is Agentic AI — and what can it actually do for your business? In this episode of the podcast, host Bobby Brill brings together three conversations with the people building Agentic AI at ServiceNow into one 20-minut…

[not-audio_url]

[/not-audio_url]

The Hidden Cost of How Your Business Actually Works | Process Mining with Dan Grady

18.03.2026

Duration: 23:44

Your dashboards show you what's happening. Process mining shows you why — and what to do about it. In this episode, Bobby Brill sits down with Dan Grady to demystify process mining: what it is, how it works within the Se…

[not-audio_url]

[/not-audio_url]

The Human in the Loop | Ethical AI with Di Le

04.03.2026

Duration: 29:03

The Human in the Loop | Ethical AI with Di Le ServicveNow Insights Podcast - hosted By Bobby Brill What does it actually mean to build AI responsibly? Not the buzzword version. The real version. In our latest episode, I…

[not-audio_url]

[/not-audio_url]

UTG Unlocked: AI Careers, Partnering with AI, and Understanding Global Cloud Services at ServiceNow

19.02.2026

Duration: 43:39

AI is transforming how businesses operate and how early-career talent grows. In this episode, UTG Unlocked, Mark Stockford (GVP, Global Cloud Operations) and Alyssa Gerhart (former intern, now full-time employee) share h…

[not-audio_url]

[/not-audio_url]

Exploring Leadership and AI with Anand Tharanathan

24.12.2025

Duration: 39:45

Exploring Leadership and AI with Anand Tharanathan In this engaging episode of our podcast, we delve into the world of leadership and Artificial Intelligence with Anand Tharanathan, GVP of Product Research and Insights a…

[not-audio_url]

[/not-audio_url]

AI-Driven Innovation with Shruti Shrivastava and Averria Martin

03.12.2025

Duration: 26:48

In this episode of the podcast, host Bobby Brill takes a break from hosting duties and introduces Shruti Shrivastava, Director of UX Research at ServiceNow in Bangalore, India. Shruti takes the reigns for this episode an…

[not-audio_url]

[/not-audio_url]

Narrative Analytics with Agent Ada

20.11.2025

Duration: 14:45

Sitting down with Amrutha Ramesh, visiting researcher at ServiceNow one of the minds behind Agent Ada, a data-focused AI agent, in the latest episode of the podcast. We talk about the gap in enterprise workflows: you upl…