AI Insights - Ep.3: Rethinking AI Performance Metrics

In the latest episode of the Cisco AI Insights po…

March 26, 202627m 26s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

In the latest episode of the Cisco AI Insights podcast, hosts Rafael Herrera and Sonia Marques are joined by Dr. Catarina Carvalho, a Cisco leader in machine learning engineering. Together, they unpack the complex academic paper " Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following," developed by researchers from the University of Maryland and the University of Waterloo. As the industry moves toward more reliable multimodal models, traditional pass-or-fail evaluation is no longer sufficient. This paper introduces a hierarchical framework that uses "LLM-as-a-judge" to evaluate outputs across five distinct criteria: visual grounding, logical coherence, factuality, reflection, and conciseness. Dr. Carvalho guides the discussion through the nuances of this "judge of judges" approach, exploring why human alignment remains the gold standard even as we automate evaluation processes. A special thank you to the teams at both The University of Waterloo and The University of Maryland, College Park, for developing this month's paper. If you are interested in reading the paper yourself, please visit this link: https://arxiv.org/pdf/2511.21662.

← All episodes of Cisco Podcast Network