Polyglot RAG: Building a Multimodal, Multilingual, and Agentic AI Assistant SciPy 2025

Polyglot RAG: Building a Multimodal, Multilingual, and Agentic AI Assistant
.ical

2025-07-10 11:25–11:55, Ballroom

AI assistants are evolving from simple Q&A bots to intelligent, multimodal, multilingual, and agentic systems capable of reasoning, retrieving, and autonomously acting. In this talk, we’ll showcase how to build a voice-enabled, multilingual, multimodal RAG (Retrieval-Augmented Generation) assistant using Gradio, OpenAI’s Whisper, LangChain, LangGraph, and FAISS. Our assistant will not only process voice and text inputs in multiple languages but also intelligently retrieve information from structured and unstructured data. We’ll demonstrate this with a flight search use case—leveraging a flight database for retrieval and, when necessary, autonomously searching external sources using LangGraph. You will gain practical insights into building scalable, adaptive AI assistants that move beyond static chatbots to autonomous agents that interact dynamically with users and the web.

This session will be a deep dive into building a next-gen AI assistant that goes beyond static RAG implementations by integrating voice, multiple languages, and agentic capabilities. We’ll walk through key concepts, architectures, and code implementations, demonstrating live how to build a fully interactive chatbot that can handle multimodal inputs (voice & text), respond in multiple languages, and autonomously retrieve information beyond its dataset.
Outline (30 minutes)

Introduction & Problem Statement (5 min)
    What are the limitations of traditional chatbots?
    Why multimodal, multilingual, and agentic capabilities matter
    Overview of our AI assistant demo

Tech Stack & Architecture Overview (5 min)
    Using Gradio for the UI (voice + text inputs, multilingual response)
    Leveraging Whisper for voice input processing and TTS for voice responses
    Implementing RAG with a vector database for retrieval
    Introducing LangGraph for agentic workflows

Hands-On: Building the Core RAG Assistant (8 min)
    Implementing retrieval over structured and unstructured data
    Connecting to a flight database for real-time search
    Handling multilingual queries with embeddings & tokenization

Adding Agentic Capabilities with LangGraph (8 min)
    Enabling the assistant to autonomously search flights online if not found in the database
    Creating dynamic workflows for retrieval + API calling
    Showcasing the agent’s ability to take actions beyond static RAG responses

Live Demo: A Fully Functional AI Assistant (3 min)
    Interactive multilingual, multimodal Q&A session
    Testing voice queries, text-based queries, and real-time flight searches

Takeaways & Future Improvements (1 min)
    How to extend the system for other industries & use cases
    The next steps for building even more autonomous AI systems

Axel Sirota

Axel Sirota is an experienced AI leader, engineer, educator and consultant for global technology organizations.

For close to 15 years, Axel has been on the forefront of AI. He founded and grew a successful training consultancy, teaching AI, GenAI, and Python to Fortune 500 companies such as Intuit,
Salesforce, Barclays, Netflix, Apple, and Yahoo.
Axel has been a keynote speaker throughout South America and is known as one of the leading voices on AI Safety and AI technologies. He is eager to work with and support leaders around the world make thoughtful decisions around AI policy and how it impacts society.

Axel’s passion for education is evident in his 50+ published online courses across Pluralsight, O’Reilly Media, and LinkedIn Learning.

Axel received his Master’s Degree in Mathematical Sciences - Probability and Statistics from Universidad de Buenos Aires, Buenos Aires.

Polyglot RAG: Building a Multimodal, Multilingual, and Agentic AI Assistant .ical 2025-07-10 11:25–11:55, Ballroom

Polyglot RAG: Building a Multimodal, Multilingual, and Agentic AI Assistant
.ical

2025-07-10 11:25–11:55, Ballroom