07-10, 11:25–11:55 (US/Pacific), Ballroom
AI assistants are evolving from simple Q&A bots to intelligent, multimodal, multilingual, and agentic systems capable of reasoning, retrieving, and autonomously acting. In this talk, we’ll showcase how to build a voice-enabled, multilingual, multimodal RAG (Retrieval-Augmented Generation) assistant using Gradio, OpenAI’s Whisper, LangChain, LangGraph, and FAISS. Our assistant will not only process voice and text inputs in multiple languages but also intelligently retrieve information from structured and unstructured data. We’ll demonstrate this with a flight search use case—leveraging a flight database for retrieval and, when necessary, autonomously searching external sources using LangGraph. You will gain practical insights into building scalable, adaptive AI assistants that move beyond static chatbots to autonomous agents that interact dynamically with users and the web.
This session will be a deep dive into building a next-gen AI assistant that goes beyond static RAG implementations by integrating voice, multiple languages, and agentic capabilities. We’ll walk through key concepts, architectures, and code implementations, demonstrating live how to build a fully interactive chatbot that can handle multimodal inputs (voice & text), respond in multiple languages, and autonomously retrieve information beyond its dataset.
Outline (30 minutes)
Introduction & Problem Statement (5 min)
What are the limitations of traditional chatbots?
Why multimodal, multilingual, and agentic capabilities matter
Overview of our AI assistant demo
Tech Stack & Architecture Overview (5 min)
Using Gradio for the UI (voice + text inputs, multilingual response)
Leveraging Whisper for voice input processing and TTS for voice responses
Implementing RAG with a vector database for retrieval
Introducing LangGraph for agentic workflows
Hands-On: Building the Core RAG Assistant (8 min)
Implementing retrieval over structured and unstructured data
Connecting to a flight database for real-time search
Handling multilingual queries with embeddings & tokenization
Adding Agentic Capabilities with LangGraph (8 min)
Enabling the assistant to autonomously search flights online if not found in the database
Creating dynamic workflows for retrieval + API calling
Showcasing the agent’s ability to take actions beyond static RAG responses
Live Demo: A Fully Functional AI Assistant (3 min)
Interactive multilingual, multimodal Q&A session
Testing voice queries, text-based queries, and real-time flight searches
Takeaways & Future Improvements (1 min)
How to extend the system for other industries & use cases
The next steps for building even more autonomous AI systems
Axel Sirota is an experienced AI leader, engineer, educator and consultant for global technology organizations.
For close to 15 years, Axel has been on the forefront of AI. He founded and grew a successful training consultancy, teaching AI, GenAI, and Python to Fortune 500 companies such as Intuit,
Salesforce, Barclays, Netflix, Apple, and Yahoo.
Axel has been a keynote speaker throughout South America and is known as one of the leading voices on AI Safety and AI technologies. He is eager to work with and support leaders around the world make thoughtful decisions around AI policy and how it impacts society.
Axel’s passion for education is evident in his 50+ published online courses across Pluralsight, O’Reilly Media, and LinkedIn Learning.
Axel received his Master’s Degree in Mathematical Sciences - Probability and Statistics from Universidad de Buenos Aires, Buenos Aires.