SciPy 2024

From RAGs to riches: Build an AI document inquiry web-app
07-09, 08:00–12:00 (US/Pacific), Ballroom B/C

As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document inquiry systems have emerged as a high-value practical use case. Retrieval-Augmented Generation (RAG) is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate.

In this hands-on tutorial, we’ll dive into RAG by creating a personal chat app that accurately answers questions about your selected documents. We’ll use a new OSS project called Ragna that provides a friendly Python and REST API, designed for this particular case. We’ll test the effectiveness of different LLMs and vector databases, including an offline LLM (i.e., local LLM) running on GPUs on the cloud-machines provided to you. We'll then develop a web application that leverages the REST API, built with Panel–a powerful OSS Python application development framework.

The ability to ask natural language questions and get relevant and accurate answers from a large corpus of documents can fundamentally transform organizations and make institutional knowledge accessible. Foundational LLM models like OpenAI’s GPT4 provide powerful capabilities, but using them directly to answer questions about a collection of documents presents accuracy-related limitations. Retrieval-augmented generation (RAG) is the leading approach to enhancing the capabilities and usability of Large Language Models.

In this tutorial, we will learn to use RAG to build document-inquiry chat systems using different commercial and locally running LLMs. The topics we’ll cover include:

  • Introduction to RAG, how it works and interacts with LLMs, and Ragna - a framework for RAG orchestration
  • Creating and optimizing a basic chat function that uses popular LLMs (like GPT) answers questions about your documents, using a Python API in Jupyter Notebooks
  • Running a local LLM on GPUs on the provided platform, and comparing its performance to commercial LLMs
  • Introduction to Panel, by creating a basic chat UI for Ragna using Panel’s ChatBox widget
  • Building and deploying a Panel-based web-app, which extends the basic chat UI and includes more application components

By the end of this tutorial, you will have an understanding of the fundamental components that form a RAG model, and practical knowledge of open source tools that can help you or your organization explore and build on your own applications. This tutorial is designed to enable enthusiasts in our community to explore an interesting topic using some beginner-friendly Python libraries.


Participants will need a beginner-intermediate level understanding of Python to follow along. The tutorial material will be in the form of Jupyter Notebooks, so a basic understanding of the notebook interface is nice to have. If participants want to run the tutorial materials locally (which is not necessary because they will have access to a cloud platform), a fundamental understanding of the command line interface, git-based version control, and packaging tools like pip and conda will be helpful.

Pavithra Eswaramoorthy is a Developer Advocate at Quansight, where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she contributed to the Bokeh visualization library, and contributes to the Nebari (adjacent to the Jupyter community), conda-store (part of the conda ecosystem), and Ragna (a RAG orchestration framework) projects. Pavithra has been involved in the open source community for over 5 years, notable as a maintainer of the Dask library and an administrator for Wikimedia’s OSS programs. In her spare time, she enjoys a good book and hot coffee. :)

This speaker also appears in:

Dharhas Pothina is the CTO at Quansight where he helps clients wrangle their data using the PyData stack. His background includes expertise in computational modeling, big data/high performance computing, visualization, and geospatial analysis. He has been part of the Holoviz (HvPlot) and Dask communities for over 10 years and has given many talks and workshops on distributed computing and big data visualization and actively leads large-scale data science projects at Quansight.

This speaker also appears in:

I'm an atmospheric scientist, python developer, and open source contributor working on the HoloViz ecosystem.

I am the lead developer of the Panel chat components to easily build an interface for interacting with Large Language Models (LLMs). I have shared applicable examples of integrating Panel chat components with LangChain, OpenAI, Mistral, LlamaCpp on

Connect with me on