SciPy 2024

Dharhas Pothina

Dharhas Pothina is the CTO at Quansight where he helps clients wrangle their data using the PyData stack. His background includes expertise in computational modeling, big data/high performance computing, visualization, and geospatial analysis. He has been part of the Holoviz (HvPlot) and Dask communities for over 10 years and has given many talks and workshops on distributed computing and big data visualization and actively leads large-scale data science projects at Quansight.

The speaker's profile picture

Sessions

07-09
08:00
240min
From RAGs to riches: Build an AI document inquiry web-app
Pavithra Eswaramoorthy, Dharhas Pothina, Andrew Huang

As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document inquiry systems have emerged as a high-value practical use case. Retrieval-Augmented Generation (RAG) is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate.

In this hands-on tutorial, we’ll dive into RAG by creating a personal chat app that accurately answers questions about your selected documents. We’ll use a new OSS project called Ragna that provides a friendly Python and REST API, designed for this particular case. We’ll test the effectiveness of different LLMs and vector databases, including an offline LLM (i.e., local LLM) running on GPUs on the cloud-machines provided to you. We'll then develop a web application that leverages the REST API, built with Panel–a powerful OSS Python application development framework.

Tutorials
Ballroom B/C
07-09
13:30
240min
Data of an Unusual Size (2024 edition): A practical guide to analysis and interactive visualization of massive datasets
Pavithra Eswaramoorthy, Dharhas Pothina

While most scientists aren't at the scale of black hole imaging research teams that analyze Petabytes of data every day, you can easily fall into a situation where your laptop doesn't have quite enough power to do the analytics you need.

In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud provided by the presenters – starting from how the data is stored and read, to how it is processed and visualized.

Tutorials
Ballroom A
0min
A cloud-based interactive and scalable approach for estimating forest disturbance using SAR imagery
Marcelo Villa, Dharhas Pothina, Josef Kellndorfer

Using Synthetic Aperture Radar (SAR) imagery, packages from the scientific Python community, and Nebari, a cloud-based open-source data science platform, we present a scalable algorithm to generate the forest disturbance products for the upcoming NASA/ISRO Synthetic Aperture Radar (NISAR) mission. By analyzing backscatter time series, we identify changes in forest cover over time. We leverage Dask and Kubernetes for distributed computation, the Zarr format for efficient storage, reading, and transfer of chunked data, Xarray for multi-dimensional data analysis, and HoloViz tools for visualization. We showcase a modern approach to Earth Observation, emphasizing the ease of collaboration and scalability without requiring deep expertise in cloud technologies.

Earth, Ocean, Geo, and Atmospheric Science