SciPy 2024

Don Setiawan

Don Setiawan is a Senior Research Software Engineer at the University of Washington, eScience Institute, Scientific Software Engineering Center (SSEC). He has expertise in Python programming, web development, geospatial data analytics, and cloud-based data engineering. He is interested in building scalable, open software to facilitate scientific discovery across fields and enforce software best practices. He has been involved with various open-source software projects with Ocean Observatory Initiative (OOI), U.S. Integrated Ocean Observing System (IOOS), National Oceanic and Atmospheric Administration (NOAA), and National Aeronautics and Space Administration (NASA).


Sessions

07-08
13:30
240min
Xarray: Friendly, Interactive, and Scalable Scientific Data Analysis
Scott Henderson, Don Setiawan, Tom Nicholas, Wietze Suijker, Jessica Scheick, Max Jones, Luis Lopez, Negin Sobhani

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This hands-on tutorial focuses on intermediate and advanced workflows using complex real-world data. We encourage participants in this workshop to bring your own dataset as we will dedicate ample time to apply tutorial concepts to datasets of interest!

Tutorials
Ballroom B/C
07-09
13:30
240min
Generative AI Copilot for Scientific Software – a RAG-Based Approach using OLMo
Vani Mandava, Cordero Core, Don Setiawan, Niki Burggraf, Anant Mittal, Anshul Tambay, Madhav Kashyap, Anuj Sinha, Ishika Khandelwal

Generative AI systems built upon large language models (LLMs) have shown great promise as tools that enable people to access information through natural conversation. Scientists can benefit from the breakthroughs these systems enable to create advanced tools that will help accelerate their research outcomes. This tutorial will cover: (1) the basics of language models, (2) setting up the environment for using open source LLMs without the use of expensive compute resources needed for training or fine-tuning, (3) learning a technique like Retrieval-Augmented Generation (RAG) to optimize output of LLM, and (4) build a “production-ready” app to demonstrate how researchers could turn disparate knowledge bases into special purpose AI-powered tools. The right audience for our tutorial is scientists and research engineers who want to use LLMs for their work.

Tutorials
Ballroom D
07-11
14:20
30min
Echostack: A flexible and scalable open-source software suite for echosounder data processing
Wu-Jung Lee, Dingrui Lei, Brandyn Lucca, CaesarTuguinay, Valentina Staneva, Don Setiawan, Soham Kishor Butala

Water column sonar data collected by echosounders are essential for fisheries and marine ecosystem research, enabling the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. However, the broad usage of these data has been hindered by the lack of modular software tools that allow flexible composition of data processing workflows that incorporate powerful analytical tools in the scientific Python ecosystem. We address this gap by developing Echostack, a suite of open-source Python software packages that leverage existing distributed computing and cloud-interfacing libraries to support intuitive and scalable data access, processing, and interpretation. These tools can be used individually or orchestrated together, which we demonstrate in example use cases for a fisheries acoustic-trawl survey.

Earth, Ocean, Geo, and Atmospheric Science
Room 315