Valentina Staneva SciPy 2024

Valentina Staneva
.ical

Valentina Staneva is a Senior Data Scientist and Data Science Fellow at the eScience Institute, Paul G. Allen School of Computer Science & Engineering, University of Washington. As part of her role she collaborates with researchers from a wide range of domains on extracting information from large data sets of various modalities, such as time series, images, videos, audio, text, etc. She is involved in data science education for audiences at broad level of experience, and regularly teaches workshops on introductory and advanced topics. She supports open science and reproducible research, and strives to help others adopt better data science workflows.

Sessions

07-08

13:30

240min

Github Actions for Scientific Data Workflows

Valentina Staneva, Quinn Brencher

In this tutorial we will introduce Github Actions to scientists as a tool for lightweight automation of scientific data workflows. We will demonstrate that GitHub Actions are not just a tool for software testing, but can be used in various ways to improve the reproducibility and impact of scientific analysis. Through a sequence of examples, we will demonstrate some of Github Actions' applications to scientific workflows, such as scheduled deployment of algorithms to sensor streams, updating visualizations based on new data, processing large datasets, model versioning and performance benchmarking. GitHub Actions can particularly empower Python scientific programmers who are not willing to build fully-fledged applications or set up complex computational infrastructure, but would like to increase the impact of their research. The goal is that participants will leave with their own ideas of how to integrate Github Actions in their own work.

Echostack: A flexible and scalable open-source software suite for echosounder data processing

Don Setiawan, CaesarTuguinay, Soham Kishor Butala, Brandyn Lucca, Valentina Staneva, Wu-Jung Lee, Dingrui Lei

Water column sonar data collected by echosounders are essential for fisheries and marine ecosystem research, enabling the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. However, the broad usage of these data has been hindered by the lack of modular software tools that allow flexible composition of data processing workflows that incorporate powerful analytical tools in the scientific Python ecosystem. We address this gap by developing Echostack, a suite of open-source Python software packages that leverage existing distributed computing and cloud-interfacing libraries to support intuitive and scalable data access, processing, and interpretation. These tools can be used individually or orchestrated together, which we demonstrate in example use cases for a fisheries acoustic-trawl survey.

Earth, Ocean, Geo, and Atmospheric Science

Room 315

Valentina Staneva .ical

Sessions

Valentina Staneva
.ical