SciPy 2024

Soham Kishor Butala

I'm Soham, a Data Science Graduate from the University of Washington. With four years of diverse experience at Deloitte and AWS, I've delved into software engineering, data engineering, and application security. I'm deeply passionate about Data Engineering and always eager to embrace new technologies. Beyond the screen and code, I find solace in the great outdoors; hiking is not just an activity for me but a way to rejuvenate my spirit. And when it comes to mental exercises, who can resist the allure of a thrilling game of chess? Looking forward to connecting and exploring the vast horizons of technology and beyond.

The speaker's profile picture

Sessions

07-11
14:20
30min
Echostack: A flexible and scalable open-source software suite for echosounder data processing
Don Setiawan, CaesarTuguinay, Soham Kishor Butala, Brandyn Lucca, Valentina Staneva, Wu-Jung Lee, Dingrui Lei

Water column sonar data collected by echosounders are essential for fisheries and marine ecosystem research, enabling the detection, classification, and quantification of fish and zooplankton from many different ocean observing platforms. However, the broad usage of these data has been hindered by the lack of modular software tools that allow flexible composition of data processing workflows that incorporate powerful analytical tools in the scientific Python ecosystem. We address this gap by developing Echostack, a suite of open-source Python software packages that leverage existing distributed computing and cloud-interfacing libraries to support intuitive and scalable data access, processing, and interpretation. These tools can be used individually or orchestrated together, which we demonstrate in example use cases for a fisheries acoustic-trawl survey.

Earth, Ocean, Geo, and Atmospheric Science
Room 315
0min
Prefect Workflows for Scaling Scientific Data Pipelines
Valentina Staneva, Soham Kishor Butala, Don Setiawan, Wu-Jung Lee

With the influx of large data from multiple instruments and experiments, scientists are wrangling complex data pipelines that are context-dependent and non-reproducible. In this talk, we will share our experience leveraging the Prefect orchestration framework to allow scientists and data managers without cyberinfrastructure experience to execute complex data workflows on a variety of local and cloud platforms by editing existing recipes. We hope this will serve as a guide to others embarking on streamlining workflows through Prefect or simply wanting to see how modern orchestration tools can be applied in the scientific context.

General