07-08, 08:00–12:00 (US/Pacific), Room 316
This tutorial walks participants — Earth scientists with some prior Python experience — through analyses of two particular climate risk scenarios: floods & wildfires. The goal is to obtain hands-on experience with common reproducible Jupyter/Python workflows based on data products from the NASA Earthdata Cloud. The case studies highlight the interplay of distributed data with scalable numerical strategies — "data-proximate computing" — implemented using scientific Python libraries like NumPy, Pandas, & Xarray. This tutorial — co-developed by 2i2c and MetaDocencia — constitutes part of NASA's Transform to Open Science (TOPS) initiative to reinforce principles of Open Science & reproducibility.
Predicting and managing environmental risks of various climate-related disasters — e.g., wildfires, drought, and floods — is challenging and critical worldwide. Part of the difficulty is that historical (e.g., from last century) norms for the frequency of such extreme climate events are no longer sufficient to infer the frequency of future disasters. These natural risks are intrinsically linked to the dynamic distributions — varying both temporally and spatially — of surface water, precipitation, vegetation, and land use. These distributions can be modelled for forecasting and analysis (enabling quantification of these environmental risks) using hundreds of petabytes of relevant Earth science data available through the NASA Earthdata Cloud. With the dramatic growth in the availability of such data, today's earth scientists benefit from a strong understanding of open science practices and of cloud-based data intensive computing that enable reproducibly analyzing and assessing changing risk profiles.
This tutorial comprises a walk-through of two such environmental risk scenarios — floods & wildfires — constructing in each case a reproducible analysis using cloud-based infrastructure and data. That is, participants will have an opportunity to identify, extract, analyze, visualize, and generate a report using data available through the NASA EarthData Cloud in each case. Computationally, the scenarios rely on constructing quantitative estimates of changes in hydrological water mass balance over various defined regions of interest. The goal is for participants to build enough familiarity with generic cloud-based Jupyer/Python workflows and with remote-sensing data that they can adapt and remix the examples for other region-specific contexts. Such environmental risks are common worldwide but are simultaneously difficult to analyze and to assess in regionally appropriate ways suitable to a particular user's open science objectives.
The concrete examples presented showcase how to build expertise using relevant tools and data and how open science can be done. The intention is to demonstrate best practices in "data-proximate computing"; the examples involve computing climatologies and other statistics from long-time series using numerical techniques that scale well with cloud-distributed data. To do so, participants will practise computing with scientific Python libraries — e.g., Rasterio, Xarray, etc. — while focusing on data processing and visualization in a reproducible and transparent way.
This tutorial — co-developed by MetaDocencia & 2i2c — is part of NASA's Open Science and Transform to Open Science (TOPS) initiatives. An important goal is to reinforce principles of reproducibility and open science-based workflows (as exemplified in TOPS OpenCore, the introductory suite of open science curricula including Open Science 101).
This tutorial is designed with the expectation that participants have at least some familiarity with raster data & similar common geospatial data conventions. Ideally, they are comfortable using a shell or a command-line interface to interact with data & programs. They are also comfortable using Jupyter notebooks and writing short snippets of Python code. There will be a brief overview of Xarray, Hvplot, & Geoviews; prior exposure to those Python tools is useful but not mandatory.
Installation Instructions –Attendees will find installation instructions in this repository: https://github.com/ScienceCore/scipy-2024-climaterisk
Dhavide Aruliah has been teaching & mentoring both in academia and in industry for three decades. His career has grown around bringing learners from where they are to where they need to be mathematically & computationally. He was a university professor (Applied Mathematics & Computer Science) at Ontario Tech University before moving to industry where he oversaw training programs supporting the PyData stack at Anaconda Inc. and later at Quansight LLC. He has taught over 40 undergraduate- & graduate-level courses at five Canadian universities as well as numerous Software Carpentry & PyData tutorial workshops. Video examples of his teaching include:
Karthik Venkataramani is a postdoctoral scholar working in the Civil and Environmental Engineering department and the eScience institute at the University of Washington, Seattle. Dr. Venkataramani's research work focuses on developing machine learning tools and models for geospatial applications, and he is currently working on refining Digital Elevation Models (DEMs) using deep learning approaches. Prior to this, Dr. Venkataramani worked as a Postdoctoral Researcher at the NASA Jet Propulsion Laboratory on the Observational Products for End-Users from Remote Sensing Analysis project, which generates a near-global suite of analysis ready data products from synthetic aperture radar (SAR) and optical data. Dr. Venkataramani received his MS and PhD in Electrical and Computer Engineering from Virginia Tech.
GitHub: https://github.com/kvenkman
LinkedIn: https://www.linkedin.com/in/karthikvenkataramani/
Since 2021, Patricia Loto collaborate on various projects as a member the Metadocencia accessibility team, including the Science Core Bilingual Development project and the Mapping of Communities, Organizations, and Open Science Resources in Latin America. She holds a Bachelor's degree in Information Systems and a Diploma in Data Science, Machine Learning, & its Applications from the FAMAF of the National University of Cordoba. She has taught computational tools and data analysis at varying levels — i.e., for researchers, students, and even people with no formal programming background – at numerous workshops, conferences, and at the Department of Statistical Calculus and Biometry at the Faculty of Agrarian Sciences of the National University of the Northeast. She is also certified to teach programming by The Carpentries and a Tidyverse Instructor by Rstudio. She enjoys learning in community and is an active member of communities such as R-Ladies, the Carpentries, Latin-R, OLS, and The Turing Way, where she contributes and learns from others.
• Machine Learning with Tidymodels in LatinR: https://www.youtube.com/watch?v=1ATHGwDPXQs
• First Steps in R: https://www.youtube.com/watch?v=plE4owAKYNA
• Linkedin: https://www.linkedin.com/in/patricia-loto/
• Website: https://patricia-loto.netlify.app/
• Github: https://github.com/PatriLoto