{"schedule": {"version": "0.42", "base_url": "https://cfp.scipy.org/2023/schedule/", "conference": {"acronym": "2023", "title": "SciPy 2023", "start": "2023-07-10", "end": "2023-11-10", "daysCount": 124, "timeslot_duration": "00:05", "time_zone_name": "America/Chicago", "rooms": [{"name": "Classroom 106", "guid": null, "description": null, "capacity": 65}, {"name": "Classroom 101", "guid": null, "description": null, "capacity": 60}, {"name": "Classroom 202", "guid": null, "description": null, "capacity": 65}, {"name": "Classroom 203", "guid": null, "description": null, "capacity": 50}, {"name": "Enthought - 200 W Cesar Chavez St", "guid": null, "description": null, "capacity": null}, {"name": "Amphitheater 204", "guid": null, "description": null, "capacity": null}, {"name": "Grand Salon C", "guid": null, "description": null, "capacity": null}, {"name": "Zlotnik Ballroom", "guid": null, "description": null, "capacity": null}, {"name": "Classroom 103", "guid": null, "description": null, "capacity": 50}, {"name": "Classroom 104", "guid": null, "description": null, "capacity": 50}, {"name": "Classroom 105", "guid": null, "description": null, "capacity": 50}], "days": [{"index": 1, "date": "2023-07-10", "day_start": "2023-07-10T04:00:00-05:00", "day_end": "2023-07-11T03:59:00-05:00", "rooms": {"Classroom 106": [{"id": 100, "guid": "469ae23c-2ab8-54fa-be80-2c7eb043ec9c", "logo": "/media/2023/submissions/DDJTZL/scipy-full-stack-ML_BYOf5z5.png", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 106", "slug": "2023-100-full-stack-machine-learning-for-data-scientists", "url": "https://cfp.scipy.org/2023/talk/DDJTZL/", "title": "Full-stack Machine Learning for Data Scientists", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this workshop, we present an introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows.", "description": "One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this workshop, we present an introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows.\r\n\r\nWe\u2019ll present a high-level overview of the 8 layers of the ML stack: data, compute, versioning, orchestration, software architecture, model operations, feature engineering, and model development. We\u2019ll present a schematic as to which layers data scientists need to be thinking about and working with, and then introduce attendees to the tooling and workflow landscape. In doing so, we\u2019ll present a widely applicable stack that provides the best possible user experience for data scientists, allowing them to focus on parts they like (modeling using their favorite off-the-shelf libraries) while providing robust built-in solutions for the foundational infrastructure.", "recording_license": "", "do_not_record": false, "persons": [{"id": 123, "code": "HMACYR", "public_name": "Hugo Bowne-Anderson", "biography": "Hugo Bowne-Anderson is Head of Developer Relations at Outerbounds, a company committed to building infrastructure that provides a solid foundation for machine learning applications of all shapes and sizes. He is also host of the industry podcast Vanishing Gradients. Hugo is a data scientist, educator, evangelist, content marketer, and data strategy consultant, with extensive experience at Coiled, a company that makes it simple for organizations to scale their data science seamlessly, and DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry. He has developed over 30 courses on the DataCamp platform, impacting over 2 million learners worldwide through his own courses. He also created the weekly data industry podcast DataFramed, which he hosted and produced for 2 years. He is committed to spreading data skills, access to data science tooling, and open source software, both for individuals and the enterprise.", "answers": []}, {"id": 124, "code": "AHDRU7", "public_name": "Savin Goyal", "biography": "Savin is the co-founder and CTO of Outerbounds - where his team is building the modern ML stack to accelerate the impact of data science. Previously, he was at Netflix, where he built and open-sourced Metaflow, a full stack framework for data science.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 43, "guid": "85927222-c17f-57e5-a1e6-f0a139d7869a", "logo": "", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 106", "slug": "2023-43-modern-deep-learning-with-pytorch", "url": "https://cfp.scipy.org/2023/talk/8BZN3E/", "title": "Modern Deep Learning with PyTorch", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "We will kick off this tutorial with an introduction to deep learning and highlight its primary strengths and use cases compared to traditional machine learning. In recent years, PyTorch has emerged as the most widely used deep learning library for research. However, a lot has changed regarding how we train neural networks these days. After getting a firm grasp of the PyTorch API, you will learn how to train deep neural networks using various multi-GPU training paradigms. We will also fine-tune large language models (transformers) and deploy them to the cloud.", "description": "This tutorial will be aimed at Python programmers new to PyTorch and deep learning. However, even more experienced deep learning practitioners and PyTorch users may be exposed to new concepts and ideas when exploring other open source libraries to extend PyTorch.\r\n\r\nThroughout this 4-hour tutorial session, attendees will learn how to use PyTorch to train neural networks for image and text classification. We will discuss the individual strengths and weaknesses of deep learning and contrast it with traditional machine learning via libraries such as scikit-learn. \r\n\r\nWe will discuss the PyTorch library in detail, exploring it as a tensor library, automatic differentiation library, and library for implementing deep neural networks.\r\n\r\nAfter getting a firm grasp of the PyTorch API, we will introduce additional open source libraries to familiarize attendees with the modern open source stack for deep learning. For instance, we will organize our model training loops using the Lightning Trainer, which will help us reduce boilerplate code and get additional benefits such as model checkpointing, logging, and convenient mixed precision training.\r\n\r\nThen, we will explore multi-GPU training strategies from DeepSpeed library to accelerate model training if multiple GPUs are available. Note that all model code in this tutorial can be run on a laptop computer, but attendees will also be introduced to free GPU options for this tutorial via Google Colab and Lightning to get the full benefits of this multi-GPU training section.\r\n\r\nLarge language transformers have largely replaced recurrent neural networks for text classification and generation. So, in this tutorial, attendees will learn how to adopt and fine-tune large language models from the HuggingFace transformer library. \r\n\r\nLastly, as a bonus, we will also build a deep learning model demo using Gradio and deploy it to the cloud using Lightning.", "recording_license": "", "do_not_record": false, "persons": [{"id": 56, "code": "GVDHSU", "public_name": "Sebastian Raschka", "biography": "Sebastian Raschka is a machine learning and AI researcher with a strong passion for education. As Lead AI Educator at Lightning AI, he is excited about making AI and deep learning more accessible and teaching people how to utilize these technologies at scale.\r\nSebastian was previously an Assistant Professor of Statistics at the University of Wisconsin-Madison. However, in 2023, he resigned from his position to devote himself fully to the Lightning AI startup company, which he had joined in 2022. While working at UW-Madison, Sebastian focused on researching deep learning and machine learning. To learn more about his research, you can visit his website at https://sebastianraschka.com/publications.\r\n\r\nMoreover, Sebastian loves open-source software and has been a passionate contributor for over a decade. Next to coding, he also loves writing and authored the bestselling Python Machine Learning book and Machine Learning with PyTorch and Scikit-learn.\r\n\r\nIf you like to find out more about Sebastian and what he is currently up to, please visit his personal website at https://sebastianraschka.com. You can also find Sebastian on Twitter (@rasbt), Mastodon (@mastodon.social@SebRaschka), and LinkedIn (https://www.linkedin.com/in/sebastianraschka/)", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 101": [{"id": 201, "guid": "21d95432-3463-5dc4-97bf-5b26e6cdf89e", "logo": "/media/2023/submissions/CJUYJM/tutorial_image_yUX7wUY.png", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 101", "slug": "2023-201-mosaic-magic-with-matplotlib", "url": "https://cfp.scipy.org/2023/talk/CJUYJM/", "title": "Mosaic Magic with Matplotlib", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Communicating scientific data often relies on making comparisons between multiple datasets.\r\nJoin the Matplotlib team to learn about creating multi-axis figures to display such data side-by-side.\r\nThis intermediate level tutorial will cover a variety of tools for making multi-axis figures.\r\nOf particular focus will be the [subplot_mosaic](https://matplotlib.org/stable/gallery/subplots_axes_and_figures/mosaic.html) and the layout engines: tight, constrained, and compressed.\r\nThis tutorial will emphasize the use of Matplotlib's Object Oriented (OO) API and why that is generally recommended over the pyplot (plt) API.", "description": "This tutorial is designed for users of Matplotlib who want to learn more about how to lay out complicated figures.\r\nBring a figure you like that you want to replicate the layout of or one that you'd like to improve.\r\n\r\n\r\n- Introduction (10 mins)\r\n- Parts of a figure. What makes up a figure (20 mins)\r\n  - (Build up to: https://matplotlib.org/stable/gallery/showcase/anatomy.html)\r\n- Creating a figure with a single axes (10 mins)\r\n- Object oriented model of interacting with axes (20 mins)\r\n    - e.g. Prefer `ax.plot` over `plt.plot`\r\n- Multi axes figures (~1.5 hr):\r\n    - `subplots` (10 mins)\r\n    - `subplot_mosaic` (30 mins)\r\n    - `grid_spec` (20 mins)\r\n    - `subplot2grid` (5 mins)\r\n    - `add_axes` (5 mins)\r\n    - `add_subplot` (5 mins)\r\n    - Inset and zoomed axes (5 mins)\r\n- Layout engines (30 mins)\r\n    - Introduction (10 mins)\r\n    - Constrained Layout (10 mins)\r\n    - Compressed Layout (5 mins)\r\n    - Tight Layout (5 mins)\r\n- Labeling figures (20 mins)\r\n    - Axis/figure labels (10 mins)\r\n    - Legends (5 mins)\r\n    - Colorbars (5 mins)\r\n- Subfigures (10 mins)\r\n- Conclusions/questions (20 mins)\r\n\r\n\r\nDetailed setup instructions will be provided prior to the event.", "recording_license": "", "do_not_record": false, "persons": [{"id": 233, "code": "VZCYWU", "public_name": "Kyle Sunden", "biography": "Kyle is a Research Software Engineer working for Matplotlib with a focus on the data pipeline.\r\nKyle has a PhD in Chemistry from the University of Wisconsin where he made software to control laser spectroscopy instrumentation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 96, "guid": "ad92273d-f765-52b7-8fb9-b26d2132a766", "logo": "/media/2023/submissions/LZPDBD/pyvista_banner_HZ7nr2j.png", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 101", "slug": "2023-96-3d-visualization-with-pyvista", "url": "https://cfp.scipy.org/2023/talk/LZPDBD/", "title": "3D Visualization with PyVista", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "[PyVista](https://github.com/pyvista/pyvista) is a general purpose 3D visualization library used for over 1400+ open source projects for the visualization of everything from [computer aided engineering and geophysics to volcanoes and digital artwork](https://dev.pyvista.org/getting-started/external_examples.html).\r\n\r\nPyVista exposes a Pythonic API to the [Visualization Toolkit (VTK)](http://www.vtk.org) to provide tooling that is immediately usable without any prior knowledge of VTK and is being built as the 3D equivalent of Matplotlib, with plugins to Jupyter to enable visualization of 3D data using both server- and client-side rendering.", "description": "Our tutorial will demonstrate PyVista's latest capabilities and bring a wide range of users to the forefront of 3D visualization in Python.\r\n\r\n- Use PyVista to create 3D visualizations from a variety of datasets in common formats.\r\n- Overview the classes and data structures of PyVista with real-world examples.\r\n- Be familiar of the various filters and features of PyVista.\r\n- Know which Python libraries are used and can be used by PyVista (meshio, trimesh etc).\r\n\r\nWe see this tutorial catering to anyone who wants to visualize data in any domain, and this ranges from basic Python users to advanced power users.", "recording_license": "", "do_not_record": false, "persons": [{"id": 117, "code": "NEC33M", "public_name": "Bane Sullivan", "biography": "[Bane Sullivan](https://banesullivan.com), co-creator of [PyVista](https://github.com/pyvista/), is a Research Software Engineer working at the intersection of geoscience, visualization, and data science.\r\n\r\nBane is a geophysicist/hydrologist by training and has been working to grow PyVista's adoption within the subsurface geoscience communities.", "answers": []}, {"id": 118, "code": "WY7NA9", "public_name": "Tetsuo Koyama", "biography": "Hi! My name is Tetsuo Koyama. I'm CAE software engineer in Japan. I'm interested in scientific computing and visualization with computer graphics. I am a commiter of GetFEM and developer team of [PyVista](https://github.com/orgs/pyvista/people).", "answers": []}, {"id": 391, "code": "3RXHG8", "public_name": "Alexander Kaszynski", "biography": "Alex Kaszynski, co-creator of [PyVista](https://docs.pyvista.org/version/stable/) and creator of the [PyAnsys](https://docs.pyansys.com/version/dev/) project.\r\n\r\nAdvocate for all things open source and has contributed to the creation of Ansys\u2019s open source projects at Ansys and [PyMAPDL](https://github.com/ansys/pymapdl). Enjoys presenting and demoing Python, especially 3D visualization but also its application to CAE and automation.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 105": [{"id": 192, "guid": "fe847e13-2b39-5ccd-b7af-e2bd1ec5db50", "logo": "", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 105", "slug": "2023-192-building-better-data-structures-apis-and-configuration-systems-for-scientific-software-using-pydantic", "url": "https://cfp.scipy.org/2023/talk/VBZ9PN/", "title": "Building better data structures, APIs and configuration systems for scientific software using Pydantic", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "This tutorial is an introduction to Pydantic, a library for data validation and settings management using Python type annotations. Using a semi-realistic ML and / or scientific software pipeline scenario we demonstrate how Pydantic can be used to support type validations for scientific data structures, APIs and configuration systems. We show how the use of Pydantic in scientific and ML software leads to a more pleasant user experience as well as more robust and easier to maintain code. A minimum knowledge of Python type annotations, class definitions and data structures will be helpful\r\nfor beginners but not required.", "description": "One of the most controversial design choices of Python is the use of dynamic types. Dynamic types of variables can often lead to confusion for beginners, but also for experts it is a common sources of hard-to-find bugs. For this reason the concept of type annotations has been introduced later in the language to allow for static code analysis and more detailed source code documentation. Pydantic is a Python library that makes use of these type annotations to parse and validate types for class based data structures. In the past years Pydantic has gained tremendous popularity among web developers and is now the most widely used data validation library for Python. In this tutorial we show how the use of Pydantic can help to build better data structures, APIs and configuration systems for scientific Python packages as well. In many cases the validated types lead to a more pleasant user experience as well as more robust and easier to maintain code.\r\n\r\nIn the first block we introduce the basics of the library such as the concept of Pydantic models, type annotations and atomic types such as int, float, str etc. We show how types are parsed and how models can be configured to forbid extra attributes. At the end of the block participants will try to implement the first Pydantic model and explore basic configuration settings.\r\n\r\nWe then proceed with the introduction of more complex types, such as typed dicts, Enums and date time objects. We will also cover custom types, which can bet used to build nested models. Then we introduce the basics of type validation for multiple scenarios, such as pre and post init and root validation. At the end of this block we will cover the topic of dynamic model creation. In the following hands-on session participants will implement a more complex Pydantic model representing the response from a weather data API at multiple levels of difficulty. \r\n\r\nThe subsequent block will be dedicated to serialization and deserialization of Pydantic models. We will first motivate the need and then introduce the JSON and YAML data formats. We will show how to support custom types for JSON serialization and give an overview of configuration options related to serialization. We will conclude with performance remarks for serialization a large number of model. In the corresponding hands-on exercise participants will use the weather data structure and build a small configurable data processing pipeline which visually compares the weather forecast data from different models. \r\n\r\nFinally we will give a summary and key takeaways of the tutorial and recommend additional resources for learning Pydantic.", "recording_license": "", "do_not_record": false, "persons": [{"id": 202, "code": "DBFWBP", "public_name": "Axel Donath", "biography": "I'm a Postdoc researcher at the Center for Astrophysics. My research interests include the Galactic X-Ray and Gamma-Ray source populations as well as statistical methods for analysis of low counts data in general. I'm also interested in methods to combine data from multiple instruments. I'm the lead developer of the open source software package Gammapy, sub-package maintainer of Astropy and a member of the CHASC astro-statistics collaboration. I'm also editor for the Astronomy and Astrophysics track of the Journal of Open Source Software JOSS.", "answers": []}, {"id": 404, "code": "TR8USM", "public_name": "Nick Langellier", "biography": "I am a senior machine learning engineer at VideaHealth, Inc. where I am currently developing AI models for automatic detection of dental diseases. My background is in astrophysics and I have 4 years experience as a teaching assistant at the University of Illinois at Urbana-Champaign and Harvard University. The coursework ranged from introductory to mid-level physics in both theoretical and laboratory settings. I have contributed to several conferences, most notably an invited talk at an exoplanet conference in G\u00f6ttingen, Germany. There I presented my Ph. D work on improving exoplanet analysis pipelines through the use of machine learning.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 195, "guid": "93d23046-1fdb-5859-a6ad-fed209b86d3b", "logo": "/media/2023/submissions/RKV3PZ/Screenshot_2023-03-01_104725_3q3B65N.png", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 105", "slug": "2023-195-meet-your-coding-best-friend-vs-code-a-hands-on-tutorial-on-how-to-get-the-most-out-of-the-world-s-most-popular-python-editor", "url": "https://cfp.scipy.org/2023/talk/RKV3PZ/", "title": "Meet your coding best friend: VS Code\ud83d\udc96 - A hands-on tutorial on how to get the most out of the world\u2019s most popular Python editor", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Visual Studio Code (VS Code) is a free code editor that runs on Windows, Linux, macOS and in your browser. This tutorial aims at Python programmers of all levels who are already using VS Code or are interested in doing so, and will take them from zero (installing VS Code) to a production setup for Python development. We will cover starter topics, such as customizing the UI and extensions, using code autocomplete, code navigation, debugging, and Jupyter Notebooks. We will also go into advanced use cases, such as remote development, pair programming via Live Share, Dev containers, GitHub Codespaces & more.", "description": "After this tutorial you will walk away with a fully equipped VS Code editor, ready to work on your next project or contribute to your favorite Scientific Python library. We will also cover tips and tricks for data science and visualization and some advanced features you may not have heard of yet.\r\n\r\n\r\nWe will cover the following topics:\r\n\r\n**The basics: VS Code editor and Python extension overview**. We will show you how to set up your editor, where to find the most useful menus and settings, and how to set up your workspace to start developing. We\u2019ll explain how to find and install our favorite extensions for Python, how to use VS Code with Git.\r\n\r\n**Scientific Python development tips and tricks**. In the second hour, we will cover how to navigate and test your Python code like a pro. We will also cover some data science tools that will help you run your favorite data analysis projects directly in VS Code, as well as some GitHub features to test and document your code.\r\n\r\n**Advanced development Part I: Work where you want to**. The third hour of the tutorial explains how to use the remote development extensions pack to hook up VS Code to a remote resource, like a powerful VM in the cloud, a local Linux instance, a Docker instance or GitHub Codespaces.\r\n\r\n**Advanced development Part II: Tools that make you look like you know magic**. It\u2019s time to have some fun and try out cool features for remote collaboration and code generation that will make you feel like the future is here.\r\n\r\n**Wrap-up & Epilogue**. We\u2019ll recap what we\u2019ve learned in this tutorial and share reading and learning materials to help you on your VS Code journey.", "recording_license": "", "do_not_record": false, "persons": [{"id": 169, "code": "UFJEM8", "public_name": "Guen Prawiroatmodjo", "biography": "Guen Prawiroatmodjo is a physicist and software engineer at Microsoft Quantum. She studied Applied Physics at Delft Technical University and obtained a PhD in condensed matter physics experiment from the Niels Bohr Institute at the University of Copenhagen in Denmark. Her expertise is in quantum device characterization and control at cryogenic temperatures for developing quantum computing elements, and has broad experience with software engineering, data engineering and data science in the context of experimental data acquisition and analysis. A large part of her role is educating her fellow physicists and engineers to level up their Python development skills by adopting software best practices in their work. Guen has given introductory talks and workshops on Quantum Computing at various conferences, hackathons and events. Guen is a co-organizer of the SciPy conference and serves on the Program committee.", "answers": []}, {"id": 228, "code": "EJ97FS", "public_name": "Sarah Kaiser", "biography": "Sarah has spent most of her career developing technology in the lab, from virtual reality hardware to satellites. She got her PhD in Physics by starting plasma fires with lasers, Python, and Jupyter Notebooks. She has also written tech books for folks of all ages, including ABCs of Engineering and Learn Quantum Computing with Python and Q#.  As a Cloud Developer Advocate for Python at Microsoft and a Python Software Foundation Fellow, she finds all kinds of new ways to build and break OSS tools for data science and machine learning. When not at her split ergo keyboard, she loves boating in the Seattle area, laser cutting everything, and playing with her German Shepard, Chewie.", "answers": []}, {"id": 164, "code": "CJCTZB", "public_name": "Leopold Talirz", "biography": "Leopold studied physics, and then spent a decade working as a computational materials scientist, solving nature\u2019s riddles through atomistic simulations and writing software to make materials science more open, reproducible, and accessible. In 2021 he joined Microsoft Quantum to supercharge atomistic simulations via the cloud and, eventually, quantum computing. Leopold is a core contributor to the Python-based open-source[AiiDA workflow manager](https://github.com/aiidateam/aiida-core) as well as the[ Materials Cloud platform](https://www.materialscloud.org/home) for seamless sharing of resources in computational materials science. He serves on the NumFOCUS committee for evaluating affiliated project applications and is co-chairing the chemistry & materials session at SciPy this year.\r\n\r\nBesides talks at scientific conferences, Leopold organized AiiDA tutorials in Switzerland, the Netherlands, Norway, and China ([sample video](https://www.youtube.com/watch?v=bjTUnHXZ6oY)), including live[ hands-on lectures on how to code AiiDA plugins in Python.](https://www.youtube.com/watch?v=760O2lDB-TM)", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 202": [{"id": 142, "guid": "bc1f685e-52e2-53fb-ba1d-ee852d3f8576", "logo": "/media/2023/submissions/NEUUKG/napari-cells_3afAN5z.png", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 202", "slug": "2023-142-image-analysis-and-visualization-in-python-with-scikit-image-napari-and-friends", "url": "https://cfp.scipy.org/2023/talk/NEUUKG/", "title": "image analysis and visualization in Python with scikit-image, napari, and friends", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Between telescopes and satellite cameras and MRI machines and microscopes, scientists are producing more images than they can realistically look at. They need specialized viewers for multi-dimensional images, and automated tools to help process those images into knowledge. In this tutorial, we will cover the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions. At every step, we will visualize and understand our work using matplotlib and napari.", "description": "Between telescopes and satellite cameras and MRI machines and microscopes, scientists are producing more images than they can realistically look at. They need specialized viewers for multi-dimensional images, and automated tools to help process those images into knowledge.\r\n\r\nThis tutorial is aimed at folks who have some experience in scientific computing with Python, but are new to image analysis. To get the most out of it, they should have done some work with NumPy arrays \u2014 no need to be an expert! \u2014 but they don't need to know [an image from a pipe](https://en.wikipedia.org/wiki/The_Treachery_of_Images). We will cover the fundamentals of working with images in scientific Python. The tutorial will be split into four parts, of about 45 minutes each, plus breaks:\r\n\r\n- **Images are just NumPy arrays.** In this section we will cover the basics: how to think of images not as things we can see but numbers we can analyze.\r\n- **Changing the structure of images with image filtering.** In this section we will define *filtering*, a fundamental operation on signals (1D), images (2D), and higher-dimensional images (3D+). We will use filtering to find various structures in images, such as *blobs* and *edges*. Putting NumPy, SciPy, scikit-image, and scikit-learn together, we'll show how these fundamental filters are related to modern convolutional neural networks.\r\n- **Finding regions in images and measuring their properties.** In this section we will define image segmentation \u2014 splitting up images into regions. We will show how segmentation is commonly represented in the scientific Python ecosystem, some basic and advanced methods to do it, and use it to take measurements of segmented objects in our images. We will use scikit-image for some basics, and to make object measurements, but we'll also demonstrate how to use a modern, neural-network-based library to find our imaged objects quickly and get on with our science: measuring the things we've imaged.\r\n- **Q&A/Quick tour of advanced features.** This section will be more freestyle and will depend on the audience. We may do a guided tour of other advanced image analysis topics, answer lingering questions about the previous sections, or walk around the room and help people apply what they've learned to their own data of interest.\r\n\r\nAttendees will leave understanding how to work with images in Python, knowing some of the main libraries that can help them do that, and knowing where to get more help if they need it.", "recording_license": "", "do_not_record": false, "persons": [{"id": 166, "code": "QUTG3K", "public_name": "Juan Nunez-Iglesias", "biography": "I'm a research scientist helping other scientists get insights from their image data using Python. I've been using Python since 2008, and the main scientific Python ecosystem (NumPy, SciPy, & co) since 2010. In 2012, on a whim, I went to my first SciPy (US) conference, and it changed my life! I realised that \"open source\" didn't mean just posting the code online. It meant actively collaborating on code with other scientists, across vast distances and at different times. Before you could say \"import numpy as np\", I had joined the scikit-image team, written a paper about it, written a whole book on SciPy (!), started new collaborative, open source libraries, and just generally been all-in on Scientific Python. I've been coming back to SciPy as often as I can to pay it forward for new folks in our community! \ud83d\ude0a", "answers": []}, {"id": 417, "code": "GHV9KN", "public_name": "Lars Gr\u00fcter", "biography": null, "answers": []}, {"id": 418, "code": "RRVUEQ", "public_name": "Kira Evans", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 23, "guid": "e7544890-022c-578d-a12f-800e7f81c0d2", "logo": "/media/2023/submissions/UJBWPQ/numpy_logo_p3rGRbD.svg", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 202", "slug": "2023-23-introduction-to-numerical-computing-with-numpy", "url": "https://cfp.scipy.org/2023/talk/UJBWPQ/", "title": "Introduction to Numerical Computing With NumPy", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "NumPy provides Python with a powerful array processing library and an elegant syntax that is well suited to expressing computational algorithms clearly and efficiently. We'll introduce basic array syntax and array indexing, review some of the available mathematical functions in NumPy, and discuss how to write your own routines.", "description": "NumPy provides Python with a powerful array processing library and an elegant syntax that is well suited to expressing computational algorithms clearly and efficiently. We'll introduce basic array syntax and array indexing, review some of the available mathematical functions in NumPy, and discuss how to write your own routines\r\n\r\nThe tutorial is intended for people new to the scientific Python ecosystem. Previous experience in Python or another programming language is useful but not required.", "recording_license": "", "do_not_record": false, "persons": [{"id": 35, "code": "DDKG73", "public_name": "Sandhya Govindaraju", "biography": "Sandhya is a Scientific Software Developer & Python Trainer at Enthought. Earlier, she supported CAD tools for microprocessor design at Sun Microsystems and Oracle. She holds a M.S in Electrical and Computer Engineering from University of Texas at Austin.\r\n\r\nSandhya enjoys learning new things and is passionate about sharing her knowledge and experience with others. Outside of work, she spends time with family and volunteers.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 203": [{"id": 199, "guid": "c1f6cab5-4099-52f9-9c6f-ac60fbc8d21d", "logo": "", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 203", "slug": "2023-199-ppml-machine-learning-on-data-you-cannot-see", "url": "https://cfp.scipy.org/2023/talk/B9CHA7/", "title": "PPML: Machine Learning on data you cannot see", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Privacy guarantee is **the** most crucial requirement when it comes to analyse sensitive data. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning models could also be exploited to _leak_ sensitive data when _attacked_, and no counter-measure is applied. *Privacy-preserving machine learning* (PPML) methods hold the promise to overcome all these issues, allowing to train machine learning models with full privacy guarantees. In this tutorial we will explore several methods for privacy-preserving data analysis, and how these techniques can be used to safely train ML models _without_ actually seeing the data.", "description": "Privacy guarantees are **the** most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be _shared_ nor moved from their silos of resident, let alone analysed in their _raw_ form. As a result, _data anonymisation techniques_ are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the _memoisation_ effect of Deep learning  models could be maliciously exploited to _attack_ the models, and _reconstruct_  sensitive information about samples used in training, even if these information were not originally provided. \r\n\r\n*Privacy-preserving machine learning* (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.\r\n\r\nThis workshop will be mainly organised in **three** main parts. In the first part, we will introduce the main concepts of **differential privacy**: what is it, and how this method differs from more classical _anonymisation_ techniques (e.g. `k-anonymity`).  In the second part, we will focus on Machine learning experiments. We will start by demonstrating how DL models could be exploited (i.e. _inference attack_ ) to reconstruct original data solely analysing models predictions; and then we will explore how **differential privacy** can help us protecting the privacy of our model, with _minimum disruption_ to the original pipeline. Finally, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised _distributed federated_ _learning_ strategies.", "recording_license": "", "do_not_record": false, "persons": [{"id": 139, "code": "FRA7AF", "public_name": "Valerio Maggio", "biography": "Valerio Maggio is a Researcher, a Data scientist Advocate at Anaconda, and a casual \"Magic: The Gathering\" wizard. He is well versed in open science and research software, supporting the adoption of best software development practice (e.g. [Code Review](https://www.software.ac.uk/blog/2022-03-18-treat-your-research-code-code-review)) in Data Science.  Valerio is also an open-source contributor, and an active member of the Python community. Over the last twelve years he has contributed and volunteered to the organization of many international conferences and community meetups like PyCon Italy, PyData, EuroPython, and EuroSciPy. All his talks, workshop materials and random ramblings are publicly available on his[ Speaker Deck](https://speakerdeck.com/leriomaggio) and[ GitHub](https://github.com/leriomaggio) profiles.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 1, "guid": "8b7e8e24-7ad2-58af-a436-5a2f973162e1", "logo": "", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 203", "slug": "2023-1-introduction-to-causal-inference", "url": "https://cfp.scipy.org/2023/talk/CQRYUC/", "title": "Introduction to Causal Inference", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "This tutorial session is intended to give attendees a gentle introduction to applying causal thinking and causal inference to data using python. Causal data analysis is very common in many academic domains (e.g. in social psychology, epidemiology, macroeconomics, etc) as well as in industry (all of the largest Silicon Valley tech companies employ teams of scientists who answer business questions purely with causal inference methods). The tutorial will involve a combination of presentations with open Q&A and hands-on exercises contained in Google Colab notebooks.", "description": "The tutorial will involve a combination of presentations with open Q&A and hands-on exercises contained in Google Colab notebooks. This session will cover the difference between correlation and causation, the pitfalls of conducting an analysis using observational data, how causal inference can help get around these pitfalls, and examples of common, modern modeling approaches used to conduct causal inference (propensity score matching, estimating causal curves, g-computation, and double ML). After the tutorial, the attendees should have a good foundational understanding of causality and the ability to confidently explore the topic on their own. Causal inference can be a very theory-heavy topic, making it impenetrable to novices. In this tutorial, we'll aim to take a more practical perspective on causal inference, while still occasionally touching on the theory.\r\n\r\nTutorial participants are not expected to be familiar with causal inference before attending, but we hope they have an earnest curiosity to learn about it! To get the most out of the session, the participants ought to have experience working with the common python data stack: matplotlib, numpy, pandas, and scikit-learn. Attendees should have some experience conducting classic machine learning modeling using the scikit-learn API, although having advanced machine learning expertise is absolutely not a prerequisite. A very basic understanding of statistics would be helpful (e.g. understanding what a mean is, what confidence intervals represent).", "recording_license": "", "do_not_record": false, "persons": [{"id": 6, "code": "PVWVHW", "public_name": "Roni Kobrosly", "biography": "I am a former epidemiology researcher who has spent approximately a decade employing causal modeling and inference. The bulk of my academic career was spent conducting data analyses to estimate the population-level effects of harmful environment exposures, when traditional randomized experiments were infeasible or unethical. During this time, I taught a couple undergraduate epidemiology courses, once of which involved a sizable introduction to causal thinking. I've also presented many one-off departmental presentations and at a few epidemiology conferences on causal inference in both cases.\r\n\r\nSince leaving the academic world, I've been loving my second life in the tech industry as a data scientist, ML engineer, and more recently as the Head of Data Science at a medium-sized health tech company based in Washington DC. I love mentoring junior data folks and explaining the magic of data analysis and modeling to non-technical audience.\r\n\r\nI also am a member of the open-source community, being the author and maintainer of the `causal-curve` python package. This package provides a set of tools for estimating the causal impact of continuous/non-binary treatments (e.g. estimating the causal impact of a neighborhood's income inequality on local crime, or understanding the causal effect of increasing a product's price on conversion rates).", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 103": [{"id": 112, "guid": "37a89625-ae8d-5337-94c7-6ebfd8ec28fd", "logo": "", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 103", "slug": "2023-112-introduction-to-python-and-programming", "url": "https://cfp.scipy.org/2023/talk/CDRJYE/", "title": "Introduction to Python and Programming", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Enjoy a gentle introduction to Python for folks who are completely new to it and may not have much experience programming. Learn how to write Python while practicing loops, if\u2019s, functions, and usage of Python\u2019s built-in features in a series of fun, interactive exercises inside Jupyter Notebooks. By the end you\u2019ll be ready to write your own basic Python -- but most importantly, I want you to learn the form and vocabulary of Python so that you can understand Python documentation, interpret code written by others, and get the most out of other SciPy tutorials.", "description": "To make the most of SciPy it helps to have some basic familiarity with the Python language itself. This beginner level tutorial is designed for folks who are brand-new to Python and may not even have much programming experience. I\u2019ll help you get a working Python installation in which you can launch Jupyter Notebooks, a common tool used in scientific research with Python and in SciPy tutorials.\r\n\r\nAttendees will learn to work with Python variables, the object interface, loops, conditional statements, function definitions, and the use of basic Python data structures through hands-on exercises inside of Jupyter. Students will use the ipythonblocks library to manipulate an image-like grid of colors for immediate, interactive feedback that makes it easy to tell whether code had the intended effect.\r\n\r\nMy goal is for you to leave the tutorial with a basic familiarity with Python (and a working Python installation) that helps you focus on the scientific libraries you\u2019ll learn about in the other tutorials and throughout SciPy. Familiarity with the usage and features of Jupyter will also help you dive headfirst into other tutorials.", "recording_license": "", "do_not_record": false, "persons": [{"id": 133, "code": "R7PFJV", "public_name": "Matt Davis", "biography": "Matt has been using Python to work with data in science and at startups since 2008, after getting degrees in Astronomy and Aerospace Engineering. He maintains some moderately popular open-source Python libraries, including SnakeViz and Palettable. Today Matt is the lead software engineer at Populus, a startup helping city governments manage various aspects of transportation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 190, "guid": "4f439f9d-0973-5c70-8be5-352502815c66", "logo": "/media/2023/submissions/7BRY3J/e2e_air_2YbrESe.png", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 103", "slug": "2023-190-scalable-machine-learning-workloads-with-ray-ai-runtime", "url": "https://cfp.scipy.org/2023/talk/7BRY3J/", "title": "Scalable machine learning workloads with Ray AI Runtime", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Machine learning (ML) pipelines involve a variety of computationally intensive stages. As state-of-the-art models and systems demand more compute, there is an urgent need for adaptable tools to scale ML workloads. This idea drove the creation of Ray\u2014an open source, distributed ML compute framework that not only powers systems like ChatGPT but also pushes theoretical computing benchmarks. Ray AIR is especially useful for parallelizing ML workloads such as pre-processing images, model training and finetuning, and batch inference. In this tutorial, participants will learn about AIR\u2019s composable APIs through hands-on coding exercises.", "description": "State-of-the-art machine learning (ML) models require an exponentially increasing amount of compute, making it necessary to utilize the full capacity of your laptop or workstation and beyond to cloud cluster.  However, scaling introduces challenges with orchestration, integration, and maintenance. What's more, ML systems change quickly. If you rely on piecemeal solutions to parallelize individual stages of pre-processing, training, inference, and tuning, then stitching these evolving systems together requires a lot of overhead.\r\n\r\nThis context drove the development of [Ray](https://github.com/ray-project/ray): a solution to enable researchers and developers to scale Python code to the full capacity of your laptop or cluster without worrying about implementing complex distributed computing logic.\r\n\r\nThis hands-on tutorial introduces Ray AI Runtime (AIR), an open source, Python-based set of libraries that equip researchers and developers with a toolkit for parallelizing ML workloads. We will use a popular computer vision (CV) use case, image segmentation, to guide participants through common ML workloads, including data pre-processing, model training and fine-tuning, and parallel batch inference.\r\n\r\n#### Resources\r\n\r\n-   GitHub repository with relevant resources including notebooks, setup instructions, reference implementations to coding exercises, and a README for an overview.\r\n\r\n-   Participants will be able to use a pre-configured compute cluster for the duration of the tutorial.\r\n\r\n#### Audience\r\n\r\n-   Intermediate-level Python and ML researchers and developers.\r\n\r\n-   Those interested in scaling ML workloads up to full laptop capacity to a cluster.\r\n\r\n#### Prerequisites\r\n\r\n-   Familiarity with basic ML concepts and workflows.\r\n\r\n-   No prior experience with Ray or distributed computing.\r\n\r\n-   (Optional) [Overview of Ray](https://github.com/ray-project/ray-educational-materials/blob/main/Introductory_modules/Overview_of_Ray.ipynb) notebook as background material.\r\n\r\n#### Key Takeaways\r\n\r\n-   Understand common challenges and trade-offs when scaling CV pipelines from laptop to cluster.\r\n\r\n-   Hands-on skill in using Ray AIR to scale CV workloads, including model training, fine-tuning, inference.\r\n\r\n#### Outline\r\n\r\nChallenges with scaling ML systems (10 min)\r\n\r\n-   Why are distributed systems so important to ML in general and CV pipelines in particular? How does Ray provide the common ML compute scale from laptop to cluster?\r\n\r\nHands-on lab 1: Composing CV pipelines (60 min)\r\n\r\n-   Examples introducing Ray Data, Train and Tune libraries. Participants will practice composing components to scale an end-to-end ML workload.\r\n\r\n-   Ray Data - Ingest, shard and preprocess the data.\r\n\r\n-   Ray Train - Train a model on the preprocessed training set.\r\n\r\n-   Ray Tune - Run hyperparameter tuning experiment.\r\n\r\n-   BatchPredictor - Perform batch inference on the test set.\r\n\r\n(10 minute break)\r\n\r\nHands-on lab 2: Model training and fine-tuning (60 min)\r\n\r\n-   Learn about approaches to scaling model training.\r\n\r\n-   Code: Implement transformer model fine-tuning with Ray Train and evaluate performance.\r\n\r\n(10 minute break)\r\n\r\nHands-on lab 3: Batch inference (60 min)\r\n\r\n-   Learn about and evaluate several distributed batch inference design patterns.\r\n\r\n-   Implement distributed batch inference through hands-on coding exercises.\r\n\r\n-   Code: Run batch inference using vision transformer and evaluate performance.\r\n\r\nNext steps (10 min)\r\n\r\n-   How to get involved with Ray and access further resources.", "recording_license": "", "do_not_record": false, "persons": [{"id": 188, "code": "U73JSP", "public_name": "Emmy Li", "biography": "Emmy is a technical trainer at Anyscale Inc. She holds a B.Sc in Physics from Stanford University where she contributed toward computational astrophysics research at the Stanford Linear Accelerator Laboratory and NASA\u2019s Jet Propulsion Laboratory. Emmy is passionate about creating high quality educational materials and sharing them with the broader Ray community.", "answers": []}, {"id": 450, "code": "VHQUSA", "public_name": "Adam Breindel", "biography": "Adam Breindel is a member of the Anyscale training team and he consults and teaches on large-scale data engineering and AI/machine learning. He has served as technical reviewer for numerous O'Reilly titles covering Ray, Apache Spark, and other topics. Adam's 20 years of engineering experience include numerous startups and large enterprises with projects ranging from AI/ML systems and cluster management to web, mobile, and IoT apps. He holds a BA (Mathematics) from University of Chicago and a MA (Classics) from Brown University. Adam's interests include hiking, literature, and complex adaptive systems.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 104": [{"id": 172, "guid": "519879b0-a301-50a6-94fb-1cbfcc04838b", "logo": "/media/2023/submissions/ZCUDYT/rocket_EeNrpQX.png", "date": "2023-07-10T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 104", "slug": "2023-172-controlling-self-landing-rockets-using-cvxpy", "url": "https://cfp.scipy.org/2023/talk/ZCUDYT/", "title": "Controlling Self-Landing Rockets Using CVXPY", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "In this tutorial, attendees will learn hands-on how to optimize the trajectory of a self-landing rocket in a real-time simulated setting using CVXPY, a Python-embedded modeling language for convex optimization. We integrate the optimization with the Kerbal Space Program, to showcase a complete landing mission without human intervention, ideally in one piece. CVXPY allows solving complex problems declaratively, letting convex optimization find an optimal way of meeting target conditions with respect to an objective function. After solving the initial problem, attendees will use a selection of advanced CVXPY features while making the example gradually more realistic.", "description": "After giving an introduction to CVXPY at SciPy 2022, we want to follow up and provide an in-depth, worked example about one of the most inquired applications of convex optimization: controlling a self-landing rocket. Indeed, this is also one of the most complex problems to solve and practical usefulness has only recently been achieved.\r\nNevertheless, CVXPY makes it possible to elegantly solve a simplified yet at its core realistic version of the problem. The application serves as a common thread that attendees can work along while being introduced to convex optimization and CVXPY in particular, as well as some of the more advanced features of the library.\r\n\r\nThe tutorial will start by introducing the problem of controlling a self-landing rocket and why it is important. We will then provide an overview of convex optimization and how it can be used to solve this problem. Next, we will dive into the details of CVXPY, starting from a simple hello-world example and gradually moving towards expressing the full problem. Stating the problem should look familiar to anyone who has worked with NumPy before, and only requires high-school level physics knowledge to understand. \r\n\r\nWe have integrated our problem with the Kerbal Space Program, which fits the theme of our tutorial nicely. It allows us to make our problem gradually more realistic by incorporating conditions such as drag, fuel usage, and wind. We will run the scripts written by the attendees to see if it manages to land a rocket safely.\r\n\r\nAs we solve the problem, we will showcase some of the more advanced features of CVXPY, including DPP and CVXPYgen, which can give a significant speedup in practice. \r\n\r\nBy the end of the tutorial, attendees will have a thorough understanding of how to use CVXPY to solve complex optimization problems, and how to apply it to real-world problems such as controlling a self-landing rocket. No prior knowledge of convex optimization is assumed, making this tutorial accessible to beginners in the field.", "recording_license": "", "do_not_record": false, "persons": [{"id": 203, "code": "WCFJQM", "public_name": "Philipp Schiele", "biography": "Main instructor Philipp Schiele\r\nPhilipp Schiele's educational background is in finance and economics and he is currently pursuing a PhD in financial econometrics at the Ludwig Maximilian University of Munich, where he taught various courses in statistics. He is a CVXPY maintainer and has presented a tutorial at SciPy 2022. Generally, he is enthusiastic about finance, optimization, and technology, especially open-source projects.", "answers": []}, {"id": 220, "code": "GUCNBJ", "public_name": "Steven Diamond", "biography": "Steven Diamond works on large scale battery optimization at Gridmatic. Steven received a PhD in Computer Science from Stanford University, where he studied optimization under Prof. Stephen Boyd. He is the original developer and BDFL of CVXPY.", "answers": []}, {"id": 231, "code": "Q7W8MW", "public_name": "Eric Sager Luxenberg", "biography": "Eric Luxenberg is a PhD candidate in the Electrical Engineering department at Stanford University, advised by Stephen Boyd. His research interests include robust optimization and mathematical finance. He is a contributor to CVXPY, and has developed an open-source package for saddle optimization called DSP. He has also served as the primary instructor of Stanford\u2019s convex optimization course.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 203, "guid": "1afd12c4-6e34-5f09-bd1e-ed4756e4449d", "logo": "", "date": "2023-07-10T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 104", "slug": "2023-203-how-the-little-jupyter-notebook-became-a-web-app-managing-increasing-complexity-with-nbdev", "url": "https://cfp.scipy.org/2023/talk/NFWZXD/", "title": "How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Already familiar with ipywidgets, but ready to take your skills to the next level?  In this tutorial we walk through what it takes to transform an exploratory Jupyter Notebook into a mature web application. Web apps can be a valuable product of collaboration between researchers and software developers, and the packages used in this tutorial were selected to support this relationship, starting with using JupyterLab as an integrated development environment. Attendees will learn how to design and document a scientific web application that accommodates increasing complexity, but is also inheritable by the researchers who maintain them in the long run.", "description": "Our tutorial should appeal to scientists and software developers alike. We hope to convince you that web applications are excellent tools for improving the accessibility of scientific data and software and provide you with the know-how to develop one that accommodates growth and collaboration. The structure of the tutorial is based on a true story about a little Jupyter Notebook\u2026\r\n\r\nOne day, a research scientist created the Notebook to do some exploratory development. After a while, that Notebook grew into a reusable workflow for creating a helpful visualization the scientist often ran for different parameters. The scientist eventually recognized that their workflow might be worth sharing, so they started working with a software developer to help the little Notebook grow into a web application. At first, the developer replaced hardcoded inputs into interactive ipywidgets, and used Voila` to hide the code cells from users. That was the day the little Notebook became a dashboard, but its journey didn\u2019t stop there. Over time, the researcher had new ideas about features they wanted to add, so the developer transformed the dashboard into a tab-based web application that could accommodate more steps with rich instructions.\r\n\r\nBut there was a problem. The Notebook started experiencing growing pains. It contained more code than was comfortable. The developer made the notebook feel better by offloading some of the code into python modules. This worked well for logic, but as the application grew more complex, it was important to develop nested widget components in the visual Notebook environment. At first, the developer coded views in extra notebooks, and then copy-pasted the code into the python module, but this became laborious and confusing. One day, the developer started to write a tool that would export code cells from the notebooks into python modules. That way, the developer could code entirely in notebooks, and they could leave in all the markdown and code cells that documented what they were thinking as they designed the tool. That was the day that the little Notebook became a literate Notebook family. \r\n\r\nNot long after, the developer was listening to the Talk Python to Me podcast, and heard someone mention a tool called nbdev. The tool was just like the one the developer had made, except it had many more useful features, like notebook-friendly git commits and merges. Eureka! Finally, the developer could accommodate increasing complexity with simple tools. When the developer gave the Notebook family back, the researchers were able to maintain it themselves, without having to download scary IDEs, extensions, or environments. And they all lived happily ever after.", "recording_license": "", "do_not_record": false, "persons": [{"id": 234, "code": "H7BGNT", "public_name": "Nicole Brewer", "biography": "Nicole is PhD student in History and Philosphy of Science at Arizona State University, where she studies the intersection of science and software from many disciplinary perspectives. As a current [Better Scientific Software Fellow](https://bssw.io/fellows/nicole-brewer) and a former research software engineer, she is passionate about using computational notebooks and literate programming to make scientific software more accessible and reproducible. Check out [Long Tales of Science](https://www.nicole-brewer.com/long-tales-of-science/) - her interview podcast about women in high-performance computing.", "answers": []}, {"id": 236, "code": "YSLRNR", "public_name": "Ludovico Bianchi", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 2, "date": "2023-07-11", "day_start": "2023-07-11T04:00:00-05:00", "day_end": "2023-07-12T03:59:00-05:00", "rooms": {"Classroom 106": [{"id": 31, "guid": "8d1a3011-1fe9-5ee6-a63d-f512dfb6d1c8", "logo": "/media/2023/submissions/8TAA7K/ms-idiomaticpd-course_Ln0MMD9.png", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 106", "slug": "2023-31-idiomatic-pandas", "url": "https://cfp.scipy.org/2023/talk/8TAA7K/", "title": "Idiomatic Pandas", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I've seen with Pandas code after working with the library for a while and writing three books on it.\r\n\r\nWe will discuss:\r\n\r\n* Proper types\r\n* Chaining\r\n* Aggregation\r\n* Debugging", "description": "Are you confused or frustrated with Pandas? Or maybe your own Pandas code when you come back to it later, you find it confusing or difficult to work with.\r\n\r\nI've taught Pandas to thousands in Corporate settings, Universities, and Virtually. I've also seen the bad code that my students write and have strong opinions on how to correct it.\r\n\r\nThis workshop assumes you know some Pandas and want to apply idiomatic constructs to existing code. There will be some lecture and then breakout time to apply the constructs on your own:\r\n\r\nWe will cover\r\n\r\n* Types\r\n* Chaining\r\n* Mutation\r\n* Aggregation\r\n* Debugging", "recording_license": "", "do_not_record": false, "persons": [{"id": 45, "code": "URCPG3", "public_name": "Matt Harrison", "biography": "Matt is a corporate trainer, author, and consultant on Python and Data Science. He has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books: Effective Pandas, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He just published Machine Learning Pocket Reference and Pandas Cookbook (Second Edition). He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), as well as small companies. He has been using Python since 2000 and has taught thousands through live training both online and in person.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 214, "guid": "21e78665-7e70-51c2-9c25-6e88d6e8d2f7", "logo": "", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 106", "slug": "2023-214-advanced-dask-tutorial", "url": "https://cfp.scipy.org/2023/talk/MQQJKG/", "title": "Advanced Dask Tutorial", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Dask is a Python library for scaling and parallelizing Python code. It provides familiar, high-level interfaces to extend the SciPy ecosystem to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms. In this tutorial, we\u2019ll cover advanced features of Dask like applying custom operations to Dask DataFrames and arrays, debugging computations, diagnosing performance issues, and more. Attendees should walk away with a deeper understanding of Dask\u2019s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own workloads.", "description": "Dask is a popular Python library for scaling and parallelizing Python code on a single machine or across a cluster. It provides familiar, high-level interfaces to extend the SciPy ecosystem (e.g. NumPy, pandas, scikit-learn) to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms and workflows. In this tutorial, we\u2019ll cover advanced features of Dask like applying custom operations to Dask DataFrames and arrays, inspecting the internal state of clusters, debugging distributed computations, diagnosing performance issues, and more. Attendees should walk away with a deeper understanding of Dask\u2019s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own data-intensive workloads. Basic Dask experience is required, though knowledge of Dask\u2019s internals is not. This hands-on tutorial is intended for existing or aspiring Dask users looking to gain a deeper understanding of more intermediate and advanced topics.", "recording_license": "", "do_not_record": false, "persons": [{"id": 189, "code": "UZXL8M", "public_name": "James Bourbeau", "biography": "James Bourbeau is a core maintainer of Dask, experienced educator, and has presented on Dask at various conferences and meetups such as SciPy, PyCon, and PyData Global. His most recent presentation was an introductory Dask tutorial at SciPy 2020, a recording of which can be found at https://www.youtube.com/watch?v=EybGGLbLipI&t.", "answers": []}, {"id": 172, "code": "NFZXUL", "public_name": "Naty Clementi", "biography": "Naty is an Open Source Software Engineer at Coiled, Dask contributor, and an experienced educator. She has taught multiple Dask tutorials at conferences like Scipy, PyData,  Women Who Code meetups, as well as periodic live tutorials. Her most recent presentations are Dask Tutorial Scipy 2022 (https://youtu.be/J0NcbvkYPoE) and PyData NYC 2022.  In her free time, she likes playing ultimate frisbee, going fly-fishing, and playing video games.", "answers": []}, {"id": 255, "code": "BEVYYV", "public_name": "Julia Signell", "biography": null, "answers": []}, {"id": 408, "code": "AGLPRX", "public_name": "Charles Blackmon-Luca", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 101": [{"id": 53, "guid": "962cd683-e587-518c-855b-c3a1902b7635", "logo": "", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 101", "slug": "2023-53-thinking-in-arrays", "url": "https://cfp.scipy.org/2023/talk/XBUC8S/", "title": "Thinking in arrays", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Despite its reputation for being slow, Python is the leading language of scientific computing, which generally needs large-scale (fast) computations. This is because most scientific problems can be split into \"metadata bookkeeping\" and \"number crunching,\" where the latter is performed by array-oriented (vectorized) calls into precompiled routines.\r\n\r\nThis tutorial is an introduction to array-oriented programming. We'll focus on techniques that are equally useful in NumPy, Pandas, xarray, CuPy, Awkward Array, and other libraries, and we'll work in groups on three class projects: Conway's Game of Life, evaluating decision trees, and computations on ragged arrays.", "description": "Array-oriented programming is a paradigm in its own right, challenging us to think about problems in a different way. From APL in 1966 to NumPy today, most users of array-oriented programming are scientists, analyzing or simulating data. This tutorial focuses on the thought process: all of the problems are to be solved in an imperative way (for loops) and an array-oriented way. Matlab will be used for plotting, but all plotting commands will be given (not prerequisites).\r\n\r\nWe'll alternate between short lectures and small group projects (3\u20124 people each), in which tutors will be available for help, followed by a guided tour through solutions, alternatives, and trade-offs.\r\n\r\nHere is a general outline:\r\n\r\n**0:00\u20120:20 (20 min):** Array-oriented programming as a paradigm: APL, SPEAKEASY, IDL, MATLAB, S, R, NumPy. Overview of basic and advanced slicing, broadcasting, and dimensional reduction. Powerful concept: element indexing is function application and advanced slicing is function composition.\r\n\r\n**0:20\u20120:40 (20 min):** Project 1: Conway's Game of Life. Calculating number of neighbors and updating the board \"all at once.\"\r\n\r\n**0:40\u20120:55 (15 min):** Break\r\n\r\n**0:55\u20121:15 (20 min):** Guided discussion of solutions to Project 1.\r\n\r\n**1:15\u20121:35 (20 min):** Array-oriented programming and the \"iteration until converged\" problem. How to update arrays in which some elements have converged and others haven't.\r\n\r\n**1:35\u20121:55 (20 min):** Project 2: evaluating a decision tree, by walking over each node individually (as in a computer science class) and by million-ball Plinko! (how Scikit-Learn actually does it).\r\n\r\n**1:55\u20122:10 (15 min):** Break\r\n\r\n**2:10\u20122:30 (20 min):** Solutions to Project 2.\r\n\r\n**2:30\u20122:45 (15 min):** Demo: Mandelbrot (fractal) picture, computed 11 different ways: Python, NumPy, C++ (pybind11), Cython, Numba imperative, Numba vectorized, CuPy, CuPy with custom CUDA, Numba-CUDA, JAX-CPU, and JAX-GPU. Discussion of performance and trade-offs.\r\n\r\n**2:45\u20123:05 (20 min):** Non-rectilinear (ragged) arrays and arrays of arbitrary data structures: Apache Arrow and Awkward Array.\r\n\r\n**3:05\u20123:25 (20 min):** Project 3: a big, ragged dataset: computing lengths of taxi trips from polylines with varying numbers of edges. Since this is a big dataset, we'll also look at ways to scale it up with Dask.\r\n\r\n**3:25\u20123:40 (15 min):** Break\r\n\r\n**3:40\u20124:00 (20 min):** Solutions to Project 3.", "recording_license": "", "do_not_record": false, "persons": [{"id": 66, "code": "BGX8FE", "public_name": "Jim Pivarski", "biography": "Jim was trained as a particle physicist with a Ph.D. from Cornell and helped commission the CMS experiment at the Large Hadron Collider (LHC). Then he worked as a data scientist for Open Data Group for 5 years before joining Princeton as a computational physicist in 2016. Now he develops software tools for data analysis in Python, leading the development of Awkward Array, and helps users with a wide range of data analysis problems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 5, "guid": "616aadb8-debd-5d07-a916-734b9bab242d", "logo": "", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 101", "slug": "2023-5-sympy-introductory-tutorial", "url": "https://cfp.scipy.org/2023/talk/LJQPVT/", "title": "SymPy Introductory Tutorial", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "SymPy is a Python library for symbolic mathematics. This tutorial will introduce SymPy to a beginner audience. It will cover an introduction to symbolic computing, basic operations, simplification, calculus, matrices, advanced expression manipulation, code generation, and selected advanced topics. The tutorial does not have any prerequisites beyond knowledge of Python and basic freshman level mathematics. It will be presented with Jupyter notebooks with regular exercises for the attendees. After attending this tutorial, attendees will be able to start using SymPy to solve their own problems.", "description": "SymPy is a pure Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.\r\n\r\nSymPy can be used in a wide array of applications. This includes basic usage as an interactive calculator, symbolically modeling problems in physics and engineering, generating fast numeric code, and use in a Python library representing custom symbolic objects. Anyone interested in learning how to get started using SymPy for any such applications should attend this tutorial.\r\n\r\nThis tutorial is a beginner level tutorial and only requires knowledge of how to use Python. Knowledge of mathematics up to basic calculus is recommended. More advanced mathematical topics will be explained as part of the tutorial. Knowledge of other Python libraries such as NumPy is NOT required. There will be a short section near the end on how to interface SymPy with other libraries such as NumPy, but the majority of the tutorial does not make use of any additional libraries.\r\n\r\nThis tutorial will cover the basics of how to use SymPy, and will also touch on some advanced topics. We will start by discussing the basics of how to build mathematical expressions with SymPy and manipulate them. We will look at how to avoid some of the more common pitfalls and gotchas when using the SymPy. We will then move onto the most common functions in SymPy such as simplification functions, solvers, functions for doing operations from calculus such as differentiation and integration, and matrices. Finally, as time permits, we will look into more advanced topics, such as code generation, extending SymPy, interfacing with other libraries such as NumPy, and additional SymPy submodules.\r\n\r\nAfter attending this tutorial, attendees will be able to start using SymPy to solve their own problems. They will also be armed with the knowledge of how to discover additional more specific functionality in SymPy that may be required for their particular use-case.\r\n\r\nWe will expect tutorial attendees to have the tutorial materials installed on their computers prior to the tutorial. This way we will not waste time in the beginning getting things installed. The tutorial will also be available online using either Binder or JupyterLite for those that do not wish to install things locally.", "recording_license": "", "do_not_record": false, "persons": [{"id": 13, "code": "C93A7K", "public_name": "Aaron Meurer", "biography": "Aaron Meurer is a software engineer at Quansight, where he works on important projects affecting the scientific Python ecosystem including the array API standard, NumPy, and PyTorch. He is also a core maintainer of the SymPy symbolic mathematics library.", "answers": []}, {"id": 17, "code": "DCR3FZ", "public_name": "Anutosh Bhat", "biography": "I am Anutosh Bhat, a 4rth year undegraduate student at IIT Madras . I'm persuing an interdisciplinary dual degree (B.Tech + M.Tech) in Biological Engineering and Data Science .I am an Open Source and Software Development enthusiast and have contributed to some influential libraries like SymPy, SageMath, Networkx, Kyverno and a couple others in the past . My main interests revolve around\r\ndomains like Symbolic and Numerical computations/algorithms and also some Cloud Native Computing based stuff.", "answers": []}, {"id": 15, "code": "NCEED7", "public_name": "Sangyub Lee", "biography": "I am a contributor of SymPy, and I have been using SymPy to develop math education solutions in Mathpresso Inc and TigerMilk.Education.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 105": [{"id": 208, "guid": "59729994-82ec-5120-bced-2eaa872c1eb2", "logo": "", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 105", "slug": "2023-208-hvplot-and-panel-visualize-all-your-data-easily-from-notebooks-to-dashboards", "url": "https://cfp.scipy.org/2023/talk/VKXXNH/", "title": "hvPlot and Panel: Visualize all your data easily, from notebooks to dashboards", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "This tutorial will show you how to use the Pandas or Xarray APIs you already know to interactively explore and visualize your data even if it is big, streaming, or multidimensional. Then just replace your expression arguments with widgets to get a web app that you can share as HTML+WASM or backed by a live Python server.  These tools let you focus on your data rather than the API, and let you build linked, interactive drill-down exploratory apps without having to run a web-technology software development project, which you can then share without becoming an operations specialist.", "description": "Python offers many powerful visualization tools, each with their own strengths and advantages, but few people have the time and interest to learn all the different APIs required to use these different tools. Luckily, a de-facto standard API for data plotting has emerged in the Pandas .plot() API, which is now supported by many different plotting packages.\r\n\r\nIn this tutorial, you will learn how to use hvPlot, a high-level interactive plotting library that exposes the power of Bokeh, Matplotlib, Plotly, Datashader, and Cartopy using the same .plot API you may already know from using Pandas or Xarray's plotting interface. We'll also show you how to turn nearly any expression you can write with that API into a web app with plots and tables by simply substituting widgets for any parameters you want users to be able to select. Thanks to the HoloViz tools on which hvPlot is built, the resulting apps can easily handle big data (up to billions of rows on an ordinary laptop), remote data (either in Jupyter or in standalone apps), streaming data (using streaming dataframe libraries), geographical data (building on the geoscience software stack), and multidimensional data (using Xarray).\r\n\r\nhvPlot's high-level interface should be sufficient for nearly all of the common data-exploration and data-analysis tasks you want to do with Pandas or Xarray, but in keeping with the HoloViz philosophy of \"shortcuts rather than dead ends\", we'll also show you how and when to drop down to lower-level APIs when you need to, such as when building more complex apps using Panel, doing complex graphical data calculations using Datashader, or integrating plotting and interactivity into your own libraries using Param and HoloViews. \r\n\r\nWith the techniques you learn in the hands-on exercises in this tutorial, you'll get the tools and know-how to effectively explore, analyze and visualize simple or complex, small or large, and static or dynamic data easily, concisely, and reproducibly. The resulting visualizations and apps can be shared as static images, simple HTML documents with limited interactivity, HTML+WASM documents with full Python-backed interactivity, or as Python apps deployed on a remote server. We expect participants to have previously used some sort of plotting tool and to be comfortable with Python and at least one array-based library (Numpy, Pandas, Xarray, CuPy, cuDF, Dask, etc.).", "recording_license": "", "do_not_record": false, "persons": [{"id": 186, "code": "RKAYQQ", "public_name": "James A. Bednar", "biography": "Jim Bednar is the Director of Custom Services at Anaconda, Inc. Dr. Bednar holds a Ph.D. in Computer Science from the University of Texas, along with degrees in Electrical Engineering and Philosophy. He has published more than 50 papers and books about the visual system, software development, and reproducible science. Dr. Bednar manages the HoloViz project, a collection of open-source Python tools that includes Panel, hvPlot, Datashader, HoloViews, GeoViews, Param, Lumen, and Colorcet. Dr. Bednar was a Lecturer and Reader in Computational Neuroscience at the University of Edinburgh from 2004-2015, and previously worked in hardware engineering and data acquisition at National Instruments.", "answers": []}, {"id": 9, "code": "7JRHGL", "public_name": "Sophia Yang", "biography": "Sophia Yang is a Senior Data Scientist and a Developer Advocate at Anaconda. She is passionate about the data science community and the Python open-source community. She is the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She serves on the Steering Committee and the Code of Conduct Committee of the Python open-source visualization system HoloViz. She also volunteers at NumFOCUS, PyData, and SciPy conferences. She holds an M.S. in Computer Science, an M.S. in Statistics, and a Ph.D. in Educational Psychology from The University of Texas at Austin.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 101, "guid": "21179fe4-4e43-5660-981a-f7e72d267496", "logo": "/media/2023/submissions/C9QZXU/bokeh-tutorial-session-image_7znieSO.png", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 105", "slug": "2023-101-interactive-data-visualization-with-bokeh", "url": "https://cfp.scipy.org/2023/talk/C9QZXU/", "title": "Interactive data visualization with Bokeh", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Bokeh is a library for interactive data visualization. You can use it with Jupyter Notebooks or create standalone web applications, all using Python. This tutorial is a complete guide to Bokeh, where we start with a basic line plot and step-by-step make our way to creating a dashboard with several interacting components. This tutorial will be helpful for scientists who are looking to level-up their analysis and presentations, and tool developers interested in adding custom plotting functionally or dashboards.", "description": "Bokeh is a Python library for creating interactive data visualizations. Bokeh allows you to create plots that can be displayed in a web browser, without needing to write HTML and JavaScript. In development for over 10 years, Bokeh has become a core tool for Python data science workflows, used for both exploratory analysis and in presentations. It is actively used in scientific domains including bioscience, geoscience, and astrophysics. Moreover, other useful libraries in the PyData ecosystem, like Dask, ArViz, and the Holoviz tools, build custom applications and workflows with Bokeh.\r\n\r\nIn this tutorial, you\u2019ll learn everything you need to know to create beautiful and powerful interactive plots from scratch. We\u2019ll start by introducing core Bokeh concepts, creating simple static plots like line and bar charts, and customizing them. We\u2019ll then gradually introduce layers of interactivity, create specialized plots like geographic maps, and discuss new features like contour plots. By the end, you will be able to create a complete interactive dashboard using Bokeh.\r\n\r\nThis tutorial is presented by Bokeh core team members and is fully hands-on with several examples and exercises in every section. We hope to enable more people, especially scientists and tool developers, to create pretty yet powerful visualizations.", "recording_license": "", "do_not_record": false, "persons": [{"id": 14, "code": "QGMGFB", "public_name": "Pavithra Eswaramoorthy", "biography": "Pavithra is a Developer Advocate at Quansight, where she works to support the PyData community. She also contributes to the Bokeh and Dask projects; and has helped administrate Wikimedia\u2019s outreach programs in the past. In her spare time, she enjoys a good book and hot coffee. :)", "answers": []}, {"id": 125, "code": "JC9KK8", "public_name": "Ian Thomas", "biography": "Ian Thomas is a Senior Software Engineer at Anaconda. Originally an ocean modeller, Ian has many years' experience analysing and visualising data. Ian is an Open Source contributor and core maintainer of a number of libraries, most notably Bokeh, Datashader and fsspec. Ian is British and drinks a lot of tea.", "answers": []}, {"id": 170, "code": "BBLFMK", "public_name": "Bryan Van de Ven", "biography": null, "answers": []}, {"id": 281, "code": "HMRG3H", "public_name": "Timo Metzger", "biography": "Timo is a technical writer and project manager at makepath. He started contributing to Bokeh in 2020 and loves to help others succeed in the world of Open Source.", "answers": []}, {"id": 416, "code": "8RCL3H", "public_name": "Victoria Adesoba", "biography": "Victoria is the Director of Operations at makepath and has enjoyed contributing to Bokeh over the last year. She enjoys traveling and working on the go as well as mentoring youth in her community.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 202": [{"id": 137, "guid": "1a9e950c-dc31-5807-91d6-59bbe610da28", "logo": "", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 202", "slug": "2023-137-explore-generative-models-in-ai-with-keras", "url": "https://cfp.scipy.org/2023/talk/7NLG3F/", "title": "Explore generative models in AI with Keras", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "This tutorial introduces Keras, a powerful deep learning library and demonstrates how to enable generative models using Keras. The first part delves into the Keras training pipeline and extended modules. The second part explores image generative models using stable diffusion, with live coding examples to generate novel images and teach the model new concepts. Finally, you'll explore language generative models, including GPT and BART, with a live coding example that demonstrates how to enable these models. By the end of this tutorial, you'll have a solid understanding of how to harness Keras to create powerful AI applications.", "description": "In this tutorial, we will explore the powerful Keras library and the world of generative models in AI. We will begin with a brief introduction to Keras, its history, and its value in creating neural networks. We will then dive into the Keras training pipeline, exploring sequential, functional, and custom models, optimizers, loss and metrics, and the training API. We will also cover Keras extended modules for NLP, CV, and GNN, and walk through an end-to-end example to create and optimize a model.\r\n\r\nIn the second part of the tutorial, we will specifically focus on image generative model stable diffusion architecture. We will explain stable diffusion, demonstrate a latent space walkthrough, and generate images using a colab example. Additionally, we will focus on image inpainting and teaching stable diffusion new concepts, this is called textual inversion.\r\n\r\n Finally, we will explore how generative models work in NLP, specifically focusing on GPT structure and GPT 2, BART, and the mobile playbook. We will demonstrate XLA compilation and show how general support for text generation using one API can be achieved. By the end of this tutorial, attendees will have a solid understanding of Keras and generative models and how they can be used to create powerful AI applications.", "recording_license": "", "do_not_record": false, "persons": [{"id": 159, "code": "89DTKG", "public_name": "Divyashree Shivakumar Sreepathihalli", "biography": "Divya is a talented machine learning software engineer who is currently a part of the Keras team at Google. In this role, she specializes in developing Keras core modeling APIs and KerasCV to improve the functionality of the software.\r\n\r\nDivya has an impressive track record of delivering successful conference talks, including the Southern Data Science Conference and the Women in ML Symposium. Prior to joining Google, Divya worked as a Deep Learning Scientist for Zazu Sensor, a startup group in Intel's Emerging Growth Incubation (EGI) group. Her work there focused on computer vision and deep learning algorithm development for object detection and tracking, resulting in significant advancements for the startup.\r\n\r\nBefore her time at Zazu Sensor, Divya worked as a Platform Architect at Intel's Client Computing Group, where she was responsible for developing proof of concepts for innovative solutions in anonymized computer vision applications. Her efforts resulted in several successful patents being filed, bringing substantial value to the organization.\r\n\r\nDivya completed her Masters in Computer Engineering from Texas A & M University where she focused on Artificial intelligence in 2017.", "answers": []}, {"id": 200, "code": "KCRKNV", "public_name": "Chen Qian", "biography": "Chen Qian is a software engineer at Google. He is a maintainer of Keras and Tensorflow. In 2021, Chen co-founded the project KerasNLP with other Keras maintainers, and has since been working on building APIs for NLP developers. He is enthusiastic at languages, finding everything about language is charming, e.g., learning new languages, linguistics and NLP.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 64, "guid": "2f90cecc-fa10-597f-985e-b3dd72b74ca9", "logo": "/media/2023/submissions/F3HAUQ/exact_vs_approximate_cdjfyvk.png", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 202", "slug": "2023-64-resampling-and-monte-carlo-methods-in-scipy-stats", "url": "https://cfp.scipy.org/2023/talk/F3HAUQ/", "title": "Resampling and Monte Carlo Methods in SciPy.stats", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Resampling and Monte Carlo statistical techniques are surprisingly intuitive, and they are often more flexible and accurate than their better-known analytical counterparts. In this tutorial, participants will develop their intuitive understanding of frequentist statistics and apply it using three functions in `scipy.stats` - `monte_carlo_test`, `permutation_test`, and `bootstrap` - to dramatically expand the statistical analyses they can perform with the SciPy Library.", "description": "Scientists and engineers often seek to answer questions of the following forms.\r\n\r\n1. Is my sample drawn from this hypothesized distribution?\r\n2. Are my samples drawn from the *same* distribution?\r\n3. Based on these samples, what can I infer about the populations from which they were drawn?\r\n\r\nCommon statistical procedures used to answer questions of these forms include:\r\n\r\n1. the one-sample t-test (\"Is my sample drawn from a distribution with population mean `m`?\"),\r\n2. the two-sample t-test (\"Are my two samples drawn from distributions with the same population mean?\"), and\r\n3. the confidence interval of the mean (\"Given my sample, what can I say about the true value of the population mean?\").\r\n\r\nSuch procedures are developed under technical assumptions (e.g., the samples were drawn from normally-distributed populations) that make the mathematics tractable, yet in practice, these assumptions can never be met exactly. Fortunately for science, the conclusions drawn from the procedures above are relatively insensitive to deviations from these assumptions\u2026 except when they\u2019re not!\r\n\r\nOne solution is to abandon frequentist statistics in favor of another paradigm (Bayesian), but the approach suggested by this tutorial is to remove the assumptions, reduce reliance on the analytical approximations, and instead use computers to approximate (or even exactly calculate) responses to the original questions. This idea will lead us to three techniques: \r\n\r\n1. Monte Carlo tests (`scipy.stats.monte_carlo_test`)\r\n2. Permutation tests (`scipy.stats.permutation_test`)\r\n3. The Bootstrap (`scipy.stats.bootstrap`)\r\n\r\nFor many of the same reasons that arithmetic (sums and differences) seems simpler than calculus (integrals and derivatives), these techniques are relatively easy to grasp. Likewise, just as computational methods for integration, equation solving, and optimization can solve a wider variety of problems than analytical approaches, these computational statistical techniques are comparatively flexible and easy to apply.\r\n\r\nDuring this tutorial, participants will write their own code to execute fundamental resampling and Monte Carlo algorithms and compare the results of their code against the equivalent functions in SciPy. They will apply their new understanding of SciPy's `monte_carlo_test`, `permutation_test`, and `bootstrap` functions to reproduce and extend the capabilities of SciPy's other statistics functions (e.g. to small samples, to discrete distributions). Through this tutorial, participants will improve their ability to apply existing statistical procedures to a given situation and gain the ability to *create* customized statistical procedures for demanding applications.", "recording_license": "", "do_not_record": false, "persons": [{"id": 50, "code": "EQBG7G", "public_name": "Matt Haberland", "biography": "Matt Haberland (@mdhaber) is an Assistant Professor in the BioResource and Agricultural Engineering Department at Cal Poly. He earned his Ph.D. in Mechanical Engineering at MIT in 2014 for his thesis \"Extracting Principles from Biology for Application to Running Robots\", and previously created the Contact Sensor / Stabilizer for the rock drill of the Mars rover Curiosity. Matt has been attending the SciPy conference since 2019 as maintainer of the SciPy library.", "answers": []}, {"id": 51, "code": "FCEPB7", "public_name": "Albert Steppi", "biography": "Albert Steppi (@steppi) is a Senior Software Engineer at Quansight Labs. He earned a PhD in Statistics from Florida State University in 2018. Albert has been a maintainer of the SciPy library since 2021.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 203": [{"id": 120, "guid": "7a447213-9322-50a2-a56b-64f79c4f1c84", "logo": "/media/2023/submissions/ALSYBR/tutorial_banner_nvqATRn.png", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 203", "slug": "2023-120-data-of-an-unusual-size-a-practical-guide-to-analysis-and-interactive-visualization-of-massive-datasets", "url": "https://cfp.scipy.org/2023/talk/ALSYBR/", "title": "Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "While most scientists aren't at the scale of black hole imaging research teams that analyze Petabytes of data every day, you can easily fall into a situation where your laptop doesn't have quite enough power to do the analytics you need.\r\n\r\nIn this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud provided by the presenters \u2013 starting from how the data is stored and read, to how it is processed and visualized.", "description": "\"Big data\" refers to any data that is too large to handle comfortably with your current tools and infrastructure. As the leading language for data science, Python has many mature options that allow you to work with datasets that are orders of magnitudes larger than what can fit into a typical laptop's memory.\r\n\r\nThis tutorial will help you understand how large-scale analysis differs from local workflows, the unique challenges associated with scale, and some best practices to work productively with your data.\r\n\r\nBy the end, you will be able to answer:\r\n\r\n- What makes some data formats more efficient at scale?\r\n- Why, how, and when (and when not) to leverage parallel and distributed computation (primarily with Dask) for your work?\r\n- How to manage cloud storage, resources, and costs effectively?\r\n- How interactive visualization can make large and complex data more understandable (primarily with hvPlot)?\r\n- How to comfortably collaborate on data science projects with your entire team?\r\n\r\nThe tutorial focuses on the reasoning, intuition, and best practices around big data workflows, while covering the practical details of Python libraries like Dask and hvPlot that are great at handling large data. It includes plenty of exercises to help you build a foundational understanding within three hours.", "recording_license": "", "do_not_record": false, "persons": [{"id": 14, "code": "QGMGFB", "public_name": "Pavithra Eswaramoorthy", "biography": "Pavithra is a Developer Advocate at Quansight, where she works to support the PyData community. She also contributes to the Bokeh and Dask projects; and has helped administrate Wikimedia\u2019s outreach programs in the past. In her spare time, she enjoys a good book and hot coffee. :)", "answers": []}, {"id": 180, "code": "EKHUEY", "public_name": "Dharhas Pothina", "biography": null, "answers": []}, {"id": 457, "code": "7AL7DJ", "public_name": "Christopher Ostrouchov", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 82, "guid": "26ab0cdc-27ea-58eb-b78f-4e1fd663ae94", "logo": "/media/2023/submissions/QXAYRM/dataset-diagram-logo_PHL7b4e.png", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 203", "slug": "2023-82-xarray-friendly-interactive-and-scalable-scientific-data-analysis", "url": "https://cfp.scipy.org/2023/talk/QXAYRM/", "title": "Xarray: Friendly, Interactive, and Scalable Scientific Data Analysis", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This tutorial will introduce data scientists already familiar with Xarray to more intermediate and advanced topics, such as applying functions in SciPy/NumPy with no Xarray equivalent, advanced indexing concepts, and wrapping other array types in the scientific Python ecosystem.", "description": "Xarray is an open-source Python project that makes working with complex, multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets are often a collection of many related variables on a common grid rather than raw numbers. Such datasets are common in the disciplines of earth science, astronomy, biology, and finance. These datasets are more than just arrays of values: they have labels which describe how array values map to locations in dimensions such as space and time and metadata that describes how the data was collected and processed.\r\n\r\nXarray embraces this complexity and enables users to use dataset metadata such as dimension names and coordinate labels to easily analyze, manipulate, and visualize their datasets. For example, the Pandas-inspired Xarray label-based syntax `temperature.sel(place=\u201dBoston\u201d)` is more intuitive and less error-prone compared to NumPy syntax: `temperature[0]`.\r\n\r\nThis hands-on tutorial will introduce data scientists already familiar with Xarray to more advanced concepts. All material will be presented via Jupyter Notebooks, with participants actively coding and performing exercises to solidify understanding of key concepts. The tutorial intersperses teaching intermediate to advanced Xarray concepts with increasingly complex real-world data analysis tasks.\r\n\r\nThe participant learning goals for the tutorial are to:\r\n\r\n1. Effectively use Xarray\u2019s powerful multidimensional indexing operations\r\n2. Become familiar with important parts of Xarray\u2019s computational API\r\n3. Understand how to extend Xarray\u2019s built-in capabilities with custom computation functions\r\n4. Understand how Xarray fits in with other array types in the scientific Python ecosystem\r\n\r\nThe structure of our tutorial is based on our extensive experience teaching Xarray over the past few years, including numerous similar tutorials at international conferences like SciPy, as well as in formal classes taught at the National Center for Atmospheric Research and the University of Washington. \r\n\r\nThe tutorial will be presented using [Nebari](scipy.quansight.dev), which will facilitate interactive computation and a consistent computational environment without requiring participants to install any software. Tutorial material will be available online ([link]( https://tutorial.xarray.dev/workshops/scipy2023/README.html)) and we will ensure that proper environment files are available for participants that prefer running the tutorial locally. Participants are expected to have some familiarity with Jupyter notebooks, NumPy, Pandas, and Xarray. No specific domain knowledge (e.g. geoscience) is required to effectively participate in this tutorial.  \r\n\r\nIf you are new to Xarray then please go through last year\u2019s tutorial ([link](https://tutorial.xarray.dev/workshops/scipy2022/README.html#scipy-2022)) prior to attending, as our tutorial will assume attendees have a working understanding of these basic concepts.", "recording_license": "", "do_not_record": false, "persons": [{"id": 31, "code": "RKNM7F", "public_name": "Deepak Cherian", "biography": null, "answers": []}, {"id": 194, "code": "LSA9VQ", "public_name": "Negin Sobhani", "biography": "Negin Sobhani is a High Performance Computing consultant and computational atmospheric scientist working at the National Center for Atmospheric Research (NCAR). She has several years of experience developing and supporting open-source tools and infrastructure to improve the performance and accessibility of Earth System models and bridge the gap between data science, atmospheric science, and software engineering. She is interested in applying in adopting cutting-edge data science and computational technologies to improve our understanding of the environment.", "answers": []}, {"id": 240, "code": "W73W8Q", "public_name": "Scott Henderson", "biography": "Scott is research scientist in the University of Washington (UW) Department of Earth and Space Sciences and data science fellow at the eScience Institute. He works on numerous NASA-funded efforts to develop open Cloud computing solutions for data intensive research.", "answers": []}, {"id": 443, "code": "RJTTPM", "public_name": "Anderson Banihirwe", "biography": null, "answers": []}, {"id": 445, "code": "WW7QLE", "public_name": "Don Setiawan", "biography": "Don Setiawan is a Senior Research Software Engineer at the University of Washington, eScience Institute, Scientific Software Engineering Center (SSEC). He has expertise in Python programming, web development, geospatial data analytics, and cloud-based data engineering. He is interested in building scalable, open software to facilitate scientific discovery across fields and enforce software best practices. He has been a power user of the Xarray ecosystem for several years across various projects with Ocean Observatory Initiative (OOI), U.S. Integrated Ocean Observing System (IOOS), National Oceanic and Atmospheric Administration (NOAA), and National Aeronautics and Space Administration (NASA). He is very excited to share his knowledge and help facilitate the Xarray tutorial as this is his first time at Scipy!", "answers": []}, {"id": 127, "code": "RZYJPC", "public_name": "Thomas Nicholas", "biography": "Tom is a Research Software Engineer working in Ryan Abernathey's Ocean Transport Group at Lamont Doherty Earth Observatory, Columbia University.\r\n\r\nHe first started using the open-source scientific python stack during his PhD, when he was studying plasma turbulence in nuclear fusion reactors.\r\n\r\nHe is a member of the xarray core development team, and also works on xGCM, pint-xarray, and xarray-datatree.", "answers": []}, {"id": 453, "code": "SR9AXZ", "public_name": "Jessica Scheick", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 103": [{"id": 9, "guid": "a79039d1-08d8-52cb-8277-21f90a2cbb93", "logo": "", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 103", "slug": "2023-9-power-up-your-work-with-compiling-and-profiling", "url": "https://cfp.scipy.org/2023/talk/NDYWUR/", "title": "Power up your work with compiling and profiling", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "In this workshop, we will introduce Numba - a JIT compiler that is designed to speed up numerical calculations. Most people found all of it is like a mystery - It sounds like magic, but how does it work? Under what conditions does it work? And because of it, new users found it hard to start using it and it requires a steep learning curve to get the hang of it. This workshop will provide all the knowledge that you need to make Numba works for you.", "description": "Have you ever heard of Numba? It is (mainly) a JIT (Just-In-Time) compiler to make your math-heavy Python code run faster under certain conditions. Most people found all of it to be like a mystery - It sounds like magic, but how does it work? Under what conditions does it work? And because of it, new users found it hard to start using it and it requires a steep learning curve to get the hang of it.\r\n\r\nThis workshop requires no prior experience. However, it will be most beneficial to those who are working with numerical data, like data scientists and researchers. We also expect participants have no knowledge of how compilers work and not much understanding of how CPython works. Through exercises, we will explore in what situation Numba works, when it does not and the reason why. We will also look at some cases where we can make Numba works by changing a few things in your code. Hopefully, by finishing the workshop, you will have a better understanding of how Numba works before you even start using it. This knowledge can save you some time on try and error, making your experience in using it better. \r\n\r\n**What Attendees will Learn**\r\n\r\nBy the end of the workshop, you will have some understanding of what Numba is and how it speeds up your Python code. You will also have a better idea about the limitation of Numba and when it does not help. You may also know how to change your code to make it benefit from the speeding up of Numba. You will also learn some troubleshooting skills and where to look for help if got stuck in the future.", "recording_license": "", "do_not_record": false, "persons": [{"id": 21, "code": "M7LKKT", "public_name": "Cheuk Ting Ho", "biography": "Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now working with the open-source community. Cheuk also contributes to multiple Open Source libraries like Hypothesis, Django and Pandas.\r\n\r\nBesides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 20, "guid": "277ba6f2-0204-5364-8e5b-dac05af48821", "logo": "/media/2023/submissions/PTB7DU/Screenshot_2023-01-27_at_3.17.49_PM_3Wyxlz7.png", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 103", "slug": "2023-20-python-for-answering-geospatial-questions-exploring-social-inequity-in-our-communities", "url": "https://cfp.scipy.org/2023/talk/PTB7DU/", "title": "Python for answering geospatial questions: exploring social inequity in our communities", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "We love Python but maybe not enough to commit to an entire coding language. What if we could understand the fundamentals and begin working with real-time data in a single session? Actionable python scripts and understanding the frameworks might be enough to be a springboard for larger exploration projects.", "description": "Recent advances in geospatial analysis and the availability of digital maps have revealed the importance of urban form and built infrastructure as fundamental to understanding the vulnerabilities and vitality of global and local cities.\r\n\r\nLearning to write Python scripts (assuming low or no-level prior experience) we will discover morphometrics and what they reveal about the urban form of cities.", "recording_license": "", "do_not_record": true, "persons": [{"id": 34, "code": "EYAD3D", "public_name": "bonny p mcclain", "biography": "Dr Bonny McClain is a geospatial analyst & self described human geographer  and social anthropologist. Dr McClain applies advanced data analytics, including data engineering and geo-enrichment, to poverty, race, and gender discussions. Her research targets judgments about structural determinants, racial equity, and elements of intersectionality to illuminate the confluence of metrics contributing to poverty. Moving beyond ZIP codes to explore apportioned socioeconomic data based on underlying population data leads to discovering novel variables based on location to build more context to complex data questions. \r\n\r\nRecent Talks:\r\nData Day Texas, Geospatial Keynote 2023\r\nOpen Source Solutions for Environmental Racism|Open Source Science Data Repositories Workshop| NASA, Langley Research Center|September 2022  \r\nKeynote Speaker | GIS DAY | Los Angeles County 2023\r\nGIS Keynote Data Day Texas 2023\r\nFormulating geospatial data questions to answer big problems | GeoPython 2022\r\nSciPy 2022-2023 Diversity Committee Chair 2022\r\nNC HIMSS Annual Conference: Closing Keynote: Location Intelligence: How Does Our Infrastructure Influence Change in Our Built Healthcare Environment?\r\nSciPy Diversity Luncheon Keynote \u2013 July 2020--bias in algorithms", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 104": [{"id": 81, "guid": "61d018b6-f1cc-5fc6-91a1-fb153d7952ba", "logo": "", "date": "2023-07-11T08:00:00-05:00", "start": "08:00", "duration": "04:00", "room": "Classroom 104", "slug": "2023-81-an-introduction-to-cloud-based-geospatial-analysis-with-earth-engine-and-geemap", "url": "https://cfp.scipy.org/2023/talk/GQ7PG3/", "title": "An Introduction to Cloud-Based Geospatial Analysis with Earth Engine and Geemap", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "This tutorial is an introduction to cloud-based geospatial analysis with Earth Engine and the geemap Python package. We will cover the basics of Earth Engine data types and how to visualize, analyze, and export Earth Engine data in a Jupyter environment using geemap. We will also demonstrate how to develop and deploy interactive Earth Engine web apps. Throughout the session, practical examples and hands-on exercises will be provided to enhance learning. The attendees should have a basic understanding of Python and Jupyter Notebooks. Familiarity with Earth science and geospatial datasets is not required, but will be useful.", "description": "The Earth is constantly changing, which creates significant challenges for the environment and human society. To tackle these challenges on a global scale, the Earth science community relies heavily on geospatial datasets that are collected through various means, such as satellite, aerial, and mobile sensors. However, the explosive growth of geospatial datasets over the past few decades has overwhelmed the Earth science community's capacity for storage, analysis, and visualization. Fortunately, the advent of cloud-computing platforms (e.g., Google Earth Engine) has made it possible to access, manipulate, and analyze large volumes of geospatial data on-the-fly. In recent years, Earth Engine has become increasingly popular in the geospatial community and has enabled numerous Earth science applications at local, regional, and global scales.\r\n\r\nThe geemap Python package is built upon the Earth Engine Python API and open-source mapping libraries. It allows Earth Engine users to interactively manipulate, analyze, and visualize geospatial big data in a Jupyter environment. Since its creation in April 2020, geemap has received over [2,500 GitHub stars](https://github.com/giswqs/geemap/stargazers) and is being used by over [800 projects](https://github.com/giswqs/geemap/network/dependents) on GitHub. More than [130 Jupyter notebook examples](https://geemap.org/tutorials/)  and an [open-access book](https://book.geemap.org/) are available for learning geemap. \r\n\r\nThis tutorial consists of seven 30-minute sessions and three 10-minute breaks. During each hands-on session, the attendees will walk through Jupyter notebook examples on Google Colab with the instructors. At the end of each session, they will complete a hands-on exercise to apply the knowledge they have learned. The topics that will be covered in this tutorial include: (1) Introduction to Earth Engine and geemap; (2) Using Earth Engine data; (3) Visualizing Earth Engine data; (4) Analyzing Earth Engine data; (5) Exporting Earth Engine data; (6) Creating satellite timelapse animations; and (7) Developing and deploying interactive Earth Engine web apps. \r\n\r\nThis tutorial is intended for scientific programmers, data scientists, geospatial analysts, and concerned citizens of Earth. Attendees should have a basic understanding of Python and the Jupyter ecosystem. Familiarity with Earth science and geospatial datasets is not necessary, but it will be helpful. For more information about Earth Engine and geemap, visit https://earthengine.google.com and https://geemap.org.", "recording_license": "", "do_not_record": false, "persons": [{"id": 78, "code": "ZFYMW8", "public_name": "Qiusheng Wu", "biography": "Qiusheng Wu is an Associate Professor in the Department of Geography & Sustainability at the University of Tennessee, Knoxville. He is also an Amazon Visiting Academic and a Google Developer Expert (GDE) for Earth Engine. His research focuses on Geographic Information Science, remote sensing, and open-source software development. Dr. Wu is an advocate of open science and reproducible research. He has developed several open-source packages that have been widely used by the geospatial community, such as [geemap](https://geemap.org) and [leafmap](https://leafmap.org). For more information about his research, visit https://wetlands.io.", "answers": []}, {"id": 103, "code": "UX3DZA", "public_name": "Steve Greenberg", "biography": "Steve is passionate about using machine learning and remote sensing technology to tackle the climate and sustainability crises. He leads the Developer Relations team for [Google Earth Engine](https://earthengine.google.com/). Earth Engine is a  geospatial analysis platform advancing planetary sustainability and resilience to climate change. His team helps remote sensing professionals, data scientists and machine learning engineers analyze petabytes of satellite imagery to understand and protect the earth. Earth Engine is provided [free-of-charge for noncommercial and research purposes](https://earthengine.google.com/noncommercial/).\r\n\r\nFrom 2016 through 2021, Steve led Developer Relations for BigQuery, Vertex AI and other Machine Learning and Data Analytics products in Google Cloud Platform, where he focused on improving the experience for users of scikit-learn, XGBoost and TensorFlow.\r\n\r\nSteve also co-leads Google's largest grassroots sustainability group - organizing Googlers to incubate new climate initiatives. Three of the climate areas he's worked on - wind energy prediction, real-time precipitation modeling and sustainable building design - have graduated into full-time projects at Google. Prior to joining Google in 2016, Steve led engineering at a Seattle startup helping governments be more accountable to their citizens with public data. Before 2012, Steve was a Program Manager working on various data efforts in Microsoft's Office team.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 48, "guid": "9193c945-6ed9-5101-a721-115624a2e33d", "logo": "", "date": "2023-07-11T13:30:00-05:00", "start": "13:30", "duration": "04:00", "room": "Classroom 104", "slug": "2023-48-a-hands-on-introduction-to-production-grade-data-science-orchestration-with-flyte", "url": "https://cfp.scipy.org/2023/talk/YHEYVY/", "title": "A Hands-on Introduction to Production-grade Data Science Orchestration with Flyte", "subtitle": "", "track": "Tutorials", "type": "Tutorial", "language": "en", "abstract": "One of the biggest challenges for data scientists and machine learning engineers alike is the friction caused by the iteration cycle between prototyping and production. It\u2019s not enough to deploy a working model to a serving app. The iterative process itself needs to be a tight feedback loop between experimentation, data and model refinement, deploying to production, and dealing with data drift. In this tutorial, attendees will learn how to unify the common tools in the Python Data/ML scientific stack into a single orchestration plane using Flyte so that you can reduce the friction between prototyping and production.", "description": "# Background\r\nThis tutorial interleaves lecture-style content and coding exercises to give data scientists, machine learning engineers, and data engineers hands-on experience with Flyte. Flyte is an open source workflow orchestrator that has a Python SDK for writing and scheduling execution graphs in a type-safe, reproducible manner. The topics and concepts covered in this tutorial are transferable to other similar orchestration tools, or would be useful for anyone who wants to build their own orchestrator. We will anchor the tutorial to five challenges of model development and deployment: scalability, data quality, reproducibility, recoverability, and auditability. Using Flyte, we\u2019ll see how to address these challenges and abstract them out to give you a broader understanding of how to overcome them.\r\n\r\n# Main Content\r\nFirst I\u2019ll define and describe what these five challenges mean in the context of model development. Then I\u2019ll dive into the ways in which Flyte provides solutions to them, taking you through the reasoning behind Flyte\u2019s data-centric and ML-aware design. We'll cover:\r\n\r\n- **Flyte tasks and workflows**: the building blocks for expressing execution graphs.\r\n- **Dynamic workflows**: for defining execution graphs at runtime.\r\n- **Map tasks**: Scale embarrassingly parallel workflows.\r\n- **Plugins**: Extend Flyte's core functionality.\r\n- **Type System**: See the benefits of static type safety.\r\n- **DataFrame Types**: Validate dataframe-like objects at runtime.\r\n- **Reproducibility**: Containerize and harden your execution graph.\r\n- **Caching**: Don't waste precious compute resources re-running nodes.\r\n- **Recovering Executions**: Build fault-tolerant pipelines.\r\n- **Checkpointing**: Checkpoint progress within a node.\r\n- **Flyte Decks**: Create rich static reports associated with your tasks.\r\n\r\nAttendees will learn how Flyte distributes and scales computation, enforces static and runtime type safety, leverages Docker to provide strong reproducibility guarantees, implements caching and checkpointing to recover from failed model training runs, and ships with built-in data lineage tracking for full data pipeline auditability.\r\n\r\n# Wrap-up\r\nThe end of the tutorial will provide a summary of all the main learnings, point to resources to learn more, and a discussion for attendees to address their questions.\r\n\r\n# Resources\r\n- [Flyte Repo](https://github.com/flyteorg/flyte)\r\n- [Flyte Docs](https://docs.flyte.org/en/latest/)\r\n- [Scipy 2022 Flyte talk](https://www.youtube.com/watch?v=EykWaiHHDNg)\r\n- [Scipy 2020 Pandera talk](https://www.youtube.com/watch?v=PxTLD-ueNd4)", "recording_license": "", "do_not_record": false, "persons": [{"id": 55, "code": "DZT3RK", "public_name": "Niels Bantilan", "biography": "Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive.\r\n\r\nHe has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Enthought - 200 W Cesar Chavez St": [{"id": 301, "guid": "11e3a471-4888-55c8-b151-6769cc8da13a", "logo": "", "date": "2023-07-11T18:30:00-05:00", "start": "18:30", "duration": "02:00", "room": "Enthought - 200 W Cesar Chavez St", "slug": "2023-301-scipy-welcome-reception", "url": "https://cfp.scipy.org/2023/talk/Y8PPAA/", "title": "SciPy Welcome Reception", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "SciPy Welcome Reception hosted by Enthought. Tuesday, July 11, 6:30-8:30 at Enthought HQ, 200 W Cesar Chavez, Austin. Meet fellow attendees! Food and drinks served! \r\n\r\n[Walk](https://www.google.com/maps/dir/AT%26T+Hotel+and+Conference+Center,+University+Avenue,+Austin,+TX/Enthought,+200+W+Cesar+Chavez+St+Suite+202,+Austin,+TX+78701/@30.272726,-97.7524166,15z/data=!3m2!4b1!5s0x8644b508a6554d83:0x7edf0a3a6fece735!4m18!4m17!1m5!1m1!1s0x8644b59de7f3c8cf:0x7ef52b1ad3321879!2m2!1d-97.7404423!2d30.2816145!1m5!1m1!1s0x8644b509cdd787e9:0x108b9372002d7f55!2m2!1d-97.7463985!2d30.2642596!2m3!6e1!7e2!8j1689100200!3e2?entry=ttu), get a ride, or [take the bus](https://www.google.com/maps/dir/AT%26T+Hotel+and+Conference+Center,+University+Avenue,+Austin,+TX/Enthought,+200+W+Cesar+Chavez+St+Suite+202,+Austin,+TX+78701/@30.2737123,-97.7521933,15z/data=!3m1!5s0x8644b508a6554d83:0x7edf0a3a6fece735!4m19!4m18!1m5!1m1!1s0x8644b59de7f3c8cf:0x7ef52b1ad3321879!2m2!1d-97.7404423!2d30.2816145!1m5!1m1!1s0x8644b509cdd787e9:0x108b9372002d7f55!2m2!1d-97.7463985!2d30.2642596!2m3!6e1!7e2!8j1689100200!3e3!5i3?entry=ttu&utm_medium=s2email&shorturl=1) with [CapMetro](https://www.capmetro.org/app)!", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 504, "code": "8H7BLW", "public_name": "200 W Cesar Chavez", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 3, "date": "2023-07-12", "day_start": "2023-07-12T04:00:00-05:00", "day_end": "2023-07-13T03:59:00-05:00", "rooms": {"Zlotnik Ballroom": [{"id": 279, "guid": "e00029f9-141f-563e-8234-90baee048682", "logo": "", "date": "2023-07-12T09:15:00-05:00", "start": "09:15", "duration": "00:45", "room": "Zlotnik Ballroom", "slug": "2023-279-keynote-open-source-contributors-in-space-and-time", "url": "https://cfp.scipy.org/2023/talk/X7YH7A/", "title": "Keynote - Open Source Contributors in Space and Time", "subtitle": "", "track": "Keynote", "type": "Talk", "language": "en", "abstract": "Michael Droettboom is a Principal Software Engineering Manager at Microsoft where he leads the CPython Performance Engineering Team. That team contributes directly to the upstream CPython project, and recently helped make Python 3.11 up to 60% faster than 3.10.\r\n\r\nMichael has been contributing to open source for over 25 years: he is the former lead maintainer of matplotlib, a major contributor to astropy, and he is the original author of Pyodide and airspeed velocity. His work has supported such diverse applications as the Hubble and James Webb Space Telescopes, the Firefox web browser, infrared retinal imaging, and optical sheet music recognition.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 409, "code": "H3ZNFW", "public_name": "Michael Droettboom", "biography": "Principal Software Engineering Manager at Microsoft", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 177, "guid": "3b27acb8-551d-51d1-9048-93b0c740630e", "logo": "/media/2023/submissions/G9P3AG/b2nd-2_hLm7ZVA.png", "date": "2023-07-12T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-177-fast-exploration-of-the-milky-way-or-any-other-n-dimensional-dataset-", "url": "https://cfp.scipy.org/2023/talk/G9P3AG/", "title": "Fast Exploration of the Milky Way (or any other n-dimensional dataset)", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "N-dimensional datasets are common in many scientific fields, and quickly accessing subsets of these datasets is critical for an efficient exploration experience. Blosc2 is a compression and format library that recently added support for multidimensional datasets. Compression is crucial in effectively dealing with sparse datasets as the zeroed parts can be almost entirely suppressed, while the non-zero parts can still be stored in smaller sizes than their uncompressed counterparts. Moreover, the new double data partition in Blosc2 reduces the need for decompressing unnecessary data, which allows for top-class slicing speed.", "description": "Blosc is a high-performance compressor optimized for binary data, such as floating-point numbers, integers, and booleans, although it can also handle string data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed direct memory fetch approach, which uses a memcpy() OS call. Blosc is widely used in popular storage libraries like HDF5 (via h5py or PyTables) or Zarr, and is probably producing many petabytes of compressed data every day around the world.\r\n\r\nC-Blosc2 (https://github.com/Blosc/c-blosc2) is the latest major version of C-Blosc. It comes with Python-Blosc2 (https://github.com/Blosc/python-blosc2), a lightweight Python wrapper that exposes many of its new features. Some of the most interesting features are:\r\n\r\n- 64-bit containers: There is no practical limit in dataset sizes.\r\n- Frames: Data can be serialized either on-disk or in-memory.\r\n- Meta-layers: Meta-data can be added in different layers inside frames.\r\n- Blosc2 NDim: N-dimensional datasets can be created, read, and sliced efficiently.\r\n- Double partitioning: Data can be split into fine-grained cubes for faster reads of n-dimensional slices.\r\n- Parallel reads: When several blocks of a chunk need to be read, this is done in parallel.\r\n- Support for special values: Large sequences of repeated values can be represented efficiently.\r\n\r\nWith leveraging these features, Blosc2 provides a powerful, yet flexible tool for data handling.  For example, when Blosc2 cooperates with libraries like PyTables/HDF5, it allows to query [100 trillion rows tables in human time frames](https://www.blosc.org/posts/100-trillion-baby/).\r\n\r\nFurthermore, being able to compress multidimensional data is of great help in handling large multidimensional datasets because 1) it reduces the amount of storage resources and 2) reduces the bandwidth necessary to bring data from storage (disk, memory) to the CPU, allowing to process data more effectively in general.  Additionally, compression can represent a wide variety of sparse data without requiring a specific format. Instead, compression works to minimize the number of zeros and keep storage requirements to a minimum.\r\n\r\nWe will address common misconceptions about compressing data, such as: 1) decompressing data takes CPU time, which may slow down computations, and 2) when retrieving a subset of data, all affected partitions must be decompressed, adding overhead. To debunk these myths, we offer the following facts: 1) decompressing data within CPU caches often saves transmission cycles, and 2) Blosc2 features a novel double partitioning schema that minimizes decompression overhead.\r\n\r\nWe will leverage Python-Blosc2 to:\r\n\r\n- Describe the main features of Blosc2\r\n- Provide useful advice on the best codecs and filters for different types of datasets\r\n- Explain how to partition multidimensional datasets for efficient slicing\r\n- Compare efficiency and resource savings with other packages, such as h5py, PyTables, and Zarr\r\n\r\nFinally, we will demonstrate an example of exploring the Milky Way's 3D dataset effectively, using data from the Gaia mission.", "recording_license": "", "do_not_record": false, "persons": [{"id": 165, "code": "9RWXLD", "public_name": "Francesc Alted", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 236, "guid": "7efd3182-44f7-5a85-b431-f0927c76fe67", "logo": "", "date": "2023-07-12T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-236-a-computer-vision-ml-approach-to-classifying-clouds-and-aerosols-from-satellite-observations", "url": "https://cfp.scipy.org/2023/talk/LAKM79/", "title": "A Computer Vision (ML) Approach to Classifying Clouds and Aerosols from Satellite Observations", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "The NASA Atmosphere SIPS, located at the University of Wisconsin, is responsible for producing operational cloud and aerosol scientific products from satellite observations. With decades of satellite observations, new scientific algorithms are employing Machine Learning (ML) methods to improve processing efficiencies and scientific analyses. In preparation for future developments, we are working with NASA Atmospheric Science Teams to understand ML requirements and assist in developing new tools that will benefit both the Science Teams and the broader Open-Source Science community. This talk will step through a ML methodology being used to identify cloud types and severe aerosols.", "description": "The purpose of this talk is to share how to make most efficient use of the existing machine learning (ML) software, such as tensorflow, to implement scientific ML methods. We will first describe the science objectives we are trying to achieve, elaborate on lessons learned, and finally introduce future challenges.\r\n\r\nOur primary science objective is to identify different cloud types and aerosols from satellite imagery, where the cloud types are indicative of different meteorological conditions. The science objective during the talk will be catered towards the broader scientific community while expecting little to no background in atmospheric science or ML. All of this will be accomplished by presenting visualization of satellite imagery throughout to relate the data to the audience.\r\n\r\nSubsequently we will introduce the ML techniques we have been using. We employ a pretrained VGG16, a convolutional neural network (CNN), which we fine-tune to identify cloud types and aerosols from satellite imagery. There will be accompanying animations illustrating this process and how the inference is combined into the softmax layer providing the result.\r\n\r\nThe specific lessons learned in using ML software is to consider which part of the code executes in CPU or GPU space. Initially we noticed GPU usage was not consistently 100% during inference. To demonstrate potential, we dumped the data to a 200GB file and streamed that directly to the GPU. This test proved what was possible and allowed us to rewrite our generator using keras Sequence where the init and getitem called tf.device locking the data I/O and preprocessing to the CPU leaving the GPU solely for inference. This approached yielded a 2x performance increase.\r\n\r\nSince our goal is to add value to existing NASA algorithm methodologies via ML, this goal requires us to have labeled data. We experimented with existing labeling packages but in the end decided to incorporate the labeling tasking into existing software the community uses. Thankfully as part of NASA\u2019s participation in Open-Source Science, one of the primary tools used by the Atmospheric Science community, NASA Worldview, is already open sourced. This allowed us to install their docker images and extend this tool bringing the labeling task directly to the Scientists.\r\n\r\nAdditionally, I will talk about the importance of visualizing the data through the entire process of training a CNN. For example, I have a video file flipping through thousands of images from our training set. And I will use this to emphasize the importance of looking at data throughout the process and the importance of being able to share information. Open-Source Science is great but being able to convey information about how ML works is just as important.", "recording_license": "", "do_not_record": false, "persons": [{"id": 260, "code": "WLSZ3M", "public_name": "Steve Dutcher", "biography": "Steve Dutcher is a Software Developer/Engineer at the University of Wisconsin, Space Science & Engineering Center with 20 years of experience applying his computer science degree to Atmospheric Science. He currently works as part of the NASA Atmosphere SIPS responsible for producing operational cloud and aerosol products from polar orbiting satellites. Additional projects include machine learning applications, producing a low latency fire product from direct broadcast, and supporting instrument field campaigns.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 45, "guid": "d8e9f80e-736b-5f4d-a4ba-2a2ad4b863c0", "logo": "", "date": "2023-07-12T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-45-pandera-beyond-pandas-data-validation", "url": "https://cfp.scipy.org/2023/talk/T3XSHX/", "title": "Pandera: Beyond Pandas Data Validation", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Data quality remains a core concern for practitioners of machine learning, data science, and data engineering, and in recent years specialized packages have emerged to validate and monitor data and models. However, as the open source community iterates on data frameworks \u2013 notably, highly performant entrants such as Polars \u2013 data quality libraries need to catch up to support them. In this talk, you will learn about Pandera and its journey from being a pandas-only validator to a generic tool for testing arbitrary data containers so that it can provide a standardized way of creating data validation tools.", "description": "# Motivation\r\nData quality remains a core concern for practitioners in machine learning, data science, and data engineering, and many specialized packages have emerged to fulfill the need of validating and monitoring data and models. However, as the open source community creates new data manipulation frameworks \u2013 notably, new highly performant entrants such as [Polars](https://www.pola.rs/) \u2013 existing data quality frameworks need to catch up to support them, and in some cases, the community creates new data validation libraries for these new data frameworks.\r\n\r\n# Origins\r\n[Pandera](https://github.com/unionai-oss/pandera) started as a small project in 2018 with the goal of providing a lightweight, flexible, and expressive API to validate Pandas DataFrames. This part of the talk provides a short primer on data validation and property-based testing with Pandera, providing insights into how its design facilitates code-first schema authoring and maintenance, which in turn gives rise to safer and more robust data pipelines.\r\n\r\nThis primer will contain content similar to the \"Introduction to Pandera\" notebook in the pandera documentation: https://pandera.readthedocs.io/en/stable/try_pandera.html\r\n\r\n# Evolution\r\nAfter gaining traction over the years, the author and community of contributors began to expand Pandera\u2019s scope to support pandas-compliant data frameworks such as GeoPandas, Dask, Modin, and Pyspark Pandas (formerly Koalas). As requests for other libraries increased in frequency, it became clear that Pandera in its existing state was not well-suited for extension beyond Pandas objects. This part of the talk focuses on some of the key design failures that made it difficult to extend to other data frameworks.\r\n\r\nRewrites are Fun! (not): Imagine doing a complete internal rewrite of a library while bug reports, feature requests, and pull requests are coming in from contributors: does it sound fun? In the author\u2019s experience, it\u2019s like juggling three balls while playing drums with your feet as someone throws water balloons in your face. This part of the talk outlines the challenges, lessons learned, and things the author would have done differently to anticipate issues related to the separation of concerns, modularity, and extensibility.\r\n\r\n# Conclusion\r\nThis talk is about how Pandera has evolved to provide a standard schema interface for easily extending and supporting validation backends for arbitrary statistical data containers. Attendees will learn not only about data testing principles such as run-time validation and property-based testing, they will also learn about the challenges of maintaining and evolving an open source project that many people rely on as a critical piece of their data infrastructure. The high-level goal for this talk is to highlight lessons learned from Pandera\u2019s particular journey from supporting only Pandas as a backend to supporting a whole suite of data objects.", "recording_license": "", "do_not_record": false, "persons": [{"id": 55, "code": "DZT3RK", "public_name": "Niels Bantilan", "biography": "Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive.\r\n\r\nHe has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 173, "guid": "4e098ea6-f3d6-5a27-9235-5e093c63b77e", "logo": "/media/2023/submissions/NPG3NS/dsp_GKs6McB.png", "date": "2023-07-12T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-173-disciplined-saddle-programming", "url": "https://cfp.scipy.org/2023/talk/NPG3NS/", "title": "Disciplined Saddle Programming", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Our recent work implements a domain-specific language called Disciplined Saddle Programming (DSP) in Python. It is available at https://github.com/cvxgrp/dsp. DSP allows specifying convex-concave saddle, or minimax problems, a class of convex optimization problems commonly used in game theory, machine learning, and finance. One application for DSP is to naturally describe and solve robust optimization problems. We show numerous examples of these problems, including robust regressions and economic applications. However, this only represents a fraction of problems solvable with DSP, and we want to engage with the SciPy community to hear about further potential applications.", "description": "Convex-concave saddle problems are a class of optimization problems that generalize convex optimization and have a wide range of practical applications, including game theory, finance, and machine learning. A technical trick to convert to a single convex problem is called dualization. However, carrying out this conversion by hand can be tedious and error-prone.\r\n\r\nIn this context, we introduced Disciplined Saddle Programming (DSP) in a recent paper, and the accompanying Python package is implemented as an extension of CVXPY. It is available at https://github.com/cvxgrp/dsp. DSP is a domain-specific language (DSL) for specifying saddle problems for which the dualizing trick can be automated. DSP is based on the conic-representable saddle programs developed by Juditsky and Nem irovski, who showed how to carry out the required dualization automatically using conic duality. The DSP language and methods extend Nesterov and Nemirovski's earlier development of conic representable convex problems and can be seen as extending disciplined convex programming (DCP) to saddle problems.\r\n\r\nThere are numerous benefits of using DSP. The language makes it easier for practitioners to specify and solve saddle problems, and it can handle a wide range of optimization problems, including many robust optimization problems, which have recently gained wider attention. Indeed, some argue that most optimization problems should be solved as robust problems instead, as inputs are rarely obtained with absolute certainty. Further, hearing about even more applications from the SciPy community is an intended side effect to make the package easier to integrate for practitioners.\r\n\r\nJust as DCP, and by extension CVXPY, made it easy for users to formulate and solve complex convex problems, DSP allows users to easily formulate and solve saddle problems. The method is implemented in an open-source Python package, also called DSP. This package provides a way to automate the dualization of saddle problems and provides a simple interface for users to formulate and solve complex problems in a structured and disciplined way.\r\n\r\nIn summary, disciplined saddle programming (DSP) is a new approach that can simplify solving saddle problems in convex optimization. It automates the dualization of saddle problems and provides a simple interface for users to specify and solve complex problems in a structured and disciplined way. DSP is designed to be easy to learn and use, and is compatible with the existing CVXPY framework. DSP has the potential to make saddle problems much easier to solve, which could have a significant impact on a wide range of fields that rely on optimization.", "recording_license": "", "do_not_record": false, "persons": [{"id": 203, "code": "WCFJQM", "public_name": "Philipp Schiele", "biography": "Main instructor Philipp Schiele\r\nPhilipp Schiele's educational background is in finance and economics and he is currently pursuing a PhD in financial econometrics at the Ludwig Maximilian University of Munich, where he taught various courses in statistics. He is a CVXPY maintainer and has presented a tutorial at SciPy 2022. Generally, he is enthusiastic about finance, optimization, and technology, especially open-source projects.", "answers": []}, {"id": 231, "code": "Q7W8MW", "public_name": "Eric Sager Luxenberg", "biography": "Eric Luxenberg is a PhD candidate in the Electrical Engineering department at Stanford University, advised by Stephen Boyd. His research interests include robust optimization and mathematical finance. He is a contributor to CVXPY, and has developed an open-source package for saddle optimization called DSP. He has also served as the primary instructor of Stanford\u2019s convex optimization course.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 73, "guid": "ee81d3f0-381d-56ce-bed0-2f2a2ece5740", "logo": "", "date": "2023-07-12T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-73-emukit-python-toolkit-for-uncertainty-quantification", "url": "https://cfp.scipy.org/2023/talk/RJPMGC/", "title": "Emukit: Python toolkit for uncertainty quantification", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Emukit is an open-source package for uncertainty quantification in Python. It provides various Bayesian methods, such as optimization, experimental design and quadrature, in a flexible unified way that leverages their commonalities. In the talk we will explain how and why Emukit was built, what are its strengths and weaknesses, how it is used today and in what scenarios one might find it useful.", "description": "## Description\r\nEmukit is a highly adaptable Python toolkit for enriching decision making under uncertainty. This is particularly pertinent to model complex systems where data is scarce or difficult to acquire. In these scenarios, propagating well-calibrated uncertainty estimates within a design loop or computational pipeline ensures that constrained resources are used effectively.\r\n\r\nThe main features currently available in Emukit are:\r\n* Bayesian optimisation: optimise physical experiments and tune parameters of machine learning algorithms;\r\n* Experimental design/Active learning: design the most informative experiments and perform active learning with machine learning models;\r\n* Sensitivity analysis: analyse the influence of inputs on the outputs of a given system;\r\n* Bayesian quadrature: efficiently compute the integrals of functions that are expensive to evaluate;\r\n* Multi-fidelity emulation: build surrogate models when data is obtained from multiple information sources that have different fidelity and/or cost.\r\n\r\nThe package was released in 2019, and since then gained popularity among the research communities of Bayesian optimization, Bayesian quadrature, and multi-fidelity modelling. The aim of this talk is to present Emukit to a wider audience of Python developers. It may be of interest to machine learning practitioners in need of hyper-parameter optimization methods, scientists running complex simulations and looking for emulation and UQ techniques, and everyone interested in approaches for decision making under uncertainty. Hearing about our development experience and lessons learned may also be useful to those who look to develop scientific packages in Python.\r\n\r\nThe first part of the talk will focus on technical details of the package. We will start with a brief introduction into Emukit and the methods it provides. Emukit is a replacement for GPyOpt and the reasons that prompted its development will be discussed. We will go over the key software design principles of Emukit, and see how they lead to a flexible and adaptable toolkit, but also how they may hinder the computational efficiency. Other popular frameworks for Bayesian optimization, Trieste and BoTorch, will be used to highlight strengths and weaknesses of Emukit.\r\n\r\nThe second part will focus on usage and adoption. We will talk about target audience of the toolkit, existing uses for teaching and research, and discuss why anyone who is not an expert in Bayesian active learning methods would want to use Emukit.\r\n\r\n## Additional materials\r\nEmukit is available on Github: https://github.com/EmuKit/emukit. There is also a website about the package: https://emukit.github.io/.\r\n\r\nEmukit was first presented at NeurIPS workshop on ML and the Physical Sciences, 2019. Corresponding paper on arXiv: https://arxiv.org/abs/2110.13293.\r\n\r\nEmukit is used for teaching ML and the Physical World course at the University of Cambridge. The course website can be found at https://mlatcl.github.io/mlphysical/.\r\n\r\nEmukit was also adopted for the Gaussian Process Summer School 2022: https://gpss.cc/gpss22/.\r\n\r\nSome of the previous talks given by the speaker can be found on his website: https://paleyes.info/#talks.", "recording_license": "", "do_not_record": false, "persons": [{"id": 89, "code": "U8RYWG", "public_name": "Andrei Paleyes", "biography": "Andrei is currently pursuing PhD at the University of Cambridge. His research interests are somewhere between machine learning and software systems, leaning towards the latter. He also has keen interest in Bayesian optimization and is actively participating in several open source projects. Before jumping into the world of academia he has spent more than a decade as a software engineer, developing everything from small webapps to data center network software.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 157, "guid": "88d3534d-477e-5e74-bf9c-4deb70121439", "logo": "", "date": "2023-07-12T16:05:00-05:00", "start": "16:05", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-157-bayesian-statistics-with-python-no-resampling-necessary", "url": "https://cfp.scipy.org/2023/talk/TVRHYS/", "title": "Bayesian Statistics with Python, No Resampling Necessary", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "TensorFlow Probability is a powerful library for statistical analysis in Python.  Using TensorFlow Probability\u2019s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results. Resampling methods like Markov Chain Monte Carlo can also be used to perform Bayesian analysis. As an alternative, we show how to use numerical optimization to estimate model parameters, and then show how numerical differentiation can be used to get a quantified degree of belief. How to perform simulation in Python to corroborate our results is also demonstrated.", "description": "This talk is a concise update of a talk delivered previously for PyStan, the Python Interface for STAN, which is software for Bayesian inference.  Now we will focus on the TensorFlow Probability library.\r\nHere are links for the previous talk:\r\nhttps://github.com/c22hatal/bayes_confidence/tree/main/meetup11aug21/meetup11aug21\r\nhttps://www.youtube.com/watch?v=-7l5QTq5Hz0&list=PLhbPZ4oC18muuVdH3pjpjGmHkJqxCldYR&index=11&t=1073s\r\n\r\nWe first briefly review the Bayesian concepts of prior and posterior and elaborate on how the posterior distribution of the parameters can be approximated by a normal distribution with large sample sizes.  This is the key theoretical point of the talk and is discussed in section 4.1 of Bayesian Data Analysis [1].   Through the talk, we will corroborate the proof by using resampling methods.  We show that the normal approximation and resampling methods are equivalent with large data using TensorFlow Probability.  After the talk, users can confidently use TensorFlow Probability and SciPy/NumPy to perform Bayesian analysis without resampling if their samples are sufficiently large.   \r\n\r\nAfter the theoretical discussion, we get into how the posterior distribution can be modeled using TensorFlow Probability\u2019s distribution classes.  I will show how you can sample from the distributions and calculate the posterior log probability density.  \r\nWe will focus on a linear regression setting where the Normal and \ud835\udf122 distributions will be used as priors for the slope and intercept parameters.  \r\nhttps://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Chi2\r\nhttps://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Normal\r\nhttps://www.tensorflow.org/probability/api_docs/python/tfp/distributions/JointDistributionNamed\r\n\r\nThen I show how the posterior modes can be estimated using TensorFlow or SciPy optimization.  The Broyden\u2013Fletcher\u2013Goldfarb\u2013Shanno algorithm (BFGS) will be used.  This method doesn\u2019t calculate the full Hessian, the second and cross derivatives of the log posterior function.  \r\nhttps://www.tensorflow.org/probability/api_docs/python/tfp/optimizer/lbfgs_minimize\r\nhttps://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html\r\n\r\nThe inverse Hessian gives us the posterior variance under our approximation. So I show how you can take numeric derivatives in NumPy to obtain it.  Derivatives are taken according to the method in Numerical Recipes [2].  Vectorized computations will be used where possible\r\n\r\nThen I finally use resampling, particularly Markov Chain Monte Carlo sampling to show how well the approximation to the posterior distribution works.  This is accomplished using TensorFlow Probability functions.  I provide a framework for simulation in Python that is used to demonstrate these results as well.  \r\nhttps://www.tensorflow.org/probability/api_docs/python/tfp/mcmc\r\n\r\nReferences\r\n\r\n1. Bayesian Data Analysis (3rd. ed.). A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari and D. B. Rubin, 2013 Boca Raton, Chapman and Hall\u2013CRC\r\n2. Numerical Recipes: The Art of Scientific Computing (3rd. ed.). W. H. Press, S. A. Teukolsky, W T. Vetterling, and Brian P. Flannery. 2007. Cambridge University Press, USA.", "recording_license": "", "do_not_record": false, "persons": [{"id": 184, "code": "HLS9JM", "public_name": "Charles D Lindsey", "biography": "Charles Lindsey is a Principal Data Scientist at Revionics.  Charles earned a PhD in Statistics from Texas A&M in 2010, where he researched dimension reduction and classification.  Charles then worked at StataCorp LLC. At StataCorp, Charles was the lead developer of the Extended Regression Model (ERM) commands, which allow causal inference on observational data with common complications like unobserved confounding variables and sample selection.  At Revionics, Charles works on price optimization and sales forecasting using Bayesian methods and other machine learning techniques.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 295, "guid": "37f84e2d-9815-5131-84ee-903fd750b8c0", "logo": "", "date": "2023-07-12T17:00:00-05:00", "start": "17:00", "duration": "01:00", "room": "Zlotnik Ballroom", "slug": "2023-295-lightning-talks", "url": "https://cfp.scipy.org/2023/talk/NUT798/", "title": "Lightning Talks", "subtitle": "", "track": "Lightning Talks", "type": "Talk", "language": "en", "abstract": "Lightning talks are 5-minute talks on any topic of interest for the SciPy community. We encourage spontaneous and prepared talks from everyone, but we can\u2019t guarantee spots. Sign ups are at the NumFOCUS booth during the conference.", "description": "", "recording_license": "", "do_not_record": false, "persons": [], "links": [], "attachments": [], "answers": []}, {"id": 300, "guid": "1e1ca872-86f0-50b6-9c44-3e9433517447", "logo": "", "date": "2023-07-12T18:00:00-05:00", "start": "18:00", "duration": "01:00", "room": "Zlotnik Ballroom", "slug": "2023-300-poster-session-and-job-fair", "url": "https://cfp.scipy.org/2023/talk/VUD8HM/", "title": "Poster Session and Job Fair", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "The Poster session will be in the Zlotnik Ballroom from 6:00-7:00pm. \r\n\r\nThe Job Fair will be held concurrently in the Zlotnik foyer with participating sponsors. Sponsor companies will be available to discuss current job opportunities.", "description": "POSTERS: Title - Authors (Track)\r\n\r\n1. RECOIL - Ronchi Evaluator and Classifier of Imperfect Lenses (RECOIL) - Allen S. Harvey Jr., Clare Egan (Astronomy and Physics)\r\n2. Planetary Defense Using Python: Measuring Deflection of the Didymos Binary Asteroid System by the NASA DART Mission - Arushi Nath (Astronomy and Physics)\r\n3. pyro: a python hydrodynamics code for teaching and prototyping - Michael Zingale (Astronomy and Physics)\r\n4. Accessing astronomical data with Python - Brigitta Sip\u0151cz (Astronomy and Physics)\r\n5. Spatial and Single-Cell Analysis of MERFISH Data using the Python Library Cormerant - Nicolas Fernandez (Bioinformatics, Computational Biology & Neuroscience)\r\n6. Cross-language Data Grammar for Single-cell Feature Engineering - Dave Bunten (Bioinformatics, Computational Biology & Neuroscience)\r\n7. Biomolecular crystallographic computing with Jupyter - Blaine Mooers (Bioinformatics, Computational Biology & Neuroscience)\r\n8. MDAKits: A Framework for FAIR-Compliant Molecular Simulation Analysis - Ian Kenney (Bioinformatics, Computational Biology & Neuroscience)\r\n9. Obtain quantitative insights through image registration in python - Matt McCormick, Konstantinos Ntatsis (Bioinformatics, Computational Biology & Neuroscience)\r\n10. EEG-to-fMRI: Neuroimaging Cross Modal Synthesis in Python - David Calhas (Bioinformatics, Computational Biology & Neuroscience)\r\n11. Matchmaker: A Toolkit for Collocating and Combining Satellite-Based Earth Observations - Greg Quinn (Earth, Ocean, Geo, and Atmospheric)\r\n12. Building geospatial workflows for Impact using Leafmap, SageMaker Studio Lab, and Open Data on AWS - Qiusheng Wu, Mike Jeffe (Earth, Ocean, Geo, and Atmospheric)\r\n13. Operational Open Science and Software for the Planet's Largest Climate Observatory - Zachary Sherman (Earth, Ocean, Geo, and Atmospheric)\r\n14. Moving the Earth with thermodynamics and python - Cian Wilson (Earth, Ocean, Geo, and Atmospheric)\r\n15. Bringing automated data analysis and machine learning pipelines directly to end users using Unidata tools - Thomas Martin, Hailey Johnson, Drew Camron (Earth, Ocean, Geo, and Atmospheric)\r\n16. Yori: A New, Highly Customizable Tool for Level-3 Data Production - Paolo Veglio (Earth, Ocean, Geo, and Atmospheric)\r\n17. Intuitive Statistics in SciPy - Matt Haberland, Albert Steppi (General Track)\r\n18. Using MyST Markdown in JupyterLab - Rowan Cockett (General Track)\r\n19. PyVista: A Python Library for Interactive 3D Data Visualization and Analysis - Tetsuo Koyama (General Track)\r\n20. SOSA: The Scalable Open-Source Analysis Stack - James A. Bednar, Martin Durant (General Track)\r\n21. Sensitivity Analysis in Python: `scipy.stats.sobol_indices` - Pamphile Roy (General Track)\r\n22. Improving the SciPy-CuPy compatibility for interpolation and signal processing - Edgar Andr\u00e9s Margffoy Tuay (General Track)\r\n23. aPhyloGeo-Covid: A Web Interface for Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake - Nadia Tahiri, Wanlin Li (Machine Learning, Data Science, and Ethics in AI)\r\n24. Quantifying Uncertainty in Time Series Forecasting with Conformal Prediction - Fede Garza Ramirez (Machine Learning, Data Science, and Ethics in AI)\r\n25. Anti-Patterns: How not to do things in Python - Gajendra Deshpande (Machine Learning, Data Science, and Ethics in AI)\r\n26. pomegranate v1.0.0: now with PyTorch - Jacob Schreiber (Machine Learning, Data Science, and Ethics in AI)\r\n27. Data engineering and analytics for photolithography manufacturing process at DuPont \u201a\u00c4\u00ec a practical approach from lab to fab - Avishek Panigrahi, Sumanth S, Abhishek Shrivastava, stefan caporale (Machine Learning, Data Science, and Ethics in AI)\r\n28. Stochastic Unitary Constraints - Victoria Schneider, Sara Logsdon, Delaney Ott (Machine Learning, Data Science, and Ethics in AI)\r\n29. Hamilton: Scalable, Portable, and Self-Documenting Dataflows in Python - Elijah ben izzy, Stefan Krawczyk (Machine Learning, Data Science, and Ethics in AI)\r\n30. Teaching machine learning in professional education - Nadia Udler (Machine Learning, Data Science, and Ethics in AI)\r\n31. Magic Data Abstractions (for  Magic\u2122 data) - Valerio Maggio (Machine Learning, Data Science, and Ethics in AI)\r\n32. Self-Supervised Cilia Segmentation - Meekail Zain, Shannon Quinn (Machine Learning, Data Science, and Ethics in AI)\r\n33. Data-centric ML pipeline for resolving data drift and optimizing data preprocessing - Hongsup Shin (Machine Learning, Data Science, and Ethics in AI)\r\n34. \"Clockwork\" detection in categorical telemetry data - Benoit Hamelin (Machine Learning, Data Science, and Ethics in AI)\r\n35. Intro to Quantum Computing for Drug Design - Maurice Benson (Machine Learning, Data Science, and Ethics in AI)\r\n36. PyQtGraph - High Performance Visualization for All Platforms - Nathan Jessurun (Machine Learning, Data Science, and Ethics in AI)\r\n37. Accelerating Drug Discovery on the Cloud with Open Source Python - Nathan Knapp (Materials and Chemistry)\r\n38. Modeling Multiphase Multicomponent Precipitate Growth with Phase-Field and Python - Trevor Keller (they/them) (Materials and Chemistry)\r\n39. Materials Project: building an open-source, data-driven platform for materials science - Ruoxi Yang (Materials and Chemistry)\r\n40. Rozha: Supporting and Simplifying Multilingual Natural Language Processing - Ian Goodale (Social Science and the Digital Humanities)\r\n41. Spatial Microsimulation & Activity Allocation in Python: An Update on the Likeness Toolkit - James Gaboardi, Joe Tuccillo (Social Science and the Digital Humanities)\r\n42. Python meta packages - Jorge Martinez, Roberto Pastor (Tending Your Open Source Garden: Maintenance and Community)\r\n43. quartodoc: a tool for quick and easy package documentation - Michael Chow (Tending Your Open Source Garden: Maintenance and Community)\r\n44. TUG-RSE: Pulling Students into Research Software Engineering - Aman Goel (Tending Your Open Source Garden: Maintenance and Community)\r\n45. CI/CD pipelines for scientists - Jorge Martinez (Tending Your Open Source Garden: Maintenance and Community)\r\n46. First steps toward supercharging remote development with Spyder - Carlos Cordoba (Tending Your Open Source Garden: Maintenance and Community)\r\n47. Chalk'it : dataflow and drag-and-drop Python dashboarding - Mongi Ben Gaid (Tending Your Open Source Garden: Maintenance and Community)\r\n48. Accessible documentation for everyone - Jorge Martinez, Revathy Venugopal (Tending Your Open Source Garden: Maintenance and Community)\r\n49. Patterns and Anti-Patterns when Measuring Diversity in Open Source - Amanda Casari (Tending Your Open Source Garden: Maintenance and Community)", "recording_license": "", "do_not_record": false, "persons": [], "links": [], "attachments": [], "answers": []}, {"id": 302, "guid": "16380343-ff49-5caa-9469-d16479bd6208", "logo": "", "date": "2023-07-12T19:00:00-05:00", "start": "19:00", "duration": "02:00", "room": "Zlotnik Ballroom", "slug": "2023-302-scipy-attendee-social-event-hosted-by-open-source-science-ossci-", "url": "https://cfp.scipy.org/2023/talk/3NH3L8/", "title": "SciPy Attendee Social Event hosted by Open Source Science (OSSci)", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "At Scholz Garten, 1607 San Jacinto Blvd. Join your fellow community members from 7:00-9:00. Walking distance from AT&T Center. Venue, food, and drinks sponsored by OSSci.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 503, "code": "JZMZAF", "public_name": "Scholz Garten, 1607 San Jacinto Blvd", "biography": "https://www.scholzgarten.com/", "answers": []}], "links": [], "attachments": [], "answers": []}], "Amphitheater 204": [{"id": 18, "guid": "13530c00-535c-5928-8bed-51dab5879e3b", "logo": "", "date": "2023-07-12T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-18-out-performing-numpy-is-hard-when-and-how-to-try-with-your-own-c-extensions", "url": "https://cfp.scipy.org/2023/talk/DF8PVV/", "title": "Out-Performing NumPy is Hard: When and How to Try with Your Own C-Extensions", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "While the NumPy C API lets developers write C that builds or evaluates arrays, just writing C is often not enough to outperform NumPy. NumPy's usage of Single Instruction Multiple Data routines, as well as multi-source compiling, provide optimizations that are impossible to beat with simple C. This presentation offers principles to help determine if an array-processing routine, implemented as a C-extension, might outperform NumPy called from Python. A C-extension implementing a narrow use case of the ``np.nonzero()`` routine will be studied as an example.", "description": "While it is well known that C-extensions can improve the performance of Python programs, writing C-extensions that improve the performance of NumPy array operations is different. Many NumPy functions employ highly optimized C routines, some of which take advantage of low-level processor optimizations. In most cases, just writing Python that calls NumPy is faster than a custom C extension. However, for routines that are sufficiently narrow in scope, there are opportunities for optimization.\r\n\r\nThis presentation offers principles to help determine if a routine, implemented as a C-extension, might outperform related NumPy routines called from Python. Along the way, Python project setup, and the basics of the NumPy C API, will be introduced.\r\n\r\nA narrow use-case of the ``np.nonzero()`` function will be implemented in C as an example: rather than returning all indices of all non-zero values for all dtypes and dimensionalities (as ``np.nonzero()`` does), this new function, ``first_true_1d()``, will return only the index of the first-encountered non-zero value for one-dimensional Boolean arrays. The performance of this far simpler routine, and why it sometimes cannot out-perform ``np.nonzero()``, will be examined.", "recording_license": "", "do_not_record": false, "persons": [{"id": 30, "code": "Z7STMX", "public_name": "Christopher Ariza", "biography": "Christopher Ariza is Partner and Chief Technology Officer at Research Affiliates, a global leader in investment strategies and research. He is the creator and lead developer of StaticFrame, an alternative DataFrame library built on an immutable data model. Having worked in Python for over 20 years, he has developed tools in a variety of domains, including algorithmic music composition and computer-aided musicology, and has spoken at numerous conferences, including PyCon USA, PyData Global, PyData Los Angeles, and numerous other venues.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 29, "guid": "786f93f9-478d-585c-b84e-c38d2d6206a8", "logo": "", "date": "2023-07-12T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-29-can-there-be-too-much-parallelism-", "url": "https://cfp.scipy.org/2023/talk/VUFGS8/", "title": "Can There Be Too Much Parallelism?", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "Numerical Python libraries can run computations on many CPU cores with various parallel interfaces. When we simultaneously use multiple levels of parallelism, it may result in oversubscription and degraded performance. This talk explores the programming interfaces used to control parallelism exposed by libraries such as NumPy, SciPy, and scikit-learn. We will learn about parallel primitives used in these libraries, such as OpenMP and Python's multiprocessing module. We will see how to control parallelism in these libraries to avoid oversubscription. Finally, we will look at the overall landscape for configuring parallelism and highlight paths for improving the user experience.", "description": "Numerical Python libraries such as NumPy, SciPy, and PyTorch can run computations on multiple CPU cores. These libraries expose a wide range of programming interfaces to control parallelism. These interfaces include environment variables, library-specific APIs, and context managers such as threadpoolctl. While reviewing the interfaces for controlling parallelism, we will learn about the many parallel primitives used in these libraries. We will cover lower-level primitives such as pthreads or OpenMP and higher-level primitives such as Python's multithreading and multiprocessing modules. Libraries that require lower-level parallel primitives need to go through a compilation step with languages and tools such as Numba, Cython, C++, or Rust. When we use multiple forms of parallelism, controlling how many cores your program uses is essential to prevent oversubscription. We will learn how libraries such as Dask, Ray, and scikit-learn handles mix their parallelism with user-provided parallel routines. Finally, we will zoom out to see the overall landscape for controlling parallelism and highlight possible paths to improve the user and developer experience. This is an intermediate talk for software and machine learning engineers that want to understand and configure parallelism in the PyData stack.", "recording_license": "", "do_not_record": false, "persons": [{"id": 42, "code": "VQNJX7", "public_name": "Thomas J. Fan", "biography": "Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Master's in Mathematics from NYU and a Master's in Physics from Stony Brook University.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 249, "guid": "d913e189-19a3-597c-8483-a041bb8f0067", "logo": "/media/2023/submissions/3NBFHV/Scientific-Python-min_NTmaUXt.png", "date": "2023-07-12T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-249-scientific-python-from-init-to-call-", "url": "https://cfp.scipy.org/2023/talk/3NBFHV/", "title": "Scientific Python: from `__init__` to `__call__`", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "The Scientific Python project aims to better coordinate the ecosystem and grow the community. Come hear about our recent progress and our plans for the coming year!", "description": "The Scientific Python project's vision is to help pave the way toward a vibrant, unified, and collaborative scientific Python community.\r\nIt focuses its efforts along two primary axes: _(i)_ to create a joint community around scientific Python projects\r\nand _(ii)_ to support maintainers by building cross-cutting technical infrastructure and tools.\r\n\r\nLast year we launched the project with new websites, a Hugo web theme, a social media campaign, and a collaborative coordination process similar to PEPs called SPECs.\r\nThis year, we are fortunate to have received [funding from CZI](https://scientific-python.org/grants/community_and_communications_infrastructure/) for the continued development, maintenance, and support of web and documentation themes, as well as other community infrastructure, in collaboration with Quansight.\r\nWith the community and communication infrastructure having support for the next few years, we are able to focus more on technical topics and the SPECs.\r\n\r\nAs a first project, we are [funded to work on improving sparse *array*](https://scientific-python.org/grants/sparse_arrays) (vs matrix) semantics in SciPy with the goal of removing sparse *matrices* and, eventually, also NumPy *matrices* from several ecosystem libraries. In line with our philosophy of continually working with the community and incorporating their feedback, we hosted the first of several [Sparse Summits](https://scientific-python.org/summits/sparse/)\u2014virtual meetings to identify sparse array needs in ecosystem libraries.\r\nThis project spans multiple core projects, including numpy, scipy, scikit-image, networkx, scikit-learn, and many of the packages built on top of these libraries.\r\n\r\nIn addition to the sparse summit, we have hosted a [domain stack summit](https://scientific-python.org/summits/domain-stacks/), to discuss domain-specific umbrella projects that host several others, as well as the first [annual developer summit](https://scientific-python.org/summits/developer/).\r\nThis in-person workshop brought together over 30 community members for a week-long, collaborative sprint, and tackled topics including build & testing systems, continuous integration infrastructure, release management tools, and community management.\r\n\r\nFinally, we will update the community on our progress on the [decadal plan](https://scientific-python.org/grants/planning_next_decade/).\r\n\r\nOur efforts thus far have already culminated in joint efforts to develop tools and shared infrastructure that will positively impact the whole ecosystem.\r\nAnd, while there is still a long road ahead, we are excited to continue preparing the ecosystem for the next decade of scientific computing in Python.", "recording_license": "", "do_not_record": false, "persons": [{"id": 271, "code": "SE7SNC", "public_name": "Juanita Gomez", "biography": "Juanita Gomez is passionate programmer, mathematician and open source advocate; former developer of Spyder IDE at Quansight. She has a BS in Pure Mathematics from Pontificia Universidad Javeriana in Colombia and is currently pursuing a Ph.D position in Computer Science at UC Santa Cruz. She is a community manager for the Scientific Python project, a community effort to better coordinate and support scientific Python libraries.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 61, "guid": "4236b283-f464-5b8d-87a7-f588de236651", "logo": "", "date": "2023-07-12T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-61-beyond-bits-qubits-effective-open-source-community-management-in-quantum-computing", "url": "https://cfp.scipy.org/2023/talk/KXWZJY/", "title": "Beyond Bits & Qubits: Effective Open Source Community Management in Quantum Computing", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "Qiskit is an open-source SDK for quantum computers, enabling developers to work with these powerful machines using a familiar python interface. First released in 2017, Qiskit has become the most popular package for quantum computing (Unitary Fund, 2022), with a thriving open-source community. As Qiskit has grown and changed, so has our approach to nurturing our community. This talk will share important lessons we\u2019ve learnt over the years, including practical tips you can apply to your own projects. Whether you\u2019re just starting in open-source or already manage an established community, this talk is for you!", "description": "basic outline of proposed talk:\r\n\r\n### 1. Context\r\nThis section will provide a brief introduction to Qiskit (https://qiskit.org) as an open-source package and some of the challenges we\u2019ve faced in maintaining and growing our community.  \r\n\r\n### 2. The Academic Element\r\nOne of the unique aspects of maintaining an open-source project in a scientific field is the closer relationship to academia compared to other open-source software. This can pose unique challenges, as researchers often have different goals, mindsets and working culture when it comes to publishing code, which doesn\u2019t always work well with traditional open-source ways of working. We continually face these conflicts in Qiskit, so in this section we will talk through some of the effective ways we\u2019ve found to address these differences through education and the development of of the Qiskit Ecosystem (https://qiskit.org/ecosystem).  \r\n\r\n### 3. Clearly Defined Spaces\r\nDefining the mechanisms for *how* different members of the community interact is a subtle yet crucial aspect of community management that requires careful planning. Whether it\u2019s clearly defined issue templates, organised discussion forums, or actual events, having clearly defined spaces can help contributors and maintainers work together more effectively. So this section will demonstrate specific strategies we\u2019ve used in Qiskit and the underlying principles that make them effective.\r\n\r\n### 4. Be a Kind Human\r\nThis section will focus on the incredibly important aspect of fostering a welcoming culture within your open-source community. We will touch on the importance of a code of conduct, contributing guidelines, issue tagging, using empathetic and accessible language, and other general tips for making the whole contribution experience inclusive.  \r\n\r\n### 5. Metrics and Automation\r\nThis section will focus on how to use automations to streamline your contributor experience and collect valuable data along the way. From bots to actions to built-in GitHub features there are a ton of options to choose from, so we\u2019ll highlight the ones we\u2019ve found the most useful and the important insights we\u2019ve gained as a result.  \r\n\r\n### 6. Development meets DevRel\r\nEffective community management requires significant time investment, which can take a toll on project maintainers. This section will make the case for working closely with Developer Relations experts (perhaps even hiring one if you haven\u2019t already!) to offload some of that burden. Developer Advocates are highly specialised in communication for a developer audience, and can become valuable assets when brought into an open-source team.  \r\n\r\n### 7. The Community Management Graveyard\r\nTo wrap things up, this section will cover ideas that we have tried and failed during our community management journey in Qiskit. Things that started out with the best intentions that just didn\u2019t work out and what we learned from the process. The tone of this section will demonstrate how experimenting is an important part of the process of finding a community management setup that works for you, and that trying and failing in public is what open source is all about.", "recording_license": "", "do_not_record": false, "persons": [{"id": 77, "code": "YA9JNA", "public_name": "Abby Mitchell", "biography": "Abby joined IBM in 2019 as a full stack web developer before moving to IBM Quantum as a Developer Advocate in March 2021. She is currently working on Qiskit, an Open Source SDK for Quantum Computers. As a primarily self taught developer and quantum enthusiast she is passionate about encouraging people from any background to pursue their interest in technology.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 163, "guid": "e19bb1f5-d0be-5dc3-8ea4-db1ac51e7b41", "logo": "", "date": "2023-07-12T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-163-thar-be-dragons-ethical-legal-and-policy-challenges-when-measuring-open-source", "url": "https://cfp.scipy.org/2023/talk/QUNAY9/", "title": "Thar Be Dragons - Ethical, Legal, and Policy Challenges when Measuring Open Source", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "Open source researchers are increasingly challenged while navigating the data which open source communities inherently create when working in the open. While mining software repositories for insights into open source practices isn't new, moving beyond code analysis into ecosystems-level research does not have a clear path. This talk will outline the current ethical, legal, and policy challenges community leaders, as well as researchers in academia and industry face and the ambiguous areas decision makers should be aware of.", "description": "Challenges to outline can include:\r\n\r\n__Ethical__\r\n- Academia - quantitative + qualitative open source data is not (usually) subject to IRB\r\n- Does anti-aliasing across datasets potentially create opportunities for harm for members of open source communities?\r\n\r\n__Legal__\r\n- When does information become a dataset?\r\n- Can I use this data? Which license for what?\r\n\r\n__Policy__\r\n- Can umbrella foundations \"opt-in\" communities and projects into ecosystem scale research?\r\n- How can communities and projects create clear boundaries about how and where they want the \"data exhaust\" they release to be used?", "recording_license": "", "do_not_record": false, "persons": [{"id": 193, "code": "THSSLK", "public_name": "amanda casari", "biography": "amanda casari is a developer relations engineer in the Open Source Programs Office at Google, where she is co-leading research and engineering to better understand risk and resilience in open source ecosystems. She was named an External Faculty member of the Vermont Complex Systems Center in 2021. amanda is persistently fascinated by the difference between the systems we aim to create and the ones that emerge, and pie.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 180, "guid": "10791be1-e3d1-5463-b69e-9af135883b65", "logo": "", "date": "2023-07-12T15:25:00-05:00", "start": "15:25", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-180-in-process-analytical-data-management-with-duckdb", "url": "https://cfp.scipy.org/2023/talk/S8NSHT/", "title": "In-Process Analytical Data Management with DuckDB", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "DuckDB is a novel analytical data management system. DuckDB supports complex queries, has no external dependencies, and is deeply integrated into the Python ecosystem. Because DuckDB runs in the same process, no serialization or socket communication has to occur, making data transfer virtually instantaneous. For example, DuckDB can directly query Pandas data frames faster than Pandas itself. In our talk, we will describe the user values of DuckDB, and how it can be used to improve their day-to-day lives through automatic parallelization, efficient operators and out-of-core operations.", "description": "Data management systems and data analysts have a troubled relationship: Common systems such as Postgres or Spark are unwieldy, hard to set up and maintain, hard to transfer data in and out, and hard to integrate into complex end-to-end workflows. As a response, analysts have developed their own ecosystem of data wrangling tools such as Pandas or Polars. These tools are much more natural for analysts to use, but are limited in the amount of data they can process or the amount of automatic optimization that is supported. \r\n\r\nDuckDB is a new analytical data management system that is built for an in-process use case. DuckDB speaks SQL, has no external dependencies, and is deeply integrated into the Python ecosystem. DuckDB is Free and Open Source software under the MIT license. DuckDB uses state-of-the art query processing techniques with vectorized execution, lightweight compression, and morsel-driven automatic parallelism. DuckDB is out-of-core capable, meaning that it is capable of not only reading datasets that are bigger than main memory. This allows for analysis of far greater datasets and in many cases removes the need to run separate infrastructure. \r\n\r\nThe \u201cduckdb\u201d Python package is not a client to the DuckDB system, it provides the entire database engine. DuckDB runs without any external server directly inside the Python process. Once there, DuckDB can run complex SQL queries on data frames in Pandas, Polars or PyArrow formats out-of-the box. DuckDB can also directly ingest files in Parquet, CSV or JSON formats. Because DuckDB runs in the same process, data transfer are virtually instantaneous. Conversely, DuckDB\u2019s query results can be transferred back into data frames very cheaply, allowing direct integration with complex downstream libraries such as PyTorch or TensorFlow. \r\n\r\nDuckDB enjoys fast-growing popularity, the Python package alone is currently downloaded around one million times a month. DuckDB has recently become the default backend of the Ibis project that offers a consistent interface in Python over a variety of data backends. \r\n\r\nThis talk is aimed at two main groups, data analysts and data engineers. For the analysts, we will explain the user values of DuckDB, and how it can be used to improve their day-to-day lives. For data engineers, we will describe DuckDB\u2019s capabilities to become part of large automated data pipelines. The presenters for the proposed talk, Hannes M\u00fchleisen and Mark Raasveldt are the original creators of DuckDB, they are still leading the project and are deeply familiar with its Python integration.\r\n\r\n- DuckDB Python API Overview: https://duckdb.org/docs/api/python/overview\r\n- DuckDB PyPI Download Statistics: https://pypistats.org/packages/duckdb\r\n- DuckDB Ibis Backend: https://ibis-project.org/backends/DuckDB/\r\n- Peer-reviewed paper about the concept behind DuckDB by the presenters\r\nhttps://www.cidrdb.org/cidr2020/papers/p23-raasveldt-cidr20.pdf\r\n- Talk about DuckDB at FOSDEM 2020 by Hannes: https://archive.fosdem.org/2020/schedule/event/duckdb/\r\n- Talk about DuckDB at CMU by Mark:\r\nhttps://www.youtube.com/watch?v=PFUZlNQIndo", "recording_license": "", "do_not_record": false, "persons": [{"id": 209, "code": "Z3Q9EC", "public_name": "Hannes M\u00fchleisen", "biography": "Prof. Dr. Hannes M\u00fchleisen is a creator of the DuckDB database management system and Co-founder and CEO of DuckDB Labs, a consulting company providing services around DuckDB. He is also a senior researcher of the Database Architectures group at the Centrum Wiskunde & Informatica (CWI), the Dutch national research lab for Mathematics and Computer Science in Amsterdam. Hannes is also Professor of Data Engineering at Radboud Universiteit Nijmegen. His' main interest is analytical data management systems.", "answers": []}, {"id": 210, "code": "S8W3MD", "public_name": "Mark Raasveldt", "biography": null, "answers": []}, {"id": 493, "code": "X7HEAV", "public_name": "Alex Monahan", "biography": "Hello, I'm Alex! I am a forward deployed software engineer at MotherDuck and I write blogs and docs for the DuckDB Foundation. My background is Industrial and Systems Engineering from Virginia Tech, but I've decided I prefer working in data! I recently joined MotherDuck after 9 years at Intel. I started at Intel as an industrial engineer, later became a technical analyst, and then jumped into a data scientist role. Back in 2020 I discovered DuckDB while building an internal self service analytics platform. It was such a perfect fit that we quickly integrated it and I began using it in multiple projects. I also became one of DuckDB's biggest Twitter fans! I have been diving deeper into duck-themed databases ever since.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 218, "guid": "82d5b789-b8fd-5f37-ab7b-b002ea401735", "logo": "", "date": "2023-07-12T16:05:00-05:00", "start": "16:05", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-218-graphblas-for-sparse-data-and-graphs", "url": "https://cfp.scipy.org/2023/talk/YESNZB/", "title": "GraphBLAS for Sparse Data and Graphs", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "GraphBLAS solves graph problems using sparse linear algebra. We are using it to build [`graphblas-algorithms`](https://github.com/python-graphblas/graphblas-algorithms), a fast backend to NetworkX. [`python-graphblas`](https://github.com/python-graphblas/python-graphblas/) is faster and more capable than `scipy.sparse` for both graph algorithms and sparse operations. If you have sparse data or graph workloads that you want to scale and make faster, then this is for you. Come learn what makes GraphBLAS special--and fast!--and how to use it effectively.", "description": "Sparse data and graph problems appear in virtually all science and engineering disciplines.  Nevertheless, adoption of sparse and graph techniques has been slow (so opportunity to exploit sparsity are plentiful)--perhaps because it's not always obvious to know when to apply them, or existing libraries are too slow or difficult to use.  GraphBLAS can help. By expressing graph algorithms in the language of linear algebra, it can handle larger data in parallel and is versatile enough to express custom analyses and integrate into larger workflows.\r\n\r\nIn this talk, we will cover:\r\n\r\n- How to recognize sparse or graph problems and when to use GraphBLAS\r\n- Representing a graph as a sparse matrix\r\n- The equivalence between graph problems and sparse problems\r\n- How GraphBLAS extends linear algebra with masking and arbitrary semirings to be more capable and work-efficient than `scipy.sparse`\r\n- The underlying sparse data structures and how to efficiently convert to and from them\r\n- Examples of graph algorithms written in [`python-graphblas`](https://github.com/python-graphblas/python-graphblas/)\r\n- Using GraphBLAS as a backend to NetworkX via dispatching to [`graphblas-algorithms`](https://github.com/python-graphblas/graphblas-algorithms)\r\n- Benchmarks comparing GraphBLAS, NetworkX, and `scipy.sparse`\r\n\r\nCome learn what makes GraphBLAS special--and fast!--and how to use it effectively.", "recording_license": "", "do_not_record": false, "persons": [{"id": 140, "code": "VTSPJM", "public_name": "Erik Welch", "biography": null, "answers": []}, {"id": 142, "code": "9VEPFU", "public_name": "Jim Kitchen", "biography": "I am a Senior Software Engineer at Anaconda, focused on graph analytics and sparse data. I am a member of the GraphBLAS C API committee and an author of the [python-graphblas](https://python-graphblas.readthedocs.io/en/latest/) library.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Grand Salon C": [{"id": 103, "guid": "067ba481-6669-5829-9691-ab583f7f1282", "logo": "/media/2023/submissions/333PY7/vak-logo-primary_ewQYbmJ.png", "date": "2023-07-12T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-103-vak-a-neural-network-framework-for-researchers-studying-animal-acoustic-communication", "url": "https://cfp.scipy.org/2023/talk/333PY7/", "title": "vak: a neural network framework for researchers studying animal acoustic communication", "subtitle": "", "track": "Bioinformatics, Computational Biology & Neuroscience", "type": "Talk", "language": "en", "abstract": "Research on animal acoustic communication is being revolutionized by deep learning. In this talk we present vak, a framework that allows researchers in this area to easily benchmark deep neural network models and apply them to their own data. We'll demonstrate how research groups are using vak through examples with TweetyNet, a model that automates annotation of birdsong by segmenting spectrograms. Then we'll show how adopting Lightning as a backend in version 1.0 has allowed us to incorporate more models and features, building on the foundation we put in place with help from the scientific Python stack.", "description": "Are humans unique among animals? We speak languages, but is speech somehow like other animal behaviors, such as birdsong? Questions like these are answered by studying how animals communicate with sound. This research requires cutting edge computational methods and big team science across a wide range of disciplines, including ecology, ethology, bioacoustics, psychology, neuroscience, linguistics, and genomics. As in many other domains, this research is being revolutionized by deep learning algorithms. Deep neural network models enable answering questions that were previously impossible to address, in part because these models automate analysis of very large datasets. Within the study of animal acoustic communication, multiple models have been proposed for similar tasks, often implemented as research code with different libraries, such as Keras and Pytorch. This situation has created a real need for a framework that allows researchers to easily benchmark models and apply trained models to their own data. To address this need, we developed vak [1], a neural network framework designed for this research community, built with core libraries of the scientific Python stack such as numpy, scipy, pandas and dask. In this talk, we will show how vak makes it easy for researchers to work with neural network models through a simple command-line interface and TOML configuration files. As an example, we will demonstrate how we used vak to benchmark a neural network model, TweetyNet [2], that automates annotation of birdsong by segmenting spectrograms. Using vak allowed us to tune hyperparameters and determine the minimal amount of expensive human-annotated data we needed for accurate model performance. We will show how TweetyNet and vak made it possible to relate the complex syntax of canary song to the hidden states of neural activity in the canary brain, and how these tools are being used by other researchers in neuroscience and bioacoustics. Then we will demonstrate how in version 1.0 of vak we have significantly extended its generality, in large part by adopting the Lightning library as a backend. We will show how we are using version 1.0 of vak to reduce the segment error rate of TweetyNet, minimizing the need to clean up predictions with post-processing. In addition we'll walk through how we're using vak to compare performance of TweetyNet with other neural network architectures proposed for similar tasks. Finally we will show work in progress incorporating other families of neural network models into vak, generative and unsupervised learning algorithms for dimensionality reduction and similarity measurements. Both authors are experienced public speakers [3], and the combination of cutting edge neural network models in Python with studies of birds, their song, and the vocalizations of other charismatic animals are sure to make for an entertaining and informative talk.\r\n\r\n[1] https://github.com/vocalpy/vak\r\n[2] https://elifesciences.org/articles/63853, https://github.com/yardencsGitHub/tweetynet \r\n[3] https://nicholdav.info/talks/, https://yardencsgithub.github.io/talks/", "recording_license": "", "do_not_record": false, "persons": [{"id": 62, "code": "BDDEXE", "public_name": "David Nicholson", "biography": "Engineer with Embedded Intelligence, a research and development group in the DC area. Developer, maintainer of https://github.com/vocalpy. More at https://nicholdav.info/", "answers": []}, {"id": 399, "code": "DALEAX", "public_name": "Yarden Cohen", "biography": "Researcher of living and artificial neural systems, behavior, memory, and computation. Assistant professor at the Weizmann Institute of Science, Israel.\r\nhttps://www.weizmann.ac.il/brain-sciences/labs/cohen/", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 41, "guid": "d5683c61-5645-54c4-8556-9cf9df7e4a75", "logo": "", "date": "2023-07-12T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-41-tfmodisco-lite-an-attribution-based-motif-discovery-algorithm", "url": "https://cfp.scipy.org/2023/talk/AQ3Z3U/", "title": "tfmodisco-lite: an attribution-based motif discovery algorithm", "subtitle": "", "track": "Bioinformatics, Computational Biology & Neuroscience", "type": "Talk", "language": "en", "abstract": "An important problem in genomics is identifying the proteins that bind to DNA. Although many methods attempt to learn DNA motifs underlying protein binding as position-weight matrices (PWMs), these PWMs cannot faithfully represent real biology. For instance, a static PWM cannot describe a zinc-finger protein whose fingers can optionally include one-nucleotide spacing. TF-MoDISco is a framework for extracting motifs using attribution scores from a machine-learning model. The learned motifs and syntax overcome many of the limitations presented by PWM. I will describe the TF-MoDISco algorithm and showcase its efficient re-implementation, tfmodisco-lite.", "description": "Understanding the binding of proteins to the genome is crucial for deciphering gene expression programs across cell types. Yet, identifying where and when these proteins bind along the genome is complicated. Most proteins bind to a specific sequence of nucleotides, known as a \"motif.\" But not all proteins are this simple: zinc-finger proteins are comprised of many \"fingers\" that each bind to short 3-4 basepair motifs. While these short motifs are always found in the same order, variable spacing can be found between these short motifs, and not all are always necessary for binding. Other proteins require the presence of a co-factor to bind to their motifs. Faithfully describing the sequence determinants of protein binding, sometimes called the cis-regulatory logic, for all proteins is a challenging task.\r\n\r\nIncreasingly, people have been using machine learning to understand biology by training neural networks to take in nucleotide sequence and predict a readout of interest, e.g. ATAC-seq, ChIP-seq, CAGE, etc. One can then run a feature attribution algorithm, such as ISM or DeepLIFT, to highlight the nucleotides that drive the predicted readouts. However, summarizing these attributions into repeated patterns has thus far been a missing component of the analysis pipeline. \r\n\r\nTF-MoDISco is a framework for using attribution-weighted sequence to discover motifs. The approach differs from classic motif finding algorithms in both the input and the output. Rather than operating solely nucleotide sequence, TF-MoDISco also takes in the attributions from a machine learning model using any attribution algorithm. These attributions highlight the nucleotides involved in accurate predictions and so distinguish between driver motifs and passenger motifs. At the end of the procedure, TF-MoDISco returns clusters of \"seqlets,\" or found motif hits. These patterns, aligned to each other to account for spacing, represent the true heterogeneity of protein binding to the genome. By returning clustering of seqlets, TF-MoDISco overcomes many of the problems of position-weight matrices (PWMs), such as the inability to account for variable spacing and linear assumption across nucleotides.\r\n\r\nThis talk will describe the TF-MoDISco procedure at a high level (first 15 minutes) and give a tutorial on how to use the code for discovery in practice (second 15 minutes). Examples will come from models used to predict chromatin accessibility via ATAC-seq as well as protein binding via ChIP-seq readouts. Specifically, the tutorial will cover tfmodisco-lite, a rewrite of the original algorithm that scales significantly better, runs faster, and requires less code. By the end of the talk, one should feel comfortable applying the method to their own data and interpreting the reports that are generated.", "recording_license": "", "do_not_record": false, "persons": [{"id": 53, "code": "BXXPAY", "public_name": "Jacob Schreiber", "biography": "Jacob Schreiber is a post-doctoral researcher at Stanford University, where he studies human genomics using modern machine-learning tools. In his \"free time,\" he contributes to the Python data science ecosystem in the form of pomegranate, a package for probabilistic modeling, and apricot, a package for submodular optimization for summarizing large data. In the past, he was a core developer for scikit-learn.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 169, "guid": "44b261b1-f943-580e-baa9-5580b6d70c56", "logo": "", "date": "2023-07-12T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-169-gammapy-a-python-package-for-gamma-ray-astronomy-version-v1-0", "url": "https://cfp.scipy.org/2023/talk/RSE89M/", "title": "Gammapy: a Python Package for Gamma-Ray Astronomy Version v1.0", "subtitle": "", "track": "Astronomy and Physics", "type": "Talk", "language": "en", "abstract": "In this contribution we will present the first stable version v1.0 of Gammapy, an openly developed Python package for gamma-ray astronomy. Gammapy provides methods for the analysis of astronomical gamma-ray data, such as measurement of spectra, images and light curves. By relying on standardized data formats and a joint likelihood framework, it allows astronomers to combine data from multiple instruments and constrain underlying astrophysical emission processes across large parts of the electromagnetic spectrum. Finally we will share lessons learned during the journey towards version v1.0 for an openly developed scientific Python package.", "description": "By observing the very high energy (VHE) range of the electromagnetic spectrum we can gain valuable insight into to the extreme universe, including remnants of supernova explosions and surroundings of black holes. In the past VHE gamma-ray astronomy was typically conducted by small, closed collaborations as a subfield of particle physics. However, with hundreds of sources now identified,\r\n VHE gamma-ray astronomy has emerged as a new branch of Astronomy. This field provides us with the high energy context for understanding the physical processes occurring throughout the universe, across the entire electromagnetic spectrum.\r\n\r\n The next generation of ground based gamma-ray instruments, particularly the Cherenkov Telescope Array Observatory (CTA)), is set to revolutionize gamma-ray astronomy. With an anticipated sensitivity ten times greater than current telescopes, it has the potential to attract a community of thousands of gamma-ray astronomers. Furthermore, it will operate as an open observatory, making both data and analysis tools readily available to the public.\r\n\r\nIn this contribution we introduce the inaugural stable version (v1.0) of Gammapy, a Python package for gamma-ray astronomy and the primary library for the future CTAO science tools. Leveraging the scientific Python ecosystem, including Numpy, Scipy, and Astropy, Gammapy offers a comprehensive set of standard data analysis tools, making it an indispensable resource for (not only) gamma-ray astronomers. By utilizing common open data formats, Gammapy also enables existing instruments such as VERITAS, H.E.S.S., or HAWC to export and archive their data, preserving it for future analysis using improved methods. Additionally, it facilitates the combination of data from multiple instruments, resulting in more sensitive analyses with greater statistics and a larger energy range.\r\n\r\nGammapy tackles the varied structure of gamma-ray data and science analysis cases by implementing a uniform API for N-dimensional sky maps. This API is independent of the underlying pixelization scheme and supports local WCS, allsky HEALPix, and region-based projections. These data structures prove useful for a broad range of applications as well as astronomers observing in other wavelength.\r\n\r\n Building on these core data structures, Gammapy features a maximum likelihood fitting framework that enables simultaneous modeling of gamma-ray emission in four dimensions: space, energy, and time. By providing a general likelihood interface, Gammapy enables science users to integrate gamma-ray data with astronomical data from other wavelengths, as well as with neutrino data. Thanks to its straightforward Python API, Gammapy can be paired with other Python-based broadband emission modeling packages, allowing for direct measurement of parameters pertaining to common underlying astrophysical processes. This feature is crucial to realizing the full potential of future multi-wavelength and multi-messenger Astronomy.\r\n\r\n Lastly, we will discuss the valuable lessons learned during our journey to achieve v1.0 quality for an openly developed package. This will involve addressing concerns regarding maintainability, selection of dependencies, handling high dimensional data structures and API design. We believe that sharing our experience will be helpful to other scientific Python projects in the future.", "recording_license": "", "do_not_record": false, "persons": [{"id": 202, "code": "DBFWBP", "public_name": "Axel Donath", "biography": "I'm a Postdoc researcher at the Center for Astrophysics. My research interests include the Galactic X-Ray and Gamma-Ray source populations as well as statistical methods for analysis of low counts data in general. I'm also interested in methods to combine data from multiple instruments. I'm the lead developer of the open source software package Gammapy, sub-package maintainer of Astropy and a member of the CHASC astro-statistics collaboration. I'm also editor for the Astronomy and Astrophysics track of the Journal of Open Source Software JOSS.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 84, "guid": "a97d3b82-be48-569f-bcf9-3878fe935c07", "logo": "", "date": "2023-07-12T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-84-libyt-a-tool-for-parallel-in-situ-analysis-with-yt", "url": "https://cfp.scipy.org/2023/talk/JTXC9W/", "title": "libyt: a Tool for Parallel In Situ Analysis with yt", "subtitle": "", "track": "Astronomy and Physics", "type": "Talk", "language": "en", "abstract": "In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime. We describe the methods for reading adaptive mesh refinement data structure, handling data transition between Python and simulation with minimal memory overhead, and conducting analysis with no additional time penalty using Python C API and NumPy C API. We demonstrate how it solves the problem in astrophysical simulations and increases disk usage efficiency.", "description": "## Motivation and Aims\r\nIn the era of exascale computing, storage and analysis of large scale data have become more important and difficult.\r\nWe present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or any other Python packages in parallel during simulation runtime.\r\n\r\n## Methods\r\n### Connecting Python and Simulation\r\nWe use Python C API and NumPy C API to connect variables and arrays in simulation to Python. This includes creating a NumPy array through wrapping an existing C array without additional memory, allocating new arrays and assigning values, and building Python objects and module that contain simulation information. We also create Python C-extension methods for Python to request data from simulations.\r\n\r\n### Executing Python Codes and Handling Errors\r\nlibyt runs in situ analysis using Python interpreter. This is like running Python prompt inside the ongoing simulation with data loaded.\r\nlibyt checks input Python syntax through compiling it to code object. If error occurs, it parses the error to see if this is caused by input not done yet or a real error. \r\n\r\n### In Situ Analysis Under Parallel Computing\r\nEach MPI process contains one simulation code and one Python. All Python instances will work together to conduct in situ analysis in parallel using mpi4py (Python bindings for MPI).\r\nyt (a Python package for analyzing and visualizing volumetric data) supports MPI parallelism feature. libyt borrows this feature and handles data transition between different MPI processes and between simulation and Python. Since every data is separated in different processes, and we cannot predict how Python decomposes the jobs and asks for data, we use one-sided MPI to deal with data exchanging process between each nodes.\r\n\r\n## Applications\r\n\r\n### Analyzing Fuzzy Dark Matter Vortices Simulation using GAMER + libyt\r\nWe use GAMER, a simulation for astrophysics, to simulate the evolution of vortices form from density voids in a Fuzzy Dark Matter halo.\r\nEach snapshot takes 116 GB, and a total of 321 snapshots are required to capture them (37 TB disk space). We solve this by using yt in libyt to extract our region of interest, which now consumes only 8 GB in each step. The data size is 15 times smaller.\r\n\r\n- Animation: https://youtu.be/tUjJYGbWgUc\r\n\r\n### Analyzing Core-Collapse Supernova Simulation using GAMER + libyt\r\nWe use GAMER to simulate core-collapse supernova explosions. We use libyt to call yt and draw slice plot of the entropy distribution.\r\nSince entropy is not part of the variable in simulation's iterative process, these entropy data will only be generated through simulation when they are needed by yt. libyt tries to minimize memory usage.\r\n\r\n- Animation: https://youtu.be/6iwHzN-FsHw\r\n\r\n## Discussion and Conclusion\r\n- libyt provides a promising solution that binds simulation to Python with minimal memory overhead and no additional time penalty. It makes analyzing large scale simulation feasible.\r\n- libyt focuses on using yt as its core analytic method, even though it can call arbitrary Python modules. We will extend to more data structure in the future.", "recording_license": "", "do_not_record": false, "persons": [{"id": 87, "code": "LDR8D3", "public_name": "Shin-Rong Tsai", "biography": "Shin-Rong Tsai is a research scientist at the University of Illinois Urbana-Champaign School of Information Sciences. She has worked on developing astrophysics simulations, processing and visualizing extensive data, and improving application performance when scaling up in high-performance computing clusters. Her work now focuses on creating an in situ analysis tool that enables ongoing simulations to use Python to analyze data. She also develops tools for analyzing and visualizing volumetric data.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 79, "guid": "b5151ce6-c3b0-5e43-bfad-e5fedf91a67f", "logo": "", "date": "2023-07-12T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-79-seeing-the-sun-through-the-clouds-accelerating-the-sunpy-data-analysis-ecosystem-with-dask", "url": "https://cfp.scipy.org/2023/talk/DZBF7K/", "title": "Seeing the Sun through the Clouds: Accelerating the SunPy Data Analysis Ecosystem with Dask", "subtitle": "", "track": "Astronomy and Physics", "type": "Talk", "language": "en", "abstract": "Over the last decade, the SunPy ecosystem, a Python solar data analysis environment, has evolved organically to serve the needs of scientists analyzing solar physics data, mostly on desktop and laptop computers. However, modern solar observatories are producing data volumes in the tens of petabytes, necessitating the need for parallelized and out-of-core computation. HelioCloud is a cloud computing environment tailored for heliophysics research and colocated with many terabytes of solar physics data. In this talk, we will show how the SunPy ecosystem, combined with Dask on HelioCloud, can be used to efficiently process high-resolution solar data.", "description": "The SunPy ecosystem is a set of community-developed, free and open-source Python packages for solar data analysis. The ecosystem consists of the core sunpy package, which provides general capabilities such as data download, data structures, and coordinate transformations, as well as a growing set of affiliated packages which provide more application-specific functionality such as image processing techniques. The entire SunPy ecosystem depends heavily on the broader scientific Python ecosystem, including numpy, scipy, and scikit-image and especially the astropy package, a community Python package for astronomy.\r\n\r\nOver the last decade, the SunPy ecosystem has evolved organically to serve the needs of scientists analyzing solar physics data. Analysis of observational solar data has traditionally been carried out on desktop or laptop computers or small compute clusters (see Bobra et al., 2020). This limitation is partly due to the longstanding historical reliance on the proprietary Interactive Data Language (IDL) by the solar physics community which has limited scalability due in part to licensing restrictions. However, modern space- and ground-based solar observatories are producing data volumes in the tens of petabytes, necessitating the need for parallelized and out-of-core computation. The surge in popularity of Python within the broader astronomy community as well as the growing availability of computing resources has led to many solar researchers using Python in cloud environments. All of these factors have propelled the development of HelioCloud. Inspired by similar science platforms for other disciplines like Pangeo, HelioCloud is a NASA-funded, AWS-backed cloud computing environment tailored for heliophysics research. HelioCloud provides both a dashboard for creating custom virtual machines as well as a JupyterLab interface. Using the latter allows for interactive, scalable computation enabled by Dask across many compute nodes. Most importantly, HelioCloud is collocated with nearly 1 petabyte of solar physics data such that researchers can perform their analysis without the added latency of needing to download the data.\r\n\r\nIn this talk, we will demonstrate how the SunPy ecosystem, combined with Dask on HelioCloud, can be used to efficiently process high-resolution solar data. First, we will provide a brief description of the SunPy project with particular emphasis on the ndcube and sunkit-image affiliated packages. Next, we will provide a brief description of the JupyterLab interface of the HelioCloud platform. Finally, we will demonstrate a typical scientific workflow on HelioCloud by efficiently analyzing many hours worth of solar active region evolution using sunpy, ndcube, sunkit-image, and Dask to scale out our computation over many workers. Additionally, we will discuss existing incompatibilities between Dask and the astropy ecosystem and how collaboration with the broader scientific Python community could resolve such frictions.", "recording_license": "", "do_not_record": false, "persons": [{"id": 95, "code": "VB8TGX", "public_name": "Will Barnes", "biography": "Research Term Faculty, American University, Washington, D.C., USA\r\nResearch Scientist, NASA Goddard Spaceflight Center, Greenbelt, MD, USA", "answers": []}, {"id": 96, "code": "GDMPDR", "public_name": "Nabil Freij", "biography": "Nabil Freij is working as Research Software Engineer for Bay Area Environmental Research Institute supporting several NASA missions at Lockheed Martin Solar and Astrophysics Laboratory.\r\n\r\nBefore this, he was a Software Engineer at the The Institute for Environmental Analytics based at the University of Reading focused on providing customized weather and climate data to growers and farmers.\r\n\r\nBefore his pivot to Software Engineering, he was a research scientist at Universitat de les Illes Balears working on coronal heating and MHD waves.", "answers": []}, {"id": 104, "code": "MPBZ7T", "public_name": "Jack Ireland", "biography": "I have worked in the field of solar physics since 1995. I am a co-founder of the SunPy and Helioviewer Projects. I am currently working as the Project Scientist for NASA's Solar Data Analysis Center, and US Project Scientist for the Solar and Heliospheric Observatory.", "answers": []}, {"id": 141, "code": "C8YJSB", "public_name": "Stuart Mumford", "biography": "Stuart writes open source software for solar and astro physics. Is the the lead-developer of SunPy, contributes to Astropy, and spends most of his time working with the DKIST data center on data products and Python software for users of DKIST data.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 140, "guid": "0706778d-3f79-5820-bd62-d5bee5d972d8", "logo": "", "date": "2023-07-12T15:25:00-05:00", "start": "15:25", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-140-open-force-field-next-generation-force-fields-with-open-data-open-software-and-open-science", "url": "https://cfp.scipy.org/2023/talk/8MMNUD/", "title": "Open Force Field: next-generation force fields with open data, open software, and open science", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "The Open Force Field (OpenFF) initiative was formed to build a new generation of force fields for molecular dynamics (MD) simulations using modern data-driven techniques. Openness is one of our fundamental founding principles, and everything we produce is released openly and accessibly so that the community can validate, modify, or extend our work. Here we introduce some flagship packages in our ecosystem and the advances they have enabled in force field science and MD workflows. These include fitting custom functional forms, exploring the addition of off-site charges, and using neural networks to assign charges to protein-ligand systems.", "description": "**Background**\r\n\r\nMolecular dynamics (MD) simulations are now critical components in pharmaceutical and biomolecular research. A potential energy function called a \u2018force field\u2019 is used to solve the differential equations that describe the particle motion. A vast number of different force fields have now been released, each fit to experimental or quantum chemistry data to reproduce specific properties in a limited region of chemical space. However, the core of most of these date from work published decades ago, and new force field development has primarily taken the form of incremental improvements guided by human chemical intuition rather than systematic, reproducible methods.\r\n\r\n**Outline**\r\n\r\nThe [Open Force Field (OpenFF) initiative](https://openforcefield.org/) was formed to produce open and extensible infrastructure to build a new generation of MD force fields. We have now developed many software packages for constructing, applying, and benchmarking force fields. We have also generated several high-quality quantum chemistry datasets. Everything is available freely on [GitHub](https://github.com/openforcefield/), [Zenodo](https://zenodo.org/communities/openforcefield/), and the [MolSSI QCArchive server](https://qcarchive.molssi.org/). This work has been successfully used to investigate potential improvements to force fields, as well as simplify many previously difficult aspects of preparing MD systems.\r\n\r\nHere we will introduce the [OpenFF-Toolkit](https://github.com/openforcefield/openff-toolkit) and [OpenFF-Interchange](https://github.com/openforcefield/openff-interchange) packages. We can use them to quickly assign force field parameters to arbitrary systems of small molecules, and then write these systems out in common MD formats for simulation. We also introduce the [OpenFF-Bespokefit](https://github.com/openforcefield/openff-bespokefit) package for fitting custom torsion parameters, as well as the [OpenFF-QCSubmit](https://github.com/openforcefield/openff-qcsubmit) package for interacting with QCArchive. We show how to use the datasets we have released on QCArchive.\r\n\r\nWe will finally show some of the advancements enabled by our work. The [OpenFF-Evaluator](https://github.com/openforcefield/openff-evaluator) package was instrumental in investigating the effect of using a custom potential for van der Waals\u2019 parameters. We used [OpenFF-Recharge](https://github.com/openforcefield/openff-recharge) to explore adding off-site charges with virtual sites. Finally, we describe the development of a neural network for quickly assigning conformer-independent partial charges \u2013 this also employed OpenFF-Recharge, as well as [OpenFF-NAGL](https://github.com/openforcefield/openff-nagl).\r\n\r\nWe hope these examples give a brief overview of how OpenFF can help both common everyday MD tasks as well as larger scientific investigations.\r\n\r\n**Previous talks**\r\n\r\nI've previously given [keynote talks](https://www.youtube.com/watch?v=Jw1iVjHkRPM) at the Open Force Field annual meetings and presented at open science meetings convened by the [NIH](https://datascience.nih.gov/news/nih-odss-to-host-sessions-at-ismb-annual-conference), the NSF, and groups in the [scientific computing](https://www.youtube.com/watch?v=hS87inZupdQ) and molecular simulation communities.", "recording_license": "", "do_not_record": false, "persons": [{"id": 423, "code": "KKF8RZ", "public_name": "Jeff Wagner", "biography": "Jeff Wagner is the Technical Lead at Open Force Field, a pre-competitive effort supported by a mix of industry partners and government funding. OpenFF aims to build extensible tools and datasets to advance the state-of-the-art in molecular modeling. He received his PhD in chemistry from UCSD in 2018 and is broadly interested in understanding out how sustainable organizations can exist at the interface of academia, industry, and open source.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 147, "guid": "1b5c3b74-839d-5a36-ae9b-89e126fb8b94", "logo": "", "date": "2023-07-12T16:05:00-05:00", "start": "16:05", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-147-designing-user-friendly-apis-for-the-nist-interatomic-potentials-repository", "url": "https://cfp.scipy.org/2023/talk/JH9JMV/", "title": "Designing user-friendly APIs for the NIST Interatomic Potentials Repository", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "The NIST Interatomic Potentials Repository project has developed Python APIs to support user interactions with the repository data hosted at https://potentials.nist.gov. The associated code is layered, starting with generic methods for JSON/XML-based data and databases, and building up to user-friendly interfaces specific to the repository. This design allows for basic users to easily explore the data and expert users to perform more complicated operations or create custom APIs for other databases. The repository APIs help users find and compare interatomic models, set up simulations, perform high throughput calculations, and access the high throughput results.", "description": "This presentation outlines the Python APIs developed for the public database of the NIST Interatomic Potentials Repository. The entire framework consists of six different Python packages designed for data interaction and generation: DataModelDict, cdcs, yabadaba, potentials, atomman, and iprPy.  These packages have an import hierarchy with each subsequent package incorporating or inheriting the previous.\r\nAll project data is represented with JSON/XML equivalent data models. Having data that can be equivalently represented in JSON and XML takes advantage of the benefits of both formats while placing only minor limits on schema designs.  The \u201cDataModelDict\u201d Python class extends the basic dict to allow for easy transformations between Python, JSON and XML, and includes additional methods for exploring and manipulating individual records.\r\nAll public potentials data are hosted in a CDCS database accessible at https://potentials.nist.gov.  CDCS databases store XML formatted records, they support multiple schemas, and provide both a web-based interface and a REST API for interacting with the data. The \u201ccdcs\u201d package defines Python methods for common database interactions that wrap around the REST API calls.  The also provides options to build custom REST calls to the database for features not yet directly supported.\r\nThe JSON/XML equivalent data models means that all records can also be stored in JSON-based Mongo databases or as local collections of JSON or XML files.  The \u201cyabadaba\u201d Python package provides an intermediate abstraction layer allowing users to interact with data stored in all three database infrastructures using common methods. It also provides a framework for interpreting and building data records associated with different schemas.  These features make it possible for end users to explore and generate data while remaining agnostic to the infrastructure used to store the data.\r\nWhile the \u201ccdcs\u201d and \u201cyabadaba\u201d packages provide APIs for interacting with an arbitrary CDCS database, the \u201cpotentials\u201d package provides APIs specifically focused on interatomic potentials content in potentials.nist.gov.  Utilizing the yabadaba features, any user can create their own copy of all interatomic potentials listings and then search and explore from either location. Searches can be performed both using simple Python methods or using Jupyter widget-based GUIs. The potentials package also forms the basis for adding new listings to the repository and for generating the traditional static repository website at https://www.ctcms.nist.gov/potentials/.\r\nThe \u201datomman\u201d package focuses on setting up and analyzing atomic configurations and LAMMPS simulations.  On the data side, it extends the \u201cpotentials\u201d package functionality to interpreting schemas of atomic configurations. Finally, the \u201ciprPy\u201d package is centered around providing a collection of standard atomistic property calculation methods for characterizing interatomic potentials. The iprPy calculations can be performed individually or in high throughput, and can be executed from the command line, from within Python, or using transparent-box demonstration Jupyter Notebooks.", "recording_license": "", "do_not_record": false, "persons": [{"id": 173, "code": "PZFXRM", "public_name": "Lucas Hale", "biography": "Dr. Lucas Hale is a materials research scientist at NIST where he is the content manager for the Interatomic Potentials Repository project.  In support of the project, he has developed numerous Python packages for interacting with the repository data, designing atomistic simulations for investigating bulk crystal and crystalline defects, and developing and performing high throughput calculations.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 4, "date": "2023-07-13", "day_start": "2023-07-13T04:00:00-05:00", "day_end": "2023-07-14T03:59:00-05:00", "rooms": {"Zlotnik Ballroom": [{"id": 280, "guid": "536ee17d-cf2b-5607-bef5-7767b5c1955d", "logo": "", "date": "2023-07-13T09:15:00-05:00", "start": "09:15", "duration": "00:45", "room": "Zlotnik Ballroom", "slug": "2023-280-keynote-how-open-source-tools-power-the-efforts-of-biological-data-analysis-and-drug-discovery", "url": "https://cfp.scipy.org/2023/talk/DQQBWR/", "title": "Keynote - How Open Source Tools Power the Efforts of Biological Data Analysis and Drug Discovery", "subtitle": "", "track": "Keynote", "type": "Talk", "language": "en", "abstract": "Angela Pisco is the head of computational biology at insitro. She is passionate about extracting meaningful information from biomedical datasets and use that to improve disease understanding and drug development. She has studied Biomedical Engineering as BSc and MSc and have a PhD in Systems Biology. Her PhD work became the foundation of a new direction of thinking on why cancer develops resistance to chemotherapy, which is the major reason for treatment failure. In her postdoctoral work, she investigated the mechanisms of cellular differentiation in the skin. She developed a 3D computational model that recapitulated the observed changes in the mouse skin connective tissue and dermis during development. The combination of the mathematical analysis with experimental data led to a new understanding of how distinct fibroblast subpopulations become activated, proliferate, and deposit matrix proteins during wound healing. Before moving to insitro, she led the Data Science platform at CZ Biohub. There she made significant contributions for the whole organism cell atlas projects including the first whole mouse cell atlas, the first aging cell atlas, and Tabula Sapiens, one of the first Human Cell Atlas drafts (The Tabula Sapiens Consortium, Science 2022). She is also a founder and core member of Open Problems in Single Cell (openproblems.bio), a community effort to improve multimodal data analysis by both generating gold standard datasets and benchmarking metrics and infrastructure.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 410, "code": "9KLYQ3", "public_name": "Angela Pisco", "biography": "Head of computational biology at insitro", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 189, "guid": "aa248c7d-a5dc-5751-a8c5-6a205b2eee39", "logo": "", "date": "2023-07-13T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-189-subpoenas-less-scary", "url": "https://cfp.scipy.org/2023/talk/TKQFWU/", "title": "Subpoenas Less Scary", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Your users have entrusted their data to you. But what happens when a law enforcement government agency demands you share the data with them? We will demystify the process of receiving and responding to law enforcement\u2019s demands for data. We demonstrate how designing around privacy can limit what needs to be shared. To make subpoenas less scary, we break them down as a technical process, and share the protections we implemented at  Mozilla. If you want to understand the real-world impact of your approaches to privacy, this talk is for you.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 222, "code": "XAWYGZ", "public_name": "Rebecca BurWei", "biography": "Staff Data Scientist at Mozilla. Prior lives include building a data department from scratch and getting a PhD in non-commutative algebra. Come talk to me about industry vs academia, careers in data science, under-rated places in Chicago, and who should have won the NBA playoffs.", "answers": []}, {"id": 458, "code": "KRCYUA", "public_name": "David Zeber", "biography": "David Zeber is a Staff Data Scientist at Mozilla who enjoys prototyping innovative approaches to improving the user search experience. While at Mozilla, he also led research into online tracking and privacy-preserving technologies for working with user data. He holds a PhD in applied probability from Cornell University.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 282, "guid": "286ad88b-9a7e-5723-b70f-38522ccdbfb4", "logo": "", "date": "2023-07-13T12:15:00-05:00", "start": "12:15", "duration": "00:45", "room": "Zlotnik Ballroom", "slug": "2023-282-diversity-luncheon-keynote-how-can-we-protect-vulnerable-groups-while-measuring-representation-in-our-communities-", "url": "https://cfp.scipy.org/2023/talk/H9FDBV/", "title": "Diversity Luncheon Keynote: How can we protect vulnerable groups while measuring representation in our communities?", "subtitle": "", "track": "Keynote", "type": "Talk", "language": "en", "abstract": "Diversity, equity and inclusion initiatives often start with measurement - what do our communities look like today and how can we track progress against our goals? However, data collected through APIs, web scraping, surveys, interviews, inference etc. have the potential to expose more details about an individual than they were expecting, especially when aggregated across platforms and shared in public forums. This talk will discuss tactics, opportunities and challenges when collecting sensitive data in and around open source communities, while aligning with policies and regulations, respecting the right to anonymity and ensuring the safety of all members of the community.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 464, "code": "AXRSNH", "public_name": "Sophia Vargas", "biography": "Sophia Vargas is a Program Manager in the research and education team within Google\u2019s Open Source Programs Office. In this role she leads efforts that span project health, contributor experience, and open source economics. She is also on the Governing Board and an active contributor to the CHAOSS community. Prior to Google, Sophia was an analyst at Forrester Research, covering data center infrastructure and cloud strategy.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 228, "guid": "62a3a168-54f6-51a5-8f0d-55819f2a631b", "logo": "", "date": "2023-07-13T14:20:00-05:00", "start": "14:20", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-228-using-python-to-accelerate-sustainable-aviation-fuel-research-and-development", "url": "https://cfp.scipy.org/2023/talk/XSQKSA/", "title": "Using Python to accelerate sustainable aviation fuel research and development", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Aviation comprises 2-3% of global CO2 emissions. Transitioning to cleaner, more sustainable aviation fuels can reduce its environmental impacts. To help accelerate sustainable aviation fuel development, we trained machine learning models to predict fundamental properties of biofuel blends using Fourier transform infrared (FTIR) spectra. We leveraged TPOT and standard libraries like NumPy, pandas, and scikit-learn to develop the models. This presentation will discuss how we overcame challenges with decomposing FTIR spectra data and using machine learning on small datasets (<100 samples). We will also discuss integration of the models into our open-source webtool to support biofuel research.", "description": "Aviation comprises 2-3% of global carbon dioxide emissions and 9-12% of U.S. transportation greenhouse gas emissions. Sustainable aviation fuels have the potential for reducing emissions and environmental impacts; however, due to high costs and high-volume requirements, experimental property testing of bio-based jet fuels is usually conducted years after initial bench-scale experiments are completed. Neglecting to conduct property testing early in the development cycle can lead to wasted investments spent on production of biofuels that do not meet performance expectations.\r\n\r\nMachine learning has already proven to be a valuable tool for predicting sustainable aviation fuel properties and accelerating research. In 2020, we presented our approach at SciPy (https://www.youtube.com/watch?v=ENOf0IZDla8) to predict high-throughput aviation fuel properties of over 10,000 molecules with molecular descriptors. The correlation analysis and tree-based methods for feature ranking were later published in Fuel (https://doi.org/10.1016/j.fuel.2022.123836). Using the property prediction models, we created the first Python-based, comprehensive, open-source webtool that enables scientists and companies to explore viable bio-based molecules without spending time and money testing in the lab (https://feedstock-to-function.lbl.gov).\r\n\r\nBecause aviation fuels are made of blends of molecules and compounds, our current research focuses on expanding the webtool to predict properties of fuel blends using Fourier transform infrared (FTIR) spectra and experimental property data. Specifically, we use binning and smoothing techniques to reduce experimental noise in more than 6700 FTIR spectra features and use non-negative matrix factorization (NMF) for feature selection to develop models that predict fundamental properties of biofuel blends (e.g., boiling point, flash point, melting point, specific gravity, and kinematic viscosity). The predictive models are also integrated into the webtool to help sustainable aviation fuel research. \r\n\r\nOur workflow includes using libraries such as Numpy, pandas, scikit-learn to reduce FTIR spectra data into interpretable components to predict properties, and the Tree-based Pipeline Optimization Tool (TPOT) to develop property prediction models with reduced FTIR spectra as features. Specifically, we will discuss methods for coalescing experimental spectra data from different sources, and will present methods for reducing the influence of experimental noise on model performance. We will also discuss using NMF as a dimensionality reduction technique that correctly groups FTIR spectra wavelengths together and results in meaningful features. Additionally, we will address common pitfalls such as defining an applicability domain, and recognizing and limiting the possibility of overfitting.\r\n\r\nBy sharing our experience and lessons learned, we aim to help the community overcome similar challenges when developing models for advancing science, while also demonstrating how a Python-based, open-source webtool can facilitate faster, less expensive bioprocess optimization and scale-up of sustainable aviation fuels.", "recording_license": "", "do_not_record": false, "persons": [{"id": 252, "code": "NHUBRL", "public_name": "Ana Comesana", "biography": "Ana Comesana is a Scientific Engineering Associate at Lawrence Berkeley National Laboratory. She is a data scientist who conducts applied machine learning research to support projects in a variety of areas, including water treatment, energy management, and bio-jet fuel research. Ana received her B.S. in Mathematics from UC Berkeley.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 178, "guid": "487093a7-698e-5220-aeeb-c62130e8644a", "logo": "", "date": "2023-07-13T15:00:00-05:00", "start": "15:00", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-178-contributor-experience-why-it-matters", "url": "https://cfp.scipy.org/2023/talk/MEGK33/", "title": "Contributor experience - Why it matters", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "Behind every successful open source project is a strong contributor community. What makes these communities strong? What can you do in your OSS project to nurture a thriving contributor community? In this presentation, we will share insights from the work of the Contributor Experience Lead team (NumPy, SciPy, Matplotlib, and pandas) and discuss why designing and providing positive contributor experience is vital to sustainability of each individual project and the SciPy ecosystem overall.", "description": "Behind every successful open source project is a strong contributor community. Engaging and supporting contributors requires specialized knowledge, experience, and time commitment from project leaders. However, a chronic lack of resources and time often inhibits them to focus on this work. Recognizing these challenges, in late 2021, we created a team of Contributor Experience Leads to support contributors to the four foundational libraries in the Scientific Python ecosystem: NumPy, SciPy, Matplotlib, and pandas.\r\nIn this presentation, we will share insights from the work of our team and discuss why it is vital for project maintenance and sustainability. We will examine what we have identified as primary goals and priorities for a Contributor Experience team in each project, taking into account project size, structure, and governance model. We will also discuss how this work could be applied to other projects in the SciPy ecosystem.\r\nFinally, we will talk about the Contributor Experience Project (https://contributor-experience.org), a community of practice and an open-source community-led project dedicated to developing best practices for onboarding and supporting contributors to open source.", "recording_license": "", "do_not_record": false, "persons": [{"id": 75, "code": "D9RFD3", "public_name": "Melissa Weber Mendon\u00e7a", "biography": "Melissa is an applied mathematician and former university professor who fell in love with open source communities. She has been involved with the Python and PyData communities for some time, with a focus on outreach, education and DEI. She works at Quansight as a Senior Developer Experience Engineer, is a maintainer for NumPy and SciPy, and believes in the power of contributions beyond code.", "answers": []}, {"id": 211, "code": "VL38N7", "public_name": "Inessa Pawson", "biography": "Inessa is building bridges between people, open science, and open source software, advocating for diversification of contribution pathways to open source and supporting its social infrastructure. Passionate about the transformative power of collaboration out in the open, she has been organizing the Maintainers Summit at PyCon US since 2020 to foster best practices on how to maintain and develop sustainable open source projects and thriving communities. In her current role as NumPy Contributor Experience Lead, Inessa\u2019s primary focus is on onboarding and supporting contributors, addressing gaps in the project governance, and developing programs to diversify pathways of contribution to the project.", "answers": []}, {"id": 363, "code": "9TWMGQ", "public_name": "Noa Tamir", "biography": "Noa have been involved with the R and PyData communities for some time, with a focus on community building and DEI. They are a  member of the NumFOCUS Board of Directors and DISC committee, PyLadies Organizer, and chaired the PyData Berlin 2022 conference. In addition, they are a Lead Data Science Coach at neue fische, contributing to pandas, and are currently developing the Contributor Experience Community and Handbook with Inessa Pawson and Melissa Mendon\u00e7a.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 196, "guid": "f297a71f-3010-5da7-8bfb-0f4544528f34", "logo": "", "date": "2023-07-13T15:50:00-05:00", "start": "15:50", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-196-zarr-community-specification-of-large-cloud-optimised-n-dimensional-typed-array-storage", "url": "https://cfp.scipy.org/2023/talk/T3NSL8/", "title": "Zarr: Community specification of large, cloud-optimised, N-dimensional, typed array storage", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.", "description": "Zarr is a data format for storing chunked, compressed N-dimensional arrays and is sponsored by [NumFOCUS](https://numfocus.org/project/zarr) under their umbrella.\r\n\r\nIn this presentation, we will discuss the evolution of Zarr, first introduced at [SciPy 2019](https://youtu.be/qyJXBlrdzBs); the development of the [Zarr Enhancement Process (ZEP)](https://zarr.dev/zeps/) and its use to define the next major version of the [Zarr Specification (V3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html); as well as uptake of the format across the research landscape.\r\n\r\n### Outline:\r\n\r\nFirst, we\u2019ll be talking about:\r\n\r\n### Introduction and Working of Zarr (10 mins.)\r\n\r\n- What is Zarr, and how it works?\r\n    - The inner workings of Zarr using illustrated graphics\r\n    - When and Why should you use Zarr?\r\n    - Extensive pluggable compressors (via [numcodecs](https://github.com/zarr-developers/numcodecs/)) and file-storage systems\r\n- What is the [Zarr Specification](https://zarr.readthedocs.io/en/stable/spec/v2.html)?\r\n    - A summary of the technical specification of Zarr\r\n    - Adoption of the Zarr specification in various programming languages like Python, C, C++, Java, and Javascript and how all of us form a wonderful community together\r\n- Development of Zarr since it was first presented in SciPy 2019 by Alistair Miles\r\n    - Highlighting some important technical and community milestones since 2019\r\n    - Securing grants from [CZI](https://chanzuckerberg.com/eoss/proposals/zarr-a-common-backbone-for-the-scalable-storage-of-annotated-tensor-data/) and getting sponsored by NumFOCUS\r\n\r\nAfter this:\r\n\r\n### Usage of Zarr across several domains (5 mins.)\r\n\r\n- Interoperability with Dask, Xarray and Numpy\r\n- Adoption of Zarr by various communities like Geospatial, Bio-imaging, Genomics, Data Science/Engineering etc.\r\n- Development of convention processes like [GeoZarr](https://github.com/zarr-developers/geozarr-spec) and [OME-Zarr](https://github.com/ome/ome-zarr-py)\r\n\r\nThen we\u2019ll discuss:\r\n\r\n### [ZEP Process](https://zarr.dev/zeps/) (10 mins.)\r\n\r\n- Need and origin of a community feedback process for the evolution of Zarr specification\r\n- How it works?\r\n- Transformation from steering council governed to community-owned specification\r\n- Learnings when migrating from [Spec V2](https://zarr.readthedocs.io/en/stable/spec/v2.html) \u2192 [Spec V3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)\r\n\r\nAnd finally:\r\n\r\n### Conclusion (5 mins.)\r\n    \r\n- Key takeaways\r\n- How can you get involved?\r\n- QnA\r\n\r\nThis talk aims to address an audience who works with large amounts of data and is looking for a format which is transparent, open-source, reliable, cloud-optimised, and friendly to the environment. Also, we\u2019d like to invite anyone interested in the lessons we learnt by maintaining the project throughout the years.\r\n\r\nThe tone of the talk is set to be informative, story-telling and fun.\r\n\r\n### After this talk, you\u2019d:\r\n\r\n- understand the basics of Zarr and its specification,\r\n- know why you should have a process for your project,\r\n- have essential takeaways regarding when an OSS project transitions from a young to a mature stage\r\n- as well as the pros and cons of a steering council vs a community-owned open-source project", "recording_license": "", "do_not_record": false, "persons": [{"id": 24, "code": "3B3PXC", "public_name": "Sanket Verma", "biography": "Sanket is a data scientist based out of New Delhi, India. He likes to build data science tools and products and has worked with startups, government and organisations. He loves building community and bringing everyone together and is Chair of PyData Delhi and PyData Global.  Currently, he's taking care of the community and OSS at Zarr as their Community Manager.\r\nWhen he\u2019s not working, he likes to play the violin and computer games and sometimes thinks of saving the world!", "answers": []}, {"id": 258, "code": "7VPJ93", "public_name": "John Kirkham", "biography": "Got my B.S. & M.S. in Physics. After graduating went to work at Howard Hughes Medical Institute for 5 years working on image processing problems particularly in neuroscience. Got more involved in open source during that work with particular interest in packaging, storage, and distributed array processing. Then joined the NVIDIA RAPIDS team where there has been good overlap with these past interests as well as new ones.", "answers": []}, {"id": 286, "code": "ZGQFJM", "public_name": "Josh Moore", "biography": "Josh is a research software engineer focusing on the standardization and storage of bioimaging data. Typically, that means finding ways of storing large binary with well-defined metadata in order to make them shareable. To that end, he is a maintainer of the Open Microscopy Environment (OME) as well as Zarr projects.\r\n\r\nYou can find out more under https://joshmoore.github.io", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 245, "guid": "421d0ea9-54f3-57e5-a80c-ee90a01ecfd8", "logo": "", "date": "2023-07-13T16:30:00-05:00", "start": "16:30", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-245-building-metpy-for-the-long-term-working-to-keep-an-open-source-project-sustainable", "url": "https://cfp.scipy.org/2023/talk/UT3CUZ/", "title": "Building MetPy for the Long Term: Working to Keep an Open Source Project Sustainable", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "MetPy is an open-source Python package for meteorological and atmospheric science applications, leveraging significantly many other pieces of the scientific Python stack (e.g. numpy, matplotlib, scipy, etc.). With a focus on sustainability, Metpy extensively leverages GitHub Action to try to automate as much of the software development process as possible. Sustainability also extends to the growth of the community of developers, and we have been working to try to make that sustainable as well. Here we talk about our experiences, share our successes and lessons learned with trying to build a sustainable project.", "description": "MetPy is an open-source Python package for meteorological and atmospheric science applications, leveraging significantly many other pieces of the scientific Python stack (e.g. numpy, matplotlib, scipy, etc.). Its goal is to provide tested, reusable components suitable to a wide array of tasks, including scripted data visualization and analysis. The guiding principle is to make MetPy easy to use with any dataset that can be read into Python. MetPy\u2019s general functionality breaks down into: reading data, meteorological calculations, interpolation, and meteorology-specific plotting. MetPy also has significant integration with XArray, as well as extended support for interpreting netCDF Climate and Forecasting Convention metadata.\r\n\r\nAs a scientific software project that has actively solicited users across the research and education spaces, MetPy has placed a heavy emphasis on the sustainability of the project. Too often core scientific libraries fall into disarray, with a heavy toll on the reproducibility of scientific results. Even given our strong institutional support, our goal with the MetPy project is to build the project with an eye to these potential problems and keep the project sustainable as much as possible.\r\n\r\nOne axis of sustainability for us lies on the side of technology and project infrastructure, which has been highly automated. This starts with our unit tests and test coverage, run automatically on GitHub, using its Actions service. These tests are run across a variety of OS, python version, and package manager combinations, as well as covering a wide array of sets of dependencies. This gives us great coverage of potential breakages. This also extends to automated documentation builds and publication, link checking, code quality checks, and, most importantly, making releases. This combination of processes, built heavily on the Github Actions service minimizes the need for humans in the loop of standard software development steps, allowing us to maximize the use of development time elsewhere.\r\n\r\nTechnological automation is important for sustainability, but it\u2019s only one part of the equation; in order to have a truly sustainable open source project, you must also be solving the issue of people. MetPy follows open development practices to drive community participation as much as possible. We use Github issues, pull requests, discussions, and projects extensively to allow input from any interested user. We also hold regular, open developer calls to keep the project moving forward; we have also started holding community calls to try to give the community more of a voice and input into the direction of the project; these are also done with a goal to encourage more of the community to become involved directly with MetPy development.\r\n\r\nThis talk will share our lessons learned, both with technology and people, to help other projects who want to try to improve their overall sustainability.", "recording_license": "", "do_not_record": false, "persons": [{"id": 270, "code": "R8XLUD", "public_name": "Ryan May", "biography": "Ryan May is a software engineer and deputy director for the Unidata program, part of the University Corporation for Atmospheric Research (UCAR) Community Programs, working on Python software and training for the atmospheric science community. Ryan began his meteorology career pursuing a B.S. in Meteorology at the University of Oklahoma in 1999.  In 2014, Ryan started at Unidata, exchanging working on radar meteorology for working on open source tools for meteorology in Python. Currently, he is the Python team lead at Unidata and a core developer of the MetPy and Siphon Python packages, as well as a member of the steering committee for matplotlib and the core team for Conda Forge.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 296, "guid": "b8769684-138d-548e-ab50-12917d062675", "logo": "", "date": "2023-07-13T17:20:00-05:00", "start": "17:20", "duration": "01:00", "room": "Zlotnik Ballroom", "slug": "2023-296-lightning-talks", "url": "https://cfp.scipy.org/2023/talk/HWW7S7/", "title": "Lightning Talks", "subtitle": "", "track": "Lightning Talks", "type": "Talk", "language": "en", "abstract": "Lightning talks are 5-minute talks on any topic of interest for the SciPy community. We encourage spontaneous and prepared talks from everyone, but we can\u2019t guarantee spots. Sign ups are at the NumFOCUS booth during the conference.", "description": "", "recording_license": "", "do_not_record": false, "persons": [], "links": [], "attachments": [], "answers": []}], "Amphitheater 204": [{"id": 185, "guid": "71d22788-c2c1-53bb-83c9-30efe86ec4d6", "logo": "", "date": "2023-07-13T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-185-using-numba-for-gpu-acceleration-of-neutron-beamline-digital-twins", "url": "https://cfp.scipy.org/2023/talk/VVVQRU/", "title": "Using Numba for GPU acceleration of Neutron Beamline Digital Twins", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "This talk will discuss how Numba was used to accelerate MCViNE, a software environment for building and running digital twins of neutron experiments via Monte Carlo ray tracing. Numba is an open-source JIT compiler for Python using LLVM to generate efficient machine code for CPUs and GPUs with NVIDIA CUDA. Python and Numba were used to create a GPU accelerated version of MCViNE utilizing an extensible object-oriented design that has achieved a speedup of up to 1000x over the CPU. The performance gain with Numba enables more sophisticated data analysis and impacts neutron scattering science and instrument design.", "description": "Motivation\r\n\r\nMCViNE is a software package for creating digital twins of neutron scattering experiments using a Monte Carlo ray-tracing approach. These simulations are useful in performing advanced neutron data analysis as well as in designing novel neutron instruments and sample environments. Specifically, it has been used in the initial designs for instruments in the Second Target Station at the Spallation Neutron Source at Oak Ridge National Laboratory. Currently, MCViNE only runs on CPUs which is a bottleneck in large simulations with tens of billions of neutrons, and in modelling complex multiple scattering, with some simulations taking months to complete. Due to the massively parallel nature of Monte Carlo methods, bringing GPU acceleration to these simulations would offer superior performance and scalability. MCViNE is originally implemented in C++ and parallelized using MPI, and it has bindings to Python for user interaction; however, extensibility for the user can be very difficult. \r\n\r\nMethods\r\n\r\nTo improve performance and to ease user contributions, Python and Numba were chosen to create a new package providing GPU acceleration of MCViNE. Numba is an open-source JIT (just-in-time) compiler for Python using LLVM to generate efficient machine code and supports GPUs using NVIDIA CUDA. Numba is designed for scientific computing and can support NumPy arrays and functions. Currently, we are only using Numba for its GPU capabilities.\r\n\r\nUsing Python and Numba for this application allowed several advantages such as utilizing an extensible object-oriented approach and polymorphism. Each MCViNE instrument is composed of several components, such as a neutron source, a guide, and a monitor. During the simulation, neutrons can travel through each component in the instrument. Each component has a method (\u201cpropagate\u201d) defined for propagating the neutron through it. Additionally, sample environments are created using constructive solid geometry (CSG) with each primitive shape defined as a CUDA kernel. To each constructed shape, many CUDA kernels are available, each modeling a different type of scattering physics. Due to different component/scattering-kernel types and geometric shapes, using an object-oriented design was beneficial. Furthermore, this structure allowed for custom on-the-fly CUDA kernel generation for complex instrument/sample geometries and physics.\r\n\r\nResults and Conclusions\r\n\r\nPython and Numba was used to create a GPU accelerated version of MCViNE, which has so far achieved speedups of up to 1000x over the original CPU implementation. This performance gain enables more sophisticated data analysis for neutron scattering and impacts neutron scattering science and instrument design. \r\nUsing Python has helped increase the usability, extensibility, and maintainability of the codebase. Additionally, coupling Python with Numba allowed complex combinations of CUDA kernels to be generated at runtime, which would have been significantly harder to implement in other languages. The techniques used in this project could also be applied to other scientific computing applications.\r\n\r\nResources:\r\n\r\nhttps://github.com/mcvine/acc\r\nhttps://mcvine.ornl.gov/ \r\nhttps://github.com/mcvine/mcvine", "recording_license": "", "do_not_record": false, "persons": [{"id": 217, "code": "EGMJRQ", "public_name": "Coleman Kendrick", "biography": "Research Software Engineer in the Application Engineering group at Oak Ridge National Laboratory.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 24, "guid": "26404d92-1a71-52e3-ba84-5b689d4aad7a", "logo": "/media/2023/submissions/AXSVZ3/Teaser-Final_mdukBeW.png", "date": "2023-07-13T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-24-interactive-exploration-of-large-scale-datasets-with-jupyter-scatter", "url": "https://cfp.scipy.org/2023/talk/AXSVZ3/", "title": "Interactive Exploration of Large-Scale Datasets with Jupyter-Scatter", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "Jupyter-scatter is a scalable, interactive, and interlinked scatter plot widget for exploring datasets with up to several million data points. It focuses on data-driven visual encodings and offers two-way pan+zoom and lasso interactions. Beyond a single instance, jupyter-scatter can compose multiple scatter plots and synchronize their views and selections. Moreover, points can be connected by spline-interpolated lines. Thanks to the underlying WebGL rendering engine, spatial and color changes are smoothly transitioned. Finally, the API integrates seamlessly with Pandas DataFrames and offers functional methods that group properties by type to ease accessibility and readability.", "description": "Visualizing datasets as a 2D scatter plot is one of the most popular data visualization methods for understanding the distributions, identifying trends, and discovering correlations. The method is used in any scientific domain. For instance, in biology, machine learning, or digital humanities, high-dimensional datasets are often summarized with dimensionality-reduction methods like PCA, t-SNE, or UMAP, and the results are typically visualized as 2D scatter plots to discover clusters.\r\n\r\nUnfortunately, many visualization tools are unable to scale or compromise user experience with datasets that grow in size, dimensionality, and quantity. For instance, while datashader can render datasets of almost any size, it offers limited interactions. On the other hand, Plotly provides interactivity but does not extend nearly as well to millions of points. Ideally, we want to be able to render and interactively explore one or more datasets with millions of data points.\r\n\r\nJupyter-scatter (https://github.com/flekschas/jupyter-scatter) is a purpose-built widget for Jupyter Notebook, Lab, and Google Colab that supports interactive, interlinked, and scalable exploration of multiple large-scale datasets as scatter plots. It focuses on data-driven visual encodings, offers pan+zoom interactions, and two-way lasso selection. Beyond a single instance, jupyter-scatter can compose multiple scatter plots and synchronize their views and selections. Moreover, points can be connected by spline-interpolated lines. Thanks to the underlying WebGL rendering engine (https://github.com/flekschas/regl-scatterplot), changes in the spatial or color encoding of the points are smoothly transitioned. Finally, the widget API is inspired by seaborn and integrates seamlessly with Pandas DataFrames. As the number of arguments can get overwhelming when many properties are customized, jupyter-scatter provides a functional API that groups properties by type and exposes them via meaningfully-named methods. This functional API additionally allows users to programmatically modify active widgets from Python. To further ease the usability, jupyter-scatter infers sensible default color encodings from the data and dynamically adjusts the point opacity based on the point density in the current field of view.\r\n\r\nUsing examples from single-cell biology and machine learning we demonstrate how jupyter-scatter works, how it enables more efficient exploration of large-scale datasets, and how it can be integrated with other ipywidgets to build bespoke applications.", "recording_license": "", "do_not_record": false, "persons": [{"id": 36, "code": "EH7JQT", "public_name": "Fritz Lekschas", "biography": "[Fritz Lekschas](https://lekschas.de) is a computer scientist researching scalable visual exploration of biomedical data. As the Head of Visualization Research at [Ozette Technologies](https://ozette.com), he is leading the development of web-based data visualization and exploration tools for analyzing high-dimensional single-cell data. Fritz earned his PhD in computer science from Harvard University, where he was advised by Hanspeter Pfister and Nils Gehlenborg. He has published more than [twenty peer-reviewed papers](https://scholar.google.com/citations?user=v1_FiEgAAAAJ) and his work has been recognized with several awards.\r\n\r\nIn his free time, Fritz likes to work on [open-source tools for visual data exploration](https://github.com/flekschas).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 129, "guid": "161c2991-798a-57f1-b1f4-1d80b5f5125b", "logo": "", "date": "2023-07-13T14:20:00-05:00", "start": "14:20", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-129-accessibility-best-practices-for-authoring-jupyter-notebooks", "url": "https://cfp.scipy.org/2023/talk/VGAUQN/", "title": "Accessibility best practices for authoring Jupyter notebooks", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "So you\u2019ve written the perfect notebook, but do you know who can read it? As a notebook author you have great stories, code, and visualizations filling your work, but how often do you consider accessibility? Jupyter notebooks seem like they are for everyone, but how a notebook gets written can greatly impact how usable it is for people with disabilities. We\u2019ve curated authoring-focused best practices for notebook content to help your notebooks be more inclusive and reach a wider audience.", "description": "Accessibility practices are for everyone, but this may be especially important to notebook authors in academic and public settings where it is often legally required. Using [staple accessibility frameworks](https://www.w3.org/WAI/WCAG21/Understanding/intro#understanding-the-four-principles-of-accessibility), this talk will dive into what it means to make your notebook\u2019s content accessible and provide actionable guidance on how you as an author can improve your notebooks. These skills can be applied regardless of preferred notebook interface, author skill set, or prior accessibility knowledge.\r\n\r\nThis talk is best for an audience that is familiar with Jupyter notebooks. Prior accessibility knowledge or any other Jupyter knowledge is not necessary. The content is likely to be most engaging for an audience who regularly authors notebooks.\r\n\r\nThe structure of the talk will be as follows:\r\n1. Background and introduction to accessibility (7 minutes)\r\n    1.1 Why this talk? (Hint: community members have requested it)\r\n    1.2 Defining accessibility and scoping: what we will and won\u2019t cover in the talk\r\n    1.3 Common terms (disability, WCAG, assistive technology)\r\n2. Breaking down the notebook with WCAG (13 minutes)\r\n    2.1 Perceivable elements (Labels, colors, alternative forms of content)\r\n    2.2 Operable elements (Labeling for interactive areas, keyboard controls)\r\n    2.3 Understandable writing and structure (Markdown headings, summaries, plain language)\r\n3. Adopting a notebook accessibility checklist (2 minutes)\r\n4. What you can do next (2 minutes)\r\n5. Questions (6 minutes)\r\n\r\nAt the end of this talk, attendees will\r\n* Have an awareness of foundational accessibility principles and how they can appear in Jupyter notebooks.\r\n* Be able to identify common accessibility pitfalls (ie. misused Markdown, incomplete visualizations, etc.) in Jupyter notebooks and what to do instead.\r\n* Have a checklist for easy reference of accessibility best practices when writing their own notebooks or editing existing ones.", "recording_license": "", "do_not_record": false, "persons": [{"id": 150, "code": "87Y88C", "public_name": "Stephannie Jimenez Gacha", "biography": "I've been working in open source since 2019 as part of multiple projects involving scientific computing and IDE development. The last two years a lot of my work has been focused on providing a better UI/UX of multiple applications. I've given multiple talks about different topics, the two most recent are available in the following links:\r\n\r\n- PyData/Pycon Berlin 2022: https://www.youtube.com/watch?v=__EkpdeVGY4\r\n- Scipy Latam 2021: https://youtu.be/ZNVp1E0QADU?t=11847", "answers": []}, {"id": 152, "code": "VCXULG", "public_name": "Isabela Presedo-Floyd", "biography": "Isabela Presedo-Floyd (she/her) is a question-asker and UX/UI and Accessibility Designer at Quansight Labs. She is an enthusiasm enthusiast who works on tools that support open, reproducible science.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 136, "guid": "1e82fe86-b6dc-523f-ade8-5dec43d31ee7", "logo": "", "date": "2023-07-13T15:00:00-05:00", "start": "15:00", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-136-scientific-and-technical-publishing-with-python-and-quarto", "url": "https://cfp.scipy.org/2023/talk/7ZGCQM/", "title": "Scientific and technical publishing with Python and Quarto", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "In research and data science, effective communication requires weaving together narrative text and code to produce elegantly formatted output. By embedding executable Python code blocks inside markdown, the open-source publishing platform, Quarto, works with Jupyter and VS Code to enable you to create these fully reproducible documents and reports with the format and styling you need. In this talk I\u2019ll share how to get started and a few of my favorite things in Quarto including creating a manuscript, presentation and website in HTML, PDF and Word from a single source file, and creating lessons, reports, and Confluence documents.", "description": "Research and data science isn\u2019t just experiments and code, it\u2019s also communicating about our results, creating reports, sharing analyses, and teaching. To communicate effectively, we need to weave together narrative text and code to produce elegantly formatted, interactive output. Not only does it need to look great, but it needs to be reproducible, accessible, easily editable, diffable, version controlled and output in a variety of formats, such as PDF, HTML and MS Word. Jupyter has already made so much of this possible! The open-source publishing platform, Quarto, combines with Jupyter, or is a VS Code extension, so that we can easily use the output format and the styling that\u2019s needed for any situation. You can author documents as plain text markdown or Jupyter notebooks with scientific markdown, including equations (LaTeX support!), citations, cross references, figure panels, callouts, advanced layouts, and more.\r\n\r\nQuarto (https://quarto.org/) is a markdown format that adds executable Python code blocks and build on top of Pandoc to produce a variety of output documents. This allows you to create fully reproducible documents and reports\u2014the Python code required to produce your output is part of the document itself, and is automatically re-run whenever the document is rendered. This means you can create documents as plain text markdown or Jupyter notebooks that can be easily rendered into presentations, websites and manuscripts in a variety of journal formats. You can also engage readers by adding interactive data exploration to your documents using Jupyter Widgets, htmlwidgets for R, Observable JS, and Shiny.\r\n\r\nIn this talk, I\u2019ll discuss how to author these dynamic, computational documents with Quarto and Python, showing how to get started and highlighting a few of my favorite things. I\u2019ll walk through how to use a single source document to target multiple formats - transforming a simple document into a presentation, a scientific manuscript, a website, a blog, and a book in a variety of formats including HTML, PDF and MS Word.  I\u2019ll share workflows for creating and automating reports, an approach to creating online lessons, and finally how to publish Jupyter notebooks within existing content management systems like Hugo, Docusaurus, and Confluence, so that you can get started creating whatever content you need.", "recording_license": "", "do_not_record": false, "persons": [{"id": 158, "code": "RXHTLU", "public_name": "Tracy Teal", "biography": "Tracy Teal is the Open Source Program Director at Posit. Previously, she was a co-founder of Data Carpentry and the Executive Director of The Carpentries. She developed open source bioinformatics software as an assistant professor at Michigan State University and holds a PhD in computation and neural systems from California Institute of Technology. Tracy is involved in the open source software and reproducible research communities, including serving on advisory committees for NumFOCUS, pyOpenSci, EarthLab and carbonplan, and has been working with open source communities, developing curriculum, and teaching people how to work with data and code as a developer, instructor and project leader throughout her career.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 50, "guid": "9ba4502c-777d-5322-95c1-a07934243c45", "logo": "/media/2023/submissions/RVLFPB/longtail_38_0_I4vvxzb.png", "date": "2023-07-13T15:50:00-05:00", "start": "15:50", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-50-taming-black-swans-long-tailed-distributions-in-the-natural-and-engineered-world", "url": "https://cfp.scipy.org/2023/talk/RVLFPB/", "title": "Taming Black Swans: Long-tailed distributions in the natural and engineered world", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "Long-tailed distributions are common in natural and engineered systems; as a result, we encounter extreme values more often than we would expect from a short-tailed distribution. If we are not prepared for these \"black swans\", they can be disastrous.\r\n\r\nBut we have statistical tools for identifying long-tailed distributions, estimating their parameters, and making better predictions about rare events.\r\n\r\nIn this talk, I present evidence of long-tailed distributions in a variety of datasets -- including earthquakes, asteroids, and stock market crashes -- discuss statistical methods for dealing with them, and show implementations using scientific Python libraries.", "description": "You would think we'd be better prepared for disaster. But events like\r\nHurricane Katrina in 2005, which caused catastrophic flooding in New\r\nOrleans, and Hurricane Maria in 2017, which caused damage in Puerto Rico\r\nthat has still not been repaired, show that large-scale disaster\r\nresponse is often inadequate. Even wealthy countries -- with large\r\ngovernment agencies that respond to emergencies and well-funded\r\norganizations that provide disaster relief -- have been caught\r\nunprepared time and again.\r\n\r\nThe are many reasons for these failures, but one of them is that rare,\r\nlarge events are fundamentally hard to comprehend. Because they are\r\nrare, it is hard to get the data we need to estimate their likelihood precisely.\r\nAnd because they are large, they challenge our ability to imagine\r\nquantities that are orders of magnitude bigger than what we experience\r\nin ordinary life.\r\n\r\nIn terms introduced by Nassim Taleb, a \"black swan\" is a large, impactful event that was\r\nconsidered extremely unlikely before it happened, based on a model of\r\nprior events. If the distribution of event sizes is actually long-tailed\r\nand the model is Gaussian, black swans will happen with some regularity.\r\nHowever, black swans can be \"tamed'' by using appropriate models, including lognormal, Student t, and Pareto distributions.\r\n\r\nIn this talk, I introduce these distributions and show how they can be used to model measurements from natural and engineered systems -- including earthquakes, craters on the moon, solar flares, file sizes, and stock market crashes. We will use distributions and optimization tools from SciPy to estimate parameters and generate predictions, and Matplotlib to visualize the results.", "recording_license": "", "do_not_record": false, "persons": [{"id": 25, "code": "FXAZKL", "public_name": "Allen Downey", "biography": "Allen Downey is a curriculum designer at the online learning company Brilliant and professor emeritus at Olin College. He is the author of several books related to computer science and data science, including Think Python, Think Stats, and Think Bayes. His blog, Probably Overthinking It, features articles about Bayesian statistics. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 174, "guid": "2b0ff282-9aa3-5e9a-8a18-70c09a2165c2", "logo": "/media/2023/submissions/X8KZ3E/napari-window_5oz2dQ9.png", "date": "2023-07-13T16:30:00-05:00", "start": "16:30", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-174-view-annotate-and-analyze-multi-dimensional-images-in-python-with-napari", "url": "https://cfp.scipy.org/2023/talk/X8KZ3E/", "title": "View, annotate, and analyze multi-dimensional images in Python with napari", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "napari is an n-dimensional image viewer for Python. If you\u2019ve ever tried `plt.imshow(arr)` and made Matplotlib unhappy because `arr` has more than two dimensions, then napari might be for you! napari will gladly *display higher-dimensional arrays* by providing sliders to explore additional dimensions. But napari can also: *overlay* derived data, such as points, segmentations, polygons, surfaces, and more; and *annotate* and *edit* these data, using standard data structures like NumPy or Zarr arrays, allowing you to *seamlessly weave* exploration, computation, and annotation in image analysis.", "description": "napari is an n-dimensional image viewer for Python. If you\u2019ve ever tried `plt.imshow(arr)` and made Matplotlib unhappy because `arr` has more than two dimensions, then napari might be for you!\r\n\r\nThe napari canvas can be 2D or 3D. When you give napari an array with more dimensions than the canvas, it will automatically create sliders for those additional dimensions, allowing you to rapidly explore your full data, rather than a few sampled slices.\r\n\r\nImage analysis and visualization involves more than images though: feature detection algorithms result in *points*, segmentation results in *label images*, annotation results in *shapes* such as rectangles or polygons, and more. Napari provides *layers* that can be displayed on top of each other or side by side, allowing users of Scientific Python to gain a rapid understanding of the algorithms they\u2019re using \u2014 where they work well and where they might go wrong.\r\n\r\nSometimes, image analysis algorithms get you *this* far, but not quite far enough. In such cases, it\u2019s useful to manually curate their output, then  continue with downstream steps of an analysis. Napari provides editing tools for its layer types, allowing one for example to add missing points to the output of a peak detection algorithm, remove incorrect ones, paint over incorrect parts of a segmentation, or draw polygons around missed objects of interest. The resulting data points are saved in standard Scientific Python data structures, such as NumPy or Zarr arrays.\r\n\r\nThis design makes it easy to seamlessly weave together image exploration, image computation, processing, and analysis, and data annotation, curation, and editing.\r\n\r\nNapari provides a *plugin interface*, allowing developers to extend napari\u2019s capabilities, providing users with novel ways to interact with their data. Because napari provides both a library accessible within Python, IPython, and Jupyter, *and* a standalone executable script, we have even found that napari plugins can be an effective way to help collaborators run Python image analysis workflows without needing to launch Python.\r\n\r\nIn this talk, I\u2019ll introduce napari\u2019s history, demonstrate all the features described above, and discuss current limitations and where we\u2019re going.", "recording_license": "", "do_not_record": false, "persons": [{"id": 166, "code": "QUTG3K", "public_name": "Juan Nunez-Iglesias", "biography": "I'm a research scientist helping other scientists get insights from their image data using Python. I've been using Python since 2008, and the main scientific Python ecosystem (NumPy, SciPy, & co) since 2010. In 2012, on a whim, I went to my first SciPy (US) conference, and it changed my life! I realised that \"open source\" didn't mean just posting the code online. It meant actively collaborating on code with other scientists, across vast distances and at different times. Before you could say \"import numpy as np\", I had joined the scikit-image team, written a paper about it, written a whole book on SciPy (!), started new collaborative, open source libraries, and just generally been all-in on Scientific Python. I've been coming back to SciPy as often as I can to pay it forward for new folks in our community! \ud83d\ude0a", "answers": []}], "links": [], "attachments": [], "answers": []}], "Grand Salon C": [{"id": 80, "guid": "161e2e04-c05e-5b03-a529-490376cf2a4c", "logo": "", "date": "2023-07-13T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-80-interactive-analysis-of-satellite-imagery-with-earth-engine-and-geemap", "url": "https://cfp.scipy.org/2023/talk/MFQQRJ/", "title": "Interactive Analysis of Satellite Imagery with Earth Engine and Geemap", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "Google Earth Engine is a cloud-computing platform with a multi-petabyte catalog of satellite imagery and geospatial datasets. Built upon the Earth Engine Python API and open-source mapping libraries, geemap enables Earth Engine users to interactively manipulate, analyze, and visualize geospatial big data in a Jupyter environment. This presentation introduces Earth Engine and highlights the key features of geemap for interactive mapping and geospatial analysis with Earth Engine. Attendees can utilize geemap to create satellite timelapse animations for any location on Earth within 60 seconds. Additional resources will be provided to the attendees to learn more about geemap.", "description": "The Earth is constantly changing, which creates significant challenges for the environment and human society. To tackle these challenges on a global scale, the Earth science community relies heavily on geospatial datasets that are collected through various means, such as satellite, aerial, and mobile sensors. However, the explosive growth of geospatial datasets over the past few decades has overwhelmed the Earth science community's capacity for storage, analysis, and visualization. Fortunately, the advent of cloud-computing platforms, such as Google Earth Engine, has made it possible to access, manipulate, and analyze large volumes of geospatial data on-the-fly. In recent years, Earth Engine has become increasingly popular in the geospatial community and has enabled numerous Earth science applications at local, regional, and global scales.\r\n\r\nThe geemap Python package is built upon the Earth Engine Python API and open-source mapping libraries. It allows Earth Engine users to interactively manipulate, analyze, and visualize geospatial big data in a Jupyter environment. Since its creation in April 2020, geemap has received over [2,500 GitHub stars](https://github.com/giswqs/geemap/stargazers) and is being used by over [800 projects](https://github.com/giswqs/geemap/network/dependents) on GitHub. More than [130 Jupyter notebook examples](https://geemap.org/tutorials/)  and an [open-access book](https://book.geemap.org/) are available for learning geemap. \r\n\r\nThis presentation introduces Earth Engine and highlights the key features of geemap for interactive mapping and geospatial analysis with Earth Engine, such as\r\n- Searching and loading datasets from the Earth Engine Data Catalog\r\n- Visualizing raster and vector datasets interactively\r\n- Using Cloud Optimized GeoTIFFs (COG) and SpatioTemporal Asset Catalogs (STAC)\r\n- Visualizing the Dynamic World global land cover datasets\r\n- Creating satellite timelapse animations\r\n\r\nThis presentation is intended for scientific programmers, data scientists, geospatial analysts, and concerned citizens of Earth. Attendees should have a basic understanding of Python and Jupyter Notebook. Familiarity with Earth science and geospatial datasets is not necessary, but it will be helpful. For more information about Earth Engine and geemap, visit https://earthengine.google.com and https://geemap.org.", "recording_license": "", "do_not_record": false, "persons": [{"id": 78, "code": "ZFYMW8", "public_name": "Qiusheng Wu", "biography": "Qiusheng Wu is an Associate Professor in the Department of Geography & Sustainability at the University of Tennessee, Knoxville. He is also an Amazon Visiting Academic and a Google Developer Expert (GDE) for Earth Engine. His research focuses on Geographic Information Science, remote sensing, and open-source software development. Dr. Wu is an advocate of open science and reproducible research. He has developed several open-source packages that have been widely used by the geospatial community, such as [geemap](https://geemap.org) and [leafmap](https://leafmap.org). For more information about his research, visit https://wetlands.io.", "answers": []}, {"id": 103, "code": "UX3DZA", "public_name": "Steve Greenberg", "biography": "Steve is passionate about using machine learning and remote sensing technology to tackle the climate and sustainability crises. He leads the Developer Relations team for [Google Earth Engine](https://earthengine.google.com/). Earth Engine is a  geospatial analysis platform advancing planetary sustainability and resilience to climate change. His team helps remote sensing professionals, data scientists and machine learning engineers analyze petabytes of satellite imagery to understand and protect the earth. Earth Engine is provided [free-of-charge for noncommercial and research purposes](https://earthengine.google.com/noncommercial/).\r\n\r\nFrom 2016 through 2021, Steve led Developer Relations for BigQuery, Vertex AI and other Machine Learning and Data Analytics products in Google Cloud Platform, where he focused on improving the experience for users of scikit-learn, XGBoost and TensorFlow.\r\n\r\nSteve also co-leads Google's largest grassroots sustainability group - organizing Googlers to incubate new climate initiatives. Three of the climate areas he's worked on - wind energy prediction, real-time precipitation modeling and sustainable building design - have graduated into full-time projects at Google. Prior to joining Google in 2016, Steve led engineering at a Seattle startup helping governments be more accountable to their citizens with public data. Before 2012, Steve was a Program Manager working on various data efforts in Microsoft's Office team.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 193, "guid": "3aa55d15-3362-508e-8c5a-2edc7d926a89", "logo": "/media/2023/submissions/BAD9ZQ/scipy_2023_thumbnail_nZPIWB6.png", "date": "2023-07-13T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-193-accelerating-the-use-of-public-geophysical-data-for-recharging-california-s-groundwater", "url": "https://cfp.scipy.org/2023/talk/BAD9ZQ/", "title": "Accelerating the Use of Public Geophysical Data for Recharging California\u2019s Groundwater", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "Recharging ground aquifers is an urgent task for improving groundwater sustainability in California. Geophysical data can provide a capability to image the subsurface where the major data gap lies. However, neither data nor analytic tools required to derive subsurface information is readily accessible. We present an interactive web application that utilizes a public database, GIS capabilities and directly integrates Jupyter Notebooks and Python packages from researchers to guide recharge site location. Our demonstration showcases how this technology can contribute to improving groundwater recharge in California and how integrating the research knowledge directly into a web application can increase the impact.", "description": "California's Central Valley is one of the world's most productive farmland, but the region faces a serious threat to groundwater sustainability due to population growth and climate change. Recharging ground aquifers is essential to address this challenge, however a major data gap exists in the subsurface. Geophysical data can provide crucial information about the subsurface, but neither the data nor the analytic tools required to derive subsurface information is readily accessible to those working on the recharge problem.\r\nIn this talk, we will present our development of a web-application and companion public database for accelerating groundwater recharge in California, which is a part of the Sustainability Accelerator Project funded by Stanford Doerr School of Sustainability. Our application uses electrical resistivity data obtained from electromagnetic geophysical surveys, as well as ancillary data from driller's logs (containing information about sediment/rock) and water level/quality measurements, to create 2D maps of recharge metrics. These maps guide the location of recharge sites, and the public resistivity and ancillary data are compiled into an online database using Redivis and displayed in a custom web-application. The application provides project partners the ability to utilize research codes without requiring knowledge of Python, and is flexible to allow updates by researchers to support rapid changes and feedback from partners to meet their specific needs for a recharge site location.\r\nThe development of the web-application was a collaborative effort between academic researchers and software engineers at Curvenote. The application enables direct use of research code by front-facing practitioners tackling the recharge problem in California. We utilized open source Python packages, to create Jupyter Notebooks that can execute each stage of the workflow.", "recording_license": "", "do_not_record": false, "persons": [{"id": 224, "code": "7V9CTG", "public_name": "SEOGI KANG", "biography": "Dr. Kang completed his PhD in Geophysics at University of British Columbia, Canada, in 2018. His thesis work focused on electromagnetic imaging and its application to mining problems. Currently, he is a Postdoctoral Researcher in the Geophysics Department at Stanford. His research focus is on maximizing the value of sensor data for advancing groundwater science and management. He is a co-creator of an open-source geophysical software, SimPEG.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 76, "guid": "43885dc9-4a98-565c-be62-84982615e176", "logo": "", "date": "2023-07-13T14:20:00-05:00", "start": "14:20", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-76-uxarray-a-python-library-for-unstructured-climate-and-weather-data", "url": "https://cfp.scipy.org/2023/talk/XMBALS/", "title": "UXarray, a python library for unstructured climate and weather data", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "UXarray aims to provide xarray-styled functionality for unstructured grid datasets. UXarray offers support for loading and representing unstructured grids by utilizing existing Xarray functionality paired with new routines that are specifically written for operating on unstructured grids. In this talk, we will present the current capabilities of the library: reading and writing of unstructured grids, reading of datasets along with basic grid operations and the need to speed up computations, integration operations along with details on speedups obtained by using Numba and python indexing. We will also demonstrate the use of this library for visualization of unstructured grids.", "description": "After less than a year of development, UXarray has already become a popular Python repository with an active community engagement, boasting more than 10 forks and 77 stars on GitHub.\r\n\r\nThe UXarray project aims to bridge the gap between traditional operations on structured grids and modern standards for unstructured grids, such as the UGRID specification. Global climate models have traditionally used rectangular latitude-longitude grids for their data layout, but these grids lead to computational challenges at high resolutions due to the convergence of lines of longitude at the poles. Therefore, modeling centers worldwide have adopted unstructured grids that allow for quasi-uniform distribution of data over the sphere. However, analyzing data on these grids is far more difficult than on latitude-longitude grids, often requiring groups to apply lossy regridding to their data so that traditional tools can be applied. To partly address this problem, groups worldwide have moved towards the adoption of standards for unstructured grid data, such as the UGRID specification developed under the Climate-Forecast (CF) conventions.\r\n\r\nMost climate models output data in the NetCDF format, and the CF conventions are an important standard for organizing the metadata of these files and includes details on how to describe a rectangular latitude longitude grid. The UGRID specification describes how a NetCDF file can represent an unstructured grid, but it has potential issues. Currently, the UGRID specification is under consideration to be included in the netCDF CF conventions.\r\n\r\nOur new Python library, UXarray, supports operations directly on unstructured grid data, reducing the need for creating regular-grid copies of unstructured grid output and simplifying the workflow. Unstructured grids can be provided in files following various conventions, such as UGRID, SCRIP, EXODUS, etc. These conventions have different definitions and representations of the attributes and variables used to describe the unstructured grid topology. Moreover, the UGRID convention does not enforce standard variable namings for most of the attributes and variables, except for a few required ones. UXarray unifies all of these conventions at the data loading step by representing grids internally in the UGRID convention, regardless of the original grid file type. Furthermore, it uses a set of standardized names for topology attributes and variables, while still providing the user with the original attribute names and variables from the grid definition file. All of these features lay the foundation for the development of quick and efficient algorithms for climate scientists around the world.\r\n\r\nOur design for UXarray aims to maintain Xarray interoperability, which allows us to utilize various Xarray-compatible packages. UXarray uses Numba for loop optimizations and faster computation. Additionally, we provide examples and performance metrics showcasing interoperable read/write operations, grid and corresponding data reading, efficiency and optimization built into UXarray, and visualization.\r\n\r\nOverall, UXarray aims to simplify the workflow for climate and weather scientists working with unstructured grids and allow them to efficiently analyze and visualize their data.", "recording_license": "", "do_not_record": true, "persons": [{"id": 61, "code": "YD8CDL", "public_name": "Rajeev Jain", "biography": "Rajeev Jain is a Principal Research Software Specialist at the Argonne National Laboratory, located in the suburbs of Chicago, with a focus on managing multi-disciplinary simulation, scalability and computation for applications-oriented problems.\r\n\r\nHe is a quick learner who loves to solve complex problems and readily adapts to new challenges. His work encompass a range of scientific domains, from simulating physical phenomena to developing deep-learning-enabled precision medicine for cancer and providing data analysis tools for the geoscience community.\r\n\r\nTo learn more about Rajeev Jain's work and research, you can visit his profile page on the Argonne website: https://www.anl.gov/profile/rajeev-jain\r\n\r\nLinkedIn: https://linkedin.com/in/rajeeja", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 158, "guid": "a23019cc-549f-59ea-af50-b6af4a9a58b5", "logo": "", "date": "2023-07-13T15:00:00-05:00", "start": "15:00", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-158-introducing-ytxarray", "url": "https://cfp.scipy.org/2023/talk/QYHD3G/", "title": "Introducing yt_xarray", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "*yt_xarray* is a new package in the scientific python ecosystem for linking *yt* and *xarray*. *yt*, primarily used in computational astrophysics, has gradually broadened support for scientific domains, including geoscience disciplines. Most geoscience data, however, still requires manual steps to load into *yt*. *yt_xarray*, a new *xarray* extension, aims to streamline communication of data from *xarray* to *yt*, providing a potentially useful tool to the many geoscience researchers already using *xarray* while allowing *yt* to leverage the distributed backends already supported by *xarray*. In this presentation, we will provide an overview of the usage and design of *yt_xarray*.", "description": "A number of recent efforts within the [*yt*](https://yt-project.org/) community have broadened the scope of scientific domains supported by *yt*. Some of these efforts included improving generic functionality while others focused on adding functionality required for specific domains outside the astrophysics scientific community. For geoscience data in particular, the addition of a geographic coordinate handler and an interface to [*cartopy*](https://scitools.org.uk/cartopy/docs/latest/) for producing maps within the *yt* plotting framework enabled analysis of geographic datasets. Getting the data into *yt*, however, was not as streamlined as it could be; with the exception of some new custom data ingestors (termed \"frontends\" in yt) for specific geoscience data products, most geoscience data still required manual loading of arrays with generic *yt* loaders. In addition to extra steps for the user, this limitation also required that the data fit entirely within memory. [*yt_xarray*](https://yt-xarray.readthedocs.io/en/latest/) fills this gap in data regularization required for loading geodata in *yt* by leveraging [*xarray*](https://docs.xarray.dev/en/stable/) for reading of data on demand as *yt* needs it. \r\n\r\nRather than a traditional *yt* frontend, *yt_xarray* v0.1 introduced an *xarray* `accessor` object that streamlines the creation of *yt* datasets from subsets of fields, simplifying the process of using *yt* with most regularly gridded datasets that *xarray* can load. While the initial release focuses on simply returning a *yt* dataset object for use with any *yt* function, future releases will further simplify access to *yt* functions from *xarray* by providing *yt* function wrappers from within *yt_xarray*.\r\n\r\nWhile *yt* and *xarray* have some similarity in that they both load and manpipulate coordinate-referenced arrays, *yt* is inherently is designed primarily for volumetric data while *xarray* supports sets of labeled arrays more generally. This difference informed a number of important design choices in *yt_xarray*, in particular with regards to how chunked arrays are handled. For gridded datasets in *yt*, a physical domain can be subdivided into multiple grid objects so that a single *yt* \"chunk\" maps to a subdomain of the whole grid. During processing, subdomains are processed sequentially so that data is loaded as needed. In *xarray*, chunks are defined as contiguous index ranges within arrays, with the actual data potentially residing in on-disk files or existing as delayed computations. *yt_xarray* merges these two chunking systems by building *yt* grids that map spatial subdomains to index ranges of *xarray* fields. This allows a 1:1 mapping of *Dask*-*xarray* chunks to yt grid objects but also allows multiple *Dask*-*xarray* chunks to be contained within a yt grid object. \r\n\r\nIn this presentation, we will provide an overview for using *yt_xarray* for loading and analyzing regularly gridded 2D and 3D *xarray* datasets. In addition to the general usage and development plans, we will describe the design of *yt_xarray* with a focus on leveraging the performance benefits of distributed arrays loaded via *xarray*.", "recording_license": "", "do_not_record": false, "persons": [{"id": 187, "code": "QCYV8M", "public_name": "Chris Havlin", "biography": "Chris Havlin is a Research Scientist in the School of Information Sciences at the University of Illinois. His work focuses on open source scientific software development and computational geodynamics.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 194, "guid": "22ba796f-af18-5651-aff7-dc4287ad8120", "logo": "", "date": "2023-07-13T15:50:00-05:00", "start": "15:50", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-194-tidy-geospatial-cubes", "url": "https://cfp.scipy.org/2023/talk/LCWBBP/", "title": "Tidy Geospatial Cubes", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "The open-source project, Xarray, combines labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. Xarray has strong user bases in the physical sciences and geospatial community. However, new users commonly struggle to fit their dataset into the Xarray model and with conceptualizing and constructing an Xarray object that makes subsequent analysis steps easy (\u201cdataset wrangling\u201d). We take inspiration from the \u201ctidy data\u201d concept for dataframes \u2014 \u201cdatasets structured to facilitate analysis\u201d (Wickham, 2014) \u2014 and attempt a definition of tidy data for labeled array objects provided by Xarray.", "description": "The open-source project, Xarray, combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays (\"cubes\") to provide an intuitive and scalable interface for scientific analysis. Xarray is now widely used across many areas of scientific research, with a particularly strong user base in the physical sciences. New users commonly struggle to fit their dataset into the Xarray data model and, in particular, struggle with conceptualizing and constructing an Xarray object that makes subsequent analysis steps easy (\u201cdataset wrangling\u201d). We take inspiration from the \u201ctidy data\u201d concept for dataframes \u2014 \u201cdatasets structured to facilitate analysis\u201d (Wickham, 2014) \u2014 and attempt a definition of tidy data for labeled array objects provided by Xarray.\r\n\r\nA \u2018tidy dataset\u2019 framework will help streamline processing workflows across the physical sciences and provide a set of norms and principles to guide the use and construction of large and complex datasets encountered in these fields. The utility of this exercise is twofold: helping dataset producers construct more useful Analysis-Ready datasets; and developing a set of guidelines that can help users wrangle their datasets into a form that enables convenient analysis with Xarray. In addition, a commonly-defined concept for \u2018tidy\u2019 geospatial array data might enable development of \u2018tidy\u2019 tools that consume and produce tidy datasets (Wickham, 2014).\r\n\r\nThis presentation will examine three datasets and the processes of \u2018tidying\u2019 them. We will demonstrate various ways that a dataset may be \u2018untidy\u2019 \u2014 not conducive to analysis \u2014 and present a useful set of rules to define \u2018tidy geospatial cubes.\u2019 The examples we will discuss are: 1) Harmonized Landsat Sentinel-2 (HLS), a dataset of multispectral reflectance measurements, 2) Aquarius, a dataset of remotely sensed sea surface salinity measurements; and 3) ITS_LIVE, a multi-sensor dataset of ice velocity measurements for glaciers and ice sheets based on satellite image pairs. Our presentation will walk through common analytical workflows with these remote sensing datasets and highlight the organizational choices a user must make along the way (related to metadata, variables, coordinates, and dimensions) to efficiently arrive at a computational result with Xarray.\r\n\r\nDefining a common framework for labeled array objects will ease the learning curve for new users and minimize the time spent on data-wrangling steps. At present, the examples are satellite remote sensing datasets, and we recognize that there might be elements of the \u2018tidy Xarray\u2019 definition that are specific to this subdomain. We hope to spark a discussion that will help generalize the presented principles.", "recording_license": "", "do_not_record": false, "persons": [{"id": 225, "code": "CFCJHA", "public_name": "Emma Marshall", "biography": "I am a graduate student at the University of Utah in the Geography Department. My research uses remote sensing data and other tools to study recent variability of alpine glaciers in High Mountain Asia. I am excited to return for my second SciPy after attending for the first time in 2022!", "answers": []}, {"id": 31, "code": "RKNM7F", "public_name": "Deepak Cherian", "biography": null, "answers": []}, {"id": 240, "code": "W73W8Q", "public_name": "Scott Henderson", "biography": "Scott is research scientist in the University of Washington (UW) Department of Earth and Space Sciences and data science fellow at the eScience Institute. He works on numerous NASA-funded efforts to develop open Cloud computing solutions for data intensive research.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 223, "guid": "c7afae56-7fee-5cd6-95bf-98034a2646de", "logo": "", "date": "2023-07-13T16:30:00-05:00", "start": "16:30", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-223-climate-model-evaluation-workflow-built-on-jupyter-notebooks", "url": "https://cfp.scipy.org/2023/talk/EPPR7R/", "title": "Climate Model Evaluation Workflow Built on Jupyter Notebooks", "subtitle": "", "track": "Earth, Ocean, Geo, and Atmospheric", "type": "Talk", "language": "en", "abstract": "This project introduces an extensible workflow used to evaluate climate model output using collections of Jupyter notebooks. The workflow supports parametrizing and batch-executing notebooks using Papermill, in conjunction with developing notebooks interactively. Additional features include integration with Dask and caching intermediate data products generated by notebooks. The final product of the workflow can automatically be built into a Jupyter book for easy presentation and shareability. While it was initially developed for climate modeling, the flexible and extensible nature of this framework makes it adaptable to any kind of data analysis work, and the presentation will highlight this capability.", "description": "Motivation\r\n\r\nWithin the field of climate modeling, there is a need to run collections of scripts generating plots of common diagnostic metrics of climate model output, for example as models are run with different configurations during development. These scripts often involve manual configuration, and the output is not necessarily well-organized for interpreting and sharing. Jupyter notebooks help address this problem, creating more readable workflows that can be annotated and edited interactively, then easily presented to others as a Jupyter book. However, Jupyter notebooks are not by default parameterizable or runnable in batches. This project addresses this gap by utilizing Papermill to create a package that can run collections of Jupyter notebooks with configurable parameters, cache generated data products, and publish results as a Jupyter book, while continuing to support the interactive development work that Jupyter notebooks enable. This framework is not limited to use within climate modeling; the infrastructure is useful to any data science project that would benefit from a batch-executable, parameterizable, and shareable Jupyter notebook-based workflow. \r\n\r\nMethods\r\n\r\nThis project uses a number of existing open-source Python tools, building on the Jupyter ecosystem using Papermill as well as Jinja templating, supporting Dask functionality, and publishing a Jupyter book. It brings these tools together to create a powerful workflow that combines their functionality. The project infrastructure will be published as a Python package and on Github, and examples showcasing its functionality will be made available.\r\n\r\nResults\r\n\r\nCurrently (as of 3/1/23), the project is in the development stage, with several working demos. By the time of the conference, a more complete version will be public on Github with documentation and installable as a Python package, along with examples that can be downloaded and built on.\r\n\r\nConclusion\r\n\r\nWe have developed a framework for data analysis using collections of parameterizable Jupyter notebooks, along with infrastructure to support Dask, caching of data products, building a Jupyter book and other features. This is a powerful application of the Jupyter ecosystem and can be applied to a wide range of fields outside of the climate model evaluation use case it was initially developed for.", "recording_license": "", "do_not_record": false, "persons": [{"id": 248, "code": "YGBXTD", "public_name": "Elena Romashkova", "biography": "I'm an Associate Scientist 1 in the Oceanography Section of NCAR's Climate and Global Dynamics Lab.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 105": [{"id": 285, "guid": "886ab91e-f38b-55e7-9fb3-caae241b78fa", "logo": "", "date": "2023-07-13T13:15:00-05:00", "start": "13:15", "duration": "00:55", "room": "Classroom 105", "slug": "2023-285--bof-room-105-scientific-python-ecosystem-coordination", "url": "https://cfp.scipy.org/2023/talk/3HXLZV/", "title": "[BoF Room 105] Scientific Python Ecosystem Coordination", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Scientific Python Ecosystem Coordination (SPEC) documents (https://scientific-python.org/specs/) provide operational guidelines for projects in the scientific Python ecosystem. SPECs are similar to project-specific guidelines (like PEPs, NEPs, SLEPs, and SKIPs), but are opt-in, have a broader scope, and target all (or most) projects in the scientific Python ecosystem. Come hear more about what we are working on and planning. Better yet, come share your ideas for improving the ecosystem!", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 271, "code": "SE7SNC", "public_name": "Juanita Gomez", "biography": "Juanita Gomez is passionate programmer, mathematician and open source advocate; former developer of Spyder IDE at Quansight. She has a BS in Pure Mathematics from Pontificia Universidad Javeriana in Colombia and is currently pursuing a Ph.D position in Computer Science at UC Santa Cruz. She is a community manager for the Scientific Python project, a community effort to better coordinate and support scientific Python libraries.", "answers": []}, {"id": 277, "code": "ZUXENY", "public_name": "Jarrod Millman", "biography": null, "answers": []}, {"id": 275, "code": "DAWPXW", "public_name": "St\u00e9fan van der Walt", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 288, "guid": "ac20c04b-8609-52ca-a85b-407c03469d2b", "logo": "", "date": "2023-07-13T18:30:00-05:00", "start": "18:30", "duration": "00:55", "room": "Classroom 105", "slug": "2023-288--bof-room-105-scientific-python-packaging-summit", "url": "https://cfp.scipy.org/2023/talk/H3GNAT/", "title": "[BoF Room 105] Scientific Python Packaging Summit", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "\"Python packaging is a rapidly changing landscape, plagued by many hurdles and challenges for users. The scientific Python community faces some of the greatest difficulties of anyone here, given the high reliance on external binaries and compiled code, the diversity of packaging ecosystems (PyPI, Conda, others), and the fact that many if not most users are not professional software engineers, like in other ecosystems. This is made all the more critical by the importance of reproducible research, and its sensitivity to even small dependency changes.\r\n\r\nWe'd like to build on the recent momentum behind evolving the packaging landscape to better serve these needs and building bridges between key players in the core Python and scientific spaces, with an intense, engaging and open discussion. This will bring together the key community stakeholders and everyday package authors to sync up on best practices, strengthen collaboration, and help come to consensus that would take months or even years if not for in-person discussion, as well as provide a jumping-off point for followup conversations and future action items.\"", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 473, "code": "8RUQ3H", "public_name": "C.A.M. Gerlach", "biography": null, "answers": []}, {"id": 481, "code": "LEUJSP", "public_name": "Henry Schreiner III", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 103": [{"id": 283, "guid": "f54c909c-ccee-5cda-bfbb-1f6d94237bdb", "logo": "", "date": "2023-07-13T13:15:00-05:00", "start": "13:15", "duration": "00:55", "room": "Classroom 103", "slug": "2023-283--bof-room-103-pyarrow-in-pandas-and-dask", "url": "https://cfp.scipy.org/2023/talk/SZP3LA/", "title": "[BoF Room 103] PyArrow in pandas and Dask", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "DataFrame libraries in general, pandas and Dask specifically, are moving towards a better integration with PyArrow. This has many benefits, like improved performance and a reduced memory footprint. We want to connect with users to discuss how PyArrow can improve DataFrame libraries and what they expect out of PyArrow support. This can include things like improved performance, more consistent behavior or better interoperability with other libraries.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 470, "code": "UBSSDL", "public_name": "Patrick Hoefler", "biography": null, "answers": []}, {"id": 479, "code": "Y89CZX", "public_name": "James Bourbeau", "biography": null, "answers": []}, {"id": 45, "code": "URCPG3", "public_name": "Matt Harrison", "biography": "Matt is a corporate trainer, author, and consultant on Python and Data Science. He has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books: Effective Pandas, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He just published Machine Learning Pocket Reference and Pandas Cookbook (Second Edition). He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), as well as small companies. He has been using Python since 2000 and has taught thousands through live training both online and in person.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 286, "guid": "572ca456-ca48-5bdc-b501-bbfbca7f0756", "logo": "", "date": "2023-07-13T18:30:00-05:00", "start": "18:30", "duration": "00:55", "room": "Classroom 103", "slug": "2023-286--bof-room-103-python-visualization-and-app-tools", "url": "https://cfp.scipy.org/2023/talk/GEDFS7/", "title": "[BoF Room 103] Python Visualization and App Tools", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Each new SciPy brings even more tools for data visualization and for building data-rich scientific applications and dashboards. This BoF brings together maintainers of Python tools for data visualization and building apps to help make sense of this complex landscape for users and to highlight new developments, trends, and opportunities. Join us and stay ahead of the curve!", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 186, "code": "RKAYQQ", "public_name": "James A. Bednar", "biography": "Jim Bednar is the Director of Custom Services at Anaconda, Inc. Dr. Bednar holds a Ph.D. in Computer Science from the University of Texas, along with degrees in Electrical Engineering and Philosophy. He has published more than 50 papers and books about the visual system, software development, and reproducible science. Dr. Bednar manages the HoloViz project, a collection of open-source Python tools that includes Panel, hvPlot, Datashader, HoloViews, GeoViews, Param, Lumen, and Colorcet. Dr. Bednar was a Lecturer and Reader in Computational Neuroscience at the University of Edinburgh from 2004-2015, and previously worked in hardware engineering and data acquisition at National Instruments.", "answers": []}, {"id": 482, "code": "J9WXRF", "public_name": "Elliott Sales de Andrade", "biography": null, "answers": []}, {"id": 117, "code": "NEC33M", "public_name": "Bane Sullivan", "biography": "[Bane Sullivan](https://banesullivan.com), co-creator of [PyVista](https://github.com/pyvista/), is a Research Software Engineer working at the intersection of geoscience, visualization, and data science.\r\n\r\nBane is a geophysicist/hydrologist by training and has been working to grow PyVista's adoption within the subsurface geoscience communities.", "answers": []}, {"id": 9, "code": "7JRHGL", "public_name": "Sophia Yang", "biography": "Sophia Yang is a Senior Data Scientist and a Developer Advocate at Anaconda. She is passionate about the data science community and the Python open-source community. She is the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She serves on the Steering Committee and the Code of Conduct Committee of the Python open-source visualization system HoloViz. She also volunteers at NumFOCUS, PyData, and SciPy conferences. She holds an M.S. in Computer Science, an M.S. in Statistics, and a Ph.D. in Educational Psychology from The University of Texas at Austin.", "answers": []}, {"id": 471, "code": "VJMN7Y", "public_name": "Juan Nunez-Iglesias", "biography": null, "answers": []}, {"id": 262, "code": "FK9UWM", "public_name": "Kushal Kolar", "biography": "https://github.com/kushalkolar", "answers": []}, {"id": 532, "code": "3H8V9J", "public_name": "Jon Mease", "biography": null, "answers": []}, {"id": 538, "code": "9QUXNU", "public_name": "Nathan Jessurun", "biography": null, "answers": []}, {"id": 548, "code": "SDZEED", "public_name": "Hadley Wickham", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 104": [{"id": 284, "guid": "938ebcc1-31aa-547f-ae80-2b3ae9e3566b", "logo": "", "date": "2023-07-13T13:15:00-05:00", "start": "13:15", "duration": "00:55", "room": "Classroom 104", "slug": "2023-284--bof-room-104-where-on-earth-is-my-pixel-", "url": "https://cfp.scipy.org/2023/talk/A9EGX9/", "title": "[BoF Room 104] Where on Earth is my Pixel?", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Imaging communities across different fields (microscopy, remote sensing, medical imaging, materials science) are currently all moving to develop cloud- and chunking friendly imaging formats based around Zarr. This includes OME-NGFF and GeoZarr. Although pretty much everyone has agreed on Zarr as the container for the image data, there is ongoing discussion about how best to store metadata about the images. In this BoF we'll discuss ways to encode *where* each pixel in the image is located in space (and time!) (and frequency!), and whether it's possible to harmonize this encoding across the different formats and standards. A relevant issue is https://github.com/ome/ngff/issues/174.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 471, "code": "VJMN7Y", "public_name": "Juan Nunez-Iglesias", "biography": null, "answers": []}, {"id": 480, "code": "EYSAPF", "public_name": "Josh Moore", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 287, "guid": "a0f0db23-015a-50fc-abca-6882545e3874", "logo": "", "date": "2023-07-13T18:30:00-05:00", "start": "18:30", "duration": "00:55", "room": "Classroom 104", "slug": "2023-287--bof-room-104-funding-open-source-software", "url": "https://cfp.scipy.org/2023/talk/7DDDWU/", "title": "[BoF Room 104] Funding Open Source Software", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Scientific open source software has often advanced by volunteer efforts with little financial support. In recent years, there has been an increase in different groups funding open source software. How has this changed the open source community? Where would future funding have the largest impact in the open source landscape? What new thing would you build that would make the lives of developers, researchers, and users easier? How much support is needed and what are the best ways to provide that support? What large scale project doesn\u2019t exist that *needs* to exist? How do you balance funded and volunteer efforts? Join this lively discussion to help identify key focus areas for open source funding and resources.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 472, "code": "ETMR7R", "public_name": "Demitri Muna", "biography": null, "answers": []}, {"id": 484, "code": "EUKXBP", "public_name": "Paige Martin", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 5, "date": "2023-07-14", "day_start": "2023-07-14T04:00:00-05:00", "day_end": "2023-07-15T03:59:00-05:00", "rooms": {"Zlotnik Ballroom": [{"id": 281, "guid": "d4ee0251-3a7d-5e02-87ea-68e8bcd938c7", "logo": "", "date": "2023-07-14T09:15:00-05:00", "start": "09:15", "duration": "00:45", "room": "Zlotnik Ballroom", "slug": "2023-281-keynote-responsible-ai-in-practice-how-far-we-ve-come-and-where-we-re-going", "url": "https://cfp.scipy.org/2023/talk/HPFVLT/", "title": "Keynote - Responsible AI in Practice: How far we've come and where we're going", "subtitle": "", "track": "Keynote", "type": "Talk", "language": "en", "abstract": "Dr. Rumman Chowdhury is a trailblazer in the field of applied algorithmic ethics, creating cutting-edge socio-technical solutions for ethical, explainable and transparent AI. She currently runs Parity Consulting, Parity Responsible Innovation Fund, and is a Responsible AI Fellow at the Berkman Klein Center for Internet & Society at Harvard University. She is also a Research Affiliate at the Minderoo Center for Democracy and Technology at Cambridge University and a visiting researcher at the NYU Tandon School of Engineering. Previously, she was the director of the ML Ethics, Transparency, and Accountability team at Twitter identifying and mitigating algorithmic harms on the platform. Before that she was CEO and founder of Parity, an enterprise algorithmic audit platform company. She formerly served as Global Lead for Responsible AI at Accenture Applied Intelligence. In her work as Accenture\u2019s Responsible AI lead, she led the design of the Fairness Tool, a first-in-industry algorithmic tool to identify and mitigate bias in AI systems. Dr. Chowdhury has been featured in international media, including the Wall Street Journal, Financial Times, Harvard Business Review, NPR, MIT Sloan Magazine among others. She was named one of BBC\u2019s 100 Women, recognized as one of the Bay Area\u2019s top 40 under 40, and honored to be inducted to the British Royal Society of the Arts (RSA).", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 411, "code": "AATXR9", "public_name": "Dr. Rumman Chowdhury", "biography": "Dr. Rumman Chowdhury currently runs Parity Consulting, Parity Responsible Innovation Fund, and is a Responsible AI Fellow at the Berkman Klein Center for Internet & Society at Harvard University. She is also a Research Affiliate at the Minderoo Center for Democracy and Technology at Cambridge University and a visiting researcher at the NYU Tandon School of Engineering.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 33, "guid": "69f107a2-ac40-527f-86f6-670f42fed1cf", "logo": "", "date": "2023-07-14T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-33-modern-compute-stack-for-scaling-large-ai-ml-workloads", "url": "https://cfp.scipy.org/2023/talk/ALEQSL/", "title": "Modern compute stack for scaling large AI/ML workloads", "subtitle": "", "track": "Machine Learning, Data Science, and Ethics in AI", "type": "Talk", "language": "en", "abstract": "Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time stitching and managing bespoke distributed systems to build end-to-end ML applications and push models to production.\r\n\r\nTo address this, the Ray community has built Ray AI Runtime (Ray AIR), an open-source toolkit for building large-scale end-to-end ML applications.", "description": "Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time stitching and managing bespoke distributed systems to build end-to-end ML applications and push models to production.\r\n\r\nTo address this, the Ray community has built Ray AI Runtime (Ray AIR), an open-source toolkit for building large-scale end-to-end ML applications.\r\n\r\nRay is a distributed compute framework, powering large scale machine learning models such as OpenAI's ChatGPT. By leveraging Ray\u2019s distributed compute strata and library ecosystem, the Ray AI Runtime brings scalability and programmability to ML platforms. \r\n\r\nThe main focus of the Ray AI Runtime is to provide the compute layer for Python-based AI/ML workloads and is designed to interoperate with popular ML frameworks and other systems for storage and metadata needs.\r\n\r\nIn this session, we\u2019ll explore and discuss the following:\r\nWhy and what is Ray \r\nHow AIR, built atop Ray, allows you to program and scale your machine learning workloads easily \r\nAIR\u2019s interoperability and easy integration points with other systems for storage and metadata needs\r\nAIR\u2019s cutting-edge features for accelerating the machine learning lifecycle such as data preprocessing, last-mile data ingestion, tuning and training, and serving at scale\r\n\r\nKey takeaways for attendees are:\r\n\r\n* Ray as a general purpose framework for distributed computing \r\n* Understand how Ray AI Runtime can be used to implement scalable, programmable machine learning workflows.\r\n* Learn how to pass and share data across distributed trainers and Ray native libraries: Tune, Serve, Train, RLlib, etc.\r\n* How to scale python-based workloads across supported public clouds", "recording_license": "", "do_not_record": false, "persons": [{"id": 19, "code": "YQ79UZ", "public_name": "Jules S. Damji", "biography": "Jules S. Damji is a lead developer advocate at Anyscale Inc, an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).", "answers": []}, {"id": 424, "code": "FSNSLD", "public_name": "Amog Kamsetty", "biography": "Amog is a Senior Software Engineer at Anyscale where works on the Ray open source project building solutions for distributed machine learning workloads including distributed model training and offline inference.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 216, "guid": "1bc0239f-e232-52b7-b229-d2e51189dfaa", "logo": "/media/2023/submissions/Q9KTXS/Screen_Shot_2023-03-01_at_5.56.53_PM_NwW79q4.png", "date": "2023-07-14T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-216-ultra-fast-visualization-of-large-datasets-using-modern-graphics-apis-in-jupyter-notebooks", "url": "https://cfp.scipy.org/2023/talk/Q9KTXS/", "title": "Ultra fast visualization of large datasets using modern graphics APIs in jupyter notebooks", "subtitle": "", "track": "Bioinformatics, Computational Biology & Neuroscience", "type": "Talk", "language": "en", "abstract": "Fast interactive visualization remains a considerable barrier in analyses pipelines for large neuronal datasets. Here, we present *fastplotlib*, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. *Fastplotlib* is built upon *pygfx* which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as *Vulkan* for fast rendering of objects. *Fastplotlib* is non-blocking, allowing for interactivity with data after plot generation. Ultimately, *fastplotlib* is a general purpose scientific plotting library that is useful for the fast and live visualization and analysis of complex datasets.", "description": "Over the past decade, advanced analyses pipelines have been developed for large neuronal datasets [1][2]. However, fast visualization and live interactivity during data collection is largely unsupported. While current tools within the Python plotting ecosystem (ex. *pyqtgraph, VisPy, napari*) allow for interactive data visualization, they either fail to leverage modern GPUs efficiently, lack intuitive APIs for rapid prototyping, or require users to write their own shaders. Additionally, other popular plotting libraries, such as *bokeh* and *matplotlib*, are not geared towards fast interactive visualization with millions of objects. Given these challenges with current visualization tools, the need for a modern GPU-driven interactive plotting library exists. In this presentation, we will go through the technical details, as well as a brief demo on how *fastplotlib* makes fast interactive visualization of complex neuronal datasets possible. We will also demonstrate the broader applicability of *fastplotlib* as a fast, general-purpose plotting library. \r\n\r\n*Fastplotlib* is built on top of *pygfx* which is a cutting edge Python rendering engine that utilizes *Vulkan*, which can efficiently leverage modern GPU and CPU hardware. *Vulkan*, released in 2016, is the successor to *OpenGL* and features a low overhead with respect to the amount of code per-draw-per-object allowing for speed even when rendering millions of objects. *Pygfx* is also non-blocking, which allows for interactivity and modification of already drawn objects. *Fastplotlib* utilizes the *pygfx* rendering library for fast visualization with an expressive API for scientific visualization. The benefits of *fastplotlib* are that it reduces boilerplate code which allows users to focus on their data without having to manage the underlying rendering process. Additionally, *fastplotlib* allows for animations as well as high-level interactivity among plots, which can be combined with lazy loading of very large neuronal imaging movies that are hundreds of gigabytes or terabytes in size. Furthermore, *fastplotlib* can be used in jupyter notebooks, allowing it to be used on cloud computing and other remote infrastructures. In total, these unique features and the underlying architecture create a plotting library that is fast, easy to use, and multifaceted.\r\n\r\nInitially, *fastplotlib* was developed for use in the neuroscience community to aid in the analysis of large neuronal datasets. However, the long term goal of this project is to provide an open source software that serves as a general-purpose scientific plotting library. As we are currently in the early stages of development, we are looking for community involvement and to connect with other developers to further progress our software package.\r\n\r\nhttps://github.com/kushalkolar/fastplotlib", "recording_license": "", "do_not_record": false, "persons": [{"id": 148, "code": "VPQY8Y", "public_name": "Caitlin Lewis", "biography": "I am a current undergraduate student at the University of North Carolina at Chapel Hill studying Computer Science and Statistics. I currently work for the Hantman Lab in the UNC Neuroscience Center helping to develop tools to aid in the analysis and visualization of large calcium imaging datasets.", "answers": []}, {"id": 262, "code": "FK9UWM", "public_name": "Kushal Kolar", "biography": "https://github.com/kushalkolar", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 234, "guid": "52edb39f-bebd-54c5-b021-e03160d7313f", "logo": "/media/2023/submissions/SXJFBQ/datajoint_talk_gq7K6nn.png", "date": "2023-07-14T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-234-datajoint-bringing-databases-back-into-data-science", "url": "https://cfp.scipy.org/2023/talk/SXJFBQ/", "title": "DataJoint: Bringing databases back into data science", "subtitle": "", "track": "Bioinformatics, Computational Biology & Neuroscience", "type": "Talk", "language": "en", "abstract": "Relational databases manage structured data and facilitate queries in collaborative repositories, but using SQL from a scientific programming language is awkward. DataJoint is an open-source framework for managing scientific data supporting data definition, diagramming, and queries. DataJoint makes computation a native part of its data model, bridging the gap between databases and numerical analysis in automated workflows. We will showcase the elegance of the relational data model and its versatility through neuroscience research examples. We will also introduce the DataJoint SciViz library, enabling scientists to build web apps for data visualization and unlocking further potential for data-driven discovery.", "description": "Research teams work on complex scientific data with many contributors. They execute quickly evolving and complex computational pipelines around such data. This requires a systematic approach to structuring data with clarity and transparency, linking it with distributed computation. Relational databases solve many of these problems; they support data integrity and facilitate queries in large, collaborative repositories. However, working with relational databases through SQL from Python can be awkward. As a result, many data scientists have dismissed relational databases and missed out on their great capabilities. Enter DataJoint, an open-source framework designed explicitly for managing scientific data.\r\n\r\nDataJoint uses a relational database system as its backend but utilizes Python programming constructs to define and query the database, similar to object-relational mappers commonly used in web development. It is specifically designed from the ground up for supporting complex data and distributed computations, making it an ideal tool for data scientists.\r\n\r\nOne of the most significant advantages of DataJoint is that it allows you to design complex databases directly from a Jupyter notebook. It provides its own sublanguage for defining database schemas to capture relationships between data elements, including beautiful diagrams for convenient navigation. DataJoint also provides a convenient query language that reduces the complexity of SQL select statements into an algebra of five operators. Data operations are well integrated with other data science tools such as numpy and pandas.\r\n\r\nMost importantly, DataJoint makes computations a first-class citizen in its data model. Computational dependencies are encoded as part of the database design, so the database schema serves to specify the computational data pipeline and workflow.\r\n\r\nDataJoint has been in continuous development and use for about 14 years and is currently used in approximately a hundred research labs. A rich collection of standardized workflows, DataJoint Elements, has been in development by the research community.\r\n\r\nIn this talk, we will introduce the basic principles of scientific databases, including how to create a database, how to visualize its structure, how to enter and delete data, and how to define and execute computational dependencies. We will also showcase examples from past and current neuroscience projects. For large-scale computations, DataJoint can be combined with job orchestration tools for scalable computing.\r\n\r\nFurthermore, we will introduce the new DataJoint SciViz library that provides a low-code approach for creating websites for data visualization to show off your work. DataJoint has become a part of the data science tool stack for working with scientific databases, providing the full rigor of relational databases for maintaining data integrity and consistency, especially in dynamic collaborative projects.\r\n\r\nFinally, we will share some glimpses of our future developments and invite diverse teams to contribute and collaborate, making DataJoint an even more powerful tool for managing scientific data. With DataJoint, scientists can bring relational databases into the modern era of data science and streamline their data management and computational workflows.", "recording_license": "", "do_not_record": false, "persons": [{"id": 430, "code": "8N8WYR", "public_name": "Dimitri Yatsenko", "biography": "Dimitri Yatsenko has a PhD in Neuroscience (Baylor College of Medicine) and Masters in Computational Engineering and Science (University of Utah). As CEO at DataJoint, he leads a team of scientists and engineers to develop tools for analyzing and managing neuroscience data for advanced collaborative projects. He serves as Principal Investigator on NIH grants to develop open-source software and a cloud platform supporting standardized data pipelines for common types of neuroscience experiments.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 241, "guid": "e008c62f-224f-5660-9b74-cf638a331a8a", "logo": "", "date": "2023-07-14T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Zlotnik Ballroom", "slug": "2023-241-an-api-for-efficient-and-low-latency-access-to-the-largest-standardized-single-cell-data-repository-by-cz-cellxgene-discover-", "url": "https://cfp.scipy.org/2023/talk/CQNJ9Z/", "title": "An API for efficient and low-latency access to the largest standardized single-cell data repository by CZ CELLxGENE Discover.", "subtitle": "", "track": "Bioinformatics, Computational Biology & Neuroscience", "type": "Talk", "language": "en", "abstract": "CZ CELxGENE Discover has released all of its human and mouse single-cell data through a new API that allows for efficient and low-latency querying. The data is fully standardized, hosted publicly and it is composed by a count matrix of 50 mi cells (observations) by >60 k genes (features) accompanied by cell and gene metadata. While these data are built from more than 700 datasets, the API enables convenient cell- and gene-based filtering to obtain any slice of interest in a matter of seconds. All data can be quickly transformed to numpy, pandas, anndata or Seurat objects.", "description": "As a part of the CZ CELxGENE Discover suite (cellxgene.cziscience.com) we have deployed Python and R APIs to query the largest aggregation of single-cell data from 50 million cells along >60 thousand genes from the major human and mouse tissues. \r\n\r\nThe data is comprised of more than 700 individual datasets represented as a single gene expression count matrix along with metadata data frames, where all cells have harmonized annotations across 11 variables (e.g. cell type, tissue, sequencing technology, donor id, etc) and all gene IDs and labels have been standardized on GENCODE references (https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md) . The APIs are able to perform efficient cell-based queries across all cells regardless of the dataset of origin. \r\n\r\nThe concatenated data presents a unique opportunity to apply machine learning on single-cell gene expression at an unprecedented scale for biological discoveries. More importantly, the data and APIs are built around a recently developed technology, TileDB-SOMA, which allows for cloud-optimized storage and access, low-latency access for larger-than-memory slices of data, querying and filtering under lazy evaluation, and transformers to pandas, pyarrow, anndata and Seurat. \r\n\r\nThe APIs are free to use (https://pypi.org/project/cell-census/) and the data is hosted publicly online, which allows users to fetch slices of data with less than 10 lines of code and under 2 minutes. Our main objective is to accelerate biological discoveries by providing ready-to-use standardized gene expression data from 50 million human and mouse cells in an interoperable manner. We are eager to provide the support necessary to enable researchers to effectively use the data and APIs.", "recording_license": "", "do_not_record": false, "persons": [{"id": 223, "code": "3NVEEG", "public_name": "Pablo Garcia-Nieto", "biography": "Computational biologist at CZI focusing on providing access to all single-cell data hosted on CZ CELLxGENE (https://cellxgene.cziscience.com/). Ph.D. on cellular and molecular biology from Stanford University, and BSc in genomics from the Autonomous National University of Mexico", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 297, "guid": "1aa9d321-9953-5583-b8c9-f2b356876ef9", "logo": "", "date": "2023-07-14T15:30:00-05:00", "start": "15:30", "duration": "01:00", "room": "Zlotnik Ballroom", "slug": "2023-297-lightning-talks", "url": "https://cfp.scipy.org/2023/talk/RTA7JG/", "title": "Lightning Talks", "subtitle": "", "track": "Lightning Talks", "type": "Talk", "language": "en", "abstract": "Lightning talks are 5-minute talks on any topic of interest for the SciPy community. We encourage spontaneous and prepared talks from everyone, but we can\u2019t guarantee spots. Sign ups are at the NumFOCUS booth during the conference.", "description": "", "recording_license": "", "do_not_record": false, "persons": [], "links": [], "attachments": [], "answers": []}], "Amphitheater 204": [{"id": 232, "guid": "7111231d-e0d4-533a-b430-5fe4f67de9eb", "logo": "", "date": "2023-07-14T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-232-new-cuda-toolkit-packages-for-conda", "url": "https://cfp.scipy.org/2023/talk/DQR9NU/", "title": "New CUDA Toolkit packages for Conda", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "In this talk, we will examine the new CUDA package layout for Conda (as included in conda-forge). Show how CUDA components have been broken out. Share how this affects development and package building. Walk through changes in the conda-forge infrastructure made to incorporate these new packages. Examine recipes using the new packages and what was needed to update them. Additionally will provide guidance on how to use these new packages in recipes or in library development.", "description": "Based on feedback from package maintainers and end users, we\u2019ve extended and restructured the CUDA Toolkit packages in conda-forge. We\u2019ve added new packages for CUDA components that were requested. Also we\u2019ve more finely split out CUDA toolkit packages by CUDA component to provide package maintainers and end users a light-weight, precise method for including and stating CUDA dependencies.\r\n\r\nIn addition to the CUDA redistributable libraries already available, we have included compilers, debuggers, profilers, etc. Thus providing users of the conda-forge channel a full development suite that they can use in their own projects. Also these greatly simplify the build infrastructure in conda-forge. Finally more libraries are included, which will allow package maintainers to enable additional features in recipe builds.\r\n\r\nSimilarly packages have become more granular. Each component of the CUDA toolkit is separated out. Further components are split into packages used at build time and run time. Maintainers of packages can now select which components they depend on for a build and only depend on the needed shared library at runtime. In terms of the package ecosystem, this makes CUDA component usage legible in downstream recipes and packages, which can make updates more targeted and easier to manage. For end users all of this means quicker downloads, more compact installs, and a smoother upgrade path.\r\n\r\nTo aid package maintainers and users in leveraging this new functionality, we will share the overall package structure and how this is integrated into conda-forge. Also we will share examples from recipes on how these CUDA packages can be used. Similarly we will show how these packages can be integrated into development workflows.", "recording_license": "", "do_not_record": false, "persons": [{"id": 258, "code": "7VPJ93", "public_name": "John Kirkham", "biography": "Got my B.S. & M.S. in Physics. After graduating went to work at Howard Hughes Medical Institute for 5 years working on image processing problems particularly in neuroscience. Got more involved in open source during that work with particular interest in packaging, storage, and distributed array processing. Then joined the NVIDIA RAPIDS team where there has been good overlap with these past interests as well as new ones.", "answers": []}, {"id": 407, "code": "FXTKPC", "public_name": "Thomson Comer", "biography": "Thomson Comer has been writing GPU-accelerated libraries at NVIDIA since 2018. He contributes to RAPIDS cuDF, cuSpatial, and node-rapids, and collaborates with customers and curious developers about best practices for GPU acceleration. He earned an M.S. in computer science in 2009 with a concentration in machine learning, computer vision, and graphics. Before NVIDIA, Thomson worked for a decade at the startup accelerator and consulting firm Cardinal Peak.", "answers": []}, {"id": 491, "code": "QEAKZ7", "public_name": "Rick Ratzel", "biography": "Rick Ratzel is a technical lead for RAPIDS cuGraph - a library of GPU-accelerated graph algorithms. Rick joined NVIDIA in January 2019, bringing several years of experience as a technical lead for teams in industries that include test and measurement, electronic design automation, and scientific computing. Rick\u2019s focus for cuGraph, and throughout his career, has been on software architecture and API usability.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 4, "guid": "1d8a3a06-cb97-5c51-83e6-9b5d074cc566", "logo": "/media/2023/submissions/T7DTX8/Slides.001_BaSD3Sp.png", "date": "2023-07-14T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-4-python-array-api-standard-toward-array-interoperability-in-the-scientific-python-ecosystem", "url": "https://cfp.scipy.org/2023/talk/T7DTX8/", "title": "Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem", "subtitle": "", "track": "General Track", "type": "Talk", "language": "en", "abstract": "The array API standard (https://data-apis.org/array-api/) is a common specification for Python array libraries, such as NumPy, PyTorch, CuPy, Dask, and JAX. \r\n\r\nThis standard will make it straightforward for array-consuming libraries, like scikit-learn and SciPy, to write code that uniformly supports all of these libraries. This will allow, for instance, running the same code on the CPU and GPU.\r\n\r\nThis talk will cover the scope of the array API standard, supporting tooling which includes a library-independent test suite and compatibility layer, what work has been completed so far, and the plans going forward.", "description": "This talk will have the following outline:\r\n\r\n* A motivating example, adding array API standard usage to a real-world scientific data analysis script so it runs with CuPy and PyTorch in addition to NumPy.\r\n* History of the Data APIs Consortium and array API specification.\r\n* The scope and general design principles of the specification.\r\n* Current status of implementations:\r\n    * Two versions of the standard have been released, 2021.12 and 2022.12.\r\n    * The standard includes all important core array functionality and extensions for linear algebra and Fast Fourier Transforms.\r\n    * NumPy and CuPy have complete reference implementations in submodules (numpy.array_api). \r\n    * NumPy, CuPy, and PyTorch have near full compliance and have plans to approach full compliance\r\n    * array-api-compat is a wrapper library designed to be vendored by consuming libraries like scikit-learn that makes NumPy, CuPy, and PyTorch use a uniform API.\r\n    * The array-api-tests package is a rigorous and complete test suite for testing against the array API and can be used to determine where an array API library follows the specification and where it doesn\u2019t.\r\n* Future work\r\n    * Add full compliance to NumPy, as part of NumPy 2.0.\r\n    * Focus on improving adoption by consuming libraries, such as SciPy and scikit-learn.\r\n    * Reporting website that lists array API compliance by library. \r\n    * Work is being done to create a similar standard for dataframe libraries. This work has already produced a common dataframe interchange API.", "recording_license": "", "do_not_record": false, "persons": [{"id": 13, "code": "C93A7K", "public_name": "Aaron Meurer", "biography": "Aaron Meurer is a software engineer at Quansight, where he works on important projects affecting the scientific Python ecosystem including the array API standard, NumPy, and PyTorch. He is also a core maintainer of the SymPy symbolic mathematics library.", "answers": []}, {"id": 419, "code": "3UJMLP", "public_name": "Tyler Reddy", "biography": "Staff Scientist at LANL", "answers": []}, {"id": 420, "code": "HATB9G", "public_name": "Stephan Hoyer", "biography": "stephanhoyer.com", "answers": []}, {"id": 421, "code": "XLYN3Z", "public_name": "Leo Fang", "biography": null, "answers": []}, {"id": 150, "code": "87Y88C", "public_name": "Stephannie Jimenez Gacha", "biography": "I've been working in open source since 2019 as part of multiple projects involving scientific computing and IDE development. The last two years a lot of my work has been focused on providing a better UI/UX of multiple applications. I've given multiple talks about different topics, the two most recent are available in the following links:\r\n\r\n- PyData/Pycon Berlin 2022: https://www.youtube.com/watch?v=__EkpdeVGY4\r\n- Scipy Latam 2021: https://youtu.be/ZNVp1E0QADU?t=11847", "answers": []}, {"id": 425, "code": "EESUZB", "public_name": "Matthew Barber", "biography": "Software engineer @ Quansight working on Data APIs. Hypothesis maintainer.", "answers": []}, {"id": 427, "code": "7A9QJW", "public_name": "Ralf Gommers", "biography": "Ralf has been deeply involved in the SciPy and PyData communities for over a decade. He is a maintainer of NumPy, SciPy and data-apis.org, and has contributed widely throughout the SciPy ecosystem. Ralf is currently the SciPy Steering Council Chair, and he served on the NumFOCUS Board of Directors from 2012-2018.\r\n\r\nRalf co-directs Quansight Labs, which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around data science and scientific computing projects. Previously Ralf has worked in industrial R&D, on topics as diverse as MRI, lithography and forestry.", "answers": []}, {"id": 431, "code": "MUBWXG", "public_name": "Athan Reines", "biography": null, "answers": []}, {"id": 432, "code": "TJRFQP", "public_name": "Mario", "biography": null, "answers": []}, {"id": 42, "code": "VQNJX7", "public_name": "Thomas J. Fan", "biography": "Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Master's in Mathematics from NYU and a Master's in Physics from Stony Brook University.", "answers": []}, {"id": 437, "code": "TW8UF3", "public_name": "Andreas Mueller", "biography": "Andreas M\u00fcller is a Principal Research SDE at Microsoft, where he works on the interface of the Data Science ecosystem and cloud infrastructure.\r\nHe previously held positions as Associate Research Scientist at the Columbia Data Science Institute and as a Research Engineer at the NYU Center for Data Science.\r\nHe is one of the core developers of the scikit-learn machine learning library, a member of the scikit-learn technical committee, and the author of the book \"Introduction to machine learning with Python\". \r\nHis work focuses on practical aspects of machine learning and the development of user-centric machine learning software.", "answers": []}, {"id": 333, "code": "BKKCSF", "public_name": "Saul shanabrook", "biography": "Working on e-graphs in Python currently. Interested in cross library collaboration in the Python data science ecosystem.", "answers": []}, {"id": 446, "code": "EYSVJF", "public_name": "Alexandre Passos", "biography": "Currently working at openai, previously at google.", "answers": []}, {"id": 447, "code": "R3JYQZ", "public_name": "Travis E Oliphant", "biography": "Travis is a long-time participant in the SciPy ecosystem.", "answers": []}, {"id": 258, "code": "7VPJ93", "public_name": "John Kirkham", "biography": "Got my B.S. & M.S. in Physics. After graduating went to work at Howard Hughes Medical Institute for 5 years working on image processing problems particularly in neuroscience. Got more involved in open source during that work with particular interest in packaging, storage, and distributed array processing. Then joined the NVIDIA RAPIDS team where there has been good overlap with these past interests as well as new ones.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 155, "guid": "9e89b0fb-c7e8-5cd5-81a8-ac9090bc521f", "logo": "", "date": "2023-07-14T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-155-what-happens-when-the-main-maintainer-of-a-project-takes-a-step-down-", "url": "https://cfp.scipy.org/2023/talk/A7EZZV/", "title": "What happens when the main maintainer of a project takes a step down?", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "Once a maintainer of a project decides to step down of a project, the community needs to quickly adapt to this decision. This situation can be devastating for small projects and lead to their extinction. This talk demonstrates, based on the case of poliastro, that the community is a key factor for a software to survive no matter who is leading it.", "description": "Free and open source software is made by the community, for the community, off the community. The community is made out of amazing people who are human beings. The Python community would not be what it is without people.\r\n\r\nSome people of these people are maintainers of projects. They devote a significant amount of their time to guarantee the health of a project, review new contributions, solving questions... The community recognizes their effort and usually evolves around them.\r\n\r\nHowever, what happens when a maintainer steps down of a project? How does the community react to this situation? What about for tiny projects?\r\n\r\nThis talk presents some key concepts for building a healthy community around a project to guarantee its survival over time. These key concepts include not only good coding practices like documentation but also the creation of community meetings for everyone, promotion of software, financial support, and tons of passion among others.\r\n\r\nAs an example, the case of \"poliastro\" is used.", "recording_license": "", "do_not_record": false, "persons": [{"id": 175, "code": "NQ7ERE", "public_name": "Jorge Martinez", "biography": "`Aerospace engineer` and `software developer` interested in `computational astrodynamics`. In his free time, Jorge maintains various `open source scientific projects` including `poliastro` and the whole `PyAnsys` ecosystem.\r\n\r\n**Github account:** [jorgepiloto](https://github.com/jorgepiloto)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 209, "guid": "1fde533e-1207-53fa-bf59-a95f4b801eb3", "logo": "", "date": "2023-07-14T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-209-better-open-source-homes-and-gardens-with-project-pythia", "url": "https://cfp.scipy.org/2023/talk/EDZ9YB/", "title": "Better (Open Source) Homes and Gardens with Project Pythia", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "As scientists continue to embrace the Jupyter ecosystem for constructing computational narratives of their science through code, data, and rich text, they may encounter technical and community barriers to maintaining and sharing their science with new and existing audiences. We demonstrate the value of open-source science community building and getting there through reliance on the open-source Jupyter ecosystem, pre-packaged GitHub and BinderHub-based infrastructure, and documentation for creating, sharing, testing, and maintaining Pythia Cookbooks for their computational narratives.", "description": "A \u201ccommunity garden\u201d metaphor is particularly apt for a free- and  open-source software project and community. Enthusiasm, creativity, and openness work both for the SciPy conference and Albany NY\u2019s Tulip Festival. But a \u201cgarden\u201d, be it botanical or cyber, requires nurturing. With regard to free- and open-source software, there are bounteous examples. Pull requests (PRs) are sown and merged; Issues are resolved, and bugs are removed. Yet we also see signs of formerly fruitful repositories that have been left to languish. Issues proliferate like weeds; bugs roam freely, and eventually the repos\u2019 stars fade away. It is incumbent on the SciPy community to ensure that the projects we are invested in take the more fruitful path. \r\nOne such open source \u201cgreenspace\u201d is Project Pythia (hereafter Pythia). Now in its 3rd year, Pythia extends Pangeo by providing an educational and training hub for the geoscientific Python community. It has three key components:\r\n1. Foundations: The core geoscientific Python stack (JupyterBook)\r\n2. Cookbooks: Advanced and domain-specific workflows (JupyterBooks)\r\n3. Resource Gallery of externally-hosted geoscientific Python resources\r\n\r\nHere we discuss Pythia\u2019s infrastructure, which sustains the above components in a year-round \u201ccommunity garden\u201d.\r\n\r\nPythia\u2019s content is built upon an open stack of infrastructure for reproducibility and collaboration that provides for the care and nurturing of the community it serves. We have built a cloud-based publishing system upon Jupyter Book that automates notebook execution in a reproducible, curated environment. Users can interact with notebooks via Binder links, launching directly into an identical environment. The platform provides automated code- and link-checking, ensuring a rapid healing cycle. Collaboration is achieved through PRs that trigger the same execution infrastructure and a rich preview.\r\n\r\nOur infrastructure relies on GitHub, which encourages open development via PRs. Pythia uses this process extensively for building and maintaining its \u201cgarden\u201d, for the core team and community contributions. GitHub\u2019s focus on collaboration provides users a sense of ownership of whatever \u201cgarden\u201d they choose to visit, and provides a path for others to visit and contribute.\r\n\r\nGitHub\u2019s Actions power Pythia\u2019s automation of key steps in the notebook execution/publishing process. We periodically re-run the publication workflow as health checks for on-going maintenance of the materials, as well as for new \u201cplantings\u201d via PRs. Pythia\u2019s web portal displays the updated content, which users can download to try out and build on in their own \u201cbackyard gardens\u201d\u2013computing environments.\r\n\r\nA garden may need more powerful tools. While GitHub Actions may often suffice, real-world scientific workflows have compute and data requirements that exceed GitHub\u2019s free resources. Pythia\u2019s notebooks can also be executed on our dedicated cloud using BinderHub, which provides a way to execute notebooks within custom environments. Pythia\u2019s workflows are able to validate and deploy results directly from execution on its BinderHub. The same BinderHub instance powers interactive user sessions, guaranteeing that users execute code in the same environment in which the rendered web pages were built.", "recording_license": "", "do_not_record": false, "persons": [{"id": 229, "code": "QQB7RF", "public_name": "Kevin Tyle", "biography": "Mr. Tyle is the Manager of Departmental Computing for the Department of Atmospheric and Environmental Sciences at the University at Albany, at which he received his M.S. in Atmospheric Science in 1995. He also has a B.A. in Psychology, with emphases on Neuroscience and Cognitive Science, from the University of Rochester. His main interest is promoting the use of free- and open-source software packages, mostly using Python, for the analysis, visualization and sharing of geoscientific datasets.", "answers": []}, {"id": 232, "code": "KH9CEJ", "public_name": "Drew Camron", "biography": "Scientific Python dev and educator @ UCAR/Unidata. MetPy, Siphon, Project Pythia.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 121, "guid": "096e9c17-563b-5a85-a981-9070c17af002", "logo": "/media/2023/submissions/9JTLCF/community-talk-banner_Wf0ag8L.png", "date": "2023-07-14T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Amphitheater 204", "slug": "2023-121-community-first-open-source-an-action-plan-", "url": "https://cfp.scipy.org/2023/talk/9JTLCF/", "title": "Community-first open source: An action plan!", "subtitle": "", "track": "Tending Your Open Source Garden: Maintenance and Community", "type": "Talk", "language": "en", "abstract": "Communities are at the heart of open source software and are fundamental to our projects\u2019 long-term success. The Python ecosystem has several mature projects, that have spent years working on community initiatives. Newer projects can learn from their experiences and build stronger foundations to foster healthy communities.\r\n\r\nIn this talk, we share a set of practices for community-first projects, including repository management, contributor pathways, and governance principles. We\u2019ll also share real examples from our own journey transitioning a company-backed OSS project, Nebari (https://nebari.dev/), to be more community-oriented.", "description": "Open source communities come in a lot of different flavors and have many different ways of operating. However, there is a common thread of promoting kindness in communication, improving the contributor and user experience, and working to make the project more inclusive, accessible, and sustainable.\r\n\r\nWe, the presenters, recently worked to transition a company-backed open source project, Nebari (https://nebari.dev/), to be more community-oriented in its development, maintenance, and governance. We focused on creating a community-first foundation that builds on years of learnings from other leading communities, including Jupyter, NumPy, Gatsby JS, and more. In this talk, we want to share our journey and the things we learned along the way.\r\n\r\nWe aim to provide a step-by-step guide for open source projects looking to adopt more community-driven practices. We will discuss everything from repository management, and contributor and maintainer pathways, to documentation and governance principles. This talk will be most helpful for projects in their formative stages and projects transitioning from company-backed models, however we feel everyone can learn something new to implement in their communities.", "recording_license": "", "do_not_record": false, "persons": [{"id": 14, "code": "QGMGFB", "public_name": "Pavithra Eswaramoorthy", "biography": "Pavithra is a Developer Advocate at Quansight, where she works to support the PyData community. She also contributes to the Bokeh and Dask projects; and has helped administrate Wikimedia\u2019s outreach programs in the past. In her spare time, she enjoys a good book and hot coffee. :)", "answers": []}, {"id": 180, "code": "EKHUEY", "public_name": "Dharhas Pothina", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Grand Salon C": [{"id": 164, "guid": "d60bd347-0164-58db-8893-cedc76e4040f", "logo": "", "date": "2023-07-14T10:45:00-05:00", "start": "10:45", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-164-small-town-police-accountability-a-data-science-toolkit", "url": "https://cfp.scipy.org/2023/talk/AXPZZG/", "title": "Small Town Police Accountability: A Data Science Toolkit", "subtitle": "", "track": "Social Science and the Digital Humanities", "type": "Talk", "language": "en", "abstract": "In this talk we will share a Python library to obtain and analyze policing data, that was developed in conjunction with community activists, data scientists, social scientists and the Small Town Police Accountability (SToPA) Research Lab.  We will showcase components of the SToPA library which use Python tools such as web drivers, optical character recognition, geospatial mapping, machine learning and statistical sampling to better understand the policing landscape.  The goal of this work is to present an easily replicable framework for analyzing police and community interactions with accessible on-ramps for activists, developers and researchers.", "description": "Recent years have highlighted the urgent need for transparency and accountability within police departments across the United States. Typically, large cities have access to policing data and the resources to analyze and interrogate such data to hold authority accountable. Small towns face the same injustices at the hands of police, but these issues receive comparatively little attention, in part due to a lack of resources and tools to investigate the data. Additional challenges arise in the clarity and consistency of the data that may be available. Consequently, the public are generally unable to take data-informed action toward social justice in these regions. The overarching goal of the The Small Town Police Accountability (SToPA) Research Lab is to create an adaptable tool that enables small-town residents to analyze police actions to increase transparency and accountability. \r\n\r\nThis talk will introduce the interdisciplinary work of the research group to (1) obtain data through digital portals and records requests, (2) create a flexible, scaffolded software toolkit for organizing and analyzing police data for users with various levels of technical expertise and (3) use data-driven modeling tools to uncover potential patterns and anomalies in select small town data, serving as a template for investigations elsewhere. The SToPA toolkit consists of a range of components including instructions for data gathering; adaptable tools for reading, cleaning, and organizing data; and machine learning applications to analyze and understand patterns in policing. \r\n\r\nUsing case studies of a handful of small towns, the SToPA toolkit provides a broadly applicable methodology for reading and parsing police data.  Where data is available online in a somewhat structured format, the SToPA library offers tools for web crawling and scraping. In other cases, data is only available as a printed physical copy, necessitating digitization, text identification using tools such as PyTesseract, word-level data cleaning, and testing for accuracy.  This pipeline includes the use of user-defined, non-standard language dictionaries (such as a list of town-specific locations), geometric methods for word location detection, regular expressions, and fuzzy string matching.     \r\n\r\nAfter data is collected, cleaned, and structured, a second thrust of the SToPA lab is to analyze police interactions with machine learning and statistical tools. A diverse set of policing data, including dates, locations, names, and free text narratives, yields rich opportunity for exploratory analysis and modeling. Explorable maps were created with various mapping and plotting libraries, revealing location-based patterns. Town-specific data from the US Census allows for demographic comparisons between how citizens are distributed vs. how they are policed.  This analysis is further refined using statistical sampling and inference tools such as scikit-learn and PyEI. Narrative text data, unstructured language across thousands of reports, was also analyzed with natural language processing techniques such as topic modeling.\r\n\r\n This talk aims to be accessible to a diverse audience and to empower and inspire others to contribute to the growing SToPA repository:  https://qsideinstitute.github.io/SToPA/", "recording_license": "", "do_not_record": false, "persons": [{"id": 195, "code": "U7LFSW", "public_name": "Ariana Mendible", "biography": "Ariana Mendible is an assistant professor at Seattle University, where she teaches and uses data science to approach social justice research problems.", "answers": []}, {"id": 201, "code": "GLD3MM", "public_name": "Anna Haensch", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 198, "guid": "6a1a3a31-4040-5b75-ae1b-de90a80b6a33", "logo": "", "date": "2023-07-14T11:25:00-05:00", "start": "11:25", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-198-using-linear-tracking-data-to-estimate-backcountry-recreation-popularity", "url": "https://cfp.scipy.org/2023/talk/LMMPRP/", "title": "Using Linear Tracking Data to Estimate Backcountry Recreation Popularity", "subtitle": "", "track": "Social Science and the Digital Humanities", "type": "Talk", "language": "en", "abstract": "Geolocated data from smartphone apps are well-established resources for research. While most of that data come as points (e.g., geotagged photos), there are a growing number of apps that collect linear data from users activities (e.g., running, hiking, off-road driving). Using established ecological methods, shallow-machine learning packages, and multiprocessing we demonstrate a novel approach using mobile app data to estimate back-country recreation popularity at multiple scales. The topics covered include normalizing and thinning coordinate data, merging linear data from multiple sources, and accounting for spatial bias while preserving the integrity of the original data.", "description": "Official sources (typically governments) provide the cleanest and most trustworthy data. Decades of established standards and years of archived records provide a framework for reliable data collection making it a strong foundations for research. The drawback to official centralized sources is that they often focus on the macro level, and because of this, the data tends to be lower resolution, leaving broad areas of obscurity at a micro level.\r\n\r\nWith a need to geospatially estimate and represent backcountry recreation habits on a statewide level down to square mile grid, our team needed high-resolution datasets.\r\n\r\nSocial media data is high-resolution, dense and valuable, which often leads companies to limit access to their data. App downloads and active user counts fluctuate with the market and long-term utility of an app's data is not guaranteed to last. Despite these limitations, social media offers significantly higher resolution data than official sources. We will discuss the methods we developed to overcome the unique challenges of processing and standardizing social media data so it complements and informs official datasets.\r\n\r\nWe will cover acquiring data from multiple apps using modular methods that can be applied to new apps as older ones become obsolete. It is rare for apps to offer identical metrics, so we developed a flexible approach that can translate different metrics into a standardized form. It is also important that a model addresses the inherent unknowns that lie beyond the app's userbase.\r\n\r\nOur specific use case uses linear geolocation data gathered from mobile tracking apps. Data comes in the form of GeoJSON coordinates, Google Earth polylines, and shapefiles. We will discuss the specific packages used to read each and store them in a common format. \r\n\r\nLinear data brings with it unique challenges relative to point and polygon data. We will describe how we used Python to redistribute points along line segments while maintaining a minimum distance between segment vertices as a first step towards standardizing the linear data. We then needed to \"thin\" the data to minimize spatial bias caused by overlapping line segments and circuitous routes; this will include our reasoning for not averaging or interpolating new data points between each dataset, and how instead we used a method that preserved the integrity of original geolocation data.\r\n\r\nWe will explain the ways multiprocessing and nonlinear data structures were used to process large numbers of vertices when we aggregated all datasets together and ran the thinning algorithm on the combined points; the goal of which was to create an overall presence dataset that represents recreation across the state with minimal spatial bias.\r\n\r\nWe will also review the resulting data structure: coordinate pairs, aggregated metrics and IDs that point back to rows from the original datasets gathered from each app.\r\n\r\nThe presentation will finish with a summary and conclusion on how the resulting presence data was processed using the MaxEnt ecological model to inform and supplement official data sources and provide the state of Arizona with a clearer picture of recreation.", "recording_license": "", "do_not_record": false, "persons": [{"id": 230, "code": "YCYD3P", "public_name": "Vincent Sutherland", "biography": "Vince is a Data science grad student with a background in biology. He lives in a small mountain town where he works on a research team involved in estimating and quantifying the risks abandoned hard-rock mines pose to the population of Arizona.", "answers": []}, {"id": 235, "code": "RAJ8JE", "public_name": "David C. Folch", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 161, "guid": "f5e954ee-b1d7-56e4-a8d9-b211350399c8", "logo": "/media/2023/submissions/BDV3EE/pthviz_p1gFqaY.png", "date": "2023-07-14T13:15:00-05:00", "start": "13:15", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-161-allegro-and-flare-fast-and-accurate-machine-learning-potentials-for-extreme-scale-simulations", "url": "https://cfp.scipy.org/2023/talk/BDV3EE/", "title": "Allegro and FLARE: Fast and accurate machine learning potentials for extreme-scale simulations", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "Allegro and FLARE are two very different packages for constructing machine learning potentials that are fast, accurate, and suitable for extreme-scale molecular dynamics simulations. Allegro uses PyTorch for efficient equivariant potentials with state-of-the-art accuracy, while FLARE is a sparse Gaussian process potential with an optimized C++ training backend leveraging Kokkos, OpenMP, and MPI for state-of-the-art performance, and a user-friendly Python frontend. We will compare and contrast the two methods, discuss lessons learned, and show spectacular scientific applications.", "description": "Molecular dynamics is a common method for studying molecules and materials at the atomistic level, in which the dynamics of atoms are simulated directly using Newton\u2019s equations of motion. This requires a model for the forces between the atoms, often referred to as a potential. Traditionally, there have been two approaches to computing the interatomic forces. First, there are empirical potentials, which are based on simple, physically motivated functional forms with a few parameters that are fit to match experimental measurements of material properties. These models are fast, but they have limited accuracy and are hard to transfer between applications. The alternative is quantum mechanical methods, which are highly accurate. In return, they are computationally expensive and have limited scalability.\r\n\r\nIn recent years, machine learning potentials (MLPs) have emerged as a compromise in terms of accuracy and computational efficiency. The idea is to generate a small amount of training data with a quantum mechanical method. The MLP learns to reproduce its forces and energies and can be used for large and long-timescale molecular dynamics simulations with an accuracy approaching that of the quantum mechanical method.\r\n\r\nAllegro and FLARE are two drastically different MLPs. FLARE approximates the energy of an atom as a sparse Gaussian process (SGP) as a function of the atom\u2019s local environment. The environment is encoded in a rotationally invariant vector with high descriptive power. By using an invariant descriptor, FLARE correctly respects the symmetry of the problem. Allegro, on the other hand, exploits the symmetry of the problem by using an equivariant neural network, i.e., a neural network where tensor product layers force the features to systematically transform with the input. While more computationally demanding, the added symmetry information allows Allegro and other equivariant models to be significantly more accurate and data-efficient than traditional models.\r\n\r\nFor extreme-scale simulations, scalability and performance are of utmost importance. Through its model design of avoiding message passing, Allegro is the only scalable equivariant neural network potential, with excellent performance demonstrated up to 100 million atoms. FLARE, being a simpler model, takes this to the extreme and has achieved record scalability and performance, simulating 0.5 trillion atoms on 27,336 NVIDIA V100 GPUs.\r\n\r\nOn the implementation side, Allegro and FLARE are also very different. Allegro is implemented in Python with PyTorch, which allows for a high-level implementation with excellent GPU performance through the JIT compiler. FLARE has a low-level training backend written in C++ with OpenMP, MPI, and Kokkos. The C++ code is conveniently wrapped for Python use with pybind11.\r\n\r\nIn this talk, we will compare and contrast these two methods, discuss lessons learned, and show spectacular scientific applications.\r\n\r\nLinks:\r\nAllegro repository: https://github.com/mir-group/allegro\r\nAllegro paper: https://www.nature.com/articles/s41467-023-36329-y\r\nFLARE repository: https://github.com/mir-group/flare\r\nFLARE LAMMPS active learning tutorial: https://bit.ly/flarelmpotf\r\nPreprint on FLARE scalability: https://arxiv.org/abs/2204.12573", "recording_license": "", "do_not_record": false, "persons": [{"id": 176, "code": "TEKNGX", "public_name": "Anders  Johansson", "biography": "I am a PhD student in Applied Physics in the group of Boris Kozinsky at Harvard SEAS. My focus is on machine learning interatomic potentials for molecular dynamics simulations, in particular on how to make them fast on modern hardware architecture and large supercompters.\r\n\r\nGitHub: @anjohan", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 197, "guid": "4c1c04f3-116e-5019-b6ae-76de77949e1c", "logo": "", "date": "2023-07-14T13:55:00-05:00", "start": "13:55", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-197-a-graph-neural-network-based-model-for-rapid-prediction-of-thermal-transport-in-metal-organic-frameworks", "url": "https://cfp.scipy.org/2023/talk/EYHNUV/", "title": "A Graph-Neural Network-Based model for rapid prediction of Thermal Transport in Metal-Organic Frameworks", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "Metal-Organic Frameworks (MOFs) have vast potential for gas adsorption, but their practical use hinges on their ability to dissipate thermal energy generated during adsorption. Here, we performed the first high-throughput screening of thermal conductivity in over 10,000 MOFs using molecular dynamics simulations. Next, we developed a graph neural network (GNN) based model to swiftly predict the diagonal components of the thermal conductivity tensor for accelerated materials discovery. Attendees will gain insights into how GNNs can be trained to predict material tensor properties, benefiting both the materials science and machine learning communities.", "description": "Metal-organic frameworks (MOFs) are a promising class of porous materials that have potential applications in various areas, including gas storage and separations. However, effective thermal energy management in MOFs is critical to enhancing their performance in these applications. Unfortunately, there is still a lack of understanding regarding the structure-property relationships that govern thermal transport in MOFs.\r\n\r\nIn order to provide a data-driven perspective on these relationships, a large-scale computational screening study was conducted to investigate the thermal conductivity of MOFs. This study utilized classical molecular dynamics simulations to calculate the thermal conductivities of 10,194 hypothetical MOFs generated using the Topology-Based Crystal Constructor (ToBaCCo) code developed in Python. These MOFs comprised 1,015 different topologies, along with 40 types of organic edge building blocks and 38 inorganic and organic nodular building blocks.\r\n\r\nThe study discovered that high thermal conductivity in MOFs is favored by high densities, small pores (<10 \u00c5), and four-connected metal nodes. Moreover, it identified 36 MOFs with ultra-low thermal conductivity (<0.02 W/mK) primarily due to their extremely large pores (~65 \u00c5). Additionally, the study uncovered six hypothetical MOFs with exceptionally high thermal conductivity (>10 W/mK).\r\n\r\nTo handle a large number of MOFs screened, an algorithm was developed to adaptively determine the appropriate plateaued interval of the thermal conductivity vs. correlation time curve based on a set of criteria. The search strategy utilized for finding the optimal plateaued interval involved iteratively performing linear fitting to data segments of 2 ps in length at 1 ps increments if the data was between 0 and 10 ps, and segments of 10 ps length at 5 ps increments if the data was beyond 10 ps. The normalized slopes and normalized average oscillation amplitudes were then calculated with respect to the average thermal conductivity for each of those data segments.\r\n\r\nUsing the 10,194 MOF-thermal conductivity data, a range of state-of-the-art graph neural network-based models, including CGCNN, iCGCNN, MEGNet, DimeNet++, ALIGNN, and others, were trained for the rapid prediction of thermal conductivity in MOFs. Finally, the model that demonstrated the best performance on the test data was applied to screen the Computation-Ready, Experimental (CoRE) MOF database, resulting in the identification of experimentally viable MOF structures with potentially exceptional thermal transport properties.\r\n\r\nThis talk will discuss the ToBaCCo hypothetical MOF crystal generation algorithm, various state-of-the-art GNN architectures, and their implementation in PyTorch. This presentation will be of interest to the wider material science community, particularly those with a passion for deep learning models. The findings of this study have the potential to enhance our understanding of thermal transport in MOFs, paving the way for the development of more efficient MOFs for gas storage and separation applications.", "recording_license": "", "do_not_record": false, "persons": [{"id": 227, "code": "DPKLBS", "public_name": "Meiirbek Islamov", "biography": "Currently, I am pursuing a Ph.D. degree in Chemical Engineering at the University of Pittsburgh with an expected completion date of January 2024. My current research at Pitt focuses on understanding nanoscale thermal transport physics in Metal-Organic Frameworks (MOFs), a class of porous materials, which have been heralded as revolutionary materials for gas adsorption applications. In my research, I use high-performance computing, deep learning, and computational materials science/chemistry techniques.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 56, "guid": "f4f054fd-39a0-5b1b-9f91-1d7a54069413", "logo": "", "date": "2023-07-14T14:35:00-05:00", "start": "14:35", "duration": "00:30", "room": "Grand Salon C", "slug": "2023-56-from-espaloma-to-sake-to-brew-distill-and-mix-force-fields-with-balanced-briskness-smoothness-and-intricacy-", "url": "https://cfp.scipy.org/2023/talk/F9P3F3/", "title": "From Espaloma to SAKE: To brew, distill, and mix force fields with balanced briskness, smoothness, and intricacy.", "subtitle": "", "track": "Materials and Chemistry", "type": "Talk", "language": "en", "abstract": "Force fields (FF)\u2014the (parametrized) mapping from geometry to energy, are a crucial component of molecular dynamics (MD) simulations, whose associated Boltzmann-like target probability densities are sampled to estimate ensemble observables, to harvest quantitative insights of the system. State-of-the-art force fields are either fast (molecular mechanics, MM-based) or accurate (quantum mechanics, QM-based), but seldom both. Here, leveraging graph-based machine learning and incorporating inductive biases crucial to chemical modeling, we approach the balance between accuracy and speed from two angles---to make MM more accurate and to make machine learning force fields faster.", "description": "A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists. Machine learning force forces have been designed to bring us one step closer to this dream, by fitting simpler functional forms to QM data and extrapolating to chemically and geometrically diverse regions. Nonetheless, current state-of-the-art architectures, though approaching or surpassing the quantum chemical accuracy, are by magnitudes slower than MM and manifest various pathologies when it comes to interpretability, generalizability, and stability.\r\n\r\nIn this talk, we introduce our efforts to approach the lotusland from two angles: by making MM force fields more accurate (using a GNN to replace the atom typing schemes, Espaloma) and making state-of-the-art machine learning force fields faster (maintaining local universal approximative power without employing spherical harmonics, SAKE). Along the way, we show a plethora of useful gadgets, including the first unified force field for joint protein--ligand parametrization, an AM1-BCC surrogate charge model thousands-fold faster with error smaller than discrepancies among backends, and a way to forecast the fate of dynamic systems before the simulation even starts.\r\n\r\nWith these, we identify the opportunities and challenges of machine learning force fields design: What interpretable, stable, simple yet expressive function forms to use? How do we bake domain knowledge in, e.g., forces vanish when particles are far and explode when close? Can we detach sophisticated neural networks during inference? Can force fields be uncertainty-aware? And finally how do we stir these ingredients well to achieve the delicious balance between stability and speed and accuracy?", "recording_license": "", "do_not_record": false, "persons": [{"id": 69, "code": "ZKVNB8", "public_name": "Yuanqing Wang", "biography": "Simons Center Fellow, NYU", "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 105": [{"id": 291, "guid": "20d4c0e8-d5a7-5629-b4b7-423bfcfabae4", "logo": "", "date": "2023-07-14T16:40:00-05:00", "start": "16:40", "duration": "00:55", "room": "Classroom 105", "slug": "2023-291--bof-room-105-open-source-project-code-of-conduct-management-and-dei-support", "url": "https://cfp.scipy.org/2023/talk/LTDRGY/", "title": "[BoF Room 105] Open Source Project Code of Conduct Management and DEI Support", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "NumFOCUS will facilitate a discussion around open source projects managing a robust Code of Conduct as well as ongoing DEI support", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 476, "code": "YZSY3U", "public_name": "Leah Silen", "biography": "Leah is the Executive Director of NumFOCUS", "answers": []}, {"id": 363, "code": "9TWMGQ", "public_name": "Noa Tamir", "biography": "Noa have been involved with the R and PyData communities for some time, with a focus on community building and DEI. They are a  member of the NumFOCUS Board of Directors and DISC committee, PyLadies Organizer, and chaired the PyData Berlin 2022 conference. In addition, they are a Lead Data Science Coach at neue fische, contributing to pandas, and are currently developing the Contributor Experience Community and Handbook with Inessa Pawson and Melissa Mendon\u00e7a.", "answers": []}, {"id": 211, "code": "VL38N7", "public_name": "Inessa Pawson", "biography": "Inessa is building bridges between people, open science, and open source software, advocating for diversification of contribution pathways to open source and supporting its social infrastructure. Passionate about the transformative power of collaboration out in the open, she has been organizing the Maintainers Summit at PyCon US since 2020 to foster best practices on how to maintain and develop sustainable open source projects and thriving communities. In her current role as NumPy Contributor Experience Lead, Inessa\u2019s primary focus is on onboarding and supporting contributors, addressing gaps in the project governance, and developing programs to diversify pathways of contribution to the project.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 294, "guid": "9290e580-9cea-592b-842c-56ec3f7dd318", "logo": "", "date": "2023-07-14T17:45:00-05:00", "start": "17:45", "duration": "00:55", "room": "Classroom 105", "slug": "2023-294--bof-room-105-scipy-2024", "url": "https://cfp.scipy.org/2023/talk/9H9KFM/", "title": "[BoF Room 105] SciPy 2024", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Feedback on SciPy 2023 and ideas for SciPy 2024", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 478, "code": "CVELZ7", "public_name": "SciPy 2023 Committee", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 103": [{"id": 289, "guid": "d3d34817-ff8e-5abf-9386-3c1d5f007919", "logo": "", "date": "2023-07-14T16:40:00-05:00", "start": "16:40", "duration": "00:55", "room": "Classroom 103", "slug": "2023-289--bof-room-103-scipy-2023-sprint-prep-bof", "url": "https://cfp.scipy.org/2023/talk/JXWQPG/", "title": "[BoF Room 103] SciPy 2023 Sprint Prep BoF", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Come join the BoF to do a practice run on contributing to a GitHub project. We will walk through how to open a Pull Request for a bugfix, using the workflow most libraries participating at the weekend sprints use (hosted by the sprint chairs)", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 272, "code": "CQEZY8", "public_name": "Brigitta Sip\u0151cz", "biography": "I am an astronomer turned Research Software Engineer. I work at Caltech/IPAC to build and improve tools, e.g. Python libraries and Science Platforms to provide ways to access data in the NASA/IPAC Infrared Science Archive. Prior to joining IPAC, I was DiRAC Fellow in the data engineering team at the Institute for Data Intensive Research in Astrophysics and Cosmology in Seattle. I am a developer and maintainer of several open-source astronomy libraries and their infrastructure (e.g. astroML, astroquery, astropy) and I very much enjoy contributing to upstream projects as well in the wider Scientific Python ecosystem. I have a keen interest in finding ways to make tools more sustainable. I am a fellow of the Software Sustainability Institute.", "answers": []}, {"id": 57, "code": "QUKSS3", "public_name": "Gil Forsyth", "biography": "Gil Forsyth is a software engineer at Voltron Data. He followed the common career path of Japanese language specialist -> administrative assistant -> mechanical engineer -> computational fluid dynamicist -> data scientist -> software engineer -> machine learning engineer -> software engineer. Gil contributes to several projects in the PyData ecosystem and is a core maintainer of xonsh and Ibis. He served as the program chair for the Scientific Computing with Python (SciPy) conference from 2017 to 2020.", "answers": []}, {"id": 366, "code": "W9B3CQ", "public_name": "Madicken", "biography": null, "answers": []}, {"id": 133, "code": "R7PFJV", "public_name": "Matt Davis", "biography": "Matt has been using Python to work with data in science and at startups since 2008, after getting degrees in Astronomy and Aerospace Engineering. He maintains some moderately popular open-source Python libraries, including SnakeViz and Palettable. Today Matt is the lead software engineer at Populus, a startup helping city governments manage various aspects of transportation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 292, "guid": "feeae6ae-5474-5db3-b421-fa925884097e", "logo": "", "date": "2023-07-14T17:45:00-05:00", "start": "17:45", "duration": "00:55", "room": "Classroom 103", "slug": "2023-292--bof-room-103-cpython-performance", "url": "https://cfp.scipy.org/2023/talk/XNVLQA/", "title": "[BoF Room 103] CPython performance", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Discuss the effects of recent and potential performance improvements on the scientific Python packages. The goal is to discuss the cost/benefit tradeoffs of adapting existing libraries to take advantage of potential improvements, especially per-interpreter GIL and nogil, but also type specializations in the interpreter.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 477, "code": "SBJ73R", "public_name": "Michael Droettboom", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "Classroom 104": [{"id": 290, "guid": "d45c0f8b-757a-5953-a3fc-71e01ce4a577", "logo": "", "date": "2023-07-14T16:40:00-05:00", "start": "16:40", "duration": "00:55", "room": "Classroom 104", "slug": "2023-290--bof-room-104-future-of-python-programming-language-in-the-artificial-intelligence-era", "url": "https://cfp.scipy.org/2023/talk/VA7ENC/", "title": "[BoF Room 104] Future of Python Programming Language in the Artificial Intelligence Era", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "Here the aim of the panel would be to throw light on role code assistants like Co-Pilot and tools like ChatGPT and how they revolutionize coding careers. Also, provide insights that help young and budding programmers to prepare themselves for futuristic careers. Also, try to find answers to some hypothetical questions like can AI replace human programmers? Can it add or suggest new features to the language itself? and problems people may face while developing enterprise-grade applications with AI.", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 26, "code": "AHYXHQ", "public_name": "Gajendra Deshpande", "biography": "I am Gajendra Deshpande and I am using Python since 2013 for academic research and development activities. I develop prototypes and applications in Natural Language Processing, Machine Learning, Cyber Security, and Web applications using Python and its ecosystem. I am working as a faculty of Computer Science and run a start-up in cyber security. I am an active member of the PyCon India community and served as program committee lead for PyCon India 2021. I have presented approximately 80 talks, 20 Workshops, and 15 posters across the globe at prestigious conferences like PyData Global, PyCon APAC, PyCon AU, EuroPython, DjangoCon US and Europe, SciPy India, SciPy USA, PyCon USA, JuliaCon, FOSDEM, and several other Python and FOSS conferences. I have helped Python and FOSS Conferences by reviewing the talk and tutorial proposals, mentoring first-time speakers, participating in the discussions, and organizing the events.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"id": 293, "guid": "ffe09ed1-1761-5c2c-9dc2-23f6e76ba3cd", "logo": "", "date": "2023-07-14T17:45:00-05:00", "start": "17:45", "duration": "00:55", "room": "Classroom 104", "slug": "2023-293--bof-room-104-beyond-notebooks-from-reproducible-to-reusable-research", "url": "https://cfp.scipy.org/2023/talk/LGZUNG/", "title": "[BoF Room 104] Beyond Notebooks: From reproducible to reusable research", "subtitle": "", "track": "Birds of a Feather (BoF)", "type": "Talk", "language": "en", "abstract": "\"Notebooks can be a powerful tool for the purposes for which they were designed\u2014learning, experimenting, and sharing results. However, users face many challenges when trying to achieve true reproducbility with notebooks alone, including lack of dependency management, pitfalls of non-linear interactive execution, and requiring bespoke tooling to open and execute. Furthermore, there is a growing need to go beyond reprodubility of individual results\u2014siloed into an opaque format possessing limited interoperability with the rest of the Python ecosystem\u2014toward reusuability of research methods, that can be shared, built upon, and deployed by users across the world. \r\n\r\nTherefore, we invite the community to share their tools and workflows to go beyond reproducibility and towards true reusable science, built on the shoulders of giants. Furthermore, we hope to explore how we can encourage users and the community to move beyond the notebooks monoculture and toward a holistic, open, modular and interoperable approaches to conducting research and developing scientific code.\"", "description": "", "recording_license": "", "do_not_record": false, "persons": [{"id": 473, "code": "8RUQ3H", "public_name": "C.A.M. Gerlach", "biography": null, "answers": []}, {"id": 271, "code": "SE7SNC", "public_name": "Juanita Gomez", "biography": "Juanita Gomez is passionate programmer, mathematician and open source advocate; former developer of Spyder IDE at Quansight. She has a BS in Pure Mathematics from Pontificia Universidad Javeriana in Colombia and is currently pursuing a Ph.D position in Computer Science at UC Santa Cruz. She is a community manager for the Scientific Python project, a community effort to better coordinate and support scientific Python libraries.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 6, "date": "2023-07-15", "day_start": "2023-07-15T04:00:00-05:00", "day_end": "2023-07-16T03:59:00-05:00", "rooms": {"Amphitheater 204": [{"id": 298, "guid": "775f1730-df07-5040-9195-2c6ab9a1e999", "logo": "", "date": "2023-07-15T09:00:00-05:00", "start": "09:00", "duration": "01:00", "room": "Amphitheater 204", "slug": "2023-298-open-source-sprints-kickoff-in-room-204-", "url": "https://cfp.scipy.org/2023/talk/NMNJYF/", "title": "Open Source Sprints [Kickoff in Room 204]", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Everyone will meet in Room 204 and organize before breaking out for the remainder of the day. \r\n\r\nEvery year, our community dedicates the last 2 days of the SciPy conference to Sprints, where we work together on open-source projects to push our ecosystem forward.\r\n\r\nSprints are an informal part of the conference, where all are welcome to exchange ideas, hack on exciting projects, and create lasting connections.  All programming levels are welcome at the sprints.\r\n\r\nJoin us for the preparatory Sprint BoF as well on Friday at 4:40 in Room 103 - https://cfp.scipy.org/2023/talk/JXWQPG/\r\n\r\nInterested in leading a sprint at SciPy 2022? Sign up at https://www.scipy2023.scipy.org/sprints", "description": "Sprints FAQs\r\nWhat will you do as an attendee?\r\n\r\nThere are a variety of ways to contribute during the sprints session including testing code, fixing bugs, adding new features, and improving documentation. You could also contribute to an entirely brand new project that our ecosystem is missing. One of the best parts about the sprints is that you might also have the opportunity to work with authors and core contributors of your favorite open source packages, as well as, the opportunity to work alongside other developers who are just as excited as you are to make the SciPy community even better. \r\n\r\nWhat are the benefits of attending a sprint?\r\n\r\nMake open source Python better! Code alongside package authors/contributors, while learning from them. Become a power user of a core package by gaining a deeper understanding of its inner workings. Improve your github profile. Get to know other SciPy community members at the Sprints dinner.\r\n\r\nCan I participate?\r\n\r\nYes! Sprints are free and open to everyone no matter what your programming level of experience.  Sprints are a great way to add your contribution to your favorite Python libraries and packages. Thanks to the generosity of our sponsors, sprints are free of charge for all participants, including the Sprints dinner on Saturday evening.\r\n\r\nIf you aren't sure about how you can contribute to a project, it's not a problem. We'll get you up to speed at the How to Contribute to Open Source BoF on Friday and we have helpers at the beginner friendly sprints.", "recording_license": "", "do_not_record": false, "persons": [{"id": 272, "code": "CQEZY8", "public_name": "Brigitta Sip\u0151cz", "biography": "I am an astronomer turned Research Software Engineer. I work at Caltech/IPAC to build and improve tools, e.g. Python libraries and Science Platforms to provide ways to access data in the NASA/IPAC Infrared Science Archive. Prior to joining IPAC, I was DiRAC Fellow in the data engineering team at the Institute for Data Intensive Research in Astrophysics and Cosmology in Seattle. I am a developer and maintainer of several open-source astronomy libraries and their infrastructure (e.g. astroML, astroquery, astropy) and I very much enjoy contributing to upstream projects as well in the wider Scientific Python ecosystem. I have a keen interest in finding ways to make tools more sustainable. I am a fellow of the Software Sustainability Institute.", "answers": []}, {"id": 474, "code": "J9RENW", "public_name": "Tania Allard", "biography": null, "answers": []}, {"id": 475, "code": "F8FS8L", "public_name": "Alan Braz", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 7, "date": "2023-07-16", "day_start": "2023-07-16T04:00:00-05:00", "day_end": "2023-07-17T03:59:00-05:00", "rooms": {"Amphitheater 204": [{"id": 299, "guid": "807aa7d6-6efa-59d3-8dc0-df79ec04d426", "logo": "", "date": "2023-07-16T09:00:00-05:00", "start": "09:00", "duration": "01:00", "room": "Amphitheater 204", "slug": "2023-299-open-source-sprints-kickoff-in-room-204-", "url": "https://cfp.scipy.org/2023/talk/WTNHTR/", "title": "Open Source Sprints [Kickoff in Room 204]", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Everyone will meet in Room 204 and organize before breaking out for the remainder of the day. \r\n\r\nEvery year, our community dedicates the last 2 days of the SciPy conference to Sprints, where we work together on open-source projects to push our ecosystem forward.\r\n\r\nSprints are an informal part of the conference, where all are welcome to exchange ideas, hack on exciting projects, and create lasting connections.  All programming levels are welcome at the sprints.\r\n\r\nJoin us for the preparatory Sprint BoF as well on Friday at 4:40 in Room 103 - https://cfp.scipy.org/2023/talk/JXWQPG/\r\n\r\nInterested in leading a sprint at SciPy 2022? Sign up at https://www.scipy2023.scipy.org/sprints", "description": "Sprints FAQs\r\nWhat will you do as an attendee?\r\n\r\nThere are a variety of ways to contribute during the sprints session including testing code, fixing bugs, adding new features, and improving documentation. You could also contribute to an entirely brand new project that our ecosystem is missing. One of the best parts about the sprints is that you might also have the opportunity to work with authors and core contributors of your favorite open source packages, as well as, the opportunity to work alongside other developers who are just as excited as you are to make the SciPy community even better. \r\n\r\nWhat are the benefits of attending a sprint?\r\n\r\nMake open source Python better! Code alongside package authors/contributors, while learning from them. Become a power user of a core package by gaining a deeper understanding of its inner workings. Improve your github profile. Get to know other SciPy community members at the Sprints dinner.\r\n\r\nCan I participate?\r\n\r\nYes! Sprints are free and open to everyone no matter what your programming level of experience.  Sprints are a great way to add your contribution to your favorite Python libraries and packages. Thanks to the generosity of our sponsors, sprints are free of charge for all participants, including the Sprints dinner on Saturday evening.\r\n\r\nIf you aren't sure about how you can contribute to a project, it's not a problem. We'll get you up to speed at the How to Contribute to Open Source BoF on Friday and we have helpers at the beginner friendly sprints.", "recording_license": "", "do_not_record": false, "persons": [{"id": 272, "code": "CQEZY8", "public_name": "Brigitta Sip\u0151cz", "biography": "I am an astronomer turned Research Software Engineer. I work at Caltech/IPAC to build and improve tools, e.g. Python libraries and Science Platforms to provide ways to access data in the NASA/IPAC Infrared Science Archive. Prior to joining IPAC, I was DiRAC Fellow in the data engineering team at the Institute for Data Intensive Research in Astrophysics and Cosmology in Seattle. I am a developer and maintainer of several open-source astronomy libraries and their infrastructure (e.g. astroML, astroquery, astropy) and I very much enjoy contributing to upstream projects as well in the wider Scientific Python ecosystem. I have a keen interest in finding ways to make tools more sustainable. I am a fellow of the Software Sustainability Institute.", "answers": []}, {"id": 474, "code": "J9RENW", "public_name": "Tania Allard", "biography": null, "answers": []}, {"id": 475, "code": "F8FS8L", "public_name": "Alan Braz", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 8, "date": "2023-07-17", "day_start": "2023-07-17T04:00:00-05:00", "day_end": "2023-07-18T03:59:00-05:00", "rooms": {}}, {"index": 9, "date": "2023-07-18", "day_start": "2023-07-18T04:00:00-05:00", "day_end": "2023-07-19T03:59:00-05:00", "rooms": {}}, {"index": 10, "date": "2023-07-19", "day_start": "2023-07-19T04:00:00-05:00", "day_end": "2023-07-20T03:59:00-05:00", "rooms": {}}, {"index": 11, "date": "2023-07-20", "day_start": "2023-07-20T04:00:00-05:00", "day_end": "2023-07-21T03:59:00-05:00", "rooms": {}}, {"index": 12, "date": "2023-07-21", "day_start": "2023-07-21T04:00:00-05:00", "day_end": "2023-07-22T03:59:00-05:00", "rooms": {}}, {"index": 13, "date": "2023-07-22", "day_start": "2023-07-22T04:00:00-05:00", "day_end": "2023-07-23T03:59:00-05:00", "rooms": {}}, {"index": 14, "date": "2023-07-23", "day_start": "2023-07-23T04:00:00-05:00", "day_end": "2023-07-24T03:59:00-05:00", "rooms": {}}, {"index": 15, "date": "2023-07-24", "day_start": "2023-07-24T04:00:00-05:00", "day_end": "2023-07-25T03:59:00-05:00", "rooms": {}}, {"index": 16, "date": "2023-07-25", "day_start": "2023-07-25T04:00:00-05:00", "day_end": "2023-07-26T03:59:00-05:00", "rooms": {}}, {"index": 17, "date": "2023-07-26", "day_start": "2023-07-26T04:00:00-05:00", "day_end": "2023-07-27T03:59:00-05:00", "rooms": {}}, {"index": 18, "date": "2023-07-27", "day_start": "2023-07-27T04:00:00-05:00", "day_end": "2023-07-28T03:59:00-05:00", "rooms": {}}, {"index": 19, "date": "2023-07-28", "day_start": "2023-07-28T04:00:00-05:00", "day_end": "2023-07-29T03:59:00-05:00", "rooms": {}}, {"index": 20, "date": "2023-07-29", "day_start": "2023-07-29T04:00:00-05:00", "day_end": "2023-07-30T03:59:00-05:00", "rooms": {}}, {"index": 21, "date": "2023-07-30", "day_start": "2023-07-30T04:00:00-05:00", "day_end": "2023-07-31T03:59:00-05:00", "rooms": {}}, {"index": 22, "date": "2023-07-31", "day_start": "2023-07-31T04:00:00-05:00", "day_end": "2023-08-01T03:59:00-05:00", "rooms": {}}, {"index": 23, "date": "2023-08-01", "day_start": "2023-08-01T04:00:00-05:00", "day_end": "2023-08-02T03:59:00-05:00", "rooms": {}}, {"index": 24, "date": "2023-08-02", "day_start": "2023-08-02T04:00:00-05:00", "day_end": "2023-08-03T03:59:00-05:00", "rooms": {}}, {"index": 25, "date": "2023-08-03", "day_start": "2023-08-03T04:00:00-05:00", "day_end": "2023-08-04T03:59:00-05:00", "rooms": {}}, {"index": 26, "date": "2023-08-04", "day_start": "2023-08-04T04:00:00-05:00", "day_end": "2023-08-05T03:59:00-05:00", "rooms": {}}, {"index": 27, "date": "2023-08-05", "day_start": "2023-08-05T04:00:00-05:00", "day_end": "2023-08-06T03:59:00-05:00", "rooms": {}}, {"index": 28, "date": "2023-08-06", "day_start": "2023-08-06T04:00:00-05:00", "day_end": "2023-08-07T03:59:00-05:00", "rooms": {}}, {"index": 29, "date": "2023-08-07", "day_start": "2023-08-07T04:00:00-05:00", "day_end": "2023-08-08T03:59:00-05:00", "rooms": {}}, {"index": 30, "date": "2023-08-08", "day_start": "2023-08-08T04:00:00-05:00", "day_end": "2023-08-09T03:59:00-05:00", "rooms": {}}, {"index": 31, "date": "2023-08-09", "day_start": "2023-08-09T04:00:00-05:00", "day_end": "2023-08-10T03:59:00-05:00", "rooms": {}}, {"index": 32, "date": "2023-08-10", "day_start": "2023-08-10T04:00:00-05:00", "day_end": "2023-08-11T03:59:00-05:00", "rooms": {}}, {"index": 33, "date": "2023-08-11", "day_start": "2023-08-11T04:00:00-05:00", "day_end": "2023-08-12T03:59:00-05:00", "rooms": {}}, {"index": 34, "date": "2023-08-12", "day_start": "2023-08-12T04:00:00-05:00", "day_end": "2023-08-13T03:59:00-05:00", "rooms": {}}, {"index": 35, "date": "2023-08-13", "day_start": "2023-08-13T04:00:00-05:00", "day_end": "2023-08-14T03:59:00-05:00", "rooms": {}}, {"index": 36, "date": "2023-08-14", "day_start": "2023-08-14T04:00:00-05:00", "day_end": "2023-08-15T03:59:00-05:00", "rooms": {}}, {"index": 37, "date": "2023-08-15", "day_start": "2023-08-15T04:00:00-05:00", "day_end": "2023-08-16T03:59:00-05:00", "rooms": {}}, {"index": 38, "date": "2023-08-16", "day_start": "2023-08-16T04:00:00-05:00", "day_end": "2023-08-17T03:59:00-05:00", "rooms": {}}, {"index": 39, "date": "2023-08-17", "day_start": "2023-08-17T04:00:00-05:00", "day_end": "2023-08-18T03:59:00-05:00", "rooms": {}}, {"index": 40, "date": "2023-08-18", "day_start": "2023-08-18T04:00:00-05:00", "day_end": "2023-08-19T03:59:00-05:00", "rooms": {}}, {"index": 41, "date": "2023-08-19", "day_start": "2023-08-19T04:00:00-05:00", "day_end": "2023-08-20T03:59:00-05:00", "rooms": {}}, {"index": 42, "date": "2023-08-20", "day_start": "2023-08-20T04:00:00-05:00", "day_end": "2023-08-21T03:59:00-05:00", "rooms": {}}, {"index": 43, "date": "2023-08-21", "day_start": "2023-08-21T04:00:00-05:00", "day_end": "2023-08-22T03:59:00-05:00", "rooms": {}}, {"index": 44, "date": "2023-08-22", "day_start": "2023-08-22T04:00:00-05:00", "day_end": "2023-08-23T03:59:00-05:00", "rooms": {}}, {"index": 45, "date": "2023-08-23", "day_start": "2023-08-23T04:00:00-05:00", "day_end": "2023-08-24T03:59:00-05:00", "rooms": {}}, {"index": 46, "date": "2023-08-24", "day_start": "2023-08-24T04:00:00-05:00", "day_end": "2023-08-25T03:59:00-05:00", "rooms": {}}, {"index": 47, "date": "2023-08-25", "day_start": "2023-08-25T04:00:00-05:00", "day_end": "2023-08-26T03:59:00-05:00", "rooms": {}}, {"index": 48, "date": "2023-08-26", "day_start": "2023-08-26T04:00:00-05:00", "day_end": "2023-08-27T03:59:00-05:00", "rooms": {}}, {"index": 49, "date": "2023-08-27", "day_start": "2023-08-27T04:00:00-05:00", "day_end": "2023-08-28T03:59:00-05:00", "rooms": {}}, {"index": 50, "date": "2023-08-28", "day_start": "2023-08-28T04:00:00-05:00", "day_end": "2023-08-29T03:59:00-05:00", "rooms": {}}, {"index": 51, "date": "2023-08-29", "day_start": "2023-08-29T04:00:00-05:00", "day_end": "2023-08-30T03:59:00-05:00", "rooms": {}}, {"index": 52, "date": "2023-08-30", "day_start": "2023-08-30T04:00:00-05:00", "day_end": "2023-08-31T03:59:00-05:00", "rooms": {}}, {"index": 53, "date": "2023-08-31", "day_start": "2023-08-31T04:00:00-05:00", "day_end": "2023-09-01T03:59:00-05:00", "rooms": {}}, {"index": 54, "date": "2023-09-01", "day_start": "2023-09-01T04:00:00-05:00", "day_end": "2023-09-02T03:59:00-05:00", "rooms": {}}, {"index": 55, "date": "2023-09-02", "day_start": "2023-09-02T04:00:00-05:00", "day_end": "2023-09-03T03:59:00-05:00", "rooms": {}}, {"index": 56, "date": "2023-09-03", "day_start": "2023-09-03T04:00:00-05:00", "day_end": "2023-09-04T03:59:00-05:00", "rooms": {}}, {"index": 57, "date": "2023-09-04", "day_start": "2023-09-04T04:00:00-05:00", "day_end": "2023-09-05T03:59:00-05:00", "rooms": {}}, {"index": 58, "date": "2023-09-05", "day_start": "2023-09-05T04:00:00-05:00", "day_end": "2023-09-06T03:59:00-05:00", "rooms": {}}, {"index": 59, "date": "2023-09-06", "day_start": "2023-09-06T04:00:00-05:00", "day_end": "2023-09-07T03:59:00-05:00", "rooms": {}}, {"index": 60, "date": "2023-09-07", "day_start": "2023-09-07T04:00:00-05:00", "day_end": "2023-09-08T03:59:00-05:00", "rooms": {}}, {"index": 61, "date": "2023-09-08", "day_start": "2023-09-08T04:00:00-05:00", "day_end": "2023-09-09T03:59:00-05:00", "rooms": {}}, {"index": 62, "date": "2023-09-09", "day_start": "2023-09-09T04:00:00-05:00", "day_end": "2023-09-10T03:59:00-05:00", "rooms": {}}, {"index": 63, "date": "2023-09-10", "day_start": "2023-09-10T04:00:00-05:00", "day_end": "2023-09-11T03:59:00-05:00", "rooms": {}}, {"index": 64, "date": "2023-09-11", "day_start": "2023-09-11T04:00:00-05:00", "day_end": "2023-09-12T03:59:00-05:00", "rooms": {}}, {"index": 65, "date": "2023-09-12", "day_start": "2023-09-12T04:00:00-05:00", "day_end": "2023-09-13T03:59:00-05:00", "rooms": {}}, {"index": 66, "date": "2023-09-13", "day_start": "2023-09-13T04:00:00-05:00", "day_end": "2023-09-14T03:59:00-05:00", "rooms": {}}, {"index": 67, "date": "2023-09-14", "day_start": "2023-09-14T04:00:00-05:00", "day_end": "2023-09-15T03:59:00-05:00", "rooms": {}}, {"index": 68, "date": "2023-09-15", "day_start": "2023-09-15T04:00:00-05:00", "day_end": "2023-09-16T03:59:00-05:00", "rooms": {}}, {"index": 69, "date": "2023-09-16", "day_start": "2023-09-16T04:00:00-05:00", "day_end": "2023-09-17T03:59:00-05:00", "rooms": {}}, {"index": 70, "date": "2023-09-17", "day_start": "2023-09-17T04:00:00-05:00", "day_end": "2023-09-18T03:59:00-05:00", "rooms": {}}, {"index": 71, "date": "2023-09-18", "day_start": "2023-09-18T04:00:00-05:00", "day_end": "2023-09-19T03:59:00-05:00", "rooms": {}}, {"index": 72, "date": "2023-09-19", "day_start": "2023-09-19T04:00:00-05:00", "day_end": "2023-09-20T03:59:00-05:00", "rooms": {}}, {"index": 73, "date": "2023-09-20", "day_start": "2023-09-20T04:00:00-05:00", "day_end": "2023-09-21T03:59:00-05:00", "rooms": {}}, {"index": 74, "date": "2023-09-21", "day_start": "2023-09-21T04:00:00-05:00", "day_end": "2023-09-22T03:59:00-05:00", "rooms": {}}, {"index": 75, "date": "2023-09-22", "day_start": "2023-09-22T04:00:00-05:00", "day_end": "2023-09-23T03:59:00-05:00", "rooms": {}}, {"index": 76, "date": "2023-09-23", "day_start": "2023-09-23T04:00:00-05:00", "day_end": "2023-09-24T03:59:00-05:00", "rooms": {}}, {"index": 77, "date": "2023-09-24", "day_start": "2023-09-24T04:00:00-05:00", "day_end": "2023-09-25T03:59:00-05:00", "rooms": {}}, {"index": 78, "date": "2023-09-25", "day_start": "2023-09-25T04:00:00-05:00", "day_end": "2023-09-26T03:59:00-05:00", "rooms": {}}, {"index": 79, "date": "2023-09-26", "day_start": "2023-09-26T04:00:00-05:00", "day_end": "2023-09-27T03:59:00-05:00", "rooms": {}}, {"index": 80, "date": "2023-09-27", "day_start": "2023-09-27T04:00:00-05:00", "day_end": "2023-09-28T03:59:00-05:00", "rooms": {}}, {"index": 81, "date": "2023-09-28", "day_start": "2023-09-28T04:00:00-05:00", "day_end": "2023-09-29T03:59:00-05:00", "rooms": {}}, {"index": 82, "date": "2023-09-29", "day_start": "2023-09-29T04:00:00-05:00", "day_end": "2023-09-30T03:59:00-05:00", "rooms": {}}, {"index": 83, "date": "2023-09-30", "day_start": "2023-09-30T04:00:00-05:00", "day_end": "2023-10-01T03:59:00-05:00", "rooms": {}}, {"index": 84, "date": "2023-10-01", "day_start": "2023-10-01T04:00:00-05:00", "day_end": "2023-10-02T03:59:00-05:00", "rooms": {}}, {"index": 85, "date": "2023-10-02", "day_start": "2023-10-02T04:00:00-05:00", "day_end": "2023-10-03T03:59:00-05:00", "rooms": {}}, {"index": 86, "date": "2023-10-03", "day_start": "2023-10-03T04:00:00-05:00", "day_end": "2023-10-04T03:59:00-05:00", "rooms": {}}, {"index": 87, "date": "2023-10-04", "day_start": "2023-10-04T04:00:00-05:00", "day_end": "2023-10-05T03:59:00-05:00", "rooms": {}}, {"index": 88, "date": "2023-10-05", "day_start": "2023-10-05T04:00:00-05:00", "day_end": "2023-10-06T03:59:00-05:00", "rooms": {}}, {"index": 89, "date": "2023-10-06", "day_start": "2023-10-06T04:00:00-05:00", "day_end": "2023-10-07T03:59:00-05:00", "rooms": {}}, {"index": 90, "date": "2023-10-07", "day_start": "2023-10-07T04:00:00-05:00", "day_end": "2023-10-08T03:59:00-05:00", "rooms": {}}, {"index": 91, "date": "2023-10-08", "day_start": "2023-10-08T04:00:00-05:00", "day_end": "2023-10-09T03:59:00-05:00", "rooms": {}}, {"index": 92, "date": "2023-10-09", "day_start": "2023-10-09T04:00:00-05:00", "day_end": "2023-10-10T03:59:00-05:00", "rooms": {}}, {"index": 93, "date": "2023-10-10", "day_start": "2023-10-10T04:00:00-05:00", "day_end": "2023-10-11T03:59:00-05:00", "rooms": {}}, {"index": 94, "date": "2023-10-11", "day_start": "2023-10-11T04:00:00-05:00", "day_end": "2023-10-12T03:59:00-05:00", "rooms": {}}, {"index": 95, "date": "2023-10-12", "day_start": "2023-10-12T04:00:00-05:00", "day_end": "2023-10-13T03:59:00-05:00", "rooms": {}}, {"index": 96, "date": "2023-10-13", "day_start": "2023-10-13T04:00:00-05:00", "day_end": "2023-10-14T03:59:00-05:00", "rooms": {}}, {"index": 97, "date": "2023-10-14", "day_start": "2023-10-14T04:00:00-05:00", "day_end": "2023-10-15T03:59:00-05:00", "rooms": {}}, {"index": 98, "date": "2023-10-15", "day_start": "2023-10-15T04:00:00-05:00", "day_end": "2023-10-16T03:59:00-05:00", "rooms": {}}, {"index": 99, "date": "2023-10-16", "day_start": "2023-10-16T04:00:00-05:00", "day_end": "2023-10-17T03:59:00-05:00", "rooms": {}}, {"index": 100, "date": "2023-10-17", "day_start": "2023-10-17T04:00:00-05:00", "day_end": "2023-10-18T03:59:00-05:00", "rooms": {}}, {"index": 101, "date": "2023-10-18", "day_start": "2023-10-18T04:00:00-05:00", "day_end": "2023-10-19T03:59:00-05:00", "rooms": {}}, {"index": 102, "date": "2023-10-19", "day_start": "2023-10-19T04:00:00-05:00", "day_end": "2023-10-20T03:59:00-05:00", "rooms": {}}, {"index": 103, "date": "2023-10-20", "day_start": "2023-10-20T04:00:00-05:00", "day_end": "2023-10-21T03:59:00-05:00", "rooms": {}}, {"index": 104, "date": "2023-10-21", "day_start": "2023-10-21T04:00:00-05:00", "day_end": "2023-10-22T03:59:00-05:00", "rooms": {}}, {"index": 105, "date": "2023-10-22", "day_start": "2023-10-22T04:00:00-05:00", "day_end": "2023-10-23T03:59:00-05:00", "rooms": {}}, {"index": 106, "date": "2023-10-23", "day_start": "2023-10-23T04:00:00-05:00", "day_end": "2023-10-24T03:59:00-05:00", "rooms": {}}, {"index": 107, "date": "2023-10-24", "day_start": "2023-10-24T04:00:00-05:00", "day_end": "2023-10-25T03:59:00-05:00", "rooms": {}}, {"index": 108, "date": "2023-10-25", "day_start": "2023-10-25T04:00:00-05:00", "day_end": "2023-10-26T03:59:00-05:00", "rooms": {}}, {"index": 109, "date": "2023-10-26", "day_start": "2023-10-26T04:00:00-05:00", "day_end": "2023-10-27T03:59:00-05:00", "rooms": {}}, {"index": 110, "date": "2023-10-27", "day_start": "2023-10-27T04:00:00-05:00", "day_end": "2023-10-28T03:59:00-05:00", "rooms": {}}, {"index": 111, "date": "2023-10-28", "day_start": "2023-10-28T04:00:00-05:00", "day_end": "2023-10-29T03:59:00-05:00", "rooms": {}}, {"index": 112, "date": "2023-10-29", "day_start": "2023-10-29T04:00:00-05:00", "day_end": "2023-10-30T03:59:00-05:00", "rooms": {}}, {"index": 113, "date": "2023-10-30", "day_start": "2023-10-30T04:00:00-05:00", "day_end": "2023-10-31T03:59:00-05:00", "rooms": {}}, {"index": 114, "date": "2023-10-31", "day_start": "2023-10-31T04:00:00-05:00", "day_end": "2023-11-01T03:59:00-05:00", "rooms": {}}, {"index": 115, "date": "2023-11-01", "day_start": "2023-11-01T04:00:00-05:00", "day_end": "2023-11-02T03:59:00-05:00", "rooms": {}}, {"index": 116, "date": "2023-11-02", "day_start": "2023-11-02T04:00:00-05:00", "day_end": "2023-11-03T03:59:00-05:00", "rooms": {}}, {"index": 117, "date": "2023-11-03", "day_start": "2023-11-03T04:00:00-05:00", "day_end": "2023-11-04T03:59:00-05:00", "rooms": {}}, {"index": 118, "date": "2023-11-04", "day_start": "2023-11-04T04:00:00-05:00", "day_end": "2023-11-05T02:59:00-06:00", "rooms": {}}, {"index": 119, "date": "2023-11-05", "day_start": "2023-11-05T03:00:00-06:00", "day_end": "2023-11-06T02:59:00-06:00", "rooms": {}}, {"index": 120, "date": "2023-11-06", "day_start": "2023-11-06T03:00:00-06:00", "day_end": "2023-11-07T02:59:00-06:00", "rooms": {}}, {"index": 121, "date": "2023-11-07", "day_start": "2023-11-07T03:00:00-06:00", "day_end": "2023-11-08T02:59:00-06:00", "rooms": {}}, {"index": 122, "date": "2023-11-08", "day_start": "2023-11-08T03:00:00-06:00", "day_end": "2023-11-09T02:59:00-06:00", "rooms": {}}, {"index": 123, "date": "2023-11-09", "day_start": "2023-11-09T03:00:00-06:00", "day_end": "2023-11-10T02:59:00-06:00", "rooms": {}}, {"index": 124, "date": "2023-11-10", "day_start": "2023-11-10T03:00:00-06:00", "day_end": "2023-11-11T02:59:00-06:00", "rooms": {}}]}}}