SciPy 2024

Defining Valuable Data for Mapping Open Source Science
07-12, 17:45–18:40 (US/Pacific), Room 318

This BoF session will explore the essential data required to build a robust and comprehensive map of the open source science landscape. The discussion will center around the types of data needed, the challenges in collecting and curating this data, and the potential insights and benefits that such a map can provide. Participants will engage in a discussion on how to effectively gather and use data to illuminate the dynamics, challenges, and opportunities within open source and open science ecosystems.

Key Discussion Points:

  • Identifying necessary data types. For example: people: pull requests, hours committed, number up upvotes on issues; projects: number of direct dependencies, number of citations, etc.
  • Challenges in data collection and curation.
  • Methodologies for ensuring data accuracy and comprehensiveness.
  • Insights gained from mapping large datasets.
  • The impact of a comprehensive map on the open source and open science communities.
  • Future directions and potential improvements for data collection and mapping.

The context of this BoF will be set by a brief demonstration of The Map of Open Source Science seen on https://opensource.science