SciPy 2025

Shaurya Agarwal

Shaurya Agarwal has been tinkering with Data, Cloud technologies, Machine Learning, Data Science and now GenAI for over 21 years. Now a Director with PwC India, Shaurya brings his expertise and experience to solve critical problems across a very wide set of domains.

The speaker's profile picture

Sessions

07-07
13:30
240min
The-Silmaril: Practice #ontology engineering with Python (and other languages).
Shaurya Agarwal

Ontologies provide a powerful way to structure knowledge, enable reasoning, and support more meaningful queries compared to traditional data models. Recently, interest in ontologies has resurged, driven by advancements in language models, reasoning capabilities, and the growing adoption of platforms like Palantir Foundry.

In this hands-on tutorial, participants will explore ontology development across multiple domains using a variety of Python-based tools such as rdflib, Owlready2, SWI-Prolog, PySpark, Pandas, NetworkX, and SciPy. They will learn how ontologies facilitate semantic reasoning, improve data interoperability, and enhance query capabilities.
Additionally, attendees will build a rudimentary reasoning engine to better understand inference mechanisms.
The tutorial emphasizes practical applications and comparisons with conventional data representations, making it ideal for researchers, data engineers, and developers interested in knowledge representation and reasoning.

Tutorials
Room 317
0min
Fun with FinOps - Playing with Cloud Cost and Usage Data in Python
Shaurya Agarwal

Cloud cost management (FinOps) is an essential practice for organizations optimizing their cloud spend. This talk will explore how to analyze and visualize cloud usage and cost data using Python. We will demystify Cloud Usage Reports (CUR), perform exploratory data analysis (EDA) on sample datasets, and identify correlations between services. Finally, we will introduce time series forecasting techniques to predict future cloud expenses. Attendees will leave with practical Python tools and workflows for managing cloud financials effectively.

General
0min
Common Patterns of Complexity and Opportunities for Optimization in Data Engineering (PySpark and friends)
Shaurya Agarwal

Distributed data processing is key for scale. PySpark is powerful for big data, but common inefficiencies slow it down.
This talk highlights patterns that introduce complexity and shows how to optimize them.
Using the MovieLens dataset, we'll demonstrate how typical mistakes impact performance and how to fix them.
You'll leave with a better understanding of PySpark execution and practical strategies to speed up your workflows.
This talk will use code and examples that are part of a larger data engineering workshop that recieved a lot of acclaim at forums like PyData Global, PyCon India etc.
All files, code, exercises and notes will be freely available on GitHub: https://github.com/shauryashaurya/learn-data-munging/tree/main/03-Spark

General