Keeping LLMs in Their Lane: Focused AI for Data Science and Research SciPy 2025

Keeping LLMs in Their Lane: Focused AI for Data Science and Research
.ical

2025-07-09 14:35–15:05, Room 315

LLMs are powerful, flexible, easy-to-use... and often wrong. This is a dangerous combination, especially for data analysis and scientific research, where correctness and reproducibility are core requirements. Fortunately, it turns out that by carefully applying LLMs to narrower use cases, we can turn them into surprisingly reliable assistants that accelerate and enhance, rather than undermine, scientific work.

This is not just theory—I’ll showcase working examples of seamlessly integrating LLMs into analytic workflows, helping data scientists build interactive, intelligent applications without needing to be web developers. You’ll see firsthand how keeping LLMs focused lets us leverage their "intelligence" in a way that’s practical, rigorous, and reproducible.

This talk is for Python data scientists, researchers, and developers looking to integrate AI into their work in a practical, responsible way—or skeptical that it's even possible.

In data analysis, correctness and reproducibility are essential, yet general-purpose AI tools lack the structure and determinism needed to ensure reliable results. Instead of treating LLMs as open-ended assistants, we should focus on applying them to well-defined tasks with clear guardrails. When used this way, they can be not just useful, but (relatively) safe and highly effective.

This talk will explore how to integrate LLMs into scientific workflows in a controlled and purposeful way. Instead of relying on generic AI assistants, we can build focused tools that guide and enhance research without introducing unnecessary complexity. I’ll discuss design principles for creating AI solutions that combine the creativity of LLMs, the reliability of deterministic software, and the safety of human oversight.

Live demos will show how LLMs can be embedded in interactive applications, assisting with real-world data workflows while maintaining transparency and control. These applications produce analyses that can be not only verified and trusted, but even reused and extended.

My hope is for attendees to leave the talk inspired to build their own thoughtful LLM solutions that accelerate their research without sacrificing rigor.

Links to packages used for demos

Sample of previous talks

Joe Cheng

Joe Cheng is CTO and first employee at Posit (formerly known as RStudio) and the creator of Shiny, a reactive web framework for creating data and AI applications using Python or R. He has been writing and maintaining open source software at the intersection of data analysis and the web for over 15 years.

Keeping LLMs in Their Lane: Focused AI for Data Science and Research .ical 2025-07-09 14:35–15:05, Room 315

Links to packages used for demos

Sample of previous talks

Keeping LLMs in Their Lane: Focused AI for Data Science and Research
.ical

2025-07-09 14:35–15:05, Room 315