Building an AI Agent for Natural Language to SQL Query Execution on Live Databases SciPy 2025

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases
.ical

2025-07-08 08:00–12:00, Ballroom A

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this 4-hour session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

Overview

Natural‑language interfaces unlock database insights for non‑technical users. This tutorial provides a practical implementation for building these systems reliably and effectively.

Participants will build an AI agent system that can:

Route intelligently between SQL generation and ReAct chat agent workflows
Ingest and understand database schemas with domain knowledge
Retrieve relevant context and similar query examples using RAG with vector similarity
Generate accurate SQL with validation and safety guardrails
Execute queries safely with human-in-the-loop approval
Present results in an understandable format
Track costs and monitor performance using LangSmith
Manage session-based memory and conversation context

We'll use the Kaggle dataset "Brazilian E-Commerce dataset by Olist" as our working example, demonstrating how to handle multiple tables across two schemas with complex relationships. This dataset will be hosted on an EC2 AWS instance for live interaction during the tutorial.

This tutorial addresses real-world database complexity with production-grade considerations. Participants will start from a repository with backbone code and implement the key components during the session. By the end, attendees will have a working system they can adapt to their own datasets.

Tools and Frameworks

This tutorial will leverage modern tools and frameworks for efficient development:

AI and Agent Frameworks:
- LangGraph for agent orchestration and workflow management
- LangChain for agent components and LLM interactions
- LangSmith for comprehensive cost tracking and monitoring
- OpenAI models with examples of alternatives

Database and Vector Store:
- SQLAlchemy for database interactions and schema retrieval
- PostgreSQL as the database engine for the live dataset
- PGVector for similarity-based query retrieval

Development:
- YAML for configuration management
- pyproject.toml for standardized project configuration
- UV reliable package management and Ruff for code formatting/linting

Prerequisites:

Python programming experience
Basic understanding of API interactions
Basic familiarity with SQL and database management
Laptop with Git and UV, and your preferred Python IDE (I recommend VSCode) installed.

No prior experience with LLMs, RAG, or advanced NLP is required.

Installation Instructions:

Setup instructions are here -- workshop materials are linked at the end of the initial setup: https://github.com/cmcouto-silva/nl2sql-agent/blob/main/docs/prerequisites.md

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases .ical 2025-07-08 08:00–12:00, Ballroom A

Overview

Tools and Frameworks

Building an AI Agent for Natural Language to SQL Query Execution on Live Databases
.ical

2025-07-08 08:00–12:00, Ballroom A