SciPy 2024

Pretraining and Finetuning LLMs from the Ground Up
07-08, 13:30–17:30 (US/Pacific), Ballroom A

This tutorial is aimed at coders interested in understanding the building blocks of large language models (LLMs), how LLMs work, and how to code them from the ground up in PyTorch. We will kick off this tutorial with an introduction to LLMs, recent milestones, and their use cases. Then, we will code a small GPT-like LLM, including its data input pipeline, core architecture components, and pretraining code ourselves. After understanding how everything fits together and how to pretrain an LLM, we will learn how to load pretrained weights and finetune LLMs using open-source libraries.


This tutorial will be aimed at Python programmers interested in learning about LLMs by coding them from PyTorch. The focus is on coding the building blocks from the ground up in PyTorch rather than using LLM code libraries. Some experience with PyTorch (or array libraries like NumPy) is beneficial but not strictly required.

We will start this 4-hour session with a big-picture introduction to LLMs to understand what LLMs are and how they are structured, trained, and used. Then, we will embark on a small journey towards implementing a small but fully functional GPT-like LLM for educational purposes.

In this code implementation, we will look at and implement the input text data processing pipeline, code the main architecture of the LLM, and write the pretraining code. Note that pretraining an LLM on expensive GPU hardware can take many weeks to months, which is outside the scope of this workshop. However, once you learn how the pretraining works, we will load openly available model weights into our architecture such that our model becomes fully useable in an instant.

Next, after we coded our own LLM, we will use the open-source Lit-GPT library, which is based on readable yet efficient and optimized code for training and finetuning LLMs in practice. Lastly, we will learn how the popular Low-Rank Adaptation method for efficient LLM finetuning works and how we can finetune LLMs using Lit-GPT. (Lit-GPT has been one of the most widely used libraries at the 2023 NeurIPS LLM Efficiency Challenge and was also recently by researchers to develop TinyLlama.)

The overarching goal of this workshop is to equip participants with a thorough understanding of LLMs by engaging them in the development process. By the end of the session, attendees will not only have gained conceptual knowledge but also practical experience in building and finetuning an LLM.


Prerequisites

Some experience with PyTorch (or array libraries like NumPy) is beneficial but not strictly required.

Installation Instructions

Please see the instructions here to set up your computer to run the code locally: https://github.com/rasbt/LLM-workshop-2024/tree/main/setup In addition, a ready-to-go cloud environment, complete with all code examples and dependencies installed, will be shared during the workshop. This will enable participants to run all code, particularly in the pretraining and finetuning sections, on a GPU.

My name is Sebastian Raschka, and I am a machine learning and AI researcher. Next to being a researcher, I also have a strong passion for education and am best known for my bestselling books on machine learning using open-source software.

After my PhD, I joined the University of Wisconsin-Madison as a professor in the Department of Statistics, where I focused deep learning and machine learning research until 2023.

Taking a yearlong break from academia, I joined Lightning AI in 2022, where I am now a Staff Research Engineer focusing on the intersection of AI research, software development, and large language models (LLMs).

If you are interested in learning more about me or my projects, please visit my website at https://sebastianraschka.com