Matt Harrison
Matt is a corporate trainer, author, and consultant on Python and Data Science. He has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books: Effective Pandas, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He just published Machine Learning Pocket Reference and Pandas Cookbook (Second Edition). He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), as well as small companies. He has been using Python since 2000 and has taught thousands through live training both online and in person.
Sessions
Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I've seen with Pandas code after working with the library for a while and writing three books on it.
We will discuss:
- Proper types
- Chaining
- Aggregation
- Debugging
DataFrame libraries in general, pandas and Dask specifically, are moving towards a better integration with PyArrow. This has many benefits, like improved performance and a reduced memory footprint. We want to connect with users to discuss how PyArrow can improve DataFrame libraries and what they expect out of PyArrow support. This can include things like improved performance, more consistent behavior or better interoperability with other libraries.