Skip to content

ETL Using Python and Pandas: Part 2

In 2017, I whipped up a quick post to summarize some reusable Extract Transform Load (ETL) functions using Python and Pandas. To my pleasant surprise, it still receives a steady stream of visitor traffic. That fact alone indicates there are enough interests out there.

In this follow-on post, I am going to push the complexity a little further. The key differences include

  • Reading data from mutli-sheet Excel files, instead of CSV files
  • Checking for blank data
  • Removing unwanted columns
  • Visualizing data with basic bar graphs and pivot tables

The Jupyter (iPython) version is also available.

The sample data files are published on Github: