From the course: ETL in Python and SQL
Unlock the full course today
Join today to access over 24,000 courses taught by industry experts.
Understanding your data
From the course: ETL in Python and SQL
Understanding your data
- [Instructor] Let's discuss how to explore data using Pandas. Understanding the data is a big part of data warehousing and ETL creation. There are a couple of reasons why we need to explore and understand our data before transformation and warehousing. Before warehousing our data, we must be familiar with the data structure, the format, data types, and relationships between tables and columns. This is very important for the transformation stage of the ETL. We also need to consider data quality. Are there duplicates, inconsistencies, missing values or no records? We need to understand this so we can address any data quality issues. We also need to figure out if and what transformations are required. We need to take into consideration whether there are existing business logics that should be applied to the data. We also need to figure out how to structure the ETL process as well as what columns, if any, need to be standardized. For example, converting time columns from string to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.