In general explanation, data science is nothing more than using advanced statistical and machine learning techniques to solve various problems using data. Yet, it’s easier to just dive into applying some fancy machine learning algorithms —and Voila! You got the prediction — without first understanding the data.
This is exactly where the importance of Exploratory Data Analysis (EDA) (as defined by Jaideep Khare) comes in which, unfortunately, is a commonly undervalued step as part of the data science process.
EDA is so important for 3 reasons (at least) as stated below:
There you have it. Now that we have already understood the “WHAT and WHY”aspects of EDA, let’s examine a dataset together and go through the “HOW”that will eventually lead us to discover some interesting patterns, as we’ll see in the next section.
We’ll focus on the overall workflow of EDA, visualization and its results. For technical reference, please refer to my notebook on Kaggle anytime you want to have a more detailed understanding of the codes.
To give a brief overview, this post is dedicated to 5 sections as follow:
Let’s get started and have fun!