Search

Translate

Exploratory Data Analysis

What is exploratory data analysis:

Exploratory Data Analysis or EDA is a technique to explore the data and to understand the various aspects of data such as dataset size, data type, whether the dataset is preprocessed or not?, what are the labels and features present in the dataset?, quartile division of dataset etc. and to find the hidden patterns such as the relationship between variables. EDA uses statistical and data visualization methods to achieve the above factors.

why EDA?

EDA is used
1)to understand the data before coming to conclusion.
2)to clean the data so that the dataset will be free from redundancies and missing values.
3)to know the distribution of data and to handle outliers.
4)to find the relationship between variables.
5)to find which machine learning algorithm suits our dataset.
so that the dataset is used for modeling.

Types of EDA:

1) Univariate analysis: analysis on single variable or feature such as total count of individual data element. Example: number of heart patients. We can use histogram, density plot, box plot for this analysis.
2)Bivariate analysis: analysis on two features such as finding correlation between two variables etc.
scatter plot, heat maps etc. are used here.
3)Multivariate analysis: deals with analysis on more than two features. Example: diabetes prediction based on insulin, age, BMI etc. we can use pair plot or similar plots in this case.

Data visualization:

Data visualization is a technique to represent the data in graphical format. visual elements like charts, graphs and maps are used to find trends, patterns, and outliers in the data. since humans can understand images 3000 times faster than text, data visualization helps us to understand the data easier. Dashboarding is a key part of visualization.

Demo for Exploratory Data Analysis:


Some of the other Data Visualization graphs:

Scatter plot: As the name suggest it has dots scattered in it. Each dots represents the individual data points. Used to show the relationship between variables. 
import matplotlib.pyplot as plt x = [89,43,36,36,95,10,66,34,38,20,5,7,8,7,2,17,2,11,12,9] y = [21,46,3,35,67,95,53,72,58,10,99,86,27,88,100,86,10,87,47,85] plt.scatter(x,y, c = "red")

  Output:

Boxplot: this shows the summary of data having aspects like quartile division, median. In the below you can see the orange line which is the median.

data = [21,46,3,35,67,95,53,72,58,10,99,86,27,88,100,86,10,87,47,85] plt.boxplot(data)

Output:

Pie chart: pie chart shows the proportion or percentage of data each items. By adding all those values we can get the total.

x = ['car','bus','truck'] percentage =[35,45,20] plt.pie(percentage, labels = x) plt.show()

Output:






Previous
Next Post »

1 comments:

Click here for comments
Unknown
admin
April 9, 2022 at 1:00 PM ×

The Most Iconic Video Slots On The Planet - Jancasino
The dental implants most iconic video slot is the 7,800-calibre jancasino.com slot nba매니아 machine called Sweet Bonanza. www.jtmhub.com This slot machine was developed in 2011, developed worrione.com in the same studio by

Congrats bro Unknown you got PERTAMAX...! hehehehe...
Reply
avatar