Life of Mechon is an information resource site for Mechons and Geeks.Here we focus on Machine Learning, Artificial Intelligence, 3D printing, Tips and Tricks related to Programming and Front End CSS
- Home
- About Me
- Contact
- Machine Learning
-
Settings
- Dark mode
Indexing, Manipulation and Visualization Part -3
Day 5 Indexing, Manipulation and Visualization Part -3
Now we are good with merge functions and sort function. Now let we look some Data Aggregation.
For Today process let us have a some new dataset , download from here. [source : Analytics Vidhya]
There are multiple functions we can use for useful aggregation for the data provided.
Some of them are
- groupby
- crosstab
- pivotable
import pandas as pdimport numpy as np# read the datasetdata_BM = pd.read_csv('bigmart_data.csv')# drop the null valuesdata_BM = data_BM.dropna(how="any")# reset index after droppingdata_BM = data_BM.reset_index(drop=True)# view the top resultsdata_BM.head()
Let's find out what is the mean price of each 'item_type' - column,
Use groupby function by passing 'item_type' as a parameter
# group price based on item typeprice_by_item = data_BM.groupby('Item_Type')
# display first few rowsprice_by_item.first()
Now we grouped the data based on item_type column, at next we should calculate the mean ( ).
# mean price by item price_by_item.Item_MRP.mean()
As similar to other functions we can pass multiple column in the groupby function using the square brackets.
# group on multiple columnsmultiple_groups = data_BM[:10].groupby(['Item_Type', 'Item_Fat_Content'])multiple_groups.first()
If you want to know more about groupby() , no one stop you from reading this documentation
Accoording to crosstab it may comes under both visualization and aggregation of data. However it does not show the graphs or chart , it allows us to interpret the key difference between two factors
# generate crosstab of Outlet_Size and Outlet_Location_Typepd.crosstab(data_BM["Outlet_Size"],data_BM["Outlet_Location_Type"],margins=True)
Interesting , Now lets find how does the average sales differed from each year, using Pivot Tables.
Pandas Pivot tables are more and more richer than excel .
you can access it through one line of code.
By default Pivot has its own values , unless you want to change except the data and index values.
# create pivot tablepd.pivot_table(data_BM, index=['Outlet_Establishment_Year'], values= "Item_Outlet_Sales")
# create pivot tablepd.pivot_table(data_BM, index=['Outlet_Establishment_Year', 'Outlet_Location_Type', 'Outlet_Size'], values= "Item_Outlet_Sales")
pd.pivot_table(data_BM, index=['Outlet_Establishment_Year', 'Outlet_Location_Type', 'Outlet_Size'], values= "Item_Outlet_Sales", aggfunc= [np.mean, np.median, min, max, np.std])
Thats it , now we have finished the manipulation and indexing sections.
Let's get into the 🐼 🐼 best and the most used feature in visualization. Soory irrespective of pandas you can use matplotlib for any datsets which can be read by any other libraries.
We'll use two new libraries and they are matplotlib and seaborn.
I hope you have already installed matplotlib.
For seaborn intsallation it's not much difficult to install.
Matplotlib
Matplotlib is chosen for its extensive use and high flexibility. Making plots or visualization which could be easily interpret is the most important need for data analysis.
We'll look for all the bars,charts, plots used in matplotlib.
We would create these charts by the end of today's post,
- Line chart
- Bar chart
- Histogram
- Box Plot
- Violin Plot
- Scatter Plot
- Bubble Plot
Import matplotlib using a small variable or name.
import matplotlib as plt# The below line is used to produce the visualized data within the notebook or else it would open in newtab in the browser.%matplotlib inline
Here is some first few lines of code
import numpy as npimport pandas as pdimport matplotlib as plt%matplotlib inline
# Create two listsheight =[150, 160, 165, 185]weight = [70, 80, 90, 100]# Draw the plotplt.plot(height,weight)
We passed two list inside plot(), the first parameter appears in x axis and the second parameter appears in y axis.
Isn't that interetsing ? We would make it more interactive by adding Titles, labels and legends.
plt.title("Relationship between height and weight")# Label for x axisplt.xlabel("Height")# label for y axisplt.ylabel("Weight")
Time doesn't permits remaining will be updated by today 5 pm.
Post a Comment
Post a Comment
your response will be validated and get a reply very soon.