8. 数据透视表

我们目前所用的累计操作都是按照一个维度进行,数据透视表可以看做是按照二维进行累计的操作功能。

# 以泰坦尼克号数据为例子进行展示

import numpy as np
import pandas as pd
import seaborn as sns

titanic = sns.load_dataset("titanic")
print(titanic.shape)
(891, 15)

8.1. 数据透视表的初步使用

# 先进行粗分类
a = titanic.groupby("sex")[['survived']].mean()

# 可以看出,女性获救比例大概是男性的4倍多
print(a)
        survived
sex             
female  0.742038
male    0.188908
# 尝试按sex,class分组,然后统计逃生人数,求mean后使用层级索引
# 可以清晰的展示出,逃生人数受sex,class 的影响

a = titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()
print(a)
class      First    Second     Third
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447

8.2. pivot_table

pivot_table实现的效果等同于上一届的管道命令,是一个简写。

# 尝试按sex,class分组,然后统计逃生人数,求mean后使用层级索引
a = titanic.pivot_table('survived', index='sex', columns='class')
print(a)
class      First    Second     Third
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447

8.3. 创建多级索引形状的DataFrame结果

# 按照sex,age,class统计,age分三个年龄段,0-18-80

# 使用cut函数对年龄进行分段
# cut适用于明确知道分界值的时候进行分类,具体落在每个分界内的数据数量不一定
age = pd.cut(titanic['age'], [0,18,80])

a = titanic.pivot_table('survived', ['sex', age], 'class')
print(a)
class               First    Second     Third
sex    age                                   
female (0, 18]   0.909091  1.000000  0.511628
       (18, 80]  0.972973  0.900000  0.423729
male   (0, 18]   0.800000  0.600000  0.215686
       (18, 80]  0.375000  0.071429  0.133663
# 对列也可以使用类似策略
# 使用qcut对票价进行划分成两部分,每一部分人数相等
# qcut把数据分成两部分,一般是等分
fare = pd.qcut(titanic['fare'], 2)

a = titanic.pivot_table('survived', ['sex', age], [fare, 'class'])
print(a)
fare            (-0.001, 14.454]                     (14.454, 512.329]  \
class                      First    Second     Third             First   
sex    age                                                               
female (0, 18]               NaN  1.000000  0.714286          0.909091   
       (18, 80]              NaN  0.880000  0.444444          0.972973   
male   (0, 18]               NaN  0.000000  0.260870          0.800000   
       (18, 80]              0.0  0.098039  0.125000          0.391304   

fare                                 
class              Second     Third  
sex    age                           
female (0, 18]   1.000000  0.318182  
       (18, 80]  0.914286  0.391304  
male   (0, 18]   0.818182  0.178571  
       (18, 80]  0.030303  0.192308