级联操作

级联操作

  • pd.concat,pd.append
    pandas使用pd.concat函数,与np.concatenate函数类似,只是多了一些参数:
    objs
    axis
    keys
    join=’outer’/‘inner’:表示的是级联方式,outer会将所有的项进行级联(忽略匹配和不匹配),而inner只会将匹配的项级联在一起,不匹配的不级联
    ignore_index=False
1
2
3
import numpy as np
import pandas as pd
from pandas import DataFrame
  • 匹配级联
1
2
3
4
df1 = DataFrame(np.random.randint(0,100,size=(5,3)),columns=['A','B','C'])
df2 = DataFrame(np.random.randint(0,100,size=(5,3)),columns=['A','D','C'])
print(df1)
print(df2)
    A   B   C
0  79  32  25
1  25  25  37
2  10  83  72
3   0  48   3
4  59  31  86
    A   D   C
0  35  54  59
1   5   7  76
2  76  82  89
3  17  93  46
4  99  56   8
1
print(pd.concat((df1,df2),axis=1))  # 横向级联
    A   B   C   A   D   C
0  79  32  25  35  54  59
1  25  25  37   5   7  76
2  10  83  72  76  82  89
3   0  48   3  17  93  46
4  59  31  86  99  56   8
  • 不匹配级联
    • 不匹配级联指的是级联的维度索引不一致。例如纵向级联时列索引不一致,横向级联时行索引不一致
    • 有2中连接方式:
      • 外连接:补NAN(默认模式),如果想要保留数据的完整性必须使用outer(外连接)
      • 内连接:只连接匹配的项
1
2
print(pd.concat((df1,df2),axis=0))  # 纵向级联,出现不匹配级联
print(pd.concat((df1,df2),axis=0,join='inner'))
    A     B   C     D
0  79  32.0  25   NaN
1  25  25.0  37   NaN
2  10  83.0  72   NaN
3   0  48.0   3   NaN
4  59  31.0  86   NaN
0  35   NaN  59  54.0
1   5   NaN  76   7.0
2  76   NaN  89  82.0
3  17   NaN  46  93.0
4  99   NaN   8  56.0
    A   C
0  79  25
1  25  37
2  10  72
3   0   3
4  59  86
0  35  59
1   5  76
2  76  89
3  17  46
4  99   8
1
2
3
df3 = DataFrame(np.random.randint(0,100,size=(5,2)),columns=['A','B'])
print(pd.concat((df1,df3),axis=1))
print(pd.concat((df1,df3),axis=0))
  • append函数的使用
    • 默认纵向级联外连接
    • 知道即可,一般不用
1
print(df1.append(df2))
    A     B   C     D
0  79  32.0  25   NaN
1  25  25.0  37   NaN
2  10  83.0  72   NaN
3   0  48.0   3   NaN
4  59  31.0  86   NaN
0  35   NaN  59  54.0
1   5   NaN  76   7.0
2  76   NaN  89  82.0
3  17   NaN  46  93.0
4  99   NaN   8  56.0

合并操作

  • merge与concat的区别在于,merge需要依据某一共同列来进行合并;merge是对数据进行合并,而concat是对表格进行级联
  • 使用pd.merge()合并时,会自动根据两者相同column名称的那一列,作为key来进行合并
  • 注意每一列元素的顺序不要求一致
1
2
3
4
5
6
7
# 一对一合并
df1 = DataFrame({'employee':['Bob','Jake','Lisa'],'group':['Accounting','Engineering','Engineering']})
df2 = DataFrame({'employee':['Lisa','Bob','Jake'],'hire_date':[2004,2008,2012]})
print(df1)
print(df2)
print(pd.merge(df1,df2,on='employee'))
print(pd.merge(df1,df2)) # on也可以不写,默认会将两表中共有的列作为合并条件,此处结果相同
  employee        group
0      Bob   Accounting
1     Jake  Engineering
2     Lisa  Engineering
  employee  hire_date
0     Lisa       2004
1      Bob       2008
2     Jake       2012
  employee        group  hire_date
0      Bob   Accounting       2008
1     Jake  Engineering       2012
2     Lisa  Engineering       2004
  employee        group  hire_date
0      Bob   Accounting       2008
1     Jake  Engineering       2012
2     Lisa  Engineering       2004
1
2
3
4
5
6
# 一对多合并
df3 = DataFrame({'employee':['Lisa','Jake'],'group':['Accounting','Engineering'],'hire_date':[2004,2016]})
df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],'supervisor':['Carly','Guido','Steve']})
print(df3)
print(df4)
print(pd.merge(df3,df4))
  employee        group  hire_date
0     Lisa   Accounting       2004
1     Jake  Engineering       2016
         group supervisor
0   Accounting      Carly
1  Engineering      Guido
2  Engineering      Steve
  employee        group  hire_date supervisor
0     Lisa   Accounting       2004      Carly
1     Jake  Engineering       2016      Guido
2     Jake  Engineering       2016      Steve
1
2
3
4
5
6
7
# 多对多合并
df5 = DataFrame({'group':['Engineering','Engineering','HR'],'supervisor':['Carly','Guido','Steve']})
print(df5)
print(pd.merge(df1,df5)) # 默认内连接
print(pd.merge(df1,df5,how='outer'))
print(pd.merge(df1,df5,how='left')) # 保留左连接
print(pd.merge(df1,df5,how='right')) # 保留右连接
         group supervisor
0  Engineering      Carly
1  Engineering      Guido
2           HR      Steve
  employee        group supervisor
0     Jake  Engineering      Carly
1     Jake  Engineering      Guido
2     Lisa  Engineering      Carly
3     Lisa  Engineering      Guido
  employee        group supervisor
0      Bob   Accounting        NaN
1     Jake  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Carly
4     Lisa  Engineering      Guido
5      NaN           HR      Steve
  employee        group supervisor
0      Bob   Accounting        NaN
1     Jake  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Carly
4     Lisa  Engineering      Guido
  employee        group supervisor
0     Jake  Engineering      Carly
1     Lisa  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Guido
4      NaN           HR      Steve

key的规范化

  • 当列冲突时,即有多个列名称相同时需要使用on=来指定哪一个列作为Key,配合suffixes指定冲突列名
1
2
3
4
5
6
df1 = DataFrame({'employee':['Jack','Summer','Steve'],'group':['Accounting','Finance','Marketing']})
df2 = DataFrame({'employee':['Jack','Bob','Jake'],'hire_date':[2003,2009,2012],'group':['Accounting','Sell','CEO']})
print(df1)
print(df2)
print(pd.merge(df1,df2)) # 不指定合并条件则相同的几项共同作为合并条件
print(pd.merge(df1,df2,on='group'))
  employee       group
0     Jack  Accounting
1   Summer     Finance
2    Steve   Marketing
  employee  hire_date       group
0     Jack       2003  Accounting
1      Bob       2009        Sell
2     Jake       2012         CEO
  employee       group  hire_date
0     Jack  Accounting       2003
  employee_x       group employee_y  hire_date
0       Jack  Accounting       Jack       2003
  • 当两张表没有可进行连接的列时,可使用left_on和right_on手动指定merge中左右两边的哪一列作为连接的列
1
2
3
4
5
6
df1 = DataFrame({'employee':['Bobs','Linda','Bill'],'group':['Accounting','Product','Marketing'],'hire_date':[1998,2017,2018]})
df5 = DataFrame({'name':['Lisa','Bobs','Bill'],'hire_dates':[1998,2016,2007]})
print(df1)
print(df5)
# print(pd.merge(df1,df5)) # 此处合并出错,因为没有共同的列,需要指定左右标准
print(pd.merge(df1,df5,left_on='employee',right_on='name'))
  employee       group  hire_date
0     Bobs  Accounting       1998
1    Linda     Product       2017
2     Bill   Marketing       2018
   name  hire_dates
0  Lisa        1998
1  Bobs        2016
2  Bill        2007
  employee       group  hire_date  name  hire_dates
0     Bobs  Accounting       1998  Bobs        2016
1     Bill   Marketing       2018  Bill        2007
Donate
  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.
  • Copyrights © 2019-2022 Woody
  • Visitors: | Views:

请我喝杯咖啡吧~

支付宝
微信