级联操作

2021-05-25

Word count: 1.5k | Reading time≈ 7 min

级联操作

pd.concat,pd.append
pandas使用pd.concat函数，与np.concatenate函数类似，只是多了一些参数：
objs
axis
keys
join=’outer’/‘inner’：表示的是级联方式，outer会将所有的项进行级联（忽略匹配和不匹配），而inner只会将匹配的项级联在一起，不匹配的不级联
ignore_index=False

1
2
3

import numpy as np
import pandas as pd
from pandas import DataFrame

匹配级联

df1 = DataFrame(np.random.randint(0,100,size=(5,3)),columns=['A','B','C'])
df2 = DataFrame(np.random.randint(0,100,size=(5,3)),columns=['A','D','C'])
print(df1)
print(df2)

    A   B   C
0  79  32  25
1  25  25  37
2  10  83  72
3   0  48   3
4  59  31  86
    A   D   C
0  35  54  59
1   5   7  76
2  76  82  89
3  17  93  46
4  99  56   8

1	print(pd.concat((df1,df2),axis=1)) # 横向级联

    A   B   C   A   D   C
0  79  32  25  35  54  59
1  25  25  37   5   7  76
2  10  83  72  76  82  89
3   0  48   3  17  93  46
4  59  31  86  99  56   8

不匹配级联
- 不匹配级联指的是级联的维度索引不一致。例如纵向级联时列索引不一致，横向级联时行索引不一致
- 有2中连接方式：
  - 外连接：补NAN（默认模式），如果想要保留数据的完整性必须使用outer(外连接)
  - 内连接：只连接匹配的项

1 2	print(pd.concat((df1,df2),axis=0)) # 纵向级联，出现不匹配级联 print(pd.concat((df1,df2),axis=0,join='inner'))

    A     B   C     D
0  79  32.0  25   NaN
1  25  25.0  37   NaN
2  10  83.0  72   NaN
3   0  48.0   3   NaN
4  59  31.0  86   NaN
0  35   NaN  59  54.0
1   5   NaN  76   7.0
2  76   NaN  89  82.0
3  17   NaN  46  93.0
4  99   NaN   8  56.0
    A   C
0  79  25
1  25  37
2  10  72
3   0   3
4  59  86
0  35  59
1   5  76
2  76  89
3  17  46
4  99   8

1
2
3

df3 = DataFrame(np.random.randint(0,100,size=(5,2)),columns=['A','B'])
print(pd.concat((df1,df3),axis=1))
print(pd.concat((df1,df3),axis=0))

append函数的使用
- 默认纵向级联外连接
- 知道即可，一般不用

1	print(df1.append(df2))

    A     B   C     D
0  79  32.0  25   NaN
1  25  25.0  37   NaN
2  10  83.0  72   NaN
3   0  48.0   3   NaN
4  59  31.0  86   NaN
0  35   NaN  59  54.0
1   5   NaN  76   7.0
2  76   NaN  89  82.0
3  17   NaN  46  93.0
4  99   NaN   8  56.0

合并操作

merge与concat的区别在于，merge需要依据某一共同列来进行合并；merge是对数据进行合并，而concat是对表格进行级联
使用pd.merge()合并时，会自动根据两者相同column名称的那一列，作为key来进行合并
注意每一列元素的顺序不要求一致

# 一对一合并
df1 = DataFrame({'employee':['Bob','Jake','Lisa'],'group':['Accounting','Engineering','Engineering']})
df2 = DataFrame({'employee':['Lisa','Bob','Jake'],'hire_date':[2004,2008,2012]})
print(df1)
print(df2)
print(pd.merge(df1,df2,on='employee'))
print(pd.merge(df1,df2))    # on也可以不写，默认会将两表中共有的列作为合并条件，此处结果相同

  employee        group
0      Bob   Accounting
1     Jake  Engineering
2     Lisa  Engineering
  employee  hire_date
0     Lisa       2004
1      Bob       2008
2     Jake       2012
  employee        group  hire_date
0      Bob   Accounting       2008
1     Jake  Engineering       2012
2     Lisa  Engineering       2004
  employee        group  hire_date
0      Bob   Accounting       2008
1     Jake  Engineering       2012
2     Lisa  Engineering       2004

# 一对多合并
df3 = DataFrame({'employee':['Lisa','Jake'],'group':['Accounting','Engineering'],'hire_date':[2004,2016]})
df4 = DataFrame({'group':['Accounting','Engineering','Engineering'],'supervisor':['Carly','Guido','Steve']})
print(df3)
print(df4)
print(pd.merge(df3,df4))

  employee        group  hire_date
0     Lisa   Accounting       2004
1     Jake  Engineering       2016
         group supervisor
0   Accounting      Carly
1  Engineering      Guido
2  Engineering      Steve
  employee        group  hire_date supervisor
0     Lisa   Accounting       2004      Carly
1     Jake  Engineering       2016      Guido
2     Jake  Engineering       2016      Steve

# 多对多合并
df5 = DataFrame({'group':['Engineering','Engineering','HR'],'supervisor':['Carly','Guido','Steve']})
print(df5)
print(pd.merge(df1,df5))    # 默认内连接
print(pd.merge(df1,df5,how='outer'))
print(pd.merge(df1,df5,how='left')) # 保留左连接
print(pd.merge(df1,df5,how='right'))    # 保留右连接

         group supervisor
0  Engineering      Carly
1  Engineering      Guido
2           HR      Steve
  employee        group supervisor
0     Jake  Engineering      Carly
1     Jake  Engineering      Guido
2     Lisa  Engineering      Carly
3     Lisa  Engineering      Guido
  employee        group supervisor
0      Bob   Accounting        NaN
1     Jake  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Carly
4     Lisa  Engineering      Guido
5      NaN           HR      Steve
  employee        group supervisor
0      Bob   Accounting        NaN
1     Jake  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Carly
4     Lisa  Engineering      Guido
  employee        group supervisor
0     Jake  Engineering      Carly
1     Lisa  Engineering      Carly
2     Jake  Engineering      Guido
3     Lisa  Engineering      Guido
4      NaN           HR      Steve

key的规范化

当列冲突时，即有多个列名称相同时需要使用on=来指定哪一个列作为Key,配合suffixes指定冲突列名

df1 = DataFrame({'employee':['Jack','Summer','Steve'],'group':['Accounting','Finance','Marketing']})
df2 = DataFrame({'employee':['Jack','Bob','Jake'],'hire_date':[2003,2009,2012],'group':['Accounting','Sell','CEO']})
print(df1)
print(df2)
print(pd.merge(df1,df2))    # 不指定合并条件则相同的几项共同作为合并条件
print(pd.merge(df1,df2,on='group'))

  employee       group
0     Jack  Accounting
1   Summer     Finance
2    Steve   Marketing
  employee  hire_date       group
0     Jack       2003  Accounting
1      Bob       2009        Sell
2     Jake       2012         CEO
  employee       group  hire_date
0     Jack  Accounting       2003
  employee_x       group employee_y  hire_date
0       Jack  Accounting       Jack       2003

当两张表没有可进行连接的列时，可使用left_on和right_on手动指定merge中左右两边的哪一列作为连接的列

df1 = DataFrame({'employee':['Bobs','Linda','Bill'],'group':['Accounting','Product','Marketing'],'hire_date':[1998,2017,2018]})
df5 = DataFrame({'name':['Lisa','Bobs','Bill'],'hire_dates':[1998,2016,2007]})
print(df1)
print(df5)
# print(pd.merge(df1,df5))  # 此处合并出错，因为没有共同的列，需要指定左右标准
print(pd.merge(df1,df5,left_on='employee',right_on='name'))

  employee       group  hire_date
0     Bobs  Accounting       1998
1    Linda     Product       2017
2     Bill   Marketing       2018
   name  hire_dates
0  Lisa        1998
1  Bobs        2016
2  Bill        2007
  employee       group  hire_date  name  hire_dates
0     Bobs  Accounting       1998  Bobs        2016
1     Bill   Marketing       2018  Bill        2007

Donate

Copyright： Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.