python之pandas库的DataFrame — 数据对齐与缺失数据处理
目录
1.基本操作
DataFrame相加
DataFrame缺失值填充 fillna
DataFrame缺失值删除 dropna
2.高级操作
缺失值删除 dropna(how = ‘’)
当一行中全部为缺失值时删除整行
当一行中任意为缺失值时删除整行
当一行中全部为缺失值时删除整列
当一行中任意为缺失值时删除整列
代码实现
1.基本操作
import pandas as pd import numpy as np df1 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,7,8]},index=['a','b','c','d']) df2 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,7,8]},index=['d','c','b','a']) df_add = df1 + df2 df_fill = df2.fillna(0) #填充缺失值 df_drop = df2.dropna() #一行中有一个缺失值时,会把整行删掉 print('df1=',df1,'\n') print('df2=',df2,'\n') print('df_add=',df_add,'\n') print('df_fill=',df_fill,'\n') print('df_drop=',df_drop,'\n')
df1= one two a 1 5 b 2 6 c 3 7 d 4 8 df2= one two d 1.0 5 c 2.0 6 b 3.0 7 a NaN 8 df_add= one two a NaN 13 b 5.0 13 c 5.0 13 d 5.0 13 df_fill= one two d 1.0 5 c 2.0 6 b 3.0 7 a 0.0 8 df_drop= one two d 1.0 5 c 2.0 6 b 3.0 7 df3= one two d 1.0 5.0 c 2.0 6.0 b 3.0 NaN a NaN NaN df3_drop= one two d 1.0 5.0 c 2.0 6.0 b 3.0 NaN df3_drop2= one two d 1.0 5.0 c 2.0 6.0
2.高级操作
#缺失值删除的高级操作# df3 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,np.nan,np.nan]},index=['d','c','b','a']) df3_drop = df3.dropna(how='all') #当一行中全部为缺失值时删除整行# df3_drop2 = df3.dropna(how='any') #当一行中有任意缺失值时删除整行# df4 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,np.nan,np.nan],'three':[np.nan,np.nan,np.nan,np.nan]},index=['d','c','b','a']) df4_drop = df4.dropna(how='any',axis=1) #当一列中有任意缺失值时删除整行# df4_drop2 = df4.dropna(how='all',axis=1) #当一列中全部为缺失值时删除整行# print('df3=',df3,'\n') print('df3_drop=',df3_drop,'\n') print('df3_drop2=',df3_drop2,'\n') print('df4=',df4,'\n') print('df4_drop=',df4_drop,'\n') print('df4_drop2=',df4_drop2,'\n')
df3= one two d 1.0 5.0 c 2.0 6.0 b 3.0 NaN a NaN NaN df3_drop= one two d 1.0 5.0 c 2.0 6.0 b 3.0 NaN df3_drop2= one two d 1.0 5.0 c 2.0 6.0 df4= one two three d 1 5.0 NaN c 2 6.0 NaN b 3 NaN NaN a 4 NaN NaN df4_drop= one d 1 c 2 b 3 a 4 df4_drop2= one two d 1 5.0 c 2 6.0 b 3 NaN a 4 NaN