python之pandas库的DataFrame — 数据对齐与缺失数据处理

首页 / 新闻资讯 / 正文

python之pandas库的DataFrame — 数据对齐与缺失数据处理

目录
1.基本操作
DataFrame相加
DataFrame缺失值填充 fillna
DataFrame缺失值删除 dropna

2.高级操作
缺失值删除 dropna(how = ‘’)
当一行中全部为缺失值时删除整行
当一行中任意为缺失值时删除整行
当一行中全部为缺失值时删除整列
当一行中任意为缺失值时删除整列

代码实现
1.基本操作

import pandas as pd  import numpy as np df1 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,7,8]},index=['a','b','c','d']) df2 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,7,8]},index=['d','c','b','a']) df_add = df1 + df2 df_fill = df2.fillna(0) #填充缺失值 df_drop = df2.dropna() #一行中有一个缺失值时,会把整行删掉  print('df1=',df1,'\n') print('df2=',df2,'\n') print('df_add=',df_add,'\n') print('df_fill=',df_fill,'\n') print('df_drop=',df_drop,'\n')  
df1=    one  two a    1    5 b    2    6 c    3    7 d    4    8   df2=    one  two d  1.0    5 c  2.0    6 b  3.0    7 a  NaN    8   df_add=    one  two a  NaN   13 b  5.0   13 c  5.0   13 d  5.0   13   df_fill=    one  two d  1.0    5 c  2.0    6 b  3.0    7 a  0.0    8   df_drop=    one  two d  1.0    5 c  2.0    6 b  3.0    7   df3=    one  two d  1.0  5.0 c  2.0  6.0 b  3.0  NaN a  NaN  NaN   df3_drop=    one  two d  1.0  5.0 c  2.0  6.0 b  3.0  NaN   df3_drop2=    one  two d  1.0  5.0 c  2.0  6.0  

2.高级操作

#缺失值删除的高级操作# df3 = pd.DataFrame({'one':[1,2,3,np.nan],'two':[5,6,np.nan,np.nan]},index=['d','c','b','a']) df3_drop = df3.dropna(how='all') #当一行中全部为缺失值时删除整行# df3_drop2 = df3.dropna(how='any') #当一行中有任意缺失值时删除整行# df4 = pd.DataFrame({'one':[1,2,3,4],'two':[5,6,np.nan,np.nan],'three':[np.nan,np.nan,np.nan,np.nan]},index=['d','c','b','a']) df4_drop = df4.dropna(how='any',axis=1) #当一列中有任意缺失值时删除整行# df4_drop2 = df4.dropna(how='all',axis=1) #当一列中全部为缺失值时删除整行#  print('df3=',df3,'\n') print('df3_drop=',df3_drop,'\n') print('df3_drop2=',df3_drop2,'\n') print('df4=',df4,'\n') print('df4_drop=',df4_drop,'\n') print('df4_drop2=',df4_drop2,'\n')  
df3=    one  two d  1.0  5.0 c  2.0  6.0 b  3.0  NaN a  NaN  NaN   df3_drop=    one  two d  1.0  5.0 c  2.0  6.0 b  3.0  NaN   df3_drop2=    one  two d  1.0  5.0 c  2.0  6.0   df4=    one  two  three d    1  5.0    NaN c    2  6.0    NaN b    3  NaN    NaN a    4  NaN    NaN   df4_drop=    one d    1 c    2 b    3 a    4   df4_drop2=    one  two d    1  5.0 c    2  6.0 b    3  NaN a    4  NaN