Index

阅读: 6195     评论:0

Pandas中的索引对象Index用于存储轴标签和其它元数据。索引对象是不可变的,用户无法修改它。

In [73]: obj = pd.Series(range(3),index = ['a','b','c'])

In [74]: index = obj.index

In [75]: index
Out[75]: Index(['a', 'b', 'c'], dtype='object')

In [76]: index[1:]
Out[76]: Index(['b', 'c'], dtype='object')

In [77]: index[1] = 'f'  # TypeError

In [8]: index.size
Out[8]: 3

In [9]: index.shape
Out[9]: (3,)

In [10]: index.ndim
Out[10]: 1

In [11]: index.dtype
Out[11]: dtype('O')

索引对象的不可变特性,使得在多种数据结构中分享索引对象更安全:

In [78]: labels = pd.Index(np.arange(3))

In [79]: labels
Out[79]: Int64Index([0, 1, 2], dtype='int64')

In [80]: obj2 = pd.Series([2,3.5,0], index=labels)

In [81]: obj2
Out[81]:
0    2.0
1    3.5
2    0.0
dtype: float64

In [82]: obj2.index is labels
Out[82]: True

索引对象,本质上也是一个容器对象,所以可以使用Python的in操作:

In [84]: f2
Out[84]:
key    year     state  pop  debt
order
a      2000   beijing  1.5   NaN
b      2001   beijing  1.7   NaN
c      2002   beijing  3.6   1.0
d      2001  shanghai  2.4   2.0
e      2002  shanghai  2.9   NaN
f      2003  shanghai  3.2   3.0

In [86]: 'c' in f2.index
Out[86]: True

In [88]: 'pop' in f2.columns
Out[88]: True

而且最关键的是,pandas的索引对象可以包含重复的标签:

In [89]: dup_lables = pd.Index(['foo','foo','bar','bar'])

In [90]: dup_lables
Out[90]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

那么思考一下,DataFrame对象可不可以有重复的columns或者index呢?

可以的!但是请尽量不要这么做!:

In [91]: f2.index = ['a']*6

In [92]: f2
Out[92]:
key  year     state  pop  debt
a    2000   beijing  1.5   NaN
a    2001   beijing  1.7   NaN
a    2002   beijing  3.6   1.0
a    2001  shanghai  2.4   2.0
a    2002  shanghai  2.9   NaN
a    2003  shanghai  3.2   3.0

In [93]: f2.loc['a']
Out[93]:
key  year     state  pop  debt
a    2000   beijing  1.5   NaN
a    2001   beijing  1.7   NaN
a    2002   beijing  3.6   1.0
a    2001  shanghai  2.4   2.0
a    2002  shanghai  2.9   NaN
a    2003  shanghai  3.2   3.0

In [94]: f2.columns = ['year']*4

In [95]: f2
Out[95]:
   year      year  year  year
a  2000   beijing   1.5   NaN
a  2001   beijing   1.7   NaN
a  2002   beijing   3.6   1.0
a  2001  shanghai   2.4   2.0
a  2002  shanghai   2.9   NaN
a  2003  shanghai   3.2   3.0

In [96]: f2.index.is_unique  # 可以使用这个属性来判断是否存在重复的索引
Out[96]: False

index对象也可以进行集合的交、并、差和异或运算,类似Python的标准set数据结构。


 DataFrame 重建索引 

评论总数: 0


点击登录后方可评论