這篇文章主要介紹了python?Pandas之DataFrame索引及選取數(shù)據(jù),文章圍繞主題展開詳細的內(nèi)容介紹,具有一定的參考價值,需要的朋友可以參考一下
1.索引是什么
1.1 認識索引
先創(chuàng)建一個簡單的DataFrame。文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
1 2 3 4 5 6 7 8 9 10 11 12 13 | myList = [[ 'a' , 10 , 1.1 ], ?????? [ 'b' , 20 , 2.2 ], ?????? [ 'c' , 30 , 3.3 ], ?????? [ 'd' , 40 , 4.4 ]] df1 = pd.DataFrame(data = myList) print (df1) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [out]: ??? 0 ?? 1 ??? 2 0 ? a? 10 ? 1.1 1 ? b? 20 ? 2.2 2 ? c? 30 ? 3.3 3 ? d? 40 ? 4.4 |
DataFrame中有兩種索引:文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
- 行索引(index):對應(yīng)最左邊那一豎列
- 列索引(columns):對應(yīng)最上面那一橫行
兩種索引默認均為從0開始的自增整數(shù)。文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # 輸出行索引 print (df1.index) [out]: RangeIndex(start = 0 , stop = 4 , step = 1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # 輸出列索引 print (df1.columns) [out]: RangeIndex(start = 0 , stop = 3 , step = 1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # 輸出所有的值 print (df1.values) [out]: array([[ 'a' , 10 , 1.1 ], ??????? [ 'b' , 20 , 2.2 ], ??????? [ 'c' , 30 , 3.3 ], ??????? [ 'd' , 40 , 4.4 ]], dtype = object ) |
文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
1.2 自定義索引
可以使用 index 這個參數(shù)指定行索引,columns 這個參數(shù)指定列索引。文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
1 2 3 4 5 6 7 8 9 10 11 | df2 = pd.DataFrame(myList, ??????????????????? index = [ 'one' , 'two' , 'three' , 'four' ], ??????????????????? columns = [ 'char' , 'int' , 'float' ]) print (df2) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [out]: ?????? char? int ? float one????? a?? 10 ??? 1.1 two????? b?? 20 ??? 2.2 three??? c?? 30 ??? 3.3 four???? d?? 40 ??? 4.4 |
輸出此時的行索引和列索引:文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
# 輸出行索引
print(df2.index)
[out]:
Index(['one', 'two', 'three', 'four'], dtype='object')
--------------------------------------------------------
# 輸出列索引
print(df2.columns)
[out]:
Index(['char', 'int', 'float'], dtype='object')文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
2. 索引的簡單使用
文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
2.1 列索引
選擇一列:文章源自四五設(shè)計網(wǎng)-http://www.wasochina.com/25842.html
1 2 3 4 5 6 7 8 9 | print (df2[ 'char' ]) print (df2.char) # 兩種方式輸出一樣 [out]: one????? a two????? b three??? c four???? d Name: char, dtype: object |
注意此時方括號里面只傳入一個字符串’char’,這樣選出來的一列,結(jié)果的類型為Series
1 2 3 4 5 6 7 8 9 | print (df2[ 'char' ]) print (df2.char) # 兩種方式輸出一樣 [out]: one????? a two????? b three??? c four???? d Name: char, dtype: object |
選擇多列:
1 2 3 4 5 6 7 | print (df2[[ 'char' , 'int' ]]) [out]: ?????? char?? int one????? a?? 10 two????? b?? 20 three??? c?? 30 four???? d?? 40 |
注意此時方括號里面?zhèn)魅胍粋€列表 [‘char’, ‘int’],選出的結(jié)果類型為 DataFrame。
如果只想選出來一列,卻想返回 DataFrame 類型怎么辦?
1 2 3 4 5 6 7 8 9 10 | print (df2[[ 'char' ]]) [out]: ?????? char one????? a two????? b three??? c four???? d - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - type (df2[[ 'char' ]]) [out]:pandas.core.frame.DataFrame |
注意直接使用df2[0]取某一列會報錯,除非columns是由下標(biāo)索引組成的,比如df1那個樣子,df1[0]就不會報錯。
1 2 3 4 5 6 7 8 9 10 11 | print (df1[ 0 ]) [out]: 0 ??? a 1 ??? b 2 ??? c 3 ??? d Name: 0 , dtype: object - - - - - - - - - - - - - - - - - - - - - - - print (df2[ 0 ]) [out]: KeyError: 0 |
2.2 行索引
2.2.1 使用[ ]
區(qū)別于選取列,此種方式[ ]中不再單獨的傳入一個字符串,而是需要使用冒號切片。
選取行標(biāo)簽從 ’two’ 到 ’three’ 的多行數(shù)據(jù)
1 2 3 4 5 | print (df2[ 'two' : 'three' ]) [out]: ?????? char? int ? float two????? b?? 20 ??? 2.2 three??? c?? 30 ??? 3.3 |
選取行標(biāo)簽為’two’這一行數(shù)據(jù)
1 2 3 4 5 | # 此時返回的類型為DataFrame print (df2[ 'two' : 'two' ]) [out]: ?????? char? int ? float two????? b?? 20 ??? 2.2 |
在[ ]中不僅可以傳入行標(biāo)簽,還可以傳入行的編號。
選取從第1行到第3行的數(shù)據(jù)(編號從0開始)
1 2 3 4 5 6 | print (df2[ 1 : 4 ]) [out]: ?????? char? int ? float two????? b?? 20 ??? 2.2 three??? c?? 30 ??? 3.3 four???? d?? 40 ??? 4.4 |
可以看到選取的數(shù)據(jù)是不包含方括號最右側(cè)的編號所對應(yīng)的數(shù)據(jù)的。
選取第1行的數(shù)據(jù)
1 2 3 4 | print (df2[ 1 : 2 ]) [out]: ???? char? int ? float two??? b?? 20 ??? 2.2 |
2.2.2 使用.loc()和.iloc()
區(qū)別就是.loc()是根據(jù)行索引和列索引的值來選取數(shù)據(jù),而.iloc()是根據(jù)從0開始的下標(biāo)位置來進行索引的。
選取行:
使用.loc()
1 2 3 4 5 6 7 8 9 10 11 12 | print (df2.loc[ 'one' ]) [out]: char?????? a int ?????? 10 float ??? 1.1 Name: one, dtype: object - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print (df2.loc[[ 'one' , 'three' ]]) [out]: ?????? char? int ? float one????? a?? 10 ??? 1.1 three??? c?? 30 ??? 3.3 |
使用.iloc()
1 2 3 4 5 6 7 8 9 10 11 12 | print (df2.iloc[ 0 ]) [out]: char?????? a int ?????? 10 float ??? 1.1 Name: one, dtype: object - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - print (df2.iloc[[ 0 , 2 ]]) [out]: ?????? char? int ? float one????? a?? 10 ??? 1.1 three??? c?? 30 ??? 3.3 |
到此這篇關(guān)于python Pandas之DataFrame索引及選取數(shù)據(jù)的文章就介紹到這了


評論