Python培訓(xùn)之5分鐘戰(zhàn)勝Python字符編碼(三)

更新時(shí)間:2016年08月26日17時(shí)25分來源:傳智播客python培訓(xùn)學(xué)院瀏覽次數(shù):

4. 疑難問題解答

編碼識別

說了要確定編碼，那么拿到一串二進(jìn)制要怎么確定編碼呢？

最簡單的方法是chardet：（需要安裝）

python -m pip install chardet

使用非常簡單：

#coding=utf8

from chardet import detect
print(detect('這是一串utf8的測試字符'))

# 結(jié)果：`{'confidence': 0.99, 'encoding': 'utf-8'}`

另外例如抓取網(wǎng)站，那么頭文件中很有可能有提示如何解碼，記得不要忘記了。

編碼轉(zhuǎn)換

很可能因?yàn)樽址袇㈦s了奇怪的東西，導(dǎo)致即使編碼種類正確，依舊無法解碼。

我知道我之前講過了，但可能有人直接跳疑難問題解答嘛。

這里可以使用decode的第二個(gè)參數(shù)：

#coding=utf8

# 字符串中混進(jìn)了\x00
rubbishUtf8String = 'Utf-8字\x00符串'

print(repr(rubbishUtf8String.decode('utf8', 'replace')))

print(repr(rubbishUtf8String.decode('utf8', 'ignore')))

特殊平臺下編碼

很多人都說Windows是個(gè)坑，即使在Python 3下面也一樣。

因?yàn)橹形奈募鰜矶际莵y碼。

這里使用一個(gè)取巧的方法：平臺編碼再特殊，起碼命令行讀取和創(chuàng)建一個(gè)文件夾不會(huì)出亂碼吧。

import sys, os

for folder in os.walk('.').next()[1]:
    print(folder.decode(sys.stdin.encoding))

同樣的輸入輸出也可以這樣做優(yōu)化：

import sys

def sys_print(msg):
    print(msg.encode(sys.stdin.encoding))

def sys_input(msg):
    return raw_input(msg.encode(sys.stdin.encoding)).decode(sys.stdin.encoding)

文件寫入

如果抓下來一個(gè)內(nèi)容不知道怎么解碼，但還是想要寫入文件怎么辦

寫入文件的時(shí)候制定用二進(jìn)制命令即可：

#coding=utf8
import urllib

with open('Utf8.txt', 'wb') as f: f.write('Utf8測試')

# 比如抓了個(gè)網(wǎng)頁，不知道編碼也可以寫入文件進(jìn)行一系列操作

content = urllib.urlopen('http://www.baidu.com').read()
with open('baidu.txt', 'wb') as f: f.write(content)

裸Unicode字符

Unicode存成六個(gè)Ascii字符怎么辦？其實(shí)也可以decode

#coding=utf8
# 這是普通的Unicode
s = u'測'
for i in s: print(i)
print(repr(s))

# 這是裸Unicode，實(shí)際存成了六個(gè)Ascii
s = repr(s)[2:-1]
for i in s: print(i)
print(repr(s))

# 轉(zhuǎn)化其實(shí)也很簡單
s = s.decode('unicode-escape')
for i in s: print(i)
print(repr(s))

好了，希望這篇文章對大家解決Python編碼問題有所幫助！

本文版權(quán)歸傳智播客Python培訓(xùn)學(xué)院所有，歡迎轉(zhuǎn)載，轉(zhuǎn)載請注明作者出處。謝謝！
作者：傳智播客Python培訓(xùn)學(xué)院
首發(fā)：http://m.fskzgqt.cn/Python /

上一篇：Python培訓(xùn)之5分鐘戰(zhàn)勝Python字符編碼(二) 下一篇：Python列表的6種操作實(shí)例