今日老同学来问我有关Python爬虫urllib的一些用法,恰逢错误一道。于是以此做个记录…

运行环境 Runtime environment

1
2
3
4
操作系统 : Windows10  
IDE: JetBrains Pycharm 2018.2.4 x64
浏览器: Google Chrome 版本 67.0.3396.99(正式版本) (64 位)&& FireFox Developer Edition 版本63.0b4 (64位)
Python: 3.6.2

症状

尝试了来自老同学发来的urllib爬虫代码,目的是为了通过使用get请求发送json。Python3.6.2发送请求的时候,出现报错。

报错信息:”can’t concat str to bytes.”

报错代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# send http
import urllib.request
The_id = "1"
# name = "233666888"
department = "233666888"
position = "233666888"
phone = "233666888"
email = "233666888"
data = {}
data['id'] = The_id
# data['name'] = name
data['department'] = department
data['position'] = position
data['phone'] = phone
data['email'] = email
my_headers = {'Content-Type': 'application/json'}

# url = 'http://172.19.237.1:8091/web/index.jsp'
url = 'http://httpbin.org/post'
my_request = urllib.request.Request(url,data = data,headers = my_headers)
my_responese = urllib.request.urlopen(my_request)
my_html = my_responese.read().decode('utf-8')
print(my_html)

解决办法

这是因为encode返回的是bytes型的数据,不可以和str相加。

使用urllib.parse.urlencode(data).encode(encoding=’UTF8’)来处理即可

将其部分修改为:

1
2
3
4
5
6
7
# url = 'http://172.19.237.1:8091/web/index.jsp'
url = 'http://httpbin.org/post'
params = urllib.parse.urlencode(data).encode(encoding='UTF8')
my_request = urllib.request.Request(url,data = params,headers = my_headers)
my_responese = urllib.request.urlopen(my_request)
my_html = my_responese.read().decode('utf-8')
print(my_html)

即可解决这个问题!