A02 - Requests庫基本使用📄目錄1 基本使用1.1 Requests模塊介紹和安裝1.2 基本使用2 響應的保存2.1 使用requests庫保存圖片文件2.2 使用requests庫保存html文件2.3 其他屬性2.3.1 response.text和response.content的區別2.3.2 response.encoding指定編碼2.3.3 其他3 headers請求3.1 headers參數3.2 User-Agent請求池4 帶參數的請求4.1 瀏覽器發送請求的原理4.2 url傳參4.3 發送帶參數的請求導航連結:
作用:發送http請求,獲取響應數據
Requests模塊是第三方模塊,需要另外安裝
在Windows系統中,在系統python版本中安裝requests庫時,輸入以下代碼(不建議):
xxxxxxxxxxpip install requests在Pycharm的項目python版本(即.venv)中安裝:
在Pycharm中安裝時右上角設定按鈕→
搜索Python Interpreter(Windows和Mac的Pycharm中,Python Interpreter在不同目錄位置,因此建議使用搜索功能)→
點選+號(即install)→
搜索requests→
選好版本之後點選Install package等候安裝完成

Requests基本格式:
xxxxxxxxxximport requestsurl = "https://www.baidu.com" # 目標urlresponse = requests.get(url) # 向目標url發送get請求print(response.text) # 打印響應內容⚠️ 響應內容有亂碼(如requests中文網站),這是因為requests模塊會自動尋求一種解碼方式去解碼

requests.get中的參數:
xxxxxxxxxxrequests.get(url, params, kwargs)具體例子:

xxxxxxxxxximport requestsurl = "https://www.baidu.com" # 目標urlresponse = requests.get(url) # 向目標url發送get請求print(response)print(response.content.decode()) # 這比使用response.text好# decode()中可填寫諸如'utf-8'等,確定解碼方式輸出結果:
xxxxxxxxxx<Response [200]><!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>'); </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
xxxxxxxxxximport requestsurl = 'https://isocpp.org/assets/images/cpp_logo.png' # 確定urlres = requests.get(url) # 獲取響應# print(res.content) # 打印圖片的二進制內容with open('cpp_logo.png', 'wb') as f: f.write(res.content)輸出結果(沒有結果會被顯示,但可以同目錄下找到新文件,即cpp_logo.png):
x
其中,wb的b的意思是指以二進制模式開啟檔案(而非以文字形式),w的意思是以寫入模式開啟檔案,會建立新檔或覆蓋同名檔案

xxxxxxxxxximport requestsurl = "https://www.google.com"response = requests.get(url)with open("google.html", "w", encoding='utf-8') as f: f.write(response.content.decode())輸出結果(沒有結果會被顯示,但可以同目錄下找到新文件,即google.html):
xxxxxxxxxx
response.text和response.content的區別text:str類型,requests模塊自動根據http頭部對響應的編碼作出有根據的推測
content:bytes類型,可以通過decode()解碼
response.encoding指定編碼
💡 response.encoding指定編碼,可以不用解碼
xxxxxxxxxximport requestsurl = "https://www.baidu.com"response = requests.get(url)response.encoding = 'utf-8' # 指定編碼print(response.text)輸出結果:
xxxxxxxxxx<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>'); </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号 <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>| 常用的屬性或方法 | 含義 |
|---|---|
| res.url | 獲取響應的url,有時候響應的url和請求的url並不一致 |
| res.status_code | 響應狀態碼 |
| res.request.headers | 響應對象的請求頭 |
| res.headers | 響應頭 |
| res.apparent_encoding | 如果是圖片,響應內容為二進制形式,所以沒有指定的編碼格式 |
| res.request._cookies | 響應對象請求的cookie;返回CookieJar類型 |
| res.cookies | 響應的cookie(經過了set-cookie動件;返回CookieJar類型) |

xxxxxxxxxximport requestsurl = 'https://isocpp.org/assets/images/cpp_logo.png' # 確定urlres = requests.get(url) # 獲取響應print(res.url) # 打印響應的urlprint(res.status_code) # 響應狀態碼print(res.request.headers) # 打印響應對象的請求頭print(res.headers) # 打印響應頭print(res.apparent_encoding) # 圖片的響應內容為二進制形式,所以沒有指定的編碼格式print(res.request._cookies) # 響應對象請求的cookie;返回CookieJar類型print(res.cookies) # 響應的cookie(經過了set-cookie動件;返回CookieJar類型)輸出結果:
xxxxxxxxxxhttps://isocpp.org/assets/images/cpp_logo.png200{'User-Agent': 'python-requests/2.32.5', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}{'Date': 'Wed, 27 Aug 2025 07:26:39 GMT', 'Content-Type': 'image/png', 'Content-Length': '23613', 'Connection': 'keep-alive', 'CF-RAY': '9759d78a8e7885e9-HKG', 'Last-Modified': 'Tue, 28 Apr 2020 00:03:37 GMT', 'etag': '"5c3d-5a44e902fa840"', 'Accept-Ranges': 'bytes', 'Age': '3343', 'Cache-Control': 'max-age=14400', 'cf-cache-status': 'HIT', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v4?s=fiWeC7HVmRaKDBa1UrrxHalbrkSfxn%2Faod4MWceSRhLQjQQwKgQbNTcXTTVmG3fE4PDxVMjlb%2Ffjlgq%2F8YQ6yGpWwcGrfoQP9OkD6nnP8eP1bOjL6nnar0NzOaLt2RTI0xrGyVXEryXj"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'alt-svc': 'h3=":443"; ma=86400', 'server-timing': 'cfL4;desc="?proto=TCP&rtt=10699&min_rtt=9242&rtt_var=4507&sent=5&recv=6&lost=0&retrans=0&sent_bytes=2836&recv_bytes=786&delivery_rate=437567&cwnd=252&unsent_bytes=0&cid=14338ace0e1c6873&ts=61&x=0"'}None<RequestsCookieJar[]><RequestsCookieJar[]>headers請求headers參數xxxxxxxxxxrequests.get(url, headers=headers)headers參數接收字典形式的請求頭,請求頭字段名為key,值為value

xxxxxxxxxximport requestsurl = "https://www.baidu.com"# 構建請求頭headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'}# 帶上user-agent發送請求# headers參數接收字典形式的請求頭,請求頭字段名為key,值為valueresponse = requests.get(url, headers=headers)# print(response.content.decode())print(len(response.content.decode())) # 長度明顯更長(610581)輸出結果:
xxxxxxxxxx610581單一版本的瀏覽器請求過多,可能會被識別出來。為了達到爬蟲的目的,需要構建user-agent請求池
目的是防止反爬

第1種構建User-Agent請求池的方式:
xxxxxxxxxximport randomUAlist = [ 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Safari/605.1.15', 'Mozilla/5.0 (Linux; Android 8.0.0; SM-G955U Build/R16NW) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Mobile Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36']print(random.choice(UAlist))這樣構建User-Agent池,如果太多會很不方便
因此可以使用fake-useragent庫

第2種構建User-Agent請求池的方式(但可能會出現異常):
xxxxxxxxxx# 先安裝fake-useragent庫from fake_useragent import UserAgentprint(UserAgent().random)構建請求
查找緩存(緩解服務器的壓力,增加性能)
準備ip地址和端口
等待TCP隊列
建立TCP連接
發送HTTP請求
瀏覽器會向服務器發送請求行,包括:
請求方法
請求url
http協議
...
https://www.google.com/search?q=%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91&sca_esv=51b55862be512269&biw=1440&bih=727&sxsrf=AE3TifPQYetWqvG_LUJCLGxdYTvuDkaISg%3A1756276549673&ei=RaeuaOPxKODU1e8PvZPDwQI&ved=0ahUKEwijqPv-r6qPAxVgavUHHb3JMCgQ4dUDCBA&uact=5&oq=%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91&gs_lp=Egxnd3Mtd2l6LXNlcnAiDOe2reWfuueZvuenkTIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgARI7hNQAFjYEXAAeAGQAQSYAfsDoAHLIqoBCzEuNi41LjEuMy4xuAEDyAEA-AEBmAIKoAKSEMICBRAuGIAEwgIIEAAYgAQYogTCAgsQLhiABBjRAxjHAcICBRAAGO8FwgIIEAAYogQYiQWYAwCSBwkxLjQuMi4yLjGgB4U6sgcJMS40LjIuMi4xuAeSEMIHBzAuNC41LjHIByM&sclient=gws-wiz-serp
上述為在google搜索「維基百科」顯示第一頁搜尋結果的url
可以清晰看到,字符串被當作url提交時,會被自動進行url編碼處理
「維基百科(明文)」被轉為%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91(密文)

xxxxxxxxxxfrom urllib.parse import quote, unquote# quote() # 明文轉密文# unquote() # 密文轉明文print(quote('維基百科'))print(unquote('%E5%8F%83%E6%95%B8'))輸出結果:
xxxxxxxxxx%E7%B6%AD%E5%9F%BA%E7%99%BE%E7%A7%91參數
例子(在google搜索「自由放任主義」):
xxxxxxxxxximport requestsurl = 'https://www.google.com/search?q=%E8%87%AA%E7%94%B1%E6%94%BE%E4%BB%BB%E4%B8%BB%E7%BE%A9'headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'}res = requests.get(url, headers=headers)print(res.content.decode())如果想要在代碼中控制搜索的是什麼,就可以通過params攜帶參數字典。步驟如下:
構建請求參數字典
發送請求的時候帶上參數字典

xxxxxxxxxximport requestsurl = 'https://www.baidu.com/s?'headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'}# 構建請求參數字典,可通過input,在運行時再決定要以什麼為關鍵字name = input('請輸入關鍵字:')kw = {'wd': name}
res2 = requests.get(url, headers=headers, params=kw)print(res2.content.decode())另一種方式只使用input:

xxxxxxxxxximport requestsname = input('請輸入關鍵字:')url = f'https://www.baidu.com/s?wd={name}'headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'}res = requests.get(url, headers=headers)print(res.content.decode())| 目的地 | 超連結 |
|---|---|
| 首頁 | 返回主頁 |
| Python學習 | Python學習 |
| 上一篇 | A01 - 爬蟲基本介紹 |
| 下一篇 | A03 - 爬蟲案例:網易雲&百度貼吧 |