A03 - 爬蟲案例:網易雲&百度貼吧📄目錄1 網易雲1.1 獲取單張圖片1.2 獲取單首歌曲1.3 獲取單個MV2 百度貼吧2.1 單頁獲取案例2.2 貼吧翻頁2.3 貼吧翻頁爬蟲改寫為面向對象導航連結:
xxxxxxxxxx
import requests
url = 'https://p5.music.126.net/obj/wonDlsKUwrLClGjCm8Kx/62159237855/ea0e/81c3/4a65/1cf4dd4db06cb3644a47d47618a2fffd.jpg?imageView&quality=89'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)
with open('163music_01.png', 'wb') as f:
f.write(res.content)
xxxxxxxxxx
import requests
url = 'https://m704.music.126.net/20250827205922/8cfa0108bd2de9f485c9be3f60fecb33/jdyyaac/obj/w5rDlsOJwrLDjj7CmsOj/44027348054/93ac/4fbe/b235/a8a7e64d956918a676e297d0b477007f.m4a?vuutv=4YaIbdIFkn0DFugBl5y0+IVlDUfnyGj3T11umsEnodywbMbgfpvykjqgAAaQidtLv0NJyzU34OPbWajMcbcO+5eULygcBc9za/P1NgP3ca0=&authSecret=00000198eb85aa3309650a32ce7c0006&cdntag=bWFyaz1vc193ZWIscXVhbGl0eV9zdGFuZGFyZA'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)
with open("163music_01.m4a", "wb") as f:
f.write(res.content)
打開音樂播放的頁面(https://music.163.com/#/song?id=166282)之後,右鍵點選檢查,選擇Network,更新頁面,之後選擇Type為Media的項目(單曲一般只有一個),單擊並選擇Headers頁面,就是Request URL資源所在的url。
例子中的url下載的.m4a檔案放在同目錄下,檔案名為163music_01.m4a
xxxxxxxxxx
import requests
url = 'https://vodkgeyttp8.vod.126.net/cloudmusic/9714/core/59c8/055dca5324bbc60714cfce72c4369c8f.mp4?wsSecret=048c493b890d5d6173c8e79c91b2a7c4&wsTime=1756298990'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)
with open("163music_01.mp4", "wb") as f:
f.write(res.content)
打開MV頁面(https://music.163.com/#/mv?id=10875220),右鍵點選檢查,選擇Network,更新頁面,尋找size最大的資源(一般就是想要下載的MV,在Name單擊對應資源查看資訊(可以複制url在新頁面打開以確認是否正確)
例子中的url下載的.mp4檔案放在同目錄下,檔案名為163music_01.mp4
xxxxxxxxxx
import requests
url = 'https://tieba.baidu.com/f?ie=utf-8&kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
res = requests.get(url, headers=headers)
with open("resourse/tieba_01.html", "wb") as f:
f.write(res.content)
獲取的頁面相對於A03.py,放在同目錄的resourse文件夾中,命名為tieba_01.html
以下是在百度貼吧搜索「维多利亚3」的前4頁(前兩個url都是第一頁,區別在於,第一個是搜索結果,第二個是從第2頁或之後翻回首頁搜索結果。
https://tieba.baidu.com/f?ie=utf-8&kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3
https://tieba.baidu.com/f?kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3&ie=utf-8&pn=0
https://tieba.baidu.com/f?kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3&ie=utf-8&pn=50
https://tieba.baidu.com/f?kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3&ie=utf-8&pn=100
https://tieba.baidu.com/f?kw=%E7%BB%B4%E5%A4%9A%E5%88%A9%E4%BA%9A3&ie=utf-8&pn=150
規律:
第一頁:0
第二頁:50
第三頁:100
第四頁:150
分析規律可以得知,以0為起點,每一頁的pn會加50。
xxxxxxxxxx
import requests
url = 'https://tieba.baidu.com/f?'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
word = input('請輸入貼吧名字:')
page = int(input('請輸入要保存的頁數:'))
for i in range(page):
params = {
'kw': word,
'pn': i * 50,
}
res = requests.get(url, headers=headers, params=params)
with open(f"resourse/tieba_page_{word}_0{i + 1}.html", mode='wb') as f:
f.write(res.content)
獲取的頁面相對於A03.py,放在同目錄的resourse文件夾中,根據保存頁數的要求而有變動,部分頁面偶而會因為反爬而爬取失敗。
xxxxxxxxxx
import requests
class Tieba:
def __init__(self):
self.url = 'https://tieba.baidu.com/f?'
self.headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36'
}
# 發送請求
def send(self, params):
res = requests.get(self.url, headers=self.headers, params=params)
return res.text
# 保存數據
def save(self, page, con):
with open(f'resourse/{page + 1}.html', mode='w', encoding='utf-8') as f:
f.write(con)
def run(self):
word = input('請輸入貼吧名字:')
pages = int(input('請輸入頁數:'))
for page in range(pages):
params = {
'kw': word,
'pn': page * 50,
}
data = self.send(params)
self.save(page, data)
te = Tieba()
te.run()
目的地 | 超連結 |
---|---|
首頁 | 返回主頁 |
Python學習 | Python學習 |
上一篇 | A02 - Requests庫基本使用 |
下一篇 | A04 - Requests發送Post請求 |