Python爬虫下载腾讯课堂回放视频
在爬取了长江雨课堂回放的音频之后,又想尝试爬取腾讯课堂回放视频了,用于爬虫学习。
爬取分析
前言
难点
(1)提供
plskey
和pskey
(cookie中给出)(2)
edk
解密视频说明
利用本代码下载视频时,需要修改代码中的
tid
(term_id)。
获取课程信息
url = "https://ke.qq.com/cgi-proxy/agency/exp/get_replay_list_to_c?tid={}&need_recording=0&page_idx=0&page_size=0&need_all=1&role_type=2&bkn=658893395&r=0.4397".format(tid)"
- 主要获取课程各个视频的
fileid
根据
file_id
获取得到文件信息的url参数url = 'https://ke.qq.com/cgi-bin/qcloud/get_token?term_id={}&fileId={}&bkn=658893395&t=0.4467'.format(tid, file_id)
- 主要获取视频的四个参数信息:
exper
sign
t
us
获取视频文件地址
url = 'https://playvideo.qcloud.com/getplayinfo/v2/1258712167/{}?exper={}&sign={}&t={}&us={}'.format(file_id, video_param['result']['exper'],video_param['result']['sign'],video_param['result']['t'],video_param['result']['us'])
- 主要获取视频的m3u8文件地址
- 对于同一视频来说,有不同清晰度的视频地址,代码中默认选择了最高清晰度
获取
masterPlayList
https://1258712167.vod2.myqcloud.com/fb8e6c92vodtranscq1258712167/c76dde7c5285890800895871695/drm/voddrm.token.dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw==.master_playlist.m3u8?t=5eb4caaa&exper=0&us=8708789871727437569&sign=bb387e6ca1dfb28451dbb224d41f1bcf
dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw==
是base64码加密后的字符串,其中主要包含plskey
和pskey
- masterPlayList.m3u8文件中含有各个清晰度视频的m3u8地址。
下载最高清晰度视频的
m3u8
文件https://1258712167.vod2.myqcloud.com/fb8e6c92vodtranscq1258712167/c76dde7c5285890800895871695/drm/voddrm.token.dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw==.v.f30741.m3u8?t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
文件内容类似如下:
1
2
3
4
5
6
7
8
9
10
11
12
13#EXTINF:9.999,
v.f30741.ts?start=596637520&end=597994143&type=mpegts&t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
#EXT-X-KEY:METHOD=AES-128,URI="https://ke.qq.com/cgi-bin/qcloud/get_dk?edk=CiA3PFgfG%2BIQ7set2C1%2FAWxyVYHDD6T%2FukE95OnjE8BwRhCO08TAChiaoOvUBCokOTMyNDg4YmItOWZjYS00MzFiLWJiYjItNjFmMDhjYjNlYmM3&fileId=5285890800895871695&keySource=VodBuildInKMS&token=dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw%3D%3D",IV=0x00000000000000000000000000000000
#EXTINF:9.999,
v.f30741.ts?start=597994144&end=599294735&type=mpegts&t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
#EXT-X-KEY:METHOD=AES-128,URI="https://ke.qq.com/cgi-bin/qcloud/get_dk?edk=CiA3PFgfG%2BIQ7set2C1%2FAWxyVYHDD6T%2FukE95OnjE8BwRhCO08TAChiaoOvUBCokOTMyNDg4YmItOWZjYS00MzFiLWJiYjItNjFmMDhjYjNlYmM3&fileId=5285890800895871695&keySource=VodBuildInKMS&token=dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw%3D%3D",IV=0x00000000000000000000000000000000
#EXTINF:9.999,
v.f30741.ts?start=599294736&end=600615071&type=mpegts&t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
#EXT-X-KEY:METHOD=AES-128,URI="https://ke.qq.com/cgi-bin/qcloud/get_dk?edk=CiA3PFgfG%2BIQ7set2C1%2FAWxyVYHDD6T%2FukE95OnjE8BwRhCO08TAChiaoOvUBCokOTMyNDg4YmItOWZjYS00MzFiLWJiYjItNjFmMDhjYjNlYmM3&fileId=5285890800895871695&keySource=VodBuildInKMS&token=dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw%3D%3D",IV=0x00000000000000000000000000000000
#EXTINF:2.286,
v.f30741.ts?start=600615072&end=600980175&type=mpegts&t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
#EXT-X-ENDLIST这里直接读取倒数第2行和倒数第4行,倒数第2行为视频最后片段地址,倒数第4行中含有
edk
文件地址接下来将视频最后片段地址中的
start=600615072
改为start=0
,就是获取整个视频内容。
获取视频加密码(edk)
https://ke.qq.com/cgi-bin/qcloud/get_dk?edk=CiA3PFgfG%2BIQ7set2C1%2FAWxyVYHDD6T%2FukE95OnjE8BwRhCO08TAChiaoOvUBCokOTMyNDg4YmItOWZjYS00MzFiLWJiYjItNjFmMDhjYjNlYmM3&fileId=5285890800895871695&keySource=VodBuildInKMS&token=dWluPTMwMzI5NjQ1MTg7dm9kX3R5cGU9MDtjaWQ9MTAxMjM5NDt0ZXJtX2lkPTEwMDgzOTU1MTtwbHNrZXk9cF9sc2tleT0wMDA0MDAwMDFmMWYyMjFkZGI1NzlkN2EzNmM4NjhjOGNmZjZlMGQwYTM2NTliZGZlNWE1ZGYxMTc5MDljZDVmZTgyZGU2MTY4MWY2ODA0Y2UzZWE0MGVmO3Bza2V5PXBfc2tleT1SN0h6Yyp3ZTVpZHBvcjVNdGxVajFyc1dmU3pnYjVWSFk2N2dPR1RIY0hjXw%3D%3D
- 此处将edk文件保存到了本地文件夹内
下载加密后的视频
https://1258712167.vod2.myqcloud.com/fb8e6c92vodtranscq1258712167/c76dde7c5285890800895871695/drm/v.f30741.ts?start=0&end=600980175&type=mpegts&t=5eb4cbd0&exper=0&us=3781125914949347017&sign=dd6e77288a570373aa881c3ffa06fc19
- 由于采用requests.get来下载视频,下载速度较慢,可以用IDM或FDM直接下载该视频
将加密视频用
edk
解密1
2
3
4
5
6
7
8
9
10
11key = None
with open(pathFolder+"get_dk", 'rb') as f:
key = f.read()
iv = b'0000000000000000'
plain = ""
with open(filepath, 'rb') as f:
data = f.read()
with open(pathFolder+os.path.basename(video_url.split('?')[0]), 'wb') as ff:
cipher = AES.new(key, AES.MODE_CBC, iv)
plain = cipher.decrypt(data)
ff.write(plain)上面是解密核心代码,主要参考网上教程的。
全部代码
cookie
不需要给出的只需要修改
tid(term_id)
下载不同的课程replay_info_list = replay_info_list[0:1]
#控制下载的课程编号,该行代码在中间,自己找 一下,控制下载哪些课程,该行已被注释。
1 | import requests |