Douban scrawl tutorial
[TOC]
Intruction
Use request to scrawl Douban
Preparation
Create a virtualenv for python3
virtualenv -p /usr/local/bin/python3 venv
ps: need to install python3 first
then activate it: source venv/bin/activate
Install request
pip install request
Tutorial
1. Get movies of 2016 on Douban
1 | import requests |
2. Get movie scores and comments
Use beautiful soup to parse the HTML
Install beautiful soup by pip: pip install beautifulsoup4
Code to get movie links and next page link
1 | import requests |
Code to get movie comments link
1 | mport requests |
Code to get comments and scores of first page
1 | import requests |
Result demo
1 | 20 叙事模式跟HP相似,有更多的政治影射,但剧本,人物,剪辑和节奏都很有问题,看的完全提不起兴趣且有各种不适感。最后高潮戏又故意弄得跟超级英雄电影很像。德普已经签了续集,目测校长就快出现了吧,看来华纳把心思都放在开发HP宇宙了。影帝要再这么演下去就是下一个马丁弗里曼。 |