This is an old revision of the document!


crawlingPeopleRelatedWeiboOriginalPost(pid, person_name)

Description

Extract people related original weibo post from sina weibo and insert to database.

Parameters

Parameter Necessity Type Description
pid required int person id
person_name required string person name need to search

Output

None

Implementation

  1. masterStart()
  2. wapLogIn()
  3. weiBoWapSearch(person_name, pid)

Related Work

None

Issues About The Crawler

  1. Sina Weibo API is not not so effective, it need to be authorized but the crawler would not pass sina's examine and verify.
  2. Using browser’s cookies to log in sina account.
  3. Using the url weibo.cn instead of www.weibo.com to crawl data, because the latter one’s tweet data is sealed in javascript and it’s difficult to extract.
  4. Using multiple proxies to prevent sina block our ip.
  5. For speeding up the crawler, using multiple processes and accounts.
 
projs/clans/docs/crawlingpeoplerelatedweibooriginalpost.1390738474.txt.gz · Last modified: 2014/01/26 20:14 by yangjunfeng0317     Back to top