crawlingCompanyRelatedWeiboOriginalPost(stock_id, company_name)


Extract company related original weibo post from sina weibo and insert to database.


Parameter Necessity Type Description
stock_id required int company id
company_name required string company name need to crawl


Parameters Type Description
status string show the crawler running status


  1. masterStart(). Create multiple processes to begin crawling data.
  2. wapLogIn(). Log in sina Account.
  3. weiBoWapSearch(company_name, stock_id). Use company name and company id to search company related weibi and insert in database.
    • extractTopic(company_name, company_id). Extract weibo text and insert to database.

Related Work


Issues About The Crawler

  1. Sina Weibo API is not not so effective, it need to be authorized but the crawler would not pass sina's examine and verify.
  2. Using browser’s cookies to log in sina account.
  3. Using the url instead of to crawl data, because the latter one’s tweet data is sealed in javascript and it’s difficult to extract.
  4. Using multiple proxies to prevent sina block our ip.
  5. For speeding up the crawler, using multiple processes and accounts.
projs/clans/docs/crawlingcompanyrelatedweibooriginalpost.txt · Last modified: 2014/02/04 18:32 by yangjunfeng0317     Back to top