Version Change Log

Version 1.0: Using sina weiboAPI to craw the data we wanted.

(Not so effective, need to be authorized but the crawler would not pass sina's examine and verify)

Version 2.0: Using browser’s cookie to login sina. By crawling www.weibo.com, analyzing the html to get the user’s fellows.

Version 2.1: Get user’s tweets by user id which represents by uid.

(User’s tweets are sealed in javascript and the beautifulsoup can not parse it)

Version 3.0: Using browser’s cookie to login sina. By crawling weibo.cn, analyzing the html to get the user’s tweets.

Version 3.1: Using the weibo search function to search people we wanted and get the tweets.

Version 3.2: Using alternative proxies to make sure the crawler could run 24 hours/day.

Version 3.3: When get tweets, get all current tweet’s retweet in the same time.

Version 4.0: Refactoring the code and making it object-oriented.

Version 4.1: Add exception handle.

Version 4.2: For speeding up the crawler, use multiple processes to crawl data.

Version 4.3: Using multiple sina accounts( which means multiple cookies) to make the crawler run continually.

Version 4.4: Exit handle.

Version 5.0: Integrating company search list and people search list.

 
projs/clans/docs/social_media_crawling_junfeng/versions.txt · Last modified: 2014/01/19 18:00 by yangjunfeng0317     Back to top