This is an old revision of the document!


Document of CLANS

Example

Search (Alex)

Function Description

We use Lucene to create the index of our company and person data, and then we could implement the search function.

Related Information

Lucene – http://lucene.apache.org/

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

Change Log

Data Schema (Jingyi)

Function Description

Data Schema

Entity Disambiguation (Xu)

Function Description

There are two main steps to do name entity disambiguation:


First, we use the train set to extract filter keywords to a certain person.

Then, use the extracted filter keywords to judge each posts is related to the person in our database or not.

Related Information

Social Network Modeling by Search Engine (Xu)

Visualization (Qian, Ken, Sunny)

Additional Features

  • Loading sign
  • Drag button
  • Share on weibo, facebook, twitter and so on

Home Page

Description

Two main functions:

  • User sign in/out/up
  • Search function

More to improve

  • Solution for Misspelling
  • Solution for searching like 'xiaosuining'

Search Result

Description

Two columns of search results, one for person and one for company.

More to improve

  • Search results show the news containing the key words.
  • Search results show description of a person/company.

Person Show

Features

  • 基本信息
  • 关系网权值走势图
  • 照片
  • 成长轨迹图(故乡,求学地点,工作地点等,Google map)
  • 时间轴
  • 校友关系网图
  • 工作关系网图
  • 多项选择关系网图(节点特征值:centrality value, community)
  • 最短路径(P2P和P2C)
  • 相似度图表(P2P和P2C)
  • 微博消息(Top & Hot)
  • media

Company Show

Features

  • 基本信息
  • 照片
  • 股票当前走势及历史情况
  • 盈亏情况走势图
  • 公司地点及分公司分布图
  • 时间轴
  • 公司内部关系网
  • 公司外部关系网
  • 多项选择网图(节点特征值:centrality value, community)
  • 最短路径(C2C和C2P)
  • 比较相似度图表(C2C和C2P)
  • 微博消息(Top & Hot)
  • media

Prediction (Zhi)

  • Utilanalz, a lib for drawing and analyzing data.

Information Processing (Zhi, Hang)

Social Media Crawling (Junfeng)

Code File and Function Description

sqlite2cookie.py

Parse cookie from cookie sqlite (For Firefox and Chrome browser)

sqlite2cookie()

dbOperP.py

Class: dbOperator

Operate the data base (select, insert)

sinaMBCSearchBug.py

Class: WeiBoCBug

The Crawler for company

sinaMBPSearchBug.py

Class: WeiBoPBug

The Crawler for person

sinaBug.py

The Crawler master, control the crawler and make it run continually

linux run command:

‘nohup /local/fdm/python27/bin/python /local/fdm/weibobug/sinaBug.py &’

Social Network Analysis (Lily)

Data Source (Lily, Zhi -database)

database tables

prediction related

personal information related

 
projs/clans/docs/home.1390272101.txt.gz · Last modified: 2014/01/21 10:41 by cheungzeecn     Back to top