Project Schedule

Regular Meeting on Oct.30,2013

Pres ppt by Zhi
Next Month Mission:

@Jufeng and Alex
1. Please continue to crawl weibo messages that related people and companies in the next two weeks. First just to crawl one day's news to test. Then we can move your program to our VM server. I think it should be fine when we just use one machine in the beginning. If weibo limits 1000 times per hour, just let's our VM server crawl 1000 times per hour. Our VM server may not have the python environment, maybe you need to do some settings. If you have any problem with our VM server, please contact our administrator to help us to solve the problems. His contact email is hmlaw@cse.cuhk.edu.hk
2. I need your help to use Lucene to do the search things. So you can provide search results to Visualization groups. Let's just make our search function a little bit easier in the beginning. We just define three functions.
1) In the general search, if we type keywords, we can get related persons and companies.
2) In the person search, if we type person name, we can get related person information. If we type company name, we can get related senior managers' information.
3) In the company search, if we type company name, we can get related company information. If we type person name, we get can companies related to this person.
We use lucene a little bit last year. I will send you a demo about how we use it, including putting xml files into lucene, and giving some initial scores. We kind of complete the first function, but we also need to set some parameters to make the search results more related.

@Sunny, Ken, and Qian
1. We really need static webpages before next next week. More specifically, we need the first page, people page and company page. Please work with me for the next two weeks. We can meet when you guys have time.
2. I have wrote some basic api last year. For providing top people for the first page, basic information, timeline, relation for the people, so is the company. If you request in the url, the api will return results in json format. So you can parse the json result, and put the content in the templete. So I think it may be better to continue use the backstage framework we wrote last year. It will save your time, and we can continue to add features in that platform.
I will arrange that api, and organize the code and add some comments into the code I wrote last year. And I will show you how to use the ssh framework.
3. As we may need more features, like you guys presented in the meeting, most we didn't do last year, including inferring some results and calculating some data mining results. Please work with Lily together. As she may get more results from data mining, we need to figure out how to visualize that in our website.
4. Jufeng and Alex will handle the Lucene part, please work with them to find whether you're ok with the search functions and the results, and please give them feedback.

@Lily
1. As prof. King said in the meeting, next we need to find more detail results, and dig more deeper. Like in the Qinghua college relation, what's the department they study in, how is the age distribution, who connect Qinghua to other college. And when you do that data mining part, please list all functions you have done. Then please talk with Visualization group about how to visualize these results.
2. Please help Zhi to calculate the graph properties he needs.

@Jingyi
1. Please continue to do the api. When Junfeng and Alex finish the crawling part, we need to add these information into our xml files. When Yong finish the Baidu Baike part, we also need to add new information in, including adding some new attributes. When I finished the information extraction, we really need to update our xml files, for some timeline records and education records.
2. I think we need some logs to records the changes. If we have done some error changing, we can return back. Please check some related information in the website.
3. In semantic web, they usually use RDF to store the information. Paola had told me about that. Could you check some materials about that? Maybe there are something useful that we can use in our project.

@Yang
Please make some demo from that paper. We can test using some people. Then we can talk about the results in our next meeting.

@Hang
Please continue to learn something about NLP, I will talk to you later.

@Zhi
Please continue to do the prediction part.

@Yong
Please help us to crawl Baidu Baike next month.

Navigation

About Us

Publications

Professional Activities

Lab and Projects

Affiliated Labs

Organizations

Current Activities

See Irwin King In

Book Titles

MISC

Project Schedule

Regular Meeting on Oct.30,2013