WST540: Web Search and Text Analysis Spring 2013 Professor: Yufei Tao TA: Kyoungrok Jang |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Description | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This course will cover modeling and algorithmic aspects of web search engines. Topics (subject to time permission) include tf-idf scoring, page ranks, inverted indexes, tries, suffix trees/arrays, q-grams, error-tolerant keyword search, and deep webs. On completion, students are expected to have acquired solid understanding on fundamental text retrieval and link analysis. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Announcements | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
News 1 (18 Mar): Exercise list 1 posted. News 2 (26 Mar): Starting from the lecture on 27 Mar, class attendance will be taken during the first break. News 3 (26 Mar): The first quiz will take place in the lecture of 3 Apr. The quiz will be 20 minutes long. Its scope covers Lectures 2 and 3. News 4 (1 Apr): Exercise list 2 posted. News 5 (9 Apr): Quiz 1 solutions posted. News 6 (2 May): Exercise list 3 posted. News 7 (2 May): Exercise list 4 posted. News 8 (2 May): The second quiz will take place in the lecture of 15 May. The quiz will be 20 minutes long. Its scope covers Lectures 4-10. News 9 (2 Jun): The make up class will start at 12pm, 7 June. The classroom is the same as usual, namely, Room 2222, N5. News 10 (5 Jun): Exercise list 5 posted. News 11 (5 Jun): The third quiz will take place in the lecture of 12 Jun. The quiz will be 20 minutes long. Its scope covers Lectures 13-15. News 12 (9 Jun): Exercise list 6 posted. News 13 (10 Jun): The FINAL EXAM will start at 1pm, 25 Jun. The duration is 2 hours. The venue is Room 2222, Building N5. The scope of the exam covers all lectures. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Time and Venues | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Grading Scheme | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Attendance: 30% Quizzes: 30% Final exam: 40% |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lecture Notes | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No existing textbook contains all the materials in this course. The following is a good (and fun) reference book that covers about 30% of the contents: Michael W. Berry, Murray Browne. Understand Search Engines: Mathematical Modeling and Text Retrieval. 2nd edition, SIAM Book Series: Software, Environments, and Tools. ISBN: 0-89871-581-4. In addition, the following papers are nice reading related to this course: [LM05] Langville, A., and Meyer, C. A Survey of Eigenvector Methods of Web Information Retrieval. SIAM Review, 47(1): 135-161, 2005. [SZTJ12] Sheng, C., Zhang, N., Tao, Y., Jin, X. Optimal Algorithms for Crawling a Hidden Database in the Web. Proceedings of the VLDB Endowment (PVLDB), 5(11): 1112-1123, 2012. [TS] Tao, Y., Sheng, C. Fast Nearest Neighbor Search with Keywords. To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE). [WF74] Wagner, R., and Fischer, M. The String-to-String Correction Problem. JACM, 21(1): 168-173, 1974. [ZM06] Zobel, J., and Moffat, A. Inverted files for text search engines. ACM Comput. Surv. 38(2), 2006. The instructor will make lecture notes available before each class. These notes are concise but complete description of the course contents. As usual, lecture attendance is vital to gaining thorough understanding.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exercises, Quizzes and Exams | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The following exercises are designed to help students understand the key concepts and algorithms taught in the lectures. Students are encouraged to attempt the excercises by themselves and compare their answers to the solutions given by the instructor. Exercise List 1 (Solutions) Exercise List 2 (Solutions) Exercise List 3 (Solutions) Exercise List 4 (Solutions) Exercise List 5 (Solutions) Exercise List 6 (Solutions) Quiz 1 Solutions Quiz 2 Solutions Quiz 3 Solutions |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|