CSC5250 Information Retrieval and Search Engine
Fall 2012
| Lecture I | Lecture II | Tutorial | |
|---|---|---|---|
| Time | M2, 9:30 am - 10:30 am | T7-8, 2:30 pm - 4:15 pm | T9, 4:30 pm - 5:15 pm | 
| Venue | SHB 503 | LSB C3 | ERB 804 | 
The Golden Rule of CSC5250: No member of the CSC5250 community shall take unfair advantage of any other member of the CSC5250 community.
 This is the class photos for CSC5250 in 2006.  More photos are available <here>.
  This is the class photos for CSC5250 in 2006.  More photos are available <here>.  
Course Description
- This course surveys the current research in information retrieval for the Internet and related topics.
- The course will focus on theoretical development of information retrieval system for multimedia contents as well as practical design and implementation issues associated with Internet search engines.
- Topics include probabilistic retrieval, relevance feedback, indexing of multimedia data, and applications in e-commerce.
Personnel
| Lecturer | Tutor | Tutor | |
|---|---|---|---|
| Name | Irwin King | Xiang Peng | Jacky Zhu | 
| king AT cse.cuhk.edu.hk | xpeng AT cse.cuhk.edu.hk | jkzhu AT cse.cuhk.edu.hk | |
| Office | Rm 908 | Rm 1013 | Rm 110 | 
| Telephone | 2609 8398 | 2609 8431 | TBD | 
| Office Hour(s) | TBD | Thursday 3:00 pm - 5:00 pm | TBD | 
Note: This class will be taught in English. Homework assignments and examinations will be conducted in English.
Syllabus
The pdf files are created in Acrobat 6.0. Please obtain the correct version of the Acrobat Reader from Adobe.
Notes:
- For detailed tutorial information, please go to Tutorial Page.
- Please submit your homework assignment to csc5250@cse.cuhk.edu.hk.
Class Project
Class Project Assessment Scheme
Presentation (50%)
- Key points of the project, e.g., problem definition, proposed solution, validation, etc.
- Demonstration
Report (50%)
- It is due at 11:59 on Dec 24, 2006.
- No more than 15 pages, single-column, single spacing.
- Submit a single compressed directory file that includes: (1) The written report, preferrably in pdf, but it is fine to include both pdf and the original file. (2) Presentation files. (3) Program files. (4) References. (5) Other relevant files, e.g., webpages, lexicons, databases, other supporting files, etc. (6) Name your top level directory using your student ID(s), e.g., one student-99123456, two students-99123456-00123456.
- It should contain a cover page with abstract, introduction/background, problem definition, proposed solution, validation procedure, etc. It should use graphs, figures, tables, etc. to illustrate the outcome.
- You will be assessed by: (1) (5%) understanding of the problem. (2) (10%) your proposed solution. (3) (10%) the extend of the solution. (4) (10%) the result (any update from the presentation and demonstration). (5) (5%) the validation procedure. (6) (10%) the clarity and the technical quality of the report.
- If your files are more than 5M, pls prepare a CD instead of just an email copy.
CSC5250 Class Project Presentation Schedule
The class presentation will be on either December 18 or December 19. Please sign-up by sending an email to the xpeng@cse.cuhk.edu and king@cse.cuhk.edu.hk.
Notice: - There are some adjustments for the project presentation. The time slot of 10:30 - 12:00 on December 18 are not available due to the invited seminar in CSE Dept. So we could do the presentation in the morning without the slot of 10:30 - 12:00 and the whole afternoon from 2:00 to 6:30. And every group has 30 minutes to present the work. Thanks for your attention.
- The venue is Room 1022, Ho Sin Hang Engineering Building.
| December 18 Presentation Time | Team Name | 
|---|---|
| 9:00 AM | TBD | 
| 9:30 AM | Chan Hoi Tung (04730173) and Chan Kam Tong (04730584) | 
| 10:00 AM | Hongbo Deng (06239340) and Yingyi Bu (06238270) | 
| 10:30 AM - 12:00 AM | Not Available | 
| Break | Break | 
| 14:00 PM | Wong Tsz Keung (03558911) | 
| 14:30 PM | Wu Di (06236600) and Zhou Tu (06238680) | 
| 15:00 PM | Wong Tik Shun (06235200) | 
| 15:30 PM | LIU Renting (06239670) and SHAN Qi (06238500) | 
| 16:00 PM | Chu Ka Cheong (06836804) | 
| 16:30 PM | CHIO Ka In (04527223) and PUN Iek Hoi (04532334) | 
| 17:00 PM | TBD | 
| 17:30 PM | TBD | 
| 18:00 PM | TBD | 
Examination Schedule
| Time | Venue | Notes | |
|---|---|---|---|
| Midterm Examination | 14:30 pm - 16:15 pm Tue, Nov 7 2006 | LSB C3 | This is a close-book and close-note examination. It will cover all topics discussed in the class. Remarks: One A4, double-side “cheat sheet”, Approved calculators are allowed. | 
| Final Examination | 9:30 am - 11:30 am Fri, Dec 12 2006 | Thomas H C Cheung Gymnasium, UC | This is a close-book and close-note examination. It will cover all topics discussed in the class. Remarks: One A4, double-side “cheat sheet”, Approved calculators are allowed. | 
Grade Assessment Scheme
- Homework Assignments and Quizzes, 20%
- Midterm Examination, 15%
- Project, 25%- Proposal and Work, 10%
- Presentation and Demonstration, 10%
- Report, 5%
 
- Final Examination, 40%
- Optional Extra Credits
Note: The minimum passing grade is to achieve at least 40 out of 100 in the final examination.
Required Background
- Pre-requisites- - None.
 
Programming Requirement
Familiarity with the following topics is highly recommended:
- Data Structure: data types and structures, lists, queues, stacks, trees, sets, etc.
- Algorithm: analysis, design, sorting methods, numerical methods, algorithms on graphs, etc.
- Operating System & Programming Environment: Unix systems, C, SQL, and Matlab.
Reference Books
- Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, ACM Press, 1999.
- Information Retrieval: Data Structures and Algorithms, William B. Frakes and Ricardo Baeza-Yates, Prentice Hall, 1992.
- Information Retrieval: Algorithms and Heuristics, David A. Grossman and Ophir Frieder, Springer, 2004.
- Understanding Search Engines: Mathematical Modeling and Text Retrieval, Michael Berry, Murray Browne, and Jack Dongarra, SIAM Society for Industrial and Applied Mathematics, 2005.
- Managing Gigabytes: Compressing and Indexing Documents and Images, Ian H. Witten, Alistair Moffat, and Timothy C. Bell, Morgan Kaufmann, 1999.
- The Geometry of Information Retrieval, C. J. van Rijsbergen, Cambridge University Press, 2004.
Book Sources
- Academic & Professional Book Centre, 1H Cheong Ming Bldg., 80-86 Argyle St., Kowloon, 2398-2191, 2391-7430 (fax)
- Caves Books (H. K.), 4B Ferry St., G/F., Yaumatei, Kowloon, 2780-0987, 2771-2298
- Man Yuen Book Company, 45 Parkes street, Jordan Road, Kowloon, Hong Kong, 2366-0594. Not very large, Asian edition books, fair price, wide range, some 10% discount.
- Swindon Book Co. Ltd, 13-15 Lock Road, Tsim Sha Tsiu, Kowloon, 2366-8001. One of the largest book stores in Hong Kong, exchange rate is not favorable.
- Hongkong Book Centre, 522-7064. A branch of the Swindon book shop.
FAQ
- 1. Q: Where is a good jump off point for this class?
A: Currently, we are building an extensive archive on this topic at the Knowledge Bank. You are definitely welcome to contribute your expertise and knowledge on this site.
- 2. Q: Should I learn PERL?
A: Yes, absolutely you should learn PERL. PERL is a very nice text manipulation language for the web. You will find that using it will cut down the development time since it is an easy to learn shell-type language.
- 3. Q: Where can I get the source code of algorithms in the Information Retrieval: Data Structures and Algorithms by William B. Frakes and Ricardo Baeza-Yates?
A: You can get the source code from ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/.
- 4. Q: Where are the past homework assignments?
A: For 2000:
- 5. Q: What happens if a person is caught plagiarizing someone else's work?
A: The CSE department has a very strict guideline on this issue. The guideline is as follows:
If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender.
A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course.
Resources
- Please refer to Multimedia Information Processing Lab's (MIP Lab) knowledge bank for more resources on this and other related issues.
- CS460 by Michael Berry CS460.
- CS219-Programming for the WWW by David Beazley
- Relevant Conferences
















 
  





