====== CSC5250 Information Retrieval and Search Engine ====== ==== Fall 2006 ==== | ^ Lecture I ^ Lecture II ^ Tutorial ^ ^ Time | M2, 9:30 am - 10:30 am | T7-8, 2:30 pm - 4:15 pm | T9, 4:30 pm - 5:15 pm | ^ Venue | SHB 503 | LSB C3 | ERB 804 | The Golden Rule of CSC5250: No member of the CSC5250 community shall take unfair advantage of any other member of the CSC5250 community. [[http://www.cse.cuhk.edu.hk/~king/Photos/CUHK/2006/CSC5250_2006/|{{:teaching:csc5250:s8.jpg |:teaching:csc5250:s8.jpg}}]] This is the class photos for CSC5250 in 2006. More photos are available <[[http://www.cse.cuhk.edu.hk/~king/Photos/CUHK/2006/CSC5250_2006/|here]]>. :-) ====== Course Description ====== -This course surveys the current research in information retrieval for the Internet and related topics. -The course will focus on theoretical development of information retrieval system for multimedia contents as well as practical design and implementation issues associated with Internet search engines. -Topics include probabilistic retrieval, relevance feedback, indexing of multimedia data, and applications in e-commerce. ====== Personnel ====== | ^ Lecturer ^ Tutor ^ Tutor ^ ^ Name | [[http://www.cse.cuhk.edu.hk/~king|Irwin King]] | [[http://www.cse.cuhk.edu.hk/~xpeng|Xiang Peng]] | [[http://www.cse.cuhk.edu.hk/~jkzhu|Jacky Zhu]] | ^ Email | king AT cse.cuhk.edu.hk | xpeng AT cse.cuhk.edu.hk | jkzhu AT cse.cuhk.edu.hk | ^ Office | Rm 908 | Rm 1013 | Rm 110 | ^ Telephone | 2609 8398 | 2609 8431 | TBD | ^ Office Hour(s) | TBD | Thursday 3:00 pm - 5:00 pm | TBD | Note: This class will be taught in English. Homework assignments and examinations will be conducted in English. ====== Syllabus ====== The pdf files are created in Acrobat 6.0. Please obtain the correct version of the [[http://www.adobe.com/prodindex/acrobat/readstep.html#reader | Acrobat Reader]] from Adobe. ^ Week ^ Date ^ Topics ^ Tutorials ^ Homework & Events ^ Resources ^ | 1 | 4/9 | 1. Course Information [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/00_CS5250c.pdf|color]]-[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/00_CS5250.pdf|b&w]]]\\ 2. Introduction [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/01_Introc.pdf|color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/01_Intro.pdf|b&w]] ] | No Tutorial |Readings:\\ 1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/Pages%20from%20PoF104art15_1194688.pdf|Google]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/internet97.pdf|Information Retrieval on the World Wide Web]] |1. [[http://www.dcs.gla.ac.uk/~iain/keith/index.htm|Information Retrieval (2nd Ed.) by C.J. van Rijsbergen]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/Baeza-Yates98.pdf|Searching on the Web]]\\ 3. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/Meng99.pdf|Usefulness of Search Engines]]\\ 4. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/schwartz98a.pdf|Web Search Engines]]\\ 5. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/Kingoff97.pdf|Comparing Search Engines]]\\ 6. [[http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm|The Anatomy of a Large-Scale Hypertextual Web Search Engine]]| | 2-3 | 11-18/9 | 1. Search Engine [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/01.1_SearchEnginec.pdf|color]]-[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/01.1_SearchEngine.pdf|b&w]] ]\\ 2. Modeling [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/02_Modelingc.pdf|color]]-[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/02_Modeling.pdf|b&w]] ] | Introduction to Python I |Readings:\\ 1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/01196112.pdf|The Google Cluster Architecture]] \\ Programming Exercises:\\ 1. No.17 in [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/hw099b.pdf|v.099b]]--Simple Page Crawler in Python: Due on Friday, 29 Sept on or before 11:59 PM |1. [[http://www.searchenginewatch.com/links/Major_Search_Engines/The_Major_Search_Engines/index.html|Major Search Engines]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/MODEL/p182-lee.pdf|Extended Boolean model]] | | 4 | TBD | 1. Probabilistic Modeling [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/02.1_ProbModelc.pdf|color]]-[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/02.1_ProbModel.pdf|b&w]] ] | Introduction to Python II |Readings:\\ 1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/kobayashi00.pdf|Information Retrieval on the Web]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INTRO/raghavan.pdf|Information Retrieval Algorithms]] |1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/MODEL/lee97.pdf|Vector space model and document ranking]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/MODEL/Wolff00.pdf|Probabilistic model]]| | 5 | 25/9 | Performance Evaluation[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/04_Evaluationc.pdf|color]]-[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/04_Evaluation.pdf|b&w]] ] | Introduction to Python III | TBD |1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/EVAL/burgin99.pdf|Retrieval System Performance]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/EVAL/leighton99.pdf|Precision of Web Search Services]]\\ 3. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/EVAL/losee99.pdf|Measuring Search Engine Quality]]\\ 4. [[http://www.microsoft.com/siteserver/site/30/downloads/per_search.doc|MS Site Server's Capacity and Performance Analysis Search]]\\ 5. [[http://www.microsoft.com/siteserver/site/30/downloads/DatIndex.doc|MS Site Server Indexing]] | | 6 | 2/10 | Text Algorithm\\ 1. Text Algorithm [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/07_TextAlgoc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/07_TextAlgo.pdf|B&W]] ]\\ 2. Text Analysis [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/07_TextAnalysisc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/07_TextAnalysis.pdf|B&W]] ] | Scipy I |Written Exercises:\\ 1. No. 2.1, 2.3, 2.4, 2.5, 3.2, 5.1, 5.2, 5.3, 5.4, 5.5(1, 3) in[[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/hw099c.pdf|v.099c]]: Due on Friday, 27 Oct on or before 11:59 PM |1. [[http://www.w3.org/Protocols/rfc822/3_Lexical.html|Lexical Analysis of Messages]]\\ 2. [[http://open.muscat.com/developer/docs/porterstem.html|Porter's Stemming Algorithm]]\\ 3. [[http://open.muscat.com/stemming/|Different versions and analysis of the Porter's Stemming Algorithm]]\\ 4. [[http://sunburn.informatik.uni-tuebingen.de/~buehler/BM/BM.html|Animation of String Matching Algorithm I]]\\ 5. [[http://www-igm.univ-mlv.fr/~lecroq/string/node14.html|Animation of String Matching Algorithm II]] | | 7 | 9/10 | 1. Indexing and Searching[ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/08_Indexingc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/08_Indexing.pdf|B&W]] ]\\ 2. Chinese Language Processing [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/16_ChineseIRc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/16_ChineseIR.pdf|B&W]] ] | Scipy II | TBD |1. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INDEX/lee95.pdf|Index Structure]]\\ 2. [[http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic7/#COMPACT|On tries and suffix trees]]\\ 3. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INDEX/shang98.pdf|Approximate String Matching]]\\ 4. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INDEX/lee95.pdf|Signature File Method]]\\ 5. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INDEX/beck90.pdf|Lexical Analysis]]\\ 6. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/INDEX/Moffat96.pdf|Inverted File Compression]]\\ 7. [[http://vlc.polyu.edu.hk/lexiconindex/|Chinese Lexicon and Brown Corpus]]\\ 8. [[http://www.cogsci.princeton.edu/~wn/|WordNet]] | | 8 | 16/10 | 1. Ranking and Metasearching[ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/Ranking_and_Metasearching.ppt|ppt]] ]\\ 2. Query Language [ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/05_QueryLangc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/05_QueryLang.pdf|B&W]] ]\\ \\ 3. Query Operations[ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/06_QueryOpsc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/06_QueryOps.pdf|B&W]] ] | Information Retireval Software | TBD |1. [[http://citeseer.nj.nec.com/317732.html|The shape of the web]]\\ 2. [[http://citeseer.nj.nec.com/328280.html|The structure of the web]]\\ 3. [[http://xxx.lanl.gov/abs/cond-mat/9901071/|Evolutionary Dynamics of the World Wide Web]]\\ 4. [[http://www.sigmaxi.org/amsci/issues/Comsci00/compsci2000-03.html|Graph Theory in Practice: Part II]]\\ 5. [[http://www.parc.xerox.com/istl/groups/iea/topics/internetecologies.shtml|Internet Ecologies]]\\ 6. [[http://www.ams.org/featurecolumn/archive/pagerank.html|How Google Finds Your Needle in the Web's Haystack]]\\ 7. [[http://www.nytimes.com/2007/06/03/business/yourmoney/03google.html?_r=1&oref=slogin|Google Keeps Tweaking Its Search Engine]] | | 9 | 23/10 | 1. Web Structures[ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/12_WebStructurec.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/12_WebStructure.pdf|B&W]] ]\\ \\ 2. Clustering Algorithms[ [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/06_Clusteringc.pdf|Color]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/06_Clustering.pdf|B&W]] ] | TBD | TBD |1. [[http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/|Suffix Tree notes]]\\ 2. [[http://www.cs.aau.dk/~simas/aalg04/slides/aalg4.ppt|PAT Tree algorithm and data structure notes]]\\ 3. [[http://www.csie.ncu.edu.tw/~chia/Course/IR/IR1999/PAT.ppt|PAT Trees and PAT Arrays]] | | 10 | 30/10 | Multimedia Information Retrieval | TBD | TBD |1. [[http://www.sims.berkeley.edu/how-much-info/how-much-info.pdf|How much information is on the Internet?]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/SE/abiteboul97.pdf|Computation on the Web]] | | 11 | 6/11 | Teoma and P2P | TBD | TBD |1. [[http://www.sims.berkeley.edu/courses/is202/f00/lectures/Lecture14.ppt|Original lecture notes from Berkeley]]\\ 2. [[http://www.teoma.com/|Teoma Search Engine]]\\ 3. [[http://www.gnutella.com/|Gnutella]] | | 12 | 13/11 | Wikis | TBD | TBD |1. [[http://www.sims.berkeley.edu/how-much-info/how-much-info.pdf|How much information is on the Internet?]]\\ 2. [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/SE/abiteboul97.pdf|Computation on the Web]] | | 13 | 20/11 | Web Business Models | TBD | TBD | TBD | | 14 | 27/11 | Wrap up and Summary | TBD | TBD | TBD | Notes: - For detailed tutorial information, please go to [[http://www.cse.cuhk.edu.hk/~csc5250/tutorial.html|Tutorial Page]]. - Please submit your homework assignment to csc5250@cse.cuhk.edu.hk. ====== Class Project ====== ===== Class Project Assessment Scheme ===== ==== Presentation (50%) ==== - Key points of the project, e.g., problem definition, proposed solution, validation, etc. - Demonstration ==== Report (50%) ==== - It is due at 11:59 on **Dec 24, 2006**. - No more than 15 pages, single-column, single spacing. - Submit a single compressed directory file that includes: (1) The written report, preferrably in pdf, but it is fine to include both pdf and the original file. (2) Presentation files. (3) Program files. (4) References. (5) Other relevant files, e.g., webpages, lexicons, databases, other supporting files, etc. (6) Name your top level directory using your student ID(s), e.g., one student-99123456, two students-99123456-00123456. - It should contain a cover page with abstract, introduction/background, problem definition, proposed solution, validation procedure, etc. It should use graphs, figures, tables, etc. to illustrate the outcome. - You will be assessed by: (1) (5%) understanding of the problem. (2) (10%) your proposed solution. (3) (10%) the extend of the solution. (4) (10%) the result (any update from the presentation and demonstration). (5) (5%) the validation procedure. (6) (10%) the clarity and the technical quality of the report. - If your files are more than 5M, pls prepare a CD instead of just an email copy. ===== CSC5250 Class Project Presentation Schedule ===== The class presentation will be on either December 18 or December 19. Please sign-up by sending an email to the [[xpeng@cse.cuhk.edu]] and [[king@cse.cuhk.edu.hk]]. **Notice:** - There are some adjustments for the project presentation. The time slot of 10:30 - 12:00 on December 18 are not available due to the invited seminar in CSE Dept. So we could do the presentation in the morning without the slot of 10:30 - 12:00 and the whole afternoon from 2:00 to 6:30. And every group has 30 minutes to present the work. Thanks for your attention. - The venue is Room 1022, Ho Sin Hang Engineering Building. ^ December 18\\ Presentation Time ^ Team Name ^ | 9:00 AM | TBD | | 9:30 AM | [[Chan Hoi Tung (04730173) and Chan Kam Tong (04730584)]] | | 10:00 AM | Hongbo Deng (06239340) and Yingyi Bu (06238270) | | 10:30 AM - 12:00 AM | Not Available | | Break | Break | | 14:00 PM | Wong Tsz Keung (03558911) | | 14:30 PM | Wu Di (06236600) and Zhou Tu (06238680) | | 15:00 PM | Wong Tik Shun (06235200) | | 15:30 PM | LIU Renting (06239670) and SHAN Qi (06238500) | | 16:00 PM | Chu Ka Cheong (06836804) | | 16:30 PM | CHIO Ka In (04527223) and PUN Iek Hoi (04532334) | | 17:00 PM | TBD | | 17:30 PM | TBD | | 18:00 PM | TBD | ====== Examination Schedule ====== | ^ Time ^ Venue ^ Notes ^ ^ Midterm Examination | 14:30 pm - 16:15 pm Tue, Nov 7 2006 | LSB C3 | This is a close-book and close-note examination. It will cover all topics discussed in the class. Remarks: One A4, double-side "cheat sheet", Approved calculators are allowed. | ^ Final Examination | 9:30 am - 11:30 am Fri, Dec 12 2006 | Thomas H C Cheung Gymnasium, UC | This is a close-book and close-note examination. It will cover all topics discussed in the class. Remarks: One A4, double-side "cheat sheet", Approved calculators are allowed. | ====== Grade Assessment Scheme ====== -Homework Assignments and Quizzes, 20% -Midterm Examination, 15% -Project, 25% -Proposal and Work, 10% -Presentation and Demonstration, 10% -Report, 5% -Final Examination, 40% -Optional Extra Credits Note: The minimum passing grade is to achieve at least 40 out of 100 in the final examination. ====== Required Background ====== - Pre-requisites -- None. ====== Programming Requirement ====== Familiarity with the following topics is highly recommended: - Data Structure: data types and structures, lists, queues, stacks, trees, sets, etc. - Algorithm: analysis, design, sorting methods, numerical methods, algorithms on graphs, etc. - Operating System & Programming Environment: Unix systems, C, SQL, and Matlab. ====== Reference Books ====== - [[http://www.amazon.com/exec/obidos/ASIN/020139829X/qid=1012285904/sr=8-1/ref=sr_8_67_1/002-5887768-5049630|Modern Information Retrieval]], **Ricardo Baeza-Yates and Berthier Ribeiro-Neto**, ACM Press, 1999. - [[http://www.amazon.com/gp/product/0134638379/sr=8-3/qid=1156821082/ref=pd_bbs_3/002-1222185-8472817?ie=UTF8|Information Retrieval: Data Structures and Algorithms]], **William B. Frakes and Ricardo Baeza-Yates**, Prentice Hall, 1992. - [[http://www.amazon.com/gp/product/1402030045/sr=8-2/qid=1156821082/ref=pd_bbs_2/002-1222185-8472817?ie=UTF8|Information Retrieval: Algorithms and Heuristics]], **David A. Grossman and Ophir Frieder**, Springer, 2004. - [[http://www.amazon.com/gp/product/0898715814/sr=8-5/qid=1156821082/ref=pd_bbs_5/002-1222185-8472817?ie=UTF8|Understanding Search Engines: Mathematical Modeling and Text Retrieval]], **Michael Berry, Murray Browne, and Jack Dongarra**, SIAM Society for Industrial and Applied Mathematics, 2005. - [[http://www.amazon.com/gp/product/1558605703/sr=8-7/qid=1156821082/ref=pd_bbs_7/002-1222185-8472817?ie=UTF8|Managing Gigabytes: Compressing and Indexing Documents and Images]], **Ian H. Witten, Alistair Moffat, and Timothy C. Bell**, Morgan Kaufmann, 1999. - [[http://www.amazon.com/gp/product/0521838053/sr=8-8/qid=1156821082/ref=pd_bbs_8/002-1222185-8472817?ie=UTF8|The Geometry of Information Retrieval]], **C. J. van Rijsbergen**, Cambridge University Press, 2004. ====== Book Sources ====== - **Academic & Professional Book Centre**, 1H Cheong Ming Bldg., 80-86 Argyle St., Kowloon, 2398-2191, 2391-7430 (fax) - **Caves Books (H. K.)**, 4B Ferry St., G/F., Yaumatei, Kowloon, 2780-0987, 2771-2298 - **Man Yuen Book Company**, 45 Parkes street, Jordan Road, Kowloon, Hong Kong, 2366-0594. Not very large, Asian edition books, fair price, wide range, some 10% discount. - **Swindon Book Co. Ltd**, 13-15 Lock Road, Tsim Sha Tsiu, Kowloon, 2366-8001. One of the largest book stores in Hong Kong, exchange rate is not favorable. - **Hongkong Book Centre**, 522-7064. A branch of the Swindon book shop. ====== FAQ ====== * 1. **Q: Where is a good jump off point for this class?** A: Currently, we are building an extensive archive on this topic at the [[http://www.cse.cuhk.edu.hk/~miplab|Knowledge Bank]]. You are definitely welcome to contribute your expertise and knowledge on this site. ---- * 2. **Q: Should I learn PERL?** A: Yes, absolutely you should learn PERL. PERL is a very nice text manipulation language for the web. You will find that using it will cut down the development time since it is an easy to learn shell-type language. ---- * 3. **Q: Where can I get the source code of algorithms in the Information Retrieval: Data Structures and Algorithms by William B. Frakes and Ricardo Baeza-Yates?** A: You can get the source code from [[ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/|ftp://sunsite.dcc.uchile.cl/pub/users/rbaeza/irbook/]]. ---- * 4. **Q: Where are the past homework assignments?** A: For 2000: - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/HW/hw1_00.pdf|Homework Assignment #1 (2001/9/13)]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/HW/hw2_00.pdf|Homework Assignment #2 (2000/9/20)]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/HW/hw3_00.pdf|Homework Assignment #3 (2000/10/11)]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PUB/doc.zip|Reference Documents (2000/10/10)]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PUB/query.txt|Some query examples Version 2.0 (2000/10/26)]] - [[http://www.cse.cuhk.edu.hk/~king/csc5250/PDF/HW/hw4_00.pdf|Homework Assignment #4 (2000/11/7)]] ---- * 5. **Q: What happens if a person is caught plagiarizing someone else's work?** A: The CSE department has a very strict guideline on this issue. The guideline is as follows: If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course. ====== Resources ====== - Please refer to Multimedia Information Processing Lab's (MIP Lab) [[http://www.cse.cuhk.edu.hk/~miplab|knowledge bank]] for more resources on this and other related issues. - [[http://external.nj.nec.com/~giles/course/readinglist.html|A great course website (Lee Giles) that has a lot of interesting articles on the web]]. - CS460 by Michael Berry [[http://www.cs.utk.edu/~cs460.is&r/|CS460]]. - [[http://www.classes.cs.uchicago.edu/classes/archive/1999/winter/CS219/|CS219-Programming for the WWW]] by David Beazley - [[http://www.cs.jhu.edu/~weiss/glossary.html|Glossary for Information Retrieval]]. - [[http://hissa.nist.gov/dads/terms.html|Dictionary of Algorithms, Data Structures, and Problems]]. - [[http://town.hall.org/util/wais_help.html|WAIS Help]]. - [[http://www.metacrawler.com/index.html|MetaCrawler]]. - [[http://linkpolice.mycomputer.com/|Link Police]]. - [[http://www.cis.ohio-state.edu/rfc/rfc1321.txt|The MD5 Message-Digest Algorithm]]. - [[http://www.cl.cam.ac.uk/Research/SRG/bluebook/21/crc/crc.html|Fast CRC32 in Software]]. - [[http://www.sciam.com/0397issue/0397intro.html|Scientific American's Special Report on the Internet]]. - [[http://www.ouc.bc.ca/library/eil/research/internet3.html|Library & Internet Research]]. - [[http://zing.ncsl.nist.gov/WebTools/|Web Metrics]] - Relevant Conferences -- WWW [[http://www.www1999.org|1999]], [[http://www.www2000.org|2000]], [[http://www.www2001.org|2001]], [[http://www.www2002.org|2002]], [[http://www.www2003.org|2003]], [[http://www.www2004.org|2004]] -- INET [[http://www.isoc.org/inet2000/cdproceedings/index.htm|2000]], [[http://www.isoc.org/inet2001/CD_proceedings/index.shtml|2001]], [[http://inet2002.org/CD-ROM/lu65rw2n/|2002]], 2003, [[http://www.isoc.org/inet2004/|2004]] ====== Python ====== [[teaching:csc5250:Python|Python Discussion Site]]