Table of Contents
Query Categorization
Pre-processing
- Stemming
- Abbreviation Extension
- Stopword filtering- Use Stop-word list
 
- Misspelled words
- Location-based queries- NER for location detection
 
- Part-of-speech (POS) tagging
- Named entity recognition (NER)
- Person (e.g., Bill Gates)
- Location (e.g., Hong Kong)
- Thing (e.g., Table)
 
Knowledge Base
- Lexicon (e.g., DBpedia person, location, organization, and product lists)
- Stop-word list (e.g, of, the)
- Abbreviation list (e.g., ad for advertisement)
Useful tools
Input Examples
- the chinese university of hk
- new york pizza
- How do I play mp3 using the java programming language
Crowdsourcing
- Top 1000 queries ⇒ label them into 32 categories
Centroid Method
- Function Query2Term(string query)
 Input: a query, Output: terms of this query
 
 Example1: the chinese university of hk → [the chinese university of hk]1 ([]i is the i-th term of this query)
 Example2: new york pizza → [new york]1 [pizza]2
 Example3: How do I play mp3 using the java programming language → [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6
 
 
- Function Term2Centroid(string terms)
 Input: terms of a query, Output: centroid of this query
 
 Example1: [the chinese university of hk]1→ the chinese university of hk
 Example2: [new york]1 [pizza]2 → pizza
 Example3: [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 → mp3
 
 
- Function synonym(string keyword)
 Input: a word, Output: a set of synonyms of this term in WordNet
 
 Example: synonym(car)
 auto, automobile, machine, motorcar
Similarity-based Method
- Function catURL(string category, string engine, int n)
 Input: a category, Output: top n URLs from search engines (e.g., Google)
 
 Example: catURL(cuhk, Google, 3)
 www.cuhk.edu.hk/
 www.cuhk.edu.hk/chinese/
 www.cuhk.edu.hk/gss/
 
 
- Function keywordsURL(string URL)
 Input: a URL, Output: key words of Web pages for this URL
 
 Example: keywordsURL(http://www.cuhk.edu.hk/english/)
 research, education, shatin, campus, college, etc
 
 
- Function synonym(string keyword)
 Input: a word, Output: a set of synonyms of this term in WordNet
 
 Example: synonym(car)
 auto, automobile, machine, motorcar
















 
  





