Query Categorization

Pre-processing

Knowledge Base

Useful tools

Input Examples

  • the chinese university of hk
  • new york pizza
  • How do I play mp3 using the java programming language

Crowdsourcing

  • Top 1000 queries ⇒ label them into 32 categories

Centroid Method

  1. Function Query2Term(string query)
    Input: a query, Output: terms of this query

    Example1: the chinese university of hk → [the chinese university of hk]1 ([]i is the i-th term of this query)
    Example2: new york pizza → [new york]1 [pizza]2
    Example3: How do I play mp3 using the java programming language → [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6

  2. Function Term2Centroid(string terms)
    Input: terms of a query, Output: centroid of this query

    Example1: [the chinese university of hk]1→ the chinese university of hk
    Example2: [new york]1 [pizza]2 → pizza
    Example3: [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 → mp3

  3. Function synonym(string keyword)
    Input: a word, Output: a set of synonyms of this term in WordNet

    Example: synonym(car)
    auto, automobile, machine, motorcar

Similarity-based Method

  1. Function catURL(string category, string engine, int n)
    Input: a category, Output: top n URLs from search engines (e.g., Google)

    Example: catURL(cuhk, Google, 3)
    www.cuhk.edu.hk/
    www.cuhk.edu.hk/chinese/
    www.cuhk.edu.hk/gss/

  2. Function keywordsURL(string URL)
    Input: a URL, Output: key words of Web pages for this URL

    Example: keywordsURL(http://www.cuhk.edu.hk/english/)
    research, education, shatin, campus, college, etc

  3. Function synonym(string keyword)
    Input: a word, Output: a set of synonyms of this term in WordNet

    Example: synonym(car)
    auto, automobile, machine, motorcar

Overall Workflow

 
projs/qcat/home.txt · Last modified: 2011/04/08 17:16 by xfyu     Back to top