Differences

This shows you the differences between two versions of the page.

projs:qcat:home [2011/04/07 20:04]
xfyu
projs:qcat:home [2011/04/08 17:16] (current)
xfyu
Line 4: Line 4:
    * [[http://tartarus.org/~martin/PorterStemmer/index-old.html|The Porter Stemming Algorithm]]     * [[http://tartarus.org/~martin/PorterStemmer/index-old.html|The Porter Stemming Algorithm]]
  * Abbreviation Extension   * Abbreviation Extension
 +    * Use [[http://www.indiana.edu/~letrs/help-services/QuickGuides/oed-abbr.html|Abbreviation list]]
  * Stopword filtering   * Stopword filtering
 +    * Use [[http://snowball.tartarus.org/algorithms/english/stop.txt|Stop-word list]]
  * Misspelled words   * Misspelled words
 +    * [[http://aspell.net/|GNU Aspell]]
  * Location-based queries   * Location-based queries
 +    * NER for location detection
  * Part-of-speech (POS) tagging   * Part-of-speech (POS) tagging
 +    * [[http://nlp.stanford.edu/software/tagger.shtml|Stanford POS tagger]]
  * Named entity recognition (NER)   * Named entity recognition (NER)
 +    * [[http://nlp.stanford.edu/software/CRF-NER.shtml|Stanford NER tagger]]
    * Person (e.g., Bill Gates)     * Person (e.g., Bill Gates)
    * Location (e.g., Hong Kong)     * Location (e.g., Hong Kong)
Line 15: Line 21:
===== Knowledge Base ===== ===== Knowledge Base =====
-  * Lexicon (e.g., DBpedia person, location, organization, and product lists) +  * Lexicon (e.g., [[http://dbpedia.org/About|DBpedia]] person, location, organization, and product lists) 
-  * Stop-word lexicon (e.g, of, the) +  * [[http://snowball.tartarus.org/algorithms/english/stop.txt|Stop-word list]] (e.g, of, the) 
-  * Abbreviation lexicon (e.g., ad for advertisement)+  * [[http://www.indiana.edu/~letrs/help-services/QuickGuides/oed-abbr.html|Abbreviation list]] (e.g., ad for advertisement) 
===== Useful tools ===== ===== Useful tools =====
  *[[http://tartarus.org/~martin/PorterStemmer/index-old.html|The Porter Stemming Algorithm]]   *[[http://tartarus.org/~martin/PorterStemmer/index-old.html|The Porter Stemming Algorithm]]
 +  *[[http://aspell.net/|GNU Aspell]]
  *[[http://htmlparser.sourceforge.net/|Web page structure analysis]]   *[[http://htmlparser.sourceforge.net/|Web page structure analysis]]
  *[[http://www.nzdl.org/Kea/|KEA for key word extraction]]   *[[http://www.nzdl.org/Kea/|KEA for key word extraction]]
 +  *[[http://nlp.stanford.edu/software/tagger.shtml|Stanford POS tagger]]
 +  *[[http://nlp.stanford.edu/software/CRF-NER.shtml|Stanford NER tagger]]
  *[[http://wordnet.princeton.edu/|WordNet]]   *[[http://wordnet.princeton.edu/|WordNet]]
  *[[http://search.cpan.org/dist/WordNet-Similarity/|WordNet:: Similarity]]   *[[http://search.cpan.org/dist/WordNet-Similarity/|WordNet:: Similarity]]
Line 37: Line 47:
===== Centroid Method ===== ===== Centroid Method =====
  - Function <color red>Query2Term(string query)</color> \\ **Input**: a query, **Output**: terms of this query \\ \\ Example1: the chinese university of hk -> [the chinese university of hk]1 ([]i is the i-th term of this query) \\ Example2: new york pizza -> [new york]1 [pizza]2 \\ Example3: How do I play mp3 using the java programming language -> [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 \\ \\   - Function <color red>Query2Term(string query)</color> \\ **Input**: a query, **Output**: terms of this query \\ \\ Example1: the chinese university of hk -> [the chinese university of hk]1 ([]i is the i-th term of this query) \\ Example2: new york pizza -> [new york]1 [pizza]2 \\ Example3: How do I play mp3 using the java programming language -> [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 \\ \\
-  - Function <color red>Term2Centroid(string terms)</color> \\ **Input**: terms of a query, **Output**: centroid of this query \\ \\ Example1: [the chinese university of hk]1-> the chinese university of hk \\ Example2: [new york]1 [pizza]2 -> pizza \\ Example3: [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 -> mp3 +  - Function <color red>Term2Centroid(string terms)</color> \\ **Input**: terms of a query, **Output**: centroid of this query \\ \\ Example1: [the chinese university of hk]1-> the chinese university of hk \\ Example2: [new york]1 [pizza]2 -> pizza \\ Example3: [play]1 [mp3]2 [use]3 [java]4 [program]5 [language]6 -> mp3 \\ \\ 
 +  - Function <color red>synonym(string keyword)</color> \\ **Input**: a word, **Output**: a set of synonyms of this term in WordNet \\ \\ Example: synonym(car) \\ auto, automobile, machine, motorcar 
 +
===== Similarity-based Method ===== ===== Similarity-based Method =====
Line 45: Line 57:
===== Overall Workflow ===== ===== Overall Workflow =====
 +{{:projs:qcat:figure-updated1.pdf|Workflow for query categorization}}
 
projs/qcat/home.1302177843.txt.gz · Last modified: 2011/04/07 20:04 by xfyu     Back to top