A filtering keywords extraction function that takes the posts classified by a naive classfication method as the train set. Obtaining each person's filtering keywords and their weight which represented by their tfidf.


  1. name: personname type: string description: the name of the person that need to extract filtering keywords


  1. A boolean value represents the train process is successful or not

Detail Information

  • Unordered List ItemFiltering out the posts in which apear both a person A's name and the related company's stock name.
  • Take the posts that filtered out as a reliable post set that related to person A. Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document.
  • Then calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order.
  • Take the top 10 of them as filtering keywords for person A.
projs/clans/docs/entity_disambiguation_xu/extractkeywords.txt · Last modified: 2014/01/21 09:28 by xmill.zod     Back to top