extractFilterKeyword(pid, potential_posts)

Description

A filtering keywords extraction function that takes the posts classified by a naive classfication method as the train set. Obtaining each person's filtering keywords and their weight which represented by their tfidf.

Parameters

Parameters Necessity Type Description
pid required int the serial number of a certain person in the database
potential_posts required list A list of potential related posts in which the name of the person in the database appeared

Return

Parameters Type Description
filter_words list A list of filter keywords to the person in the database
fw_weight list A list of weights correspond to the filter keywords

Implementation

  1. Filtering out the posts in which apear both a person A's name and the related company's stock name. Take the posts that filtered out as a reliable post set that related to person A.
  2. Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document.
  3. Calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order.
  4. Take the top 10 of them as filter keywords for person A.
 
projs/clans/docs/extractfilterkeyword.txt · Last modified: 2014/01/26 21:49 by xmill.zod     Back to top