====== extractFilterKeyword(pid, potential_posts) ====== ===== Description ===== A filtering keywords extraction function that takes the posts classified by a naive classfication method as the train set. Obtaining each person's filtering keywords and their weight which represented by their tfidf. ===== Parameters ===== ^ Parameters ^ Necessity ^ Type ^ Description ^ | pid | required | int | the serial number of a certain person in the database | | potential_posts | required | list | A list of potential related posts in which the name of the person in the database appeared | ===== Return ===== ^ Parameters ^ Type ^ Description ^ | filter_words | list | A list of filter keywords to the person in the database | | fw_weight | list | A list of weights correspond to the filter keywords | ===== Implementation ===== - Filtering out the posts in which apear both a person A's name and the related company's stock name. Take the posts that filtered out as a reliable post set that related to person A. - Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document. - Calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order. - Take the top 10 of them as filter keywords for person A.