Table of Contents
extractFilterKeyword(pid, potential_posts)
Description
A filtering keywords extraction function that takes the posts classified by a naive classfication method as the train set. Obtaining each person's filtering keywords and their weight which represented by their tfidf.
Parameters
Parameters | Necessity | Type | Description |
---|---|---|---|
pid | required | int | the serial number of a certain person in the database |
potential_posts | required | list | A list of potential related posts in which the name of the person in the database appeared |
Return
Parameters | Type | Description |
---|---|---|
filter_words | list | A list of filter keywords to the person in the database |
fw_weight | list | A list of weights correspond to the filter keywords |
Implementation
- Filtering out the posts in which apear both a person A's name and the related company's stock name. Take the posts that filtered out as a reliable post set that related to person A.
- Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document.
- Calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order.
- Take the top 10 of them as filter keywords for person A.