Differences
This shows you the differences between two versions of the page.
projs:clans:docs:entity_disambiguation_xu:extractkeywords [2014/01/21 09:28] xmill.zod |
projs:clans:docs:entity_disambiguation_xu:extractkeywords [2014/01/21 09:28] (current) xmill.zod |
||
---|---|---|---|
Line 8: | Line 8: | ||
===== Detail Information ===== | ===== Detail Information ===== | ||
* Unordered List ItemFiltering out the posts in which apear both a person A's name and the related company's stock name. | * Unordered List ItemFiltering out the posts in which apear both a person A's name and the related company's stock name. | ||
- | Take the posts that filtered out as a reliable post set that related to person A. Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document. | + | * Take the posts that filtered out as a reliable post set that related to person A. Randomely pick out 1000 posts that appears person A's name, take each of them as a document and the reliable post set as a document. |
- | Then calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order. | + | * Then calculate the TF-IDF(term frequency - inverse document frequency) for each notational word in the document of reliable post set. Set their weights as the TF-IDF and sort them in desending order. |
- | Take the top 10 of them as filtering keywords for person A. | + | * Take the top 10 of them as filtering keywords for person A. |