Differences
This shows you the differences between two versions of the page.
| — |
wisc_lab:opinion_mining [2008/09/04 11:52] (current) |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Opinion Mining and Sentiment Analysis ====== | ||
| + | |||
| + | ===== Introduction ===== | ||
| + | |||
| + | ==== What is opinion mining? ==== | ||
| + | Informally: Extract the opinions given in a piece of text. | ||
| + | |||
| + | Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques. | ||
| + | |||
| + | ==== What's the big deal with opinion mining? ==== | ||
| + | |||
| + | === Motivating Scenario === | ||
| + | * People who wants to buy a camera | ||
| + | * Look for comments and reviews | ||
| + | * People who just bought a camera | ||
| + | * Comment on it | ||
| + | * Write down the usage experience | ||
| + | * Camera Manufacturer | ||
| + | * Get feedback from customer | ||
| + | * Improve their products | ||
| + | * Adjust Marketing Strategies | ||
| + | Big business, right? | ||
| + | |||
| + | Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?) | ||
| + | |||
| + | ===== People ===== | ||
| + | * [[people:kam_tong_chan|Thomas]] | ||
| + | * [[people:wei_wei|Wei Wei]] | ||
| + | |||
| + | ===== Research Issues ===== | ||
| + | ==== Opinion Extraction ==== | ||
| + | Identify the segments of text that contain opinions. | ||
| + | |||
| + | e.g. Opinions are in **boldface** | ||
| + | |||
| + | I have just entered into dslr world with 400d, before I used slr cameras. | ||
| + | |||
| + | **400d is extremly well made, precise and overall feeling is vey good.** | ||
| + | |||
| + | ==== Sentiment Classification / Subjectivity Analyzes ==== | ||
| + | Decide the sentiment orientation of a given piece of opinion. | ||
| + | |||
| + | === What is Sentiment Orientation? === | ||
| + | * Polarity | ||
| + | * Positive (e.g. This camera is great!) | ||
| + | * Negative (e.g. The battery life is too short.) | ||
| + | * Neutral | ||
| + | |||
| + | * Polarity Scale? | ||
| + | * (Most Negative) -10 ... -5 ... 0 (Neutral) ... 5 ... 10 (Most Positive) | ||
| + | |||
| + | e.g. //The picture quality is good.// (A positive opinion) | ||
| + | e.g. //The battery life is short.// (A negative opinion) | ||
| + | |||
| + | ==== Feature-Opinion Association ==== | ||
| + | A problem proposed by [[People:Kam Tong CHAN]]. The problem is related to natural language processing: | ||
| + | |||
| + | //Given a text with target features and opinions extracted, decide which opinions comment on which features.// | ||
| + | |||
| + | It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing) | ||
| + | |||
| + | Consider the phrase "pretty little girls' school", | ||
| + | * Does the school look little? | ||
| + | * Do the girls look little? | ||
| + | * Do the girls look pretty? | ||
| + | * Does the school look pretty? | ||
| + | |||
| + | |||
| + | |||
| + | ===== Advanced Issues ===== | ||
| + | |||
| + | ==== Target Identification ==== | ||
| + | Which one (or Who) is being commented? | ||
| + | |||
| + | e.g. He is a kind person. | ||
| + | |||
| + | Who is "he"? | ||
| + | |||
| + | e.g. The camera is great! | ||
| + | |||
| + | Which camera model are you talking about? | ||
| + | |||
| + | ==== Source Identification ==== | ||
| + | Given a review text, identify who made the comment. | ||
| + | |||
| + | Achieving this will allow us to build a Question-Answering System. | ||
| + | |||
| + | e.g. Who support Obama to be the next U.S. president? | ||
| + | |||
| + | |||
| + | ==== Opinion Summarization and Visualization ==== | ||
| + | Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object. | ||
| + | |||
| + | e.g. For Camera | ||
| + | * Picture Quality (+ve: 290, -ve 73) | ||
| + | * Ease of use (+ve: 57, -ve: 10) | ||
| + | * etc. | ||
| + | |||
| + | ==== Opinion Spam Detection ==== | ||
| + | Detect whether opinions that are written by spammers. | ||
| + | |||
| + | === Why there are opinion spams? === | ||
| + | - Someone may write something to promote its own image / products | ||
| + | - Someone may write something to hurt their enemies | ||
| + | |||
| + | |||
| + | ==== Others ==== | ||
| + | |||
| + | === Linguistic Tools for Opinion Mining === | ||
| + | |||
| + | == [Domain-Specific] Sentiment lexicon == | ||
| + | A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one. | ||
| + | |||
| + | * is there a way to generate it automatically from a large corpus? | ||
| + | |||
| + | == Ontology == | ||
| + | Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain. | ||
| + | |||
| + | * Who ontologies can be incorporated in opinion mining? e.g.: | ||
| + | * Opinion Summarization | ||
| + | * Processing Comparative Statements | ||
| + | |||
| + | * Is there a way to generate them automatically? | ||
| + | |||
| + | * Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like? | ||
| + | |||
| + | === Scalability === | ||
| + | * Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions? | ||
| + | |||
| + | ===== Related Software Packages for Opinion Mining ===== | ||
| + | * WordNet, SentiWordNet | ||
| + | * Thesaurus | ||
| + | * Python | ||
| + | * NLTK (Natural Language Processing Toolkits) | ||
| + | * Numpy, Scipy | ||
| + | * Matplotlib | ||
| + | * Text Processing Tools | ||
| + | * Sentence Splitters | ||
| + | * POS (Part-of-speech) Taggers | ||
| + | * Stemmers | ||
| + | * Crawler | ||
| + | |||
| + | ===== Opinion Mining Related Resources ===== | ||
| + | |||
| + | ==== Research Papers ==== | ||
| + | |||
| + | * Sentiment Classification bibliography | ||
| + | http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html | ||
| + | |||
| + | * ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics | ||
| + | http://acl.ldc.upenn.edu/ | ||
| + | |||
| + | ==== Datasets ==== | ||
| + | |||
| + | * Movie Review Data | ||
| + | http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/ | ||
| + | |||
| + | * Customer Review Data | ||
| + | http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html | ||
| + | |||
| + | * MPQA Opinion Corpus | ||
| + | http://www.cs.pitt.edu/mpqa/databaserelease/ | ||
| + | |||
| + | ==== Tools ==== | ||
| + | |||
| + | * SentiWordNet | ||
| + | http://sentiwordnet.isti.cnr.it/ | ||
| + | |||
| + | * NLTK - Natural Language Processing Toolkits for Python | ||
| + | http://nltk.sourceforge.net/ | ||
| + | |||
| + | * WordNet | ||
| + | http://wordnet.princeton.edu/ | ||
| + | |||
| + | ==== Web Resources ==== | ||
| + | |||
| + | * The Sentiment & Affect Yahoo! Group | ||
| + | http://groups.yahoo.com/group/SentimentAI | ||
| + | |||
| + | * GI - General Inquirer | ||
| + | http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/ | ||
| + | |||
| + | * LDC Catalog | ||
| + | http://www.ldc.upenn.edu/Catalog/ | ||
| + | |||
| + | * Opinmind | ||
| + | http://opinmind.com/ | ||
| + | |||
| + | * Data Mining Resources | ||
| + | http://www.kdnuggets.com/index.html | ||
| + | |||
| + | |||
| + | ==== Related Conferences ==== | ||
| + | |||
| + | * SIGIR - ACM SIGIR Special Interest Group on Information Retrieval | ||
| + | http://www.sigir.org | ||
| + | |||
| + | * CIKM - Conference on Information and Knowledge Management | ||
| + | http://www.cikm.org | ||
| + | |||
| + | * IDEAL - International Conference on Intelligent Data Engineering and Automated Learning | ||
| + | http://www.ideal2008.org/ | ||
| + | |||
| + | * SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | ||
| + | http://www.sigkdd.org | ||
| + | |||
| + | * AAAI - Association for the Advancement of Artificial Intelligence | ||
| + | http://www.aaai.org | ||
| + | |||
| + | * WWW - International World Wide Web Conferences | ||
| + | http://www.iw3c2.org/ | ||
| + | |||
| + | * TREC - Text REtrieval Conference | ||
| + | http://trec.nist.gov/ | ||
| + | |||
| + | * ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing | ||
| + | http://www.acl-ijcnlp-2009.org/ | ||
| + | |||
| + | * WSDM - ACM International Conference on Web Search and Data Mining | ||
| + | http://wsdm2009.org/ | ||
| + | |||
| + | * SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing | ||
| + | http://www.cs.jhu.edu/~yarowsky/sigdat.html | ||
| + | |||
| + | * WI - ACM International Conference on Web Intelligence | ||
| + | http://wi-consortium.org/ | ||
| + | |||
| + | * SIGWEB | ||
| + | http://www.sigweb.org/about/ | ||
| + | |||
| + | |||