Differences
This shows you the differences between two versions of the page.
— |
wisc_lab:opinion_mining [2008/09/04 11:52] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Opinion Mining and Sentiment Analysis ====== | ||
+ | |||
+ | ===== Introduction ===== | ||
+ | |||
+ | ==== What is opinion mining? ==== | ||
+ | Informally: Extract the opinions given in a piece of text. | ||
+ | |||
+ | Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques. | ||
+ | |||
+ | ==== What's the big deal with opinion mining? ==== | ||
+ | |||
+ | === Motivating Scenario === | ||
+ | * People who wants to buy a camera | ||
+ | * Look for comments and reviews | ||
+ | * People who just bought a camera | ||
+ | * Comment on it | ||
+ | * Write down the usage experience | ||
+ | * Camera Manufacturer | ||
+ | * Get feedback from customer | ||
+ | * Improve their products | ||
+ | * Adjust Marketing Strategies | ||
+ | Big business, right? | ||
+ | |||
+ | Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?) | ||
+ | |||
+ | ===== People ===== | ||
+ | * [[people:kam_tong_chan|Thomas]] | ||
+ | * [[people:wei_wei|Wei Wei]] | ||
+ | |||
+ | ===== Research Issues ===== | ||
+ | ==== Opinion Extraction ==== | ||
+ | Identify the segments of text that contain opinions. | ||
+ | |||
+ | e.g. Opinions are in **boldface** | ||
+ | |||
+ | I have just entered into dslr world with 400d, before I used slr cameras. | ||
+ | |||
+ | **400d is extremly well made, precise and overall feeling is vey good.** | ||
+ | |||
+ | ==== Sentiment Classification / Subjectivity Analyzes ==== | ||
+ | Decide the sentiment orientation of a given piece of opinion. | ||
+ | |||
+ | === What is Sentiment Orientation? === | ||
+ | * Polarity | ||
+ | * Positive (e.g. This camera is great!) | ||
+ | * Negative (e.g. The battery life is too short.) | ||
+ | * Neutral | ||
+ | |||
+ | * Polarity Scale? | ||
+ | * (Most Negative) -10 ... -5 ... 0 (Neutral) ... 5 ... 10 (Most Positive) | ||
+ | |||
+ | e.g. //The picture quality is good.// (A positive opinion) | ||
+ | e.g. //The battery life is short.// (A negative opinion) | ||
+ | |||
+ | ==== Feature-Opinion Association ==== | ||
+ | A problem proposed by [[People:Kam Tong CHAN]]. The problem is related to natural language processing: | ||
+ | |||
+ | //Given a text with target features and opinions extracted, decide which opinions comment on which features.// | ||
+ | |||
+ | It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing) | ||
+ | |||
+ | Consider the phrase "pretty little girls' school", | ||
+ | * Does the school look little? | ||
+ | * Do the girls look little? | ||
+ | * Do the girls look pretty? | ||
+ | * Does the school look pretty? | ||
+ | |||
+ | |||
+ | |||
+ | ===== Advanced Issues ===== | ||
+ | |||
+ | ==== Target Identification ==== | ||
+ | Which one (or Who) is being commented? | ||
+ | |||
+ | e.g. He is a kind person. | ||
+ | |||
+ | Who is "he"? | ||
+ | |||
+ | e.g. The camera is great! | ||
+ | |||
+ | Which camera model are you talking about? | ||
+ | |||
+ | ==== Source Identification ==== | ||
+ | Given a review text, identify who made the comment. | ||
+ | |||
+ | Achieving this will allow us to build a Question-Answering System. | ||
+ | |||
+ | e.g. Who support Obama to be the next U.S. president? | ||
+ | |||
+ | |||
+ | ==== Opinion Summarization and Visualization ==== | ||
+ | Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object. | ||
+ | |||
+ | e.g. For Camera | ||
+ | * Picture Quality (+ve: 290, -ve 73) | ||
+ | * Ease of use (+ve: 57, -ve: 10) | ||
+ | * etc. | ||
+ | |||
+ | ==== Opinion Spam Detection ==== | ||
+ | Detect whether opinions that are written by spammers. | ||
+ | |||
+ | === Why there are opinion spams? === | ||
+ | - Someone may write something to promote its own image / products | ||
+ | - Someone may write something to hurt their enemies | ||
+ | |||
+ | |||
+ | ==== Others ==== | ||
+ | |||
+ | === Linguistic Tools for Opinion Mining === | ||
+ | |||
+ | == [Domain-Specific] Sentiment lexicon == | ||
+ | A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one. | ||
+ | |||
+ | * is there a way to generate it automatically from a large corpus? | ||
+ | |||
+ | == Ontology == | ||
+ | Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain. | ||
+ | |||
+ | * Who ontologies can be incorporated in opinion mining? e.g.: | ||
+ | * Opinion Summarization | ||
+ | * Processing Comparative Statements | ||
+ | |||
+ | * Is there a way to generate them automatically? | ||
+ | |||
+ | * Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like? | ||
+ | |||
+ | === Scalability === | ||
+ | * Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions? | ||
+ | |||
+ | ===== Related Software Packages for Opinion Mining ===== | ||
+ | * WordNet, SentiWordNet | ||
+ | * Thesaurus | ||
+ | * Python | ||
+ | * NLTK (Natural Language Processing Toolkits) | ||
+ | * Numpy, Scipy | ||
+ | * Matplotlib | ||
+ | * Text Processing Tools | ||
+ | * Sentence Splitters | ||
+ | * POS (Part-of-speech) Taggers | ||
+ | * Stemmers | ||
+ | * Crawler | ||
+ | |||
+ | ===== Opinion Mining Related Resources ===== | ||
+ | |||
+ | ==== Research Papers ==== | ||
+ | |||
+ | * Sentiment Classification bibliography | ||
+ | http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html | ||
+ | |||
+ | * ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics | ||
+ | http://acl.ldc.upenn.edu/ | ||
+ | |||
+ | ==== Datasets ==== | ||
+ | |||
+ | * Movie Review Data | ||
+ | http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/ | ||
+ | |||
+ | * Customer Review Data | ||
+ | http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html | ||
+ | |||
+ | * MPQA Opinion Corpus | ||
+ | http://www.cs.pitt.edu/mpqa/databaserelease/ | ||
+ | |||
+ | ==== Tools ==== | ||
+ | |||
+ | * SentiWordNet | ||
+ | http://sentiwordnet.isti.cnr.it/ | ||
+ | |||
+ | * NLTK - Natural Language Processing Toolkits for Python | ||
+ | http://nltk.sourceforge.net/ | ||
+ | |||
+ | * WordNet | ||
+ | http://wordnet.princeton.edu/ | ||
+ | |||
+ | ==== Web Resources ==== | ||
+ | |||
+ | * The Sentiment & Affect Yahoo! Group | ||
+ | http://groups.yahoo.com/group/SentimentAI | ||
+ | |||
+ | * GI - General Inquirer | ||
+ | http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/ | ||
+ | |||
+ | * LDC Catalog | ||
+ | http://www.ldc.upenn.edu/Catalog/ | ||
+ | |||
+ | * Opinmind | ||
+ | http://opinmind.com/ | ||
+ | |||
+ | * Data Mining Resources | ||
+ | http://www.kdnuggets.com/index.html | ||
+ | |||
+ | |||
+ | ==== Related Conferences ==== | ||
+ | |||
+ | * SIGIR - ACM SIGIR Special Interest Group on Information Retrieval | ||
+ | http://www.sigir.org | ||
+ | |||
+ | * CIKM - Conference on Information and Knowledge Management | ||
+ | http://www.cikm.org | ||
+ | |||
+ | * IDEAL - International Conference on Intelligent Data Engineering and Automated Learning | ||
+ | http://www.ideal2008.org/ | ||
+ | |||
+ | * SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining | ||
+ | http://www.sigkdd.org | ||
+ | |||
+ | * AAAI - Association for the Advancement of Artificial Intelligence | ||
+ | http://www.aaai.org | ||
+ | |||
+ | * WWW - International World Wide Web Conferences | ||
+ | http://www.iw3c2.org/ | ||
+ | |||
+ | * TREC - Text REtrieval Conference | ||
+ | http://trec.nist.gov/ | ||
+ | |||
+ | * ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing | ||
+ | http://www.acl-ijcnlp-2009.org/ | ||
+ | |||
+ | * WSDM - ACM International Conference on Web Search and Data Mining | ||
+ | http://wsdm2009.org/ | ||
+ | |||
+ | * SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing | ||
+ | http://www.cs.jhu.edu/~yarowsky/sigdat.html | ||
+ | |||
+ | * WI - ACM International Conference on Web Intelligence | ||
+ | http://wi-consortium.org/ | ||
+ | |||
+ | * SIGWEB | ||
+ | http://www.sigweb.org/about/ | ||
+ | |||
+ | |||