====== Opinion Mining - An Brief Introduction ====== This page is a brief introduction to opinion mining. It contains some background information for doing researches in opinion mining. ==== Opinion Mining Group ==== Supervisors: * [[http://wiki.cse.cuhk.edu.hk/irwin.king/home|Prof. Irwin King]] * [[http://www.cse.cuhk.edu.hk/~jlee|Prof. Jimmy Lee]] Students: * [[People:Kam Tong CHAN]] * [[People:Wei WEI]] ==== What is opinion? ==== Subjective views on a certain topic Views can be: * Pros / Cons * Suggestions for improvement * Comparisons Topics: * Any objects (usually noun) * e.g. * Products: Car, Music Player, Camera, etc. * People: Barack Obama, Hillary Clinton, etc. * Organization: Bank, Government, School, etc. * Others... ==== What is opinion mining? ==== Informally: Extract the opinions given in a piece of text. Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques. ==== What's the big deal with opinion mining? ==== === Motivating Scenario === * People who wants to buy a camera * Look for comments and reviews * People who just bought a camera * Comment on it * Write down the usage experience * Camera Manufacturer * Get feedback from customer * Improve their products * Adjust Marketing Strategies Big business, right? Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?) ===== Major Issues ===== ==== Opinion Extraction ==== Identify the segments of text that contain opinions. e.g. Opinions are in **boldface** I have just entered into dslr world with 400d, before I used slr cameras. **400d is extremly well made, precise and overall feeling is vey good.** ==== Sentiment Classification / Subjectivity Analyzes ==== Decide the sentiment orientation of a given piece of opinion. === What is Sentiment Orientation? === * Polarity * Positive (e.g. This camera is great!) * Negative (e.g. The battery life is too short.) * Neutral * Polarity Scale? * (Most Negative) -10 ... -5 ... 0 (Neutral) ... 5 ... 10 (Most Positive) e.g. //The picture quality is good.// (A positive opinion) e.g. //The battery life is short.// (A negative opinion) ==== Feature-Opinion Association ==== A problem proposed by [[People:Kam Tong CHAN]]. The problem is related to natural language processing: //Given a text with target features and opinions extracted, decide which opinions comment on which features.// It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing) Consider the phrase "pretty little girls' school", * Does the school look little? * Do the girls look little? * Do the girls look pretty? * Does the school look pretty? **Reference**: @inproceedings{DBLP:conf/pakdd/KtCHAN09, author = {Kam Tong Chan and Irwin King}, title = {Let's Tango -- Finding the Right Couple for Feature-Opinion Association in Sentiment Analysis}, booktitle = {PAKDD 2009: Advances in Knowledge Discovery and Data Mining, 13th Pacific-Asia Conference}, year = {2009}, crossref = {DBLP:conf/pakdd/2009}, bibsource = {DBLP, http://dblp.uni-trier.de}, address = {Bangkok, Thailand}, month = {April 27-30,} } ===== Advanced Issues ===== ==== Target Identification ==== Which one (or Who) is being commented? e.g. He is a kind person. Who is "he"? e.g. The camera is great! Which camera model are you talking about? ==== Source Identification ==== Given a review text, identify who made the comment. Achieving this will allow us to build a Question-Answering System. e.g. Who support Obama to be the next U.S. president? ==== Opinion Summarization and Visualization ==== Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object. e.g. For Camera * Picture Quality (+ve: 290, -ve 73) * Ease of use (+ve: 57, -ve: 10) * etc. ==== Opinion Spam Detection ==== Detect whether opinions that are written by spammers. === Why there are opinion spams? === - Someone may write something to promote its own image / products - Someone may write something to hurt their enemies ==== Others ==== === Linguistic Tools for Opinion Mining === == [Domain-Specific] Sentiment lexicon == A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one. * is there a way to generate it automatically from a large corpus? == Ontology == Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain. * Who ontologies can be incorporated in opinion mining? e.g.: * Opinion Summarization * Processing Comparative Statements * Is there a way to generate them automatically? * Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like? === Scalability === * Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions? ===== Related Software Packages for Opinion Mining ===== * WordNet, SentiWordNet * Thesaurus * Python * NLTK (Natural Language Processing Toolkits) * Numpy, Scipy * Matplotlib * Text Processing Tools * Sentence Splitters * POS (Part-of-speech) Taggers * Stemmers * Crawler ===== Opinion Mining Related Resources ===== ==== Research Papers ==== * Sentiment Classification bibliography http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html * ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics http://acl.ldc.upenn.edu/ ==== Datasets ==== * Movie Review Data http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/ * Customer Review Data http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html * MPQA Opinion Corpus http://www.cs.pitt.edu/mpqa/databaserelease/ ==== Tools ==== * SentiWordNet http://sentiwordnet.isti.cnr.it/ * NLTK - Natural Language Processing Toolkits for Python http://nltk.sourceforge.net/ * WordNet http://wordnet.princeton.edu/ ==== Web Resources ==== * The Sentiment & Affect Yahoo! Group http://groups.yahoo.com/group/SentimentAI * GI - General Inquirer http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/ * LDC Catalog http://www.ldc.upenn.edu/Catalog/ * Opinmind http://opinmind.com/ * Data Mining Resources http://www.kdnuggets.com/index.html ==== Related Conferences ==== * SIGIR - ACM SIGIR Special Interest Group on Information Retrieval http://www.sigir.org * CIKM - Conference on Information and Knowledge Management http://www.cikm.org * IDEAL - International Conference on Intelligent Data Engineering and Automated Learning http://www.ideal2008.org/ * SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining http://www.sigkdd.org * AAAI - Association for the Advancement of Artificial Intelligence http://www.aaai.org * WWW - International World Wide Web Conferences http://www.iw3c2.org/ * TREC - Text REtrieval Conference http://trec.nist.gov/ * ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for 