Differences

This shows you the differences between two versions of the page.

wisc_lab:opinion_mining [2008/09/04 11:52] (current)
Line 1: Line 1:
 +====== Opinion Mining and Sentiment Analysis ======
 +
 +===== Introduction =====
 +
 +==== What is opinion mining? ====
 +Informally: Extract the opinions given in a piece of text.
 +
 +Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques.
 +
 +==== What's the big deal with opinion mining? ====
 +
 +=== Motivating Scenario ===
 +  * People who wants to buy a camera
 +    * Look for comments and reviews
 +  * People who just bought a camera
 +    * Comment on it
 +    * Write down the usage experience
 +  * Camera Manufacturer
 +    * Get feedback from customer
 +    * Improve their products
 +    * Adjust Marketing Strategies
 +Big business, right?
 +
 +Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?)
 +
 +===== People =====
 +  * [[people:kam_tong_chan|Thomas]]
 +  * [[people:wei_wei|Wei Wei]]
 +
 +===== Research Issues =====
 +==== Opinion Extraction ====
 +Identify the segments of text that contain opinions.
 +
 +e.g. Opinions are in **boldface**
 +
 +I have just entered into dslr world with 400d, before I used slr cameras.
 +
 +**400d is extremly well made, precise and overall feeling is vey good.**
 +
 +==== Sentiment Classification / Subjectivity Analyzes ====
 +Decide the sentiment orientation of a given piece of opinion.
 +
 +=== What is Sentiment Orientation? ===
 +  * Polarity
 +    * Positive (e.g. This camera is great!)
 +    * Negative (e.g. The battery life is too short.)
 +    * Neutral
 +
 +  * Polarity Scale?
 +    * (Most Negative) -10 ... -5 ... 0 (Neutral) ... 5 ... 10 (Most Positive)
 +
 +e.g. //The picture quality is good.// (A positive opinion)
 +e.g. //The battery life is short.// (A negative opinion)
 +
 +==== Feature-Opinion Association ====
 +A problem proposed by [[People:Kam Tong CHAN]]. The problem is related to natural language processing:
 +
 +//Given a text with target features and opinions extracted, decide which opinions comment on which features.//
 +
 +It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing)
 +
 +Consider the phrase "pretty little girls' school",
 +    * Does the school look little?
 +    * Do the girls look little?
 +    * Do the girls look pretty?
 +    * Does the school look pretty?
 +
 +
 +
 +===== Advanced Issues =====
 +
 +==== Target Identification ====
 +Which one (or Who) is being commented?
 +
 +e.g. He is a kind person.
 +
 +Who is "he"?
 +
 +e.g. The camera is great!
 +
 +Which camera model are you talking about?
 +
 +==== Source Identification ====
 +Given a review text, identify who made the comment.
 +
 +Achieving this will allow us to build a Question-Answering System.
 +
 +e.g. Who support Obama to be the next U.S. president?
 +
 +
 +==== Opinion Summarization and Visualization ====
 +Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object.
 +
 +e.g. For Camera
 +  * Picture Quality (+ve: 290, -ve 73)
 +  * Ease of use (+ve: 57, -ve: 10)
 +  * etc.
 +
 +==== Opinion Spam Detection ====
 +Detect whether opinions that are written by spammers.
 +
 +=== Why there are opinion spams? ===
 +  - Someone may write something to promote its own image / products
 +  - Someone may write something to hurt their enemies
 +
 +
 +==== Others ====
 +
 +=== Linguistic Tools for Opinion Mining ===
 +
 +== [Domain-Specific] Sentiment lexicon ==
 +A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one.
 +
 +  * is there a way to generate it automatically from a large corpus?
 +
 +== Ontology ==
 +Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain.
 +
 +  * Who ontologies can be incorporated in opinion mining? e.g.:
 +    * Opinion Summarization
 +    * Processing Comparative Statements
 +
 +  * Is there a way to generate them automatically?
 +
 +  * Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like?
 +
 +=== Scalability ===
 +  * Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions?
 +
 +===== Related Software Packages for Opinion Mining =====
 +  * WordNet, SentiWordNet
 +  * Thesaurus
 +  * Python
 +    * NLTK (Natural Language Processing Toolkits)
 +    * Numpy, Scipy
 +    * Matplotlib
 +  * Text Processing Tools
 +    * Sentence Splitters
 +    * POS (Part-of-speech) Taggers
 +    * Stemmers
 +  * Crawler
 +
 +===== Opinion Mining Related Resources =====
 +
 +==== Research Papers ====
 +
 +  * Sentiment Classification bibliography
 +    http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html
 +
 +  * ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics
 +    http://acl.ldc.upenn.edu/
 +
 +==== Datasets ====
 +
 +  * Movie Review Data
 +    http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/
 +
 +  * Customer Review Data
 +    http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
 +
 +  * MPQA Opinion Corpus
 +    http://www.cs.pitt.edu/mpqa/databaserelease/
 +
 +==== Tools ====
 +
 +  * SentiWordNet
 +    http://sentiwordnet.isti.cnr.it/
 +
 +  * NLTK - Natural Language Processing Toolkits for Python
 +    http://nltk.sourceforge.net/
 +
 +  * WordNet
 +    http://wordnet.princeton.edu/
 +
 +==== Web Resources ====
 +
 +  * The Sentiment & Affect Yahoo! Group
 +    http://groups.yahoo.com/group/SentimentAI
 +
 +  * GI - General Inquirer
 +    http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/
 +
 +  * LDC Catalog
 +    http://www.ldc.upenn.edu/Catalog/
 +
 +  * Opinmind
 +    http://opinmind.com/
 +
 +  * Data Mining Resources
 +    http://www.kdnuggets.com/index.html
 +
 +
 +==== Related Conferences ====
 +
 +  * SIGIR - ACM SIGIR Special Interest Group on Information Retrieval
 +    http://www.sigir.org
 +
 +  * CIKM - Conference on Information and Knowledge Management
 +    http://www.cikm.org
 +
 +  * IDEAL - International Conference on Intelligent Data Engineering and Automated Learning
 +    http://www.ideal2008.org/
 +
 +  * SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
 +    http://www.sigkdd.org
 +
 +  * AAAI - Association for the Advancement of Artificial Intelligence
 +    http://www.aaai.org
 +
 +  * WWW - International World Wide Web Conferences
 +    http://www.iw3c2.org/
 +
 +  * TREC - Text REtrieval Conference
 +    http://trec.nist.gov/
 +
 +  * ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing
 +    http://www.acl-ijcnlp-2009.org/
 +
 +  * WSDM - ACM International Conference on Web Search and Data Mining
 +    http://wsdm2009.org/
 +
 +  * SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing
 +    http://www.cs.jhu.edu/~yarowsky/sigdat.html
 +
 +  * WI - ACM International Conference on Web Intelligence
 +    http://wi-consortium.org/
 +
 +  * SIGWEB
 +    http://www.sigweb.org/about/
 +
 +
 
wisc_lab/opinion_mining.txt · Last modified: 2008/09/04 11:52 (external edit)     Back to top