Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== Opinion Mining and Sentiment Analysis ======
+===== Introduction =====
+==== What is opinion mining? ====
+Informally: Extract the opinions given in a piece of text.
+Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques.
+==== What's the big deal with opinion mining? ====
+=== Motivating Scenario ===
+  * People who wants to buy a camera
+    * Look for comments and reviews
+  * People who just bought a camera
+    * Comment on it
+    * Write down the usage experience
+  * Camera Manufacturer
+    * Get feedback from customer
+    * Improve their products
+    * Adjust Marketing Strategies
+Big business, right?
+Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?)
+===== People =====
+  * [[people:kam_tong_chan|Thomas]]
+  * [[people:wei_wei|Wei Wei]]
+===== Research Issues =====
+==== Opinion Extraction ====
+Identify the segments of text that contain opinions.
+e.g. Opinions are in **boldface**
+I have just entered into dslr world with 400d, before I used slr cameras.
+**400d is extremly well made, precise and overall feeling is vey good.**
+==== Sentiment Classification / Subjectivity Analyzes ====
+Decide the sentiment orientation of a given piece of opinion.
+=== What is Sentiment Orientation? ===
+  * Polarity
+    * Positive (e.g. This camera is great!)
+    * Negative (e.g. The battery life is too short.)
+    * Neutral
+  * Polarity Scale?
+    * (Most Negative) -10 ... -5 ... 0 (Neutral) ... 5 ... 10 (Most Positive)
+e.g. //The picture quality is good.// (A positive opinion)
+e.g. //The battery life is short.// (A negative opinion)
+==== Feature-Opinion Association ====
+A problem proposed by [[People:Kam Tong CHAN]]. The problem is related to natural language processing:
+//Given a text with target features and opinions extracted, decide which opinions comment on which features.//
+It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing)
+Consider the phrase "pretty little girls' school",
+    * Does the school look little?
+    * Do the girls look little?
+    * Do the girls look pretty?
+    * Does the school look pretty?
+===== Advanced Issues =====
+==== Target Identification ====
+Which one (or Who) is being commented?
+e.g. He is a kind person.
+Who is "he"?
+e.g. The camera is great!
+Which camera model are you talking about?
+==== Source Identification ====
+Given a review text, identify who made the comment.
+Achieving this will allow us to build a Question-Answering System.
+e.g. Who support Obama to be the next U.S. president?
+==== Opinion Summarization and Visualization ====
+Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object.
+e.g. For Camera
+  * Picture Quality (+ve: 290, -ve 73)
+  * Ease of use (+ve: 57, -ve: 10)
+  * etc.
+==== Opinion Spam Detection ====
+Detect whether opinions that are written by spammers.
+=== Why there are opinion spams? ===
+  - Someone may write something to promote its own image / products
+  - Someone may write something to hurt their enemies
+==== Others ====
+=== Linguistic Tools for Opinion Mining ===
+== [Domain-Specific] Sentiment lexicon ==
+A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one.
+  * is there a way to generate it automatically from a large corpus?
+== Ontology ==
+Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain.
+  * Who ontologies can be incorporated in opinion mining? e.g.:
+    * Opinion Summarization
+    * Processing Comparative Statements
+  * Is there a way to generate them automatically?
+  * Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like?
+=== Scalability ===
+  * Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions?
+===== Related Software Packages for Opinion Mining =====
+  * WordNet, SentiWordNet
+  * Thesaurus
+  * Python
+    * NLTK (Natural Language Processing Toolkits)
+    * Numpy, Scipy
+    * Matplotlib
+  * Text Processing Tools
+    * Sentence Splitters
+    * POS (Part-of-speech) Taggers
+    * Stemmers
+  * Crawler
+===== Opinion Mining Related Resources =====
+==== Research Papers ====
+  * Sentiment Classification bibliography
+    http://liinwww.ira.uka.de/bibliography/Misc/Sentiment.html
+  * ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics
+    http://acl.ldc.upenn.edu/
+==== Datasets ====
+  * Movie Review Data
+    http://www.cs.cornell.edu/people/pabo/movie%2Dreview%2Ddata/
+  * Customer Review Data
+    http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
+  * MPQA Opinion Corpus
+    http://www.cs.pitt.edu/mpqa/databaserelease/
+==== Tools ====
+  * SentiWordNet
+    http://sentiwordnet.isti.cnr.it/
+  * NLTK - Natural Language Processing Toolkits for Python
+    http://nltk.sourceforge.net/
+  * WordNet
+    http://wordnet.princeton.edu/
+==== Web Resources ====
+  * The Sentiment & Affect Yahoo! Group
+    http://groups.yahoo.com/group/SentimentAI
+  * GI - General Inquirer
+    http://www.webuse.umd.edu:9090/ http://www.webuse.umd.edu:9090/tags/
+  * LDC Catalog
+    http://www.ldc.upenn.edu/Catalog/
+  * Opinmind
+    http://opinmind.com/
+  * Data Mining Resources
+    http://www.kdnuggets.com/index.html
+==== Related Conferences ====
+  * SIGIR - ACM SIGIR Special Interest Group on Information Retrieval
+    http://www.sigir.org
+  * CIKM - Conference on Information and Knowledge Management
+    http://www.cikm.org
+  * IDEAL - International Conference on Intelligent Data Engineering and Automated Learning
+    http://www.ideal2008.org/
+  * SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
+    http://www.sigkdd.org
+  * AAAI - Association for the Advancement of Artificial Intelligence
+    http://www.aaai.org
+  * WWW - International World Wide Web Conferences
+    http://www.iw3c2.org/
+  * TREC - Text REtrieval Conference
+    http://trec.nist.gov/
+  * ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing
+    http://www.acl-ijcnlp-2009.org/
+  * WSDM - ACM International Conference on Web Search and Data Mining
+    http://wsdm2009.org/
+  * SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing
+    http://www.cs.jhu.edu/~yarowsky/sigdat.html
+  * WI - ACM International Conference on Web Intelligence
+    http://wi-consortium.org/
+  * SIGWEB
+    http://www.sigweb.org/about/

Navigation

About Us

Publications

Professional Activities

Lab and Projects

Affiliated Labs

Organizations

Current Activities

See Irwin King In

Book Titles

MISC

Differences