Opinion Mining and Sentiment Analysis


What is opinion mining?

Informally: Extract the opinions given in a piece of text.

Or, more formally: A recent discipline that studies the extraction of opinions using Information Retrieval (IR), Artificial Intelligence (AI), Natural Language Processing (NLP) techniques.

What's the big deal with opinion mining?

Motivating Scenario

  • People who wants to buy a camera
    • Look for comments and reviews
  • People who just bought a camera
    • Comment on it
    • Write down the usage experience
  • Camera Manufacturer
    • Get feedback from customer
    • Improve their products
    • Adjust Marketing Strategies

Big business, right?

Web 2.0 nowadays provides a great medium for people to share what they want to share. This provides a great source of unstructured information (especially opinions) that may be usually (makes a lot of money?)


Research Issues

Opinion Extraction

Identify the segments of text that contain opinions.

e.g. Opinions are in boldface

I have just entered into dslr world with 400d, before I used slr cameras.

400d is extremly well made, precise and overall feeling is vey good.

Sentiment Classification / Subjectivity Analyzes

Decide the sentiment orientation of a given piece of opinion.

What is Sentiment Orientation?

  • Polarity
    • Positive (e.g. This camera is great!)
    • Negative (e.g. The battery life is too short.)
    • Neutral
  • Polarity Scale?
    • (Most Negative) -10 … -5 … 0 (Neutral) … 5 … 10 (Most Positive)

e.g. The picture quality is good. (A positive opinion) e.g. The battery life is short. (A negative opinion)

Feature-Opinion Association

A problem proposed by Kam Tong CHAN. The problem is related to natural language processing:

Given a text with target features and opinions extracted, decide which opinions comment on which features.

It is known to be a difficult problem in natural language processing. Let's take a look at the following example (Originated from http://en.wikipedia.org/wiki/Natural_language_processing)

Consider the phrase “pretty little girls' school”,

  • Does the school look little?
  • Do the girls look little?
  • Do the girls look pretty?
  • Does the school look pretty?

Advanced Issues

Target Identification

Which one (or Who) is being commented?

e.g. He is a kind person.

Who is “he”?

e.g. The camera is great!

Which camera model are you talking about?

Source Identification

Given a review text, identify who made the comment.

Achieving this will allow us to build a Question-Answering System.

e.g. Who support Obama to be the next U.S. president?

Opinion Summarization and Visualization

Given a set of documents (crawled the web / all the reviews from a particular forum / survey results , etc.), summarize the opinion expressed with respect to the target object.

e.g. For Camera

  • Picture Quality (+ve: 290, -ve 73)
  • Ease of use (+ve: 57, -ve: 10)
  • etc.

Opinion Spam Detection

Detect whether opinions that are written by spammers.

Why there are opinion spams?

  1. Someone may write something to promote its own image / products
  2. Someone may write something to hurt their enemies


Linguistic Tools for Opinion Mining

[Domain-Specific] Sentiment lexicon

A lexicon that contains the sentiment orientation of each term. It may be a domain specific one or a general one.

  • is there a way to generate it automatically from a large corpus?

Ontology is a structural description of concepts. It defines the terminologies and hierarchical relationships of a domain.

  • Who ontologies can be incorporated in opinion mining? e.g.:
    • Opinion Summarization
    • Processing Comparative Statements
  • Is there a way to generate them automatically?
  • Which ontology elements are essential for opinion mining? In other words, what should the ontology for opinion mining looks like?


  • Can an opinion summarization system works as efficient as a search engine so that all the opinions on the web are crawled and user are able to search for any opinions?

Related Software Packages for Opinion Mining

  • WordNet, SentiWordNet
  • Thesaurus
  • Python
    • NLTK (Natural Language Processing Toolkits)
    • Numpy, Scipy
    • Matplotlib
  • Text Processing Tools
    • Sentence Splitters
    • POS (Part-of-speech) Taggers
    • Stemmers
  • Crawler

Opinion Mining Related Resources

Research Papers

  • Sentiment Classification bibliography


  • ACL Anthology - A Digital Archive of Research Papers in Computational Linguistics




  • SentiWordNet


  • NLTK - Natural Language Processing Toolkits for Python


  • WordNet


Web Resources

Related Conferences

  • SIGIR - ACM SIGIR Special Interest Group on Information Retrieval


  • CIKM - Conference on Information and Knowledge Management


  • IDEAL - International Conference on Intelligent Data Engineering and Automated Learning


  • SIGKDD - ACM SIGKDD International Conference on Knowledge Discovery and Data Mining


  • AAAI - Association for the Advancement of Artificial Intelligence


  • WWW - International World Wide Web Conferences


  • TREC - Text REtrieval Conference


  • ACL-IJCNLP - A Joint Conference of the Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing


  • WSDM - ACM International Conference on Web Search and Data Mining


  • SIGDAT / EMNLP - Conference on Empirical Methods in Natural Language Processing


  • WI - ACM International Conference on Web Intelligence




wisc_lab/opinion_mining.txt · Last modified: 2008/09/04 11:52 (external edit)     Back to top