CSCI5510 Big Data Analytics

Breaking News

  • September 3, 2013. The course homepage is migrated to permanently.
  • September 2, 2013. The new semester begins.
  • September 2, 2013. News group address: cuhk.cse.csci5510
  • September 2, 2013. The first tutorial will be conducted on Sept. 10. There is no tutorial in the first week.
  • September 3, 2013. The tutorial class room is YIA LT7.

20013-14 Term 1

Lecture Tutorial
Time M2-4, 9:30 am - 12:30 pm T3 10:30 am - 11:15 am
Venue KKB101 YIA LT7

The Golden Rule of CSCI5510: No member of the CSCI5510 community shall take unfair advantage of any other member of the CSCI5510 community.

Course Description

This course aims at teaching students the state-of-the-art big data analytics, including techniques, software, applications, and perspectives with massive data. The class will cover, but not be limited to, the following topics: distributed file systems such as Google File System, Hadoop Distributed File System, CloudStore, and map-reduce technology; similarity search techniques for big data such as minhash, locality-sensitive hashing; specialized processing and algorithms for data streams; big data search and query technology; big graph analysis; recommendation systems for Web applications. The applications may involve business applications such as online marketing, computational advertising, location-based services, social networks, recommender systems, healthcare services, also covered are scientific and astrophysics applications such as environmental sensor applications, nebula search and query, etc.


Learning Objectives

  1. To understand the current key issues on big data and the associated business/scientific data applications
  2. To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability
  3. To interpret business models and scientific computing results
  4. Able to apply software tools for big data analytics

Learning Outcomes

At the end of the course of studies, students will have acquired the ability to

  1. Understand the key issues on big data and the associated applications in intelligent business and scientific computing.
  2. Acquire fundamental enabling techniques and scalable algorithms in big data analytics.
  3. Interpret business models and scientific computing paradigms, and apply software tools for big data analytics.
  4. Achieve adequate perspectives of big data analytics in marketing, financial services, health services, social networking, astrophysics exploration, and environmental sensor applications, etc.

Learning Activities

  1. Lectures
  2. Tutorials
  3. Web resources
  4. Projects
  5. Presentations
  6. Lab Reports
  7. Examinations


Lecturer Lecturer Tutor Tutor
Name Irwin King Michael R. Lyu Guang Ling Chen Cheng
Email king AT lyu AT gling AT ccheng AT
Office Rm 908 Rm 927 Rm 1024 Rm 1024
Telephone 3943 8398 3943 8429 3943 4252 3943 4252
Office Hour(s) TBA 10:00-12:00 Tuesday TBA TBA

Note: This class will be taught in English. Homework assignments and examinations will be conducted in English.


The pdf files are created in Acrobat 6.0. Please obtain the correct version of the Acrobat Reader from Adobe.

Week Date Topics Tutorials Homework & Events Resources
1 2/9 Introduction and Motivation

No Tutorial Ch. 1 of MMDS
2 9/9 MapReduce


Ch. 2 of MMDS
Ch. 6 of MMDS
3 16/9 Locality Sensitive Hashing


Ch. 3 of MMDS
4 23/9 Mining Data Streams

Ch. 4 of MMDS
5 30/9 Scalable Clustering

Ch. 7 of MMDS
6 7/10 Dimensionality Reduction

Ch. 11 of MMDS
7 14/10 Public Holiday
8 21/10 Recommender systems/Matrix Factorization

Ch. 9 of MMDS
9 28/10 Massive Link Analysis

Ch. 5 of MMDS
10 4/11 Mid-term
11 11/11 Analysis of Massive Graph

Ch. 10 of MMDS
12 18/11 Large Scale SVM


SVM tutorial
13 25/11 Online Learning

Online learning survey

Class Project

Class Project Presentation Schedule

  • TBA

Class Project Presentation Requirements

Examination Matters

Examination Schedule

Time Venue Notes
Midterm Examination Nov. 4, 9:30am-12:00 noon TBA TBA
Final Examination TBA TBA TBA

Written Midterm Matters

  1. The midterm will test your knowledge of the materials.
  2. Answer all questions using the answer booklet. There will be more available at the venue if needed.
  3. Write legibly. Anything we cannot decipher will be considered incorrect.
  4. One A4-sized cheat-sheet page.

Grade Assessment Scheme

20% 30% 50%
  1. Assignments (20%)
    1. Written assignments
    2. Coding
  2. Mid-term Examination (30%)
  3. Project (50%)
    1. Proposal
    2. Presentations
    3. Report

Reference Books


  1. Q: What is departmental guideline for plagiarism?
    A: If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course.


Big Data Analytics

Graph Mining

Link Analysis

Learning to Rank

Recommender Systems

Human Computation/Social Games

Opinion Mining/Sentiment Analysis



teaching/csci5510/2013.txt · Last modified: 2013/09/03 16:47 by gling     Back to top