This is an old revision of the document!
CSCI5510 Big Data Analytics
[ Discussion Forum | Blogs ]
Breaking News
- September 2, 2013. The new semester begins.
Extra Credit Assignments
20013-14 Term 1
Lecture I | Lecture II | Tutorial I | Tutorial II | |
---|---|---|---|---|
Time | M2-4, 9:30 am - 12:30 pm | TBA | ||
Venue | TBA | TBA |
The Golden Rule of CSCI5510: No member of the CSCI5510 community shall take unfair advantage of any other member of the CSCI5510 community.
Course Description
This course aims at teaching students the state-of-the-art big data analytics, including techniques, software, applications, and perspectives with massive data. The class will cover, but not be limited to, the following topics: distributed file systems such as Google File System, Hadoop Distributed File System, CloudStore, and map-reduce technology; similarity search techniques for big data such as minhash, locality-sensitive hashing; specialized processing and algorithms for data streams; big data search and query technology; big graph analysis; recommendation systems for Web applications. The applications may involve business applications such as online marketing, computational advertising, location-based services, social networks, recommender systems, healthcare services, also covered are scientific and astrophysics applications such as environmental sensor applications, nebula search and query, etc.
本課程旨在教導學生最先進的針對大數據的分析,包括技術、軟件、應用和遠景。本課程內容將包括,但不限於以下內容:分佈式文件系統如谷歌文件系統,Hadoop文件系統,CloudStore等和Map-reduce技術;大數據的相似搜索技術,如最小哈希,局部敏感哈希等;針對數據流的專門處理方法和算法;大數據的搜索和查詢技術;互聯網應用中的廣告管理和推薦系統。本課涉及的應用程序可能包括商業應用程序,如網絡營銷、計算廣告、基於位置的服務、社交網絡、推薦系統、醫療保健服務和科學及天體物理學領域的應用,如環境傳感器的應用,星雲搜索和查詢等。
Learning Objectives
- To understand the current key issues on big data and the associated business/scientific data applications
- To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability
- To interpret business models and scientific computing results
- Able to apply software tools for big data analytics
Learning Outcomes
At the end of the course of studies, students will have acquired the ability to
- Understand the key issues on big data and the associated applications in intelligent business and scientific computing.
- Acquire fundamental enabling techniques and scalable algorithms in big data analytics.
- Interpret business models and scientific computing paradigms, and apply software tools for big data analytics.
- Achieve adequate perspectives of big data analytics in marketing, financial services, health services, social networking, astrophysics exploration, and environmental sensor applications, etc.
Learning Activities
- Lectures
- Tutorials
- Web resources
- Videos
- Quizzes
- Examinations
Personnel
Lecturer | Tutor | Tutor | |
---|---|---|---|
Name | Irwin King/Michael R. Lyu | ||
king/lyu AT cse.cuhk.edu.hk | |||
Office | Rm 908 | ||
Telephone | 3943 8398/8429 | ||
Office Hour(s) | TBA |
Note: This class will be taught in English. Homework assignments and examinations will be conducted in English.
Syllabus
The pdf files are created in Acrobat 6.0. Please obtain the correct version of the Acrobat Reader from Adobe.
Week | Date | Topics | Tutorials | Homework & Events | Resources |
---|---|---|---|---|---|
1 | 2/9 | Introduction and Motivation 01-Introduction.pdf | Ch. 1 of MMDS | ||
2 | 9/9 | MapReduce 02-MapReduce.pdf | | | Ch. 2 of MMDS Ch. 6 of MMDS |
3 | 16/9 | Locality Sensitive Hashing 03-lsh.pdf | | Ch. 3 of MMDS | |
4 | 23/9 | Mining Data Streams 04-stream.pdf | Ch. 4 of MMDS | ||
5 | 30/9 | Scalable Clustering 05-clustering.pdf | Ch. 7 of MMDS | ||
6 | 7/10 | Dimensionality Reduction 06-DR.pdf | Ch. 11 of MMDS | ||
7 | 14/10 | Public Holiday | Public Holiday | ||
8 | 21/10 | Recommender systems/Matrix Factorization 07-mf.pdf | Ch. 9 of MMDS | ||
9 | 28/10 | Massive Link Analysis 08-link.pdf | Ch. 5 of MMDS | ||
10 | 4/11 | Mid-term | |||
11 | 11/11 | Analysis of Massive Graph 09-graph.pdf | Ch. 10 of MMDS | ||
12 | 18/11 | Large Scale SVM 10-svm.pdf | | SVM tutorial | |
13 | 25/11 | Online Learning 11-ol.pdf | Online learning survey |
Class Project
Class Project Presentation Schedule
- TBA
Class Project Presentation Requirements
Examination Matters
Examination Schedule
Time | Venue | Notes | |
---|---|---|---|
Midterm Examination Written | TBA | TBA | TBA |
Midterm Examination Programming | TBA | TBA | TBA |
Final Examination | TBA | TBA | TBA |
Written Midterm Matters
- The midterm will test your knowledge of the materials.
- Answer all questions using the answer booklet. There will be more available at the venue if needed.
- Write legibly. Anything we cannot decipher will be considered incorrect.
- One A4-sized cheat-sheet page.
Grade Assessment Scheme
Homework Assignments | Project Report | Project Presentation | One-hour Examination |
---|---|---|---|
30% | 25% | 25% | 20% |
- Assignments (30%)
- Written assignments
- Coding
- One-hour Examination (20%)
- Project (50%)
- Report (25%)
- Presentations (25%)
- Extra Credit (There is no penalty for not doing the extra credit problems. Extra credit will only help you in borderline cases.)
Required Background
- Pre-requisites
Reference Books
FAQ
- Q: What is departmental guideline for plagiarism?
A: If a student is found plagiarizing, his/her case will be reported to the Department Discipline Committee. If the case is proven after deliberation, the student will automatically fail the course in which he/she committed plagiarism. The definition of plagiarism includes copying of the whole or parts of written assignments, programming exercises, reports, quiz papers, mid-term examinations. The penalty will apply to both the one who copies the work and the one whose work is being copied, unless the latter can prove his/her work has been copied unwittingly. Furthermore, inclusion of others' works or results without citation in assignments and reports is also regarded as plagiarism with similar penalty to the offender. A student caught plagiarizing during tests or examinations will be reported to the Faculty Office and appropriate disciplinary authorities for further action, in addition to failing the course.
Resources
Big Data Analytics
Graph Mining
Link Analysis
Learning to Rank
Recommender Systems
Human Computation/Social Games
Opinion Mining/Sentiment Analysis
Visualization
Programming
Midterm Evaluation Sign-up Sheet
- The time slots are for Thursday, November 8, 2012.
- The venue is in HSH Room 1022 (seminar room).
- Please enter all team members' name in the slot in either the A.M. or the P.M. table.
- Instructions
- Put the name of all your team members under the “Real name” column.
- Select at least one slot from either the A.M. or P.M. table.
- Press “Submit”.
- Make sure it does not conflict with others.
Midterm Project Evaluation PM
Real name | 9:30 | 9:45 | 10:00 | 10:15 | 10:30 | 10:45 | 11:00 | 11:15 | 11:30 | 11:45 |
---|---|---|---|---|---|---|---|---|---|---|
哈哈 | ![]() | |||||||||
1 |
Midterm Project Evaluation PM
Real name | 12:30 | 12:45 | 1:00 | 1:15 | 1:30 | 1:45 | 2:00 | 2:15 | 2:30 | 2:45 | 3:00 |
---|---|---|---|---|---|---|---|---|---|---|---|
Final Project Presentation Sign-up Sheet
- The time slots are for Tuesday, December 4, 2012.
- The venue is in KKB 101 (classroom).
- Please enter all team members' name in the slot in either the Session 1 or the Session 2 table.
- Instructions
- Put the name of all your team members under the “Real name” column.
- Select at least one slot from either the Session 1 or Session 2 table.
- Press “Submit”.
- Make sure it does not conflict with others.
Final Project Presentation Session 1
Real name | 9:00 | 9:15 | 9:30 | 9:45 | 10:00 | 10:15 | 10:30 | 10:45 | 11:00 |
---|---|---|---|---|---|---|---|---|---|
哈哈 | ![]() | ||||||||
1 |
Final Project Presentation Session 2
Real name | 11:15 | 11:30 | 11:45 | 12:00 | 12:15 | 12:30 | 12:45 | 13:00 | 13:15 | 13:30 |
---|---|---|---|---|---|---|---|---|---|---|
哈哈 | ![]() | |||||||||
1 |