CMSC5724 Data Mining and Knowledge Discovery

Fall 2023

Professor: Yufei Tao
TA: Ru Wang (rwang21@cse)

Quick navigation links:
[Lecture Notes][Exercises and Quizzes][Project]

Brief Description

This course will cover the conceptual and algorithmic aspects of fundamental problems in data mining and knowledge discovery, including (subject to time permission) classification, clustering, association rule analysis, and so on. On completion, students are expected to have developed the ability to perform an array of mining tasks that are essential to numerous applications in practice.

Announcements

News 20 (20 Dec): The final exam scores have been released on the Blackboard system. We will arrange a paper-checking session from 10am to 11:59am and then from 1pm to 2:30pm on Dec 22 (Fri). Please follow the instructions below:
  • Please look for Ms. Ru Wang whose office is SHB 1013.
  • You cannot take away your exam paper and neither will you be allowed to take pictures of the paper.
  • You can ask a friend to check your paper on your behalf. We will not send you pictures of your paper.
  • No other paper-review sessions will be organized.
News 19 (29 Nov): In the final exam, you will be allowed to bring in a single-sided, A4-sized, note sheet on which you can print/write anything you deem useful. The scope of the exam covers everything in Lectures 1-12.

News 18 (21 Nov): Project 6 has been released. The deadline is 11:59pm, 20 Dec, 2023.

News 17 (21 Nov): The scope of Quiz 3 includes Lectures 8-11.

News 16 (18 Nov): Project 5 has been released. The deadline is 11:59pm, 16 Dec, 2023.

News 15 (12 Nov): Projects 3 and 4 have been released. The deadline is 11:59pm, 10 Dec, 2023.

News 14 (8 Nov): The final exam venue is ERB LT (same as our lecture venue).

News 13 (7 Nov): The final exam is scheduled for 6:30pm - 8:30pm, 12 Dec. The venue will be announced later.

News 12 (5 Nov): Quiz 2 solutions and statistics have been released (see the bottom of the page). You can now find your scores in Blackboard and make appointments with the TA to collect your papers.

News 11 (29 Oct): The scope of Quiz 2 includes Lectures 4-7. For Lecture 7, only Slides 1-14 are covered.

News 10 (14 Oct): Due to the instructor's sick leave on Oct 10, please note the new test schedule for this course:
Quiz 2 (20 minutes): To be held in the lecture of Oct 31 (Wed, Week 9)
Quiz 3 (20 minutes): To be held in the lecture of Nov 28 (Wed, Week 13)

News 9 (14 Oct): A make-up lecture will be give at 6:30 pm on Nov 28. The venue is William M W Mong Eng Bldg LT (i.e., same classroom).

News 8 (3 Oct): Project 2 has been released. The deadline is 11:59pm, 31 Oct, 2023.

News 7 (28 Sep): Project 1 has been released. The deadline is 11:59pm, 26 Oct, 2023. Please see the "Project" section of the page.

News 6 (28 Sep): Quiz 1 solutions and statistics have been released (see the bottom of the page). You can now find your scores in Blackboard and make appointments with the TA (see her information at the top of the website) to collect your papers.

News 5 (21 Sep):: The scope of Quiz 1 includes Lectures 1-3.

News 4 (8 Sep): Exercise 1 and its solutions have been made available. You can locate them at the bottom of this page. No more announcements will be made regarding the release of exercises. Please check this page regularly for updates.

News 3 (6 Sep): Please note the test schedule for this course:
Quiz 1 (20 minutes): To be held in the lecture of Sep 26 (Wed, Week 4)
Quiz 2 (20 minutes): To be held in the lecture of Oct 24 (Wed, Week 8) Oct 31 (Wed, Week 9)
Quiz 3 (20 minutes): To be held in the lecture of Nov 21 (Wed, Week 12) Nov 28 (Wed, Week 13).

News 2 (6 Sep): The videos for the first lecture have been made available. Be aware that the university has resumed in-person classes, and the release of videos is not mandatory. We request that you view the videos as an additional resource, rather than considering them as a guaranteed provision. The decision to release a video is at the discretion of the instructor. There will be no further announcements for video releases.

News 1 (1 Sep): Hello all.

Time, Venues, and Zoom Link

Lecture: 6:30pm - 9:30pm Tue, William M W Mong Eng Bldg LT
Zoom Link: https://cuhk.zoom.us/j/99419802521

Click here for a map of the campus.

Grading Scheme

Project: 30%
Short Tests or Assignments: 30%
Final: 40%

Textbook and Lecture Notes

No textbooks cover all the material of this course. Some reference books may be useful for extra reading:

[Book 1] Mohammed J. Zaki, and Wagner Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms.
[Book 2] Avrim Blum, John Hopcroft, and Ravindran Kannan. Foundations of Data Science.

Ownership of the above books is not mandatory. The instructor will make lecture notes available before each class. His notes cover all the content required in this course, some of which is outside the above books.

As usual, lecture attendance is vital for thorough understanding.

Lecture Notes Extra Reading
1
[Classification] Decision Trees and a Generalization Theorem

(video 1)
(video 2)

The last 40 minutes of the lecture could not be captured. You may find the content in this video from the course's offering last year.

(video 3, start - 02:10:00)

Chapter 19 of [Book 1]
Sec 5.5-5.6 of [Book 2]
2
[Classification] The Bayesian Method

(video, 02:10:00 - end)

Sections 18.1-18.2 of [Book 1]
3
[Classification] Perceptron

(video)

Section 5.8.3 of [Book 2]
4
[Classification] Generalization Theorems Using VC-Dims and Margins

(video)

--
5
[Classification] SVM and Margin Perceptron

(video)

Sections 21.1-21.2 of [Book 1]
6
[Classification] The Kernel Method

(video)

Sec 21.4 of [Book 1]
7
[Classification] Multiclass Perceptron

(video 1)
(video 2, start - 00:45)

--
8
[Clustering] Centroid Methods

(video 1, 00:45 - end)
(video 2, start - 01:40)

Section 13.1 of [Book 1]
9
[Clustering] Connectivity Methods

(video, 01:40 - end)

Chapter 14 and Section 15.1 of [Book 1]
10
[Dimensionality Reduction] PCA
See here for a nice example of using PCA for image compression.

(video)

Section 7.2 of [Book 1]
11
[Association Rules] Apriori

(video)

Sections 8.1, 8.2.1, and 8.3 of [Book 1]
12
[Graph Mining] Page Ranks and Random Walks

(video)

--

Exercises and Quizzes

Exercise List 1 (Solutions)
Exercise List 2 (Solutions) Note: Problem 5 is outside the scope of quizzes and exams
Exercise List 3 (Solutions)
Exercise List 4 (Solutions)
Exercise List 5 (Solutions)
Exercise List 6 (Solutions)
Exercise List 7 (Solutions)
Exercise List 8 (Solutions)
Exercise List 9 (Solutions)
Exercise List 10 (Solutions)
Exercise List 11 (Solutions)
Exercise List 12 (Solutions)

Quiz 1 solutions. Average = 71, Std. Dev. = 25.5
Quiz 2 solutions. Average = 85, Std. Dev. = 19.8
Quiz 3 solutions.
Final exam: Max = 115, Average = 82.6, Std. Dev. = 18

Project

The project page is here.

Deadline:
  • For Project 1 11:59pm, 26 Oct, 2023
  • For Project 2 11:59pm, 31 Oct, 2023
  • For Project 3 11:59pm, 10 Dec, 2023
  • For Project 4 11:59pm, 10 Dec, 2023
  • For Project 5 11:59pm, 16 Dec, 2023
  • For Project 6 11:59pm, 20 Dec, 2023
To submit your project, email to the TA all the deliverables (as detailed on the project page). Please use the subject "CMSC5724 project submission" for your email. Remember to list the ids and names of all the members in your team.