CMSC 5724 Project Page


Team Coding

Each team can contain up to 5 members and will implement a designated algorithm either in C++, Java, or Python. The implementation must be from scratch, i.e., it can use only functions from a standard library, for example: Use of any function outside the above libraries is not permitted unless prior approval has been obtained from the instructor. All source code is subject to plagiarism scrutiny. All kinds of dishonesty will be reported to the university for disciplinary actions.

Deploying a programming language other than the above requires an approval from the instructor.

Project List

Each team only needs to work on one arbitrary project from the following list. The list will still growing and will contain 5 or 6 projects eventually. Additional projects will be released after their topics have been covered in the lectures.


Project #1: Decision Tree

Goal

Implement Hung's algorithm for decision tree classification

Dataset

We will use the Adult dataset whose description is available here. The training set (adult.data) and evaluation set (adult.test) can be downloaded here.

Preprocessing

Remove all the records containing '?' (i.e., missing values). Also, remove the attribute "native-country".

Deliverables


Project #2: Margin Perceptron

Goal

Implement the margin perceptron algorithm.

Dataset

Your implementation should work on any dataset in the following format: We have prepared three datasets for you:
2d-r16-n10000
4d-r24-n10000
8d-r12-n10000

Deliverables


Project #3: Bayes Classifier, K-Center, K-Means

This project has two parts.

=============
=== PART I ===
=============

Goal

Implement the Bayes Classifier.

Dataset, Preprocessing

Same as Project #1.

Deliverables

=============
=== PART II ===
=============

Goal

Implement the k-means algorithm using the k-center algorithm for center initialization.

Dataset

Download here (obtained from the data collection here). Each line has the following format:

x y

which represent the x- and y-coordinates of a point.

Task

Partition the dataset into 8 clusters.

Deliverables


Project #4: DBSCAN

Goal

Implement the DBSCAN algorithm.

Dataset

Download here (obtained from the data collection here). Each line has the following format:

x y

which represent the x- and y-coordinates of a point.

Task

Partition the dataset into 3 clusters.

Deliverables


Project #5: PCA for Image Compression

Goal

Reproduce the experimental results on this page.

Dataset

The original image can be downloaded here here.

Task

Implement the PCA-based compression method discussed in the lecture.

Deliverables


Project #6: Association Rule Mining

Goal

Implement the apriori algorithm for association rule mining.

Dataset

Download here (by courtesy of Alexander Dekhtyar). Each line has the following format:

tid, a, b, c, ...

where tid is the transaction id, and a, b, c ... are the items of the transaction (each item is represented by an integer).

Task

Find all the association rules with support at least 0.1n and confidence at least 0.9, where n is the number of transactions.

Deliverables