INFS4205/7205 Advanced Techniques for High Dimensional Data

Offered by Yufei Tao, Semester 1, 2017.

Tutor: Junhao Gan (j.gan@uq.edu.au).

Brief Description

This course will study techniques that are designed for the analysis of multidimensional data, and therefore, are useful in a large variety of disciplines such as geographic information systems, data mining and machine learning, multimedia information retrieval, to mention just a few. Focus will be placed on a series of fundamental problems defined on such data, and data structures and algorithms known to solve those problems very well in reality. Tentative topics include the R-tree, range reporting, nearest neighbor search, skyline analysis, spatial join, closest pair, locality sensitive hashing, and clustering.

Announcements

News 1 (22 Feb): Hello, all.
News 2 (2 Mar): It has come to my attention that the lecture video has been uploaded to INFS4205 only in Blackboard but not to INFS7205. I am working with the helpdesk on the issue. Until a better solution is found, please use this link to watch the video.
News 3 (2 Mar): The tutor (Mr. Junhao Gan) will organize a Q&A session from 2pm to 3pm every Monday, starting from Week 3. His contact and office location can be found here.
News 4 (5 Mar): The blackboard issue has been resolved.
News 5 (13 Mar): Exercise list 1 released. Please scroll to the bottom of the page.
News 6 (14 Mar): As mentioned by the instructor in the first lecture, the first assignment will be a coding project, while the second assignment will be a group presentation. The due day of Assignment 1 will be 11:59pm of 29 May---this is later than and will replace the due day listed on the course profile. For Assignment 2, all the group presentations will take place in the lecture of 29 May (i.e., the last lecture).
News 7 (14 Mar): Assignment 1 will be released on 20 Mar.
News 8 (19 Mar): Exercise list 2 released.
News 10 (19 Mar): Assignment 1 released.
News 11 (25 Mar): Exercise list 3 released. There will be no accouncements for exercise releases. Please check this page regularly.
News 12 (28 Apr): There will be no lecture on May 1.
News 13 (28 Apr): The instructor will be on leave on May 15. The lecture on May 15 will be delievered by the tutor Dr. Junhao Gan.
News 14 (9 May): Assignment 2 released. Scroll to the bottom of the page to see it.
News 15 (29 May): You may now make appointments with the tutor for demonstrating your assignment 1 programs. Please scroll down to the assignment section to see more details.
News 16 (1 Jun): Some information about the final exam can be found at the bottom of the page.

Time and Venues

Lectures
Mon 4pm - 5:50pm Room 212, Building 35

Click here for a map of the St Lucia campus.


Q&A Sessions
Mon 2pm - 3pm Room 642, Building 78
Grading Scheme

Assignment 1: 25%
Assignment 2: 25%
Final Exam: 50%

Lecture Notes

There will be no textbooks for this course. The instructor will make lecture notes available before each lecture. His notes will be self-contained, namely, containing all the details required in this course.

Reference papers will also be provided for extra, but non-examinable, reading.


Lecture Notes
1
Introduction (uploaded at 3am, 27 Feb)

2
The B-tree (uploaded at 3am, 27 Feb)

3
The R-tree (uploaded at 11:55pm, 5 Mar)

4
Cost Model of the R-tree (uploaded at 2pm, 13 Mar)

5
Nearest Neighbor Search (uploaded at 2am, 19 Mar)

6
Skylines (uploaded at 11pm, 25 Mar)

7
Multidimensional Divide and Conquer 1 -- Skylines (uploaded at 11pm, 2 Apr)

8
Multidimensional Divide and Conquer 2 -- Spatial Joins (uploaded at 7pm, 22 Apr)


Old Version (uploaded at 11pm, 9 Apr). Major changes: Lines 3 and 4 in the psuedocode of Slide 28.

9
Grid Decomposition -- Closest Pair and Close Pairs (uploaded at 1pm, 5 Jun)

Old Version2 (uploaded at 6:30pm, 24 Apr). Major changes: Fixed typos on Slides 25 and 28.

Old Version (uploaded at 2am, 24 Apr). Major changes: The equation in the proof on Slide 21.
10
High-Dimensional Indexing by Clustering (uploaded at 3:30am, 8 May)

11
[Non-Examinable] Locality Sensitive Hashing (Prepared by Junhao Gan) (uploaded at 11:50pm, 11 May)

12
[Non-Examinable] High-Dimensional Indexing by Distributed Aggregation (uploaded at 8am, 22 May)


Exercises

The instructor and tutor have prepared the following exercises which are useful for students to consolidate their understanding of the course materials. Solutions to these are provided in full.

Exercises carrying a "*" may be difficult.

Exercises for Week 2 (Solutions).
Exercises for Week 3 (Solutions).
Exercises for Week 4 (Solutions).
Exercises for Week 5 (Solutions).
Exercises for Week 6 (Solutions).
Exercises for Week 7 (Solutions).
Exercises for Week 8 (Solutions).
Exercises for Week 10 (Solutions).
There will be no more exercises (materials in Weeks 11-13 will not be tested in the final exam).

Assignments and Exam

Assignment 1 (uploaded at 10:20pm, 19 Mar).
You may now make appointments with the tutor Dr. Junhao Gan (j.gan@uq.edu.au) to demonstrate your project. The available time slots are here. The slots will be allocated on a first-come-first-serve basis.

Assignment 2 (uploaded at 2:30am, 9 May).

The final exam has been scheduled on 14 June, starting at 5:45pm. Below is some useful information about the exam:

1. You are allowed to bring in one single-sided, A4-sized, sheet of notes. The notes can be hand-written or typed. UQ-approved calculators will also be allowed.

2. The scope of the exam covers Lecture Notes 1-10. In fact, there will be questions on each of them.

3. The 7205 paper will be a bit more difficult than the 4205 paper.

4. Every question is in the same style as the exercises on the course homepage. This means:
  • Every question will have an unambiguous answer (e.g., there won't be essay questions like "describe the functionalities of spatial databases").
  • Each question aims to test your understanding on the technical details of the algorithms and analysis taught in the course.
  • There won't be any multiple-choice questions.
All the questions are new (i.e., do not expect any exercise to appear in the exam verbatim), although some of them may be similar to certain exercises.

5. The instructor considers that the best way to prepare yourselves for the exam is to understand the details of Lecture Notes 1-10 as well as possible. Also, make sure you can solve all the exercises released.

6. Most questions are quite "standard" in the sense that, if you understand the lectures, you should be able to solve them. About 15% of the marks are on questions more challenging -- these questions are to distinguish those students that should get a 7.

7. You can make appointments with the tutor or the instructor for consultation. Of course, we will not tell you what the exam questions are. But we can answer your questions regarding the course contents.