¡¡

CSC6210 Advanced multidimensional search

Offered by Yufei Tao

¡¡
Course description

This course will discuss data structures that can be used to solve several important geometric problems efficiently. We will assume that data are stored in the external memory, so the goal is to minimize the number of I/Os performed (nevertheless, most of the techniques taught can be used in the internal memory as well). The first main problem discussed is range search. Here, given a set S of points, we need to pre-process it into a structure such that, given any rectangle Q at run time, all the points of S that fall in Q can be reported efficiently. We will learn several asymptotically optimal structures that can achieve the best tradeoff between the space consumption and query cost.

The second major problem is nearest neighbor (NN) search. Given a query point q, a NN query returns the point in S nearest to q. Again, our objective is to build a structure on S so that all NN queries can be processed efficiently in the worst case, especially when the dimensionality is high. 

All the structures we need for the above purposes are tailored for specific problems. In practice, it would be nice to have a single, generic, structure that is reasonably good for all types of queries. The course will also cover the R-tree, which is the most popular multidimensional structure in practical systems. We will also study the cost model of the R-tree, which is a useful tool for analyzing its practical performance.

Target audience
Students in the areas of theory, database, data mining, machine learning, and pattern recognition.
¡¡
Time table


Tue lecture    11:30am-12:15pm       MMW 707
Wed lecture   11:30am-1:15pm         ERB 408

No tutorials.
¡¡

Grading scheme


Based on assignments and in-class quiz.

¡¡

Lecture notes

No textbook.
Lecture notes and/or key references will be provided for download before lectures.

Lecture 1      Introduction

Lecture 2      Persistent B-trees

Lecture 3      Point location query (revised on Sep 20; the original version)

Lecture 4      Distribution sweep and logarithmic rebuilding

Lecture 5      The buffer-tree technique (Sections 1-4)

News: Assignment 1 can be downloaded here. Due day: 31 Oct 2009. (Deadline extended to 7 Nov 2009).

Lecture 6      Stabbing query

Lectures 6-9 will be based on Prof. Lars Arge's excellent notes, which can be downloaded here.
Prof. Arge's homepage can be found here.

Lecture 7      3-sided range search

Lecture 8      Range search 1 (linear space)

Lecture 9      Range search 2 (nonlinear space)

The description of assignment 1 has a typo in Problem 2. The corrected version is available here. For comparison, the old version is still available for download. The due day is postponed to 7 Nov 2009.

Lecture 10    Nearest neighbor search 1 (LSH)

Lecture 11    Nearest neighbor search 2 (LSB)

News: Assignment 2 can be downloaded here. Due day: 20 Dec 2009.
Solutions (by Jingtian Deng)

Lecture 12    R-trees

Lecture 13    Best-first algorithm

Lecture 14    Cost models of R-trees

¡¡