CSCI5620 Algorithms for Data Science


Course code CSCI5620
Course title Algorithms for Data Science
Course description This is a graduate level course which provides the mathematical and algorithmic foundations for data science. Target audience are students interested in doing research in algorithms, statistics, machine learning or data mining. Topics to be covered will be probability and concentration bounds, high-dimension space and its properties, nearest neighbouring search and projection methods, singular value decomposition and its algorithms, random walks and Markov chains, constrained convex optimization, online stochastic gradient descent, boosting, streaming and sampling algorithms, sketches (e.g., count-min), graph sketching, random graphs.
這是一門研究生課程,為數據科學提供數學和算法基礎。 目標是對算法,統計,機器學習或數據挖掘研究感興趣的學生。 涉及的主題將是概率和集中度邊界,高維空間及其屬性,最近鄰搜索和投影方法,奇異值分解及其算法,隨機遊走和馬爾可夫鏈,約束凸優化,在線隨機梯度下降,增強,流和採樣算法,草圖(例如,最小計數),圖形草圖,隨機圖形。
Unit(s) 3
Course level Postgraduate
Semester 1 or 2
Grading basis Graded
Grade Descriptors A/A-:  EXCELLENT – exceptionally good performance and far exceeding expectation in all or most of the course learning outcomes; demonstration of superior understanding of the subject matter, the ability to analyze problems and apply extensive knowledge, and skillful use of concepts and materials to derive proper solutions.
B+/B/B-:  GOOD – good performance in all course learning outcomes and exceeding expectation in some of them; demonstration of good understanding of the subject matter and the ability to use proper concepts and materials to solve most of the problems encountered.
C+/C/C-: FAIR – adequate performance and meeting expectation in all course learning outcomes; demonstration of adequate understanding of the subject matter and the ability to solve simple problems.
D+/D: MARGINAL – performance barely meets the expectation in the essential course learning outcomes; demonstration of partial understanding of the subject matter and the ability to solve simple problems.
F: FAILURE – performance does not meet the expectation in the essential course learning outcomes; demonstration of serious deficiencies and the need to retake the course.
Learning outcomes At the end of the course of studies, students will have acquired the ability to
1. Be familiar with the concept of concentration and various tail bounds
2. Be familiar with data in high dimension space and its properties
3. Be familiar with singular value decomposition (SVD) and its algorithms
4. Be familiar with random walks, Markov chains and their applications
5. Be familiar with online learning and VC-dimension
6. Be familiar with streaming, sketching and sampling techniques
7. Be familiar about graph structures and their applications
(for reference only)
Exam: 50%
Project: 30%
Homework or assignment: 20%
Recommended Reading List 1. Foundations of Data Science, by Avrim Blum, John Hopcroft, and Ravindran Kannan
2. Various research papers


CSCIN programme learning outcomes Course mapping
Upon completion of their studies, students will be able to:  
1. identify, formulate, and solve computer science problems (K/S);
2. design, implement, test, and evaluate a computer system, component, or algorithm to meet desired needs (K/S);
3. receive the broad education necessary to understand the impact of computer science solutions in a global and societal context (K/V);
4. communicate effectively (S/V);
5. succeed in research or industry related to computer science (K/S/V);
6. have solid knowledge in computer science and engineering, including programming and languages, algorithms, theory, databases, etc. (K/S);
7. integrate well into and contribute to the local society and the global community related to computer science (K/S/V);
8. practise high standard of professional ethics (V);
9. draw on and integrate knowledge from many related areas (K/S/V);
Remarks: K = Knowledge outcomes; S = Skills outcomes; V = Values and attitude outcomes; T = Teach; P = Practice; M = Measured