CSCI3320: Fundamentals of Machine Learning
General Expectations:
Student/Faculty's Expectations on Teaching and Learning
Message:
20202021 was my last year of teaching CSCI3320, Fundamentals of Machine Learning,
a course which I have taught for 7 or 8 years. It has been fun and as someone said,
usually it is the teacher who learns the most. In any case,
for students who want to take CSCI3320, please refer to the syllabus by the
current course instructor.
Instructor:
Prof. John C.S. Lui , office hours: Thursday, 8:3010:30am.
Machine learning (ML) is a method of data analysis that automates analytical model building.
Some people say that ML is a branch of artificial intelligence.
Personally, I think that ML is really a branch of statistics.
In any case, this course provides an introduction to machine learning.
It is designed to give undergraduate students
a taste of various machine learning techniques.
Students need to have a good background in
probability, statistics, a bit of optimizaton as well as
programming (e.g., Python) to appreciate various methods.
Furthermore, students need to spend time to read the textbook,
as well as to put in the effort to read various resources on the Internet,
do the homework, attend the lectures and tutorials
to understand and keep pace with this course.
If you skip some classes, please remember that you are solely responsibile for you own actions
on any missed lectures or announcemnets.
Although skipping classes is now a norm in CUHK,
but I like to emphasize that if you skip lectures/tutorial in this course,
you will easily get lost and will not be able to keep pace with the lectures.
So, words of advice, do not skip any classes or tutorials.
Machine learning is an essential knowledge in computer science/engineering,
and a highly sought after skill in the industry.
If you are welltrained in this subject,
surely you can find a good job.
Nevertheless, the subject is
not for the fainthearted students.
I will discuss the mathematics, theories, algorithms and programming techniques
behind different machine learning
methods, and students need to do various homework and exercises to understand
the subject.
References:

Bayesian Reasoning and Machine Learning, by David Barber

Pattern Recognition and Machine Learning, by Christopher M. Bishop

Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy

Learning from Data, by Yaser S. AbuMostafa

Machine Learning: An Algorithmic Perspective, by Stephen Marsland

Machine Learning with R, by Brett Lantz

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by Trevor Hastie, Robert Tibshirani, Jerome Friedman

An Introduction to Statistical Learning: with Applications in R,
by Gareth James, Trevor Hastie

Mastering Machine Learning With scikitlearn,
by Gavin Hackeling

Machine Learning for Hackers, by Drew Conway, John Myles White

Probabilistic Graphical Models: Principles and Techniques, by Daphne Koller, Nir Friedman

Machine Learning in Action, by Peter Harrington
 Abundant resources available on the web.
Course Grades:
 Written homework (will still be given out): 0%;
 Python/Scikitlearn Programming : 40%;
 Final Examination: 60%
 Policy on letter grades !!!!!
Policies:
Announcemnet:
Final Examination :
Topics to be covered
in the final exam are in general the materials we went through
in the lectures and tutorials, these include:
 Statistics, sampling, curve fitting, correlation theory
 Basic concepts in matrix calculus, linear algebra, Lagrangian Optimization
 Supervised and unsupervised learning
 VC dimension
 Bayesian Decision Theory
 Parametric Methods: Univariate and Multivariate methods
 Dimensionality Reduction via PCA, Feature Embedding, LDA.
 Clustering via KMean Algorithm
 EM Algorithm
 Matrix Factorization
 Linear Discriminant: logistic classification and regression
 Decision trees
 Random forests
 Support vector machines
 Neural networks
 ...etc
Lecture Notes:
Lecture and tutorial notes and videos can be downloaded from the Blackboard at CUHK.

Introduction on Machine Learning (online lecture)

Review on Statistics (prerecorded lecture)
 Statistical Sampling
 Estimation Theory
 Hypothesis Testing
 Curve Fitting
 Least Sqaured Regression
 Regression
 Corrrelation Theory
 QQ Plot

Derivation of Least Squares

Some exercises on "Review of Statistics" (online lecture)

Overview of Supervised Learning (prerecorded lecture)
 What is supervised learning in classification ?
 Probably approximately correct (PAC)
 VapnikChervonenkis (VC) Dimension
 What is supervised learning in regression ?

Examining Your Data or Cleaning your Data: PANDAS Tutorial
(online lecture with Python code in Jupyter notebook)

loading a CSV file

find out various display options

examine the data and schema of the data file

relationship with Python's dictionary

select some features to display

setup a filter and select some data which qualified for the filter

incorporiate the Python's string library to set up a filter

modify feature names as well as data in the dataframe

add/remove data into dataframe

sort data in the dataframe

grouping the data

aggregating the data

exploring the data

casting datatypes and handling missing values

working with dates

working with time series data

reading/writing data to different sources: Excel, JSON, ...etc.

Overview of Bayesian Decision Theory (prerecorded lecture)

Bayes' Rule: Machine Learning perspective

Loss/Risk Functions, discriminant functions

Introduction to correlation and causality

Introduction to causal and diagnostic inference

Simple Bayesian Networks and Simple Bayes' Classifiers

Association Rules

Regression, Overfitting, Underfitting and Prediction in Python
(online lecture with Python code in Jupyter notebook)
 Numpy array
 Shape and reshape of Numpy array
 Use Numpy array as index on another Numpy array
 Elementwise logical comparison in Numpy array
 Setting minimum and maximum in all elements in an numpy array via clip()
 Cleaning numpy array by filtering NaN entries
 Brief introduction to Scipy
 Loading datafile via Scipy
 Checking NaN and filtering them out via Scipy from the array
 Performing a scatting plot in matplotlib
 Performing a polynomial best fit on the data
 Piecewise polynomial fit via "one" (or multiple) change point
 Fit model after the change point, and use models for "future" prediction
 Split the training and testing, and do prediction

Evaluation metrics (prerecorded lecture)
 Confusion matrix, accuracy, precision and recall

Example code on ML background, confusion matrix, accuracy and recall
(with scikitlearn code in Jupyter notebook)

Data cleansing and data processing in scikitlearn (prerecorded lecture)
(with scikitlearn code in Jupyter notebook)
 CSV file as input
 Data cleansing, relabelling, onehot encoding
 Split and test
 Decision tree and random forest

Classification in scikitlearn (prerecorded lecture)
(with scikitlearn code on Jupyter notebook)
 Decision tree and how it outputs feature importances
 Display of result using decision tree
 Use of DummyClassifier and how we loop through different classifier strategy
 A glimpse of other classifers like: neural network, KNN, SVC, SVM, Linear SVC, Adaboost..etc
 Concept of training time and score of each classifier
 Feature importance of Adaboost
 Multiclass classification
 Example of digit recognition
 Confusion matrix and the use of mglearn to display confusion matrix
 Use of classification_report to display precision, recall, f1score and suppot for all classes
 Prediction probabilities for each testing input

Parametric Methods (prerecorded lecture)

Maximum likelihood estimator

Estimator: bias vs. variance

Unbiased estimator, consistent estimator, asymptotically unbiased estimator

Bayes' estimator

Parametric Classification

Parametric Regression

Bias/Variance Dilemma

Illustration of Model Selection

Introduction to Classification in Python and Scikitlearn
(online lecture with Python and scikitlearn codee in Jupyter notebook)
 Visualization of subset of features in our dataset
 From visualization, discover classification rules
 Use of simple threshold technique to do classification
 The need to split up the data into training and validation
 From leaveone crossvalidation to kfold cross validation
 Using 1NN and KNN as classifiers
 The need to normalize all features
 Color scatter plot of results in KNN
(with different values of k
 Classification via random forest

How to do regression in Python and Scikitlearn
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Single feature linear regression (or least square fit)
 Multidimensional linear regeression
 Regression using Ridge, Lasso and ElasticNet Model
 Tunning hyperparameter within a learner
 Illustrate the problem of not using crossvalidation (or use ALL DATA for training).
 Illustrate how to use ElasticNet for regresion and how to use the
L_{1} ratio to tune λ_{1} and λ_{2}.

Regression in scikitlearn
(prerecorded lecture with scikitlearn code in Jupyter notebook)
 load/fetch/make_ datasets in scikitlearn
 understanding the meta data from a pickel compressed file (PKZ)
 Regression metrics: explained variance score, mean absolute error, r2 score
 Doing regression with multiple linear learners
 Understanding various regularization methods
 Doing regression with multiple nonlinear learners

Real Life Classification: rating answers in Stackoverflow
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Fetch and preprocess a 90GB raw XML data (yes, it is painful)
 Creating a first nearestneighbor classifier
 Looking into how to improve the classifier's performance
 Change from nearestneighbor to logistic regression
 Use precision, recall and AUC to better understand the classifier's performance
 Prepare the final version

Dimensionality Reduction (prerecorded lecture)

Dimensionality Reduction in action
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Feature selection vs. feature projection methods
 How to use correlation, in particular, Pearson Coefficient, to find out linear relationship among two features
 Discuss how to use mutual information to discover linear and nonlinear relations between two features.
 Discuss how to use recursive wrapper as recursive feature elimination to select features.
 Discuss PCA, LDA and Multidimensioal Scaling (MDS).

Clustering (prerecorded lecture)

Text Preprocessing, NLTK and Finding top k documents via Clustering Technique
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Preprocessing documents or text via NLTK, e.g., bagofwords technique
 Compare similarity of a document with a set of documents using raw vectors
 Compare similarity of a document with a set of documents using normalized vectors
 Applying "stop words" into the vectorizer
 Applying "stemming" into the vectorizer
 Applying Term Frequency (TF) and Inverse Document Frequency (IDF) into the vectorizer
 Applying Kmean algorithm and plotting decision space
 Clustering on a realistic dataset
 Given a new post, find "similar" posts in a corpus

Multivariate Parametric Methods (prerecorded lecture)

Multivariate Parameters and Estimation

Multivariate Normal Distributions

Multivariate Parametric Classification in Multivariate Normal Distributions

Multivariate Parametric Classification in Multivariate
Bernoulli/Multinomial Distributions

Multivariate Regression

Linear Discrimination (prerecorded video)

Generalizing the Linear Model

Geometry of the Linear Discriminant

Linear Discriminant via Pairwise Separation

Logistic Discriminant: Two and Multple Classes

Discriminant by Regression

Discriminant via Ranking

Recommender Systems
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Making recomendation in machine learning based on previous userproduct ratings (Netflixlike recoomendation)
 Visualization of matrix sparsity
 Finding similar users or similar products to make recommendation
 Using regression technique to make recommendation
 Using ensemble learning to make recommendation
 Basket analysis for nonnumeric data
 Apriori algorithm, association rules and their implementation

Nonparametric Methods
(prerecorded video)

Nonparametric density estimation: Histogram Estimator

Nonparametric density estimation: Kernel Estimator

Nonparametric density estimation: kNearest Neighbor Estimator

Nonparametric density estimation: Generalization to Multivarate Data

Condensed Nearest Neighbor

DistanceBased Classification

Nonparametric Regression: Smoothing Models

Decision Trees (prerecorded video)

Univariate Trees

Prunning on Decision Trees

Rule Extraction from Decision Trees

Learning Rules from Decision Trees

Multivariate Decision Trees

Sentiment Analysis on Tweeterlike data
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Learn about Naive Bayes classifier (NBC)
 Apply NBC on tweets to do sentiment analysis
 Learn various smoothing techniques in "NBC":
(a) Addone smoothing and, (b) Lidstone smoothing
 Learn various performance metrics such as (a) true positive, (b) false positive, (c) false negative and (d) true negative in the confusion matrix.
 Extend the performance metircs to: (a) accuracy, (b) error rate, (c) recall, (d) specificity, (e) precision, (f) false positive rate, (g) matthews correlation coefficient, (h) Fscore
 Basic working principle of PrecisionRecall Curve (PRC)
 Cleaning the tweets' texts can improve accurac
 Use `partofspeech' (POS) and substitution to refine the classification process
 Learn how to use Pipeline mode of data analysis
 Learn how to use Gridsearch approach to find optimal
values in hyperparameters

Good video in explaining Area under the Curve (AUC) and Receiver Operator Characteristics (RoC)

Kernel Machines (prerecorded video)

Quick Review of Logistic Classification/Regression

From Logistic Classification to SVM Classification

Concept of Large Margin

Landmarks to Kernels

Theory of Margin and Support Vectors

Nonseparable Case: Soft Margin Hyperplane

Hinge Loss

Kernel Tricks and Kernel Functions

Multiple Kernel Learning and Multiclass Kernel Machines

SVM for Regression

SVM for Ranking

Large Margin Nearest Neighbor

Kernel Dimensionality Reduction

Optional Reading 1:
Constrained Optimization

Optional Reading 2:
Inequality Constraints and KuhnTucker method

Multilayers Perceptrons (Artificial Neural Networks)
(prerecorded video)
 Perceptron
 Training a Perceptron
 Learning Boolean Functions
 Multilayer Perceptrons
 Backpropagation Algorithm
 Training Procedures
 Tuning the Network Size
 Bayesian View of Learning
 Dimensionality Reduction
 Deep Learning

Topic Modeling: Comparing or searching documents by topics instead of words
(online lecture with Python and scikitlearn code in Jupyter notebook)
 Learn about the importance of topic modeling and how to search document within a topic.
 Learn (at the high level) about latent Dirichlet allocation (LDA)
 Learn about gensim package and how to generate topics for corpuses
 Learn about visualizing topic distribution and how to use $\alpha$
to vary the distribuiton on associating document to number of topics
 Learn about wordcloud package to visualize the words within a topic
 Learn about how to find closest topics or documents.

Music Genre Classifiation
(with Python and scikitlearn code in Jupyter notebook (will be uploaded to blackboard))
 How to do music genre classification
 How to use fast fourier transform (FFT) to convert songs into a vector of numbers, then use these vectors to train our learner
 We usee Melfrequency cepstral coefficients (MFCCs) to convert songs into a vector of numbers, then use these vectors to train our learner
 We learn about the phyical meaning of precision/reall curve and ROC curve
 We learn how to examine and visualize the confusion matrix

Graphical Models (To be uploaded if time allows)

Conditional Independence

Generative Models

dSeparation

Belief Propagation

Undirected Graphs and Markov Random Fields

Learning Structures from Graphical Model

Influence Diagram

Hidden Markov Models (To be uploaded if time allows)
 Discrete Markov Processes
 Hidden Markov Models (HMM)
 Basic Problems of HMM
 Evaluation Problem
 Learning the State Sequence
 Learning the Model Parameters
 The HMM as a Graphical Model

Bayesian Estimation (To be uploaded if time allows)
 Bayesian Estimation of Parameters of a Disrete Distribution
 Bayesian Estimation of Parameters of a Gaussian Distribution
 Bayesian Estimation of Parameters of a Function
 Choosing a Prior
 Bayesian Model Comparison
 Bayesian Estimation of a Mixed Model
 Gaussian and Dirichlet Processes, Chinese Restaurants
 Latent Dirichlet Allocation
 Beta Processes and Indian Buffets

Reinforcement Learning (e.g., Game Theory, Markov Decision Process,..etc.) (To be uploaded if time allows)
 Single State Case: KArmed Bandit
 Elements of Reinforcement Learning
 ModelBased Learning
 Temporal Difference Learning
 Partially Observed States

Brief Introduction to Game Theory
Additional References

Exploring Python by Timothy A. Budd

Think Python: How to Think Like a Computer Scientist, by Allen B. Downey

Python Tutorial

Python Programming at Youtube

Reference note on matrix differentiation

Matrix notations and operations

Vector notations and operations

The Matrix Cookbook by K.B. Petersen and M.S. Pedersen

Brief Introduction to Kalman Filters
Tutorial Notes (Availablle on Blackboard)

Tutorial 0: Introduction to Python,

Tutorial 1 (Quick Introduction to scikitlearn with Jupyter notebook);

Tutorial 2 (Review on Linear Algebra And Matrix Calculus, with Jupyter notebook))

Tutorial 3 (Review on Gradient Descent For Linear Regression with Jupyter notebook)

Tutorial 4 (Review on Linear Regression)

Tutorial 5 (Regularization and Cross Validation with Python code)

Tutorial 6 (Parametric Classification and Implementation with sample code)

Tutorial 7 (Principal Component Analysis)

Project Tutorial (Horse Racing Prediction)

Tutorial 8 (Kernel Machines) (To be uploaded)

Tutorial 9 (Ensemble Methods) (To be uploaded)
Homework (Available on Blackboard)
 Will be posted on Blackboard.
Programming homework
 Will be posted on Blackboard.
Programming Project :
 Will be posted on Blackboard.