# CSCI3220 Algorithms for Bioinformatics

 Course code CSCI3220 Course title Algorithms for Bioinformatics 生物信息學中的算法 Course description This course describes some algorithms commonly used in contemporary bioinformatics. After a brief introduction to basic molecular biology and genetics, four main topics will be covered with corresponding lists of algorithms: 1. Sequence alignment and assembly: dynamic programming for optimal sequence alignment, FASTA and BLAST for heuristic alignment of long sequences, tree-based methods for multiple sequence alignment, suffix-tree, suffix-array and Burrows-Wheeler Transform for short read alignment, and algorithms based on de Bruijn graphs for sequence assembly 2. Statistical modeling: forward, backward, Viterbi and Baum-Welch algorithms for hidden Markov models, Gibbs sampling for sequence motif discovery, and Bayesian classifiers, logistic regression and expectation-maximization for data classification and clustering 3. Phylogenetics: methods based on Jukes-Cantor and Kimura models for divergence time estimation, maximum parsimony, UPGMA, Neighbor-joining and maximum likelihood methods for phylogenetic tree reconstruction. 4. High-throughput data analysis: Hierarchical clustering and k-means for data clustering, and algorithms for selected problems such as signal peak calling, detection of gene fusion, haplotype phasing. Other topics such as RNA secondary structure prediction may also be covered depending on the available time. 本科介紹一些現代生物信息學中常用的算法。本科以基礎份子生物學及遺傳學簡介開始，接著圍繞四個主題討論相關的算法： 1. 序列對準及組裝：為最佳化對準的動態規劃；為對準長序列的FASTA和BLAST；為多序列對準的樹狀方法；為短序列的後綴樹、後綴數組和Burrows-Wheeler變換；以及為序列組裝，基於de Bruijn圖形的算法 2. 統計模型：為隱Markov模型的前向、反向、Viterbi和Baum-Welch算法；為找尋序列主題的Gibbs取樣程序；以及為數據分類和群組的Bayesian分類法、邏輯迴歸法和期望值最大化 3. 物種發生學：為估測物種分離時間，基於Jukes-Cantor和Kimua模型的算法；以及為重構物種樹的最大簡約法、UPGMA、鄰舍結合法和最大似然法 4. 高量數據分析：為數據群組的等級群組法和k-means；以及特選課題的相關算法，例如搜尋訊息頂位、基因合併和單體型建構若時間許可，本科可能會加入諸如核糖核酸二級結構估計等其他題目。 Unit(s) 3 Course level Undergraduate Semester 1 or 2 Grading basis Graded Grade Descriptors A/A-:  EXCELLENT – exceptionally good performance and far exceeding expectation in all or most of the course learning outcomes; demonstration of superior understanding of the subject matter, the ability to analyze problems and apply extensive knowledge, and skillful use of concepts and materials to derive proper solutions. B+/B/B-:  GOOD – good performance in all course learning outcomes and exceeding expectation in some of them; demonstration of good understanding of the subject matter and the ability to use proper concepts and materials to solve most of the problems encountered. C+/C/C-: FAIR – adequate performance and meeting expectation in all course learning outcomes; demonstration of adequate understanding of the subject matter and the ability to solve simple problems. D+/D: MARGINAL – performance barely meets the expectation in the essential course learning outcomes; demonstration of partial understanding of the subject matter and the ability to solve simple problems. F: FAILURE – performance does not meet the expectation in the essential course learning outcomes; demonstration of serious deficiencies and the need to retake the course. Learning outcomes At the end of the course of studies, students will be able to 1. Explain the problems studied in both biological and computational terms. 2. Explain the details of the studied algorithms. 3. Apply the learned algorithms to solving both problems in bioinformatics and other domains. 4. Study new algorithms proposed for the same problems. 5. Understand contents of other classes in computational biology and bioinformatics. Assessment (for reference only) Exam: 65% Assignments: 35% Recommended Reading List 1. Wing-Kin Sung, Algorithms in Bioinformatics: A Practical Introduction, Chapman & Hall, 2009 (and corresponding online materials at http://www.comp.nus.edu.sg/~ksung/algo_in_bioinfo/) 2. Neil C. Jones and Pavel A. Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004

 CSCIN programme learning outcomes Course mapping Upon completion of their studies, students will be able to: 1. identify, formulate, and solve computer science problems (K/S); TP 2. design, implement, test, and evaluate a computer system, component, or algorithm to meet desired needs (K/S); TP 3. receive the broad education necessary to understand the impact of computer science solutions in a global and societal context (K/V); TP 4. communicate effectively (S/V); 5. succeed in research or industry related to computer science (K/S/V); T 6. have solid knowledge in computer science and engineering, including programming and languages, algorithms, theory, databases, etc. (K/S); TP 7. integrate well into and contribute to the local society and the global community related to computer science (K/S/V); 8. practise high standard of professional ethics (V); 9. draw on and integrate knowledge from many related areas (K/S/V); T Remarks: K = Knowledge outcomes; S = Skills outcomes; V = Values and attitude outcomes; T = Teach; P = Practice; M = Measured