CSCI5550 Advanced File and Storage Systems


Course code CSCI5550
Course title Advanced File and Storage Systems
Course description This course aims to introduce important systems-level research topics in the design and implementation of practical file and storage systems. Topics include: (i) storage device organization (e.g., disk drives, disk arrays, RAID, solid state drives), (ii) file system design (e.g., log-structured file systems, distributed file systems), (iii) data availability (e.g., erasure coding techniques, data integrity protection), (iv) data consistency (e.g., journaling techniques), (v) data compression (e.g., deduplication), (vi) benchmarking (e.g., I/O metrics, benchmarking tools), etc. Depending on the current research trends, the course also discusses the latest applied storage topics, especially related to scalable and dependable big data management.
Advisory: Students are expected to have taken CSCI3150 or ESTR3102 or equivalent.
本科旨在介紹有關設計和實踐實用檔案和存儲系統的重要系統研究課題。主題包括:(一)存儲設備結構(如磁盤驅動器、磁盤陣列、RAID、固態硬盤),(二)檔案系統的設計(如日誌結構檔案系統、分佈式檔案系統),(三)數據可用性(如糾刪編碼技術、數據完整性保護),(四)數據一致性(如日誌技術),(五)數據壓縮(如重複數據刪除),(六)基準評價(如I/ O指標、基準測試工具)等。本科也會按目前研究趨勢討論最新的應用存儲課題,尤其關於可擴展和可靠的大數據管理。
Unit(s) 3
Course level Postgraduate
Semester 1 or 2
Grading basis Graded
Grade Descriptors A/A-:  EXCELLENT – exceptionally good performance and far exceeding expectation in all or most of the course learning outcomes; demonstration of superior understanding of the subject matter, the ability to analyze problems and apply extensive knowledge, and skillful use of concepts and materials to derive proper solutions.
B+/B/B-:  GOOD – good performance in all course learning outcomes and exceeding expectation in some of them; demonstration of good understanding of the subject matter and the ability to use proper concepts and materials to solve most of the problems encountered.
C+/C/C-: FAIR – adequate performance and meeting expectation in all course learning outcomes; demonstration of adequate understanding of the subject matter and the ability to solve simple problems.
D+/D: MARGINAL – performance barely meets the expectation in the essential course learning outcomes; demonstration of partial understanding of the subject matter and the ability to solve simple problems.
F: FAILURE – performance does not meet the expectation in the essential course learning outcomes; demonstration of serious deficiencies and the need to retake the course.
Learning outcomes At the end of the course of studies, students will have acquired the ability to
1. Design, implement, evaluate, and deploy a practical storage system,
2. identify the research trends in file and storage systems, and
3. develop research skillsets in the area of file and storage systems.
(for reference only)
Essay test or exam: 30%
Presentation: 30%
Others: 40%
Recommended Reading List Reference academic papers:
1. C. Ruemmler and J. Wilkes. An Introduction to Disk Drive Modeling, IEEE Computer, 27(3):17-29, March 1994.
2. B. Schroeder and G. Gibson. Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? Proceedings of USENIX Conference File and Storage Technologies, 2007.
3. M. Holland, G. A. Gibson, and D. P. Siewiorek. Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays. ournal of Distributed and Parallel Databases, Vol. 2, No. 3, 1994.
4. Chen, P. M., Lee, E. K, Gibson, G. A., Katz, R. H., Patterson, D. A. RAID: High-Performance, Reliable Secondary Storage. ACM Computing Surveys, 26(2):145-185, 1994.
5. N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse, and R. Panigrahy. Design Tradeoffs for SSD
Performance. In Proceedings of USENIX Annual Technical Conference (ATC), 2008.
6. M. Rosenblum and J. K. Ousterhout. The Design and Implementation of a Log-structured File System. ACM Transactions on Computer Systems, 10:26–52, 1992.
7. S. Ghemawat, H. Gobioff, and S.-T. Leung The Google File System. Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2003.
8. J. S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software, Practice & Experience, 27(9):995–1012, 1997.
9. G, Sivathanu, C. P. Wright, and E. Zadok. Ensuring data integrity in storage: techniques and applications. Proceedings of the 2005 ACM workshop on Storage security and survivability, 2005.
10. V. Prabhakaran, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Analysis and evolution of journaling file systems. Proceedings of the USENIX Annual Technical Conference (ATC), 2005.
11. B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system.
Proceedings of USENIX Conference File and Storage Technologies, 2008.Reference book:
12. D. Giampaolo. Practical File System Design with the Be File System. Morgan Kaufmann Publishers, 1999.


CSCIN programme learning outcomes Course mapping
Upon completion of their studies, students will be able to:  
1. identify, formulate, and solve computer science problems (K/S); T
2. design, implement, test, and evaluate a computer system, component, or algorithm to meet desired needs (K/S);
3. receive the broad education necessary to understand the impact of computer science solutions in a global and societal context (K/V);
4. communicate effectively (S/V);
5. succeed in research or industry related to computer science (K/S/V);
6. have solid knowledge in computer science and engineering, including programming and languages, algorithms, theory, databases, etc. (K/S); TP
7. integrate well into and contribute to the local society and the global community related to computer science (K/S/V);
8. practise high standard of professional ethics (V);
9. draw on and integrate knowledge from many related areas (K/S/V);
Remarks: K = Knowledge outcomes; S = Skills outcomes; V = Values and attitude outcomes; T = Teach; P = Practice; M = Measured