ST444 Half Unit
Computational Data Science
This information is for the 2020/21 session.
Teacher responsible
Dr Yining Chen COL 5.08
Availability
This course is available on the MSc in Data Science, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (ÐÓ°ÉÂÛ̳ and Fudan), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.
Pre-requisites
Basic knowledge in calculus and linear algebra, as well as a first course in probability and statistics.
Course content
An introduction to the use of popular algorithms in statistics and data science, including (but not limit to) numerical linear algebra, optimisation, graph data and massive data processing, as well as their applications. Examples include least squares, maximum likelihood, principle component analysis, LASSO and graphical LASSO, PageRank, etc. Throughout the course, students will gain practical experience of implementing these computational methods in a programming language. Learning support will be provided for at least one programming language, such as R, Python or C++, but the choice of language supported may vary between years, depending on judged benefits to students, whether in terms of pedagogy or resulting skills. This year, the default choice is Python.
Teaching
This course will be delivered through a combination of classes/computer workshops and lectures totalling a minimum of 30 hours across Michaelmas Term. This year, some or all of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos. This course includes a reading week in Week 6 of Michaelmas Term.
Lectures will cover:
(1) Introduction: overview of the topics to be discussed, how numbers are presented in memory, floating point arithmetic, stability of numerical algorithms
(2) Basic algorithms: overview of different types of algorithms, Big-O notation, elementary complexity analysis, and their applications in data science
(3) Tools in optimisation: bi-section, steepest descent, Newton’s method, Quasi-Newton methods, stochastic search, stochastic gradient; convex optimization (coordinate descent, ADMM, etc).
(4) Tools in numerical linear algebra: Gaussian elimination, Cholesky decomposition, LU decomposition, matrix inversion and condition, computing eigenvalues and eigenvectors, and their applications
(5) Other topics: graph data processing and massive data processing
Formative coursework
Students will be expected to produce 4 problem sets in the MT.
Bi-weekly exercises, involving computer programming and some theory.
Indicative reading
Computational Statistics by Givens and Hoeting
Statistical computing in C++ and R by Eubank and Kupresanin
Foundations of Data Science by Blum, Hopcoft and Kannan
Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein
The Art of R Programming: A Tour of Statistical Software Design by Matloff
Think Python: How to Think Like a Computer Scientist by Downey
Assessment
Exam (70%, duration: 2 hours) in the summer exam period.
Coursework (30%) in the MT.
Student performance results
(2016/17 - 2018/19 combined)
Classification | % of students |
---|---|
Distinction | 24.5 |
Merit | 45.7 |
Pass | 23.4 |
Fail | 6.4 |
Important information in response to COVID-19
Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.
Key facts
Department: Statistics
Total students 2019/20: 20
Average class size 2019/20: 20
Controlled access 2019/20: Yes
Value: Half Unit
Personal development skills
- Self-management
- Team working
- Problem solving
- Application of information skills
- Communication
- Application of numeracy skills