ÐÓ°ÉÂÛ̳

 

ST457      Half Unit
Graph Data Analytics and Representation Learning

This information is for the 2024/25 session.

Teacher responsible

Prof. Zoltan Szabo (COL.5.14)

Homepage: https://zoltansz.github.io/

Availability

This course is available on the MSc in Data Science, MSc in Geographic Data Science, MSc in Health Data Science, MSc in Operations Research & Analytics, MSc in Quantitative Methods for Risk Management, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.

Pre-requisites

No particular course is required as pre-requisite. The course requires basic knowledge of linear algebra, calculus, probability, (un)supervised learning, and programming experience in Python (used throughout the classes). Familiarity with notions such as vector, matrix, matrix-vector multiplication, inner product and distance of vectors, transpose and inverse of a matrix, eigenvalue, eigenvector, derivative of a function, probability mass/density function, some formulation of regression, classification and clustering is beneficial.

Course content

Graphs are among the most widely-used data structures in machine learning. Their power comes from the flexibility of capturing relations (edges) of collections of entities (nodes) which arise in a variety of contexts including economic, communication, transportation, citation, social, neuron, computer, or particle networks, knowledge, scene or code graphs, molecules or 3D shapes. Graphs naturally generalize unstructured vectorial data and structured data such as time series, images or bags of entities. The goal of this course is to provide an overview of the fundamental computational methods leveraging this additional relational structure and leading to improved prediction. We will cover examples and techniques for node classification (which can be applied for example to determine whether a user is a bot, to classify the topic of papers, or to determine the function of proteins), link prediction (for instance to recommend content on online platforms, to complete knowledge graphs, or to predict drug side-effects), clustering and community detection (for example to determine collaborating communities in citation networks, or to reveal fraudulent groups of users in financial transaction networks), graph classification / regression / clustering (for