Social Statistics

Anyone can make a theory or a claim about anything but the important thing is can you provide real world evidence to back up and support that claim and that really is where Statistics comes into its element

Research in Social Statistics is concerned with the development of statistical methods that can be used across the social and human sciences. Statisticians play an essential role in all aspects of social inquiry, including study design, measurement, data linkage, development of statistical models that account for the complex structure of social data, model selection and evaluation, and modelling, analysis and interpretation of the data to answer substantive research questions.

Members of the Social Statistics group have interests in statistical methods in each of these areas. They regularly collaborate with social scientists whose questions motivate new lines of methodological research. We have experience in a range of social science disciplines, including demography, education, epidemiology, psychology and sociology.

Members of the Social Statistics group conduct research in many areas of statistical theory and methods that are important for answering research questions in the modern social sciences.

The methods that we research are relevant for research questions that can be of many types, including questions about out-of-sample prediction based on complex data, description of population relationships using data from surveys and other sources, or causal inference from experimental or observational studies using approaches such as regression discontinuity, interrupted time series and synthetic and negative control designs.

Data in these applications are often complex, high-dimensional and challenging to analyse. We develop methods that can cope with this complexity, for example: multivariate analysis of high-dimensional data; analysis of clustered data with complex correlation structures such as multivariate longitudinal data and multiprocess survival data; detection of outliers; analysis of problems with missing data, drop-out, misclassification and measurement error; dealing with non-informative missingness in the presence of time-varying confounding in causal inference; and combining data from multiple different sources.

We develop and employ various statistical frameworks, models, methods of estimation, and computational algorithms. These include: different types of latent variable, mixture and random effects models for continuous and categorical variables; Gaussian processes; interpretable machine learning methods; marginal modelling; composite likelihood methods; models for dependence using reproducing kernel Hilbert space methods; Markov decision and reinforcement learning methods; Bayesian methods; and computationally efficient Markov Chain Monte Carlo and sequential Monte Carlo computational techniques to facilitate parameter estimation, statistical inference, model choice, and prediction.

The statistical methods that are studied and developed by members of the Social Statistics group can be used in substantive research in the social and human sciences, business and policy. We work on such applications in collaboration with researchers in criminology, demography, education, epidemiology, political science, psychology, social policy, and sociology, as well as researchers outside academia. Many of these projects are funded by research grants from external funders such as the Economic and Social Research Council and the Wellcome Trust.

Areas of application that we have worked on include international large-scale assessments in education; methods of election polling; effects of statin prescription in the general population; public attitudes to the police; longitudinal analysis of exchanges of financial and practical support between parents and their adult children; role of education in social class mobility; resource allocation and sequential decision in crowdsourcing platforms; use of stochastic epidemic models on the infectious diseases of covid-19, influenza, HIV and sheeppox; estimation of the prevalence of problem gambling; cheating detection in educational tests; effect of changing sentencing guidelines on sentence severity; ethnic diversity and social cohesion; associations between change in beliefs and mood following cardiac surgery and subsequent attendance at outpatient rehabilitation; sequential design of personalised learning systems; safety citizenship behaviour in organizations; impact of austerity measures on mental health, in particular of ethnic minority communities in London; and exit poll forecasting of election results.

Bakk, Z. and Kuha, J. (2018). Two-step estimation of models between latent classes and external variables. Psychometrika, 83, 871-892.

Bergsma, Wicher (2020). Regression with I-priors. Econometrics and Statistics, 14, 89 - 111.

Chen, Y., Lee, Y.-H., and Li, X. (2021). Item quality control in educational testing: Change point model, compound risk, and sequential detection. To appear in Journal of Educational and Behavioral Statistics.

Chen, Y. and Li, X. (2021). Determining the number of factors in high-dimensional generalised latent factor models. To appear in Biometrika.

Doretti, M., Geneletti, S., and Stanghellini, E. (2017). Missing data: A unified taxonomy guided by conditional independence. International Statistical Review, 86, 189-204.

Dureau, J., Kalogeropoulos, K., Vickerman, P., Pickles, M., and Boily, M. C. (2016). A Bayesian approach to estimate changes in condom use from limited human immunodeficiency virus prevalence data. Journal of the Royal Statistical Society, Series C, 65, 237 - 257.

Geminiani, E., Marra, G., and Moustaki, I. (2021). Single and multiple-group penalized factor analysis: a trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika, 86, 65 - 95.

Geneletti, S., Ricciardi, F., O’Keeffe, A. G., and Baio, G. (2019). Bayesian modelling for binary outcomes in the regression discontinuity design. Journal of the Royal Statistical Society, Series A, 182, 983 - 1002.

Katsikatsou, M., Moustaki, I., and Jamil, H. (2022). Pairwise likelihood estimation for confirmatory factor analysis models with categorical variables and data that are missing at random. British Journal of Mathematical and Statistical Psychology, 75, 23 - 45.

Kuha, J., Bukodi, E., and Goldthorpe, J. H. (2021). Mediation analysis for associations of categorical variables: The role of education in social class mobility in Britain. Annals of Applied Statistics, 15, 2061-2082.

Malesios, C, Demiris, N, Kalogeropoulos, K., and Ntzoufras, I (2017). Bayesian epidemic models for spatially aggregated count data. Statistics in Medicine, 36, 3216-3230.

Shi, C., Xu, T., Bergsma, W., and Li, L. (2021). Double generative adversarial networks for conditional independence testing. Journal of Machine Learning Research, 22, 1-32.

Steele, F., Clarke, P.S., and Kuha, J. (2019). Modeling within-household associations in household panel studies. Annals of Applied Statistics, 13, 367-392.

Steele, F. and Grundy, E. (2021). Random effects dynamic panel models for unequally-spaced multivariate categorical repeated measures: an application to child-parent exchanges of support. Journal of the Royal Statistical Society, Series C, 70, 3-23.

Wicher Bergsma - Professor

Research interests: Reproducing kernels; dependence modelling; graphical models; I-priors; categorical data.

Yunxiao Chen - Assistant Professor

Research interests: Latent variable models; high-dimensional multivariate analysis; empirical Bayes; process data sequential decision.

Sara Geneletti - Associate Professor

Research interests: Causal inference; natural experiments; Bayesian methods; synthetic controls; graphical models.

Kostas Kalogeropoulos - Associate Professor

Research interests: Bayesian inference; stochastic epidemic modelling; factor analysis; sequential learning.

Jouni Kuha - Professor

Research interests: Categorical data; incomplete data problems; latent variable modelling; survey data analysis.

Irini Moustaki - Professor

Research interests: Latent variable and structural equation models; estimation methods; treatment of missing values; outlier detection; categorical data analysis.