Data Science and Computational Statistics
The focus of the Data Science and Computational Statistics is network modelling and Inference. Networks are an important paradigm for scientific research in the 21st century. Our group is focused on developing modelling and inference approaches for this emerging field.
The research programme spans the wide range of methodological developments and applied projects, from random graph models, sparse network model inference and systems biology, high-dimensional inference and inference of ODEs and SDEs. From systems biology, where inference of large gene regulatory networks constitutes an important theme, to modern quantitative sociology with its interest in large social networks, novel methods to infer networks are required.
Social network inference. For social scientists discovering the driving forces behind social networks are of crucial importance. This requires on the one hand stochastic models enriched with covariates and on the other hand efficient inference methods to estimate the influence of these potential driving forces. We have developed novel penalized approaches to deal with the potential high-dimensional nature of the interactions.
Penalized graphical models. One of the major goals of our research is therefore to address those over-fitting problems without losing too much modelling flexibility. To this end, our research focus is on developing novel regularized statistical methodologies and information-coupling schemes, which infer the right trade-off between flexibility and inference accuracy automatically from the available data. Our advanced methods offer maximal flexibility along with various possibilities for learning and tuning coupling strengths in light of the data, such that features which are not supported by the data are automatically down-rated. This maximizes inferential certainty for better-fitting model features.
ODE and SDE inference. Differential equations are the bread and butter of many quantitative sciences, from climate studies to genomics. Often the aim is to match systems of stochastic (SDE) or ordinary differential equations (ODE) with data observed on the real-life process. Two important questions are (i) how to match the SDE or ODE to the data by varying various parameters and (ii) how to choose between alternative descriptions of the system based on the data.
High-dimensional inference. Traditional statistical model, such as linear regression, considered a small number of covariates relative to the number of observations. With the advent of high-throughput and sensing technology, in many areas the data situation has reversed. Although the typical number of independent observations has not changed, the potential number of features has exploded. The field of high-dimensional inference is concerned with discovering and estimating the effect of true features that are hidden as needles in a haystack. Together with collaborators from Palermo, Wit has worked on differential geometric extensions of the least angle regression method for generalized linear models.