Data science

Data Science

Research Centre

Providing expertise in data science

The Data Science Research Centre undertakes research into a diverse range of subjects addressing real world problems in such areas as health, high performance sports, human resources and business.

Expertise is drawn from across the University with members from Applied Mathematics, Computer Science and the Liverpool Business School.

The Centre, led by Professor Paulo Lisboa, has 25 years of research expertise in the following key areas each with a significant component of real-world applications.

Our expertise

Below you'll find information about our key areas of expertise.

Advanced methodology

Advanced machine learning research in the group is well established in two main areas:

  • interpretation of machine learning methods to enable them to be used by experts in the subject area, rather than data scientists. This interpretation is based on statistical principles and is necessary for a full and rigorous validation of the application of complex methods to real-world problems.

    Methods include:
    • graphical methods to find the non-redundant correlation structure in large and complex data, forming a basis for predictive modelling and a rigorous way to define hypotheses for structure equation modelling (Kinderman et al 2015, Bacciu et al 2013)
    • similarity networks which map business questions into an informative data structure for case-based reasoning (Ruíz et al 2013)
    • efficient methods for Boolean rule extraction (i.e. with low-order predicates) from noisy data
    • interpretation of SVMs (Belle et al 2016)
  • application of second generation statistics and structural equation modelling for the purposes of market segmentation. The method such as latent class analysis (LCA) is found to be superior to CHAID and k-means cluster analysis (Hagenaars and McCutcheon, 2009) - the techniques considered as the golden industry standard. Using LCA, in a data-driven manner and based on statistical indicators, we are able to establish what is the number of market segments, what are the variables significant for segmentation, how big are the segments, and what is the probability of belonging to each of the identified segments for each observation. The results can be used for a machine learning training set so that you can independently assess consumer profile and allocate necessary resources for targeting and relationship maximisation. The method is robust and can simultaneously use nominal, ordinal and numerical data.

Key publications:

Kinderman P, Tai S, Pontin E, Schwannauer M, Jarman I, Lisboa P. 2015. Causal and mediating factors for anxiety, depression and well-being. British Journal of Psychiatry, 206: 456-460.

Bacciu D, Etchells TA, Lisboa PJG, Whittaker J. 2013. Efficient identification of independence networks using mutual information. Computational Statistics, 28: 621-646.

Ruiz H, Etchells TA, Jarman IH, Martín JD, Lisboa PJG. 2013. A principled approach to network-based classification and data representation. Neurocomputing, 112: 79-91.

Lisboa PJG, Ellis IO, Green AR, Ambrogi F, Dias MB. 2008. Cluster-based visualisation with scatter matrices. Pattern Recognition Letters, 29: 1814-1823.

Romero E, Mu T, Lisboa PJG. 2012. Cohort-based kernel visualisation with scatter matrices. Pattern Recognition, 45: 1436-1454.

Hagenaars, J. A. and McCutcheon, A. L. (2009). Applied Latent Class Analysis. Cambridge: Cambridge University Press.

Van Belle, V., Van Calster, B., Van Huffel, S., Suykens, J.A.K. and Lisboa, P.J.G. 2016. Explaining support vector machines: a colour based nomogram. PLOS ONE, 11(10):01.

Digital marketing

The explosion of data over the past few years has created a major problem for commercial organisations. The most dramatic impact has been within the area of digital marketing as companies are now able to track and monitor online customer activity – all of which is being analysed in a digital format. This is what is popularly called "digital footprint".

We provide expertise in advanced methods for data analysis to deliver the following capability:

  • strategic and operational support SMEs in the digital sector and more broadly, to embed digital analytics in new products and services
  • delivery of Continual Professional Development (CPD) on data analytics, algorithms and big data programming
  • training graduates and postgraduates with a strong complement of data analysis and transferable skills

This expertise is founded on long-term relationships with major multinationals including the development, implementation and validation of personalised recommender systems for retail (Li et al 2009, Dias et al 2008, Li et al 2007).

Key publications:

Li M, Dias B, Jarman I, El-Deredy W, Lisboa PJG. 2009. Grocery Shopping Recommendations Based on Basket-Sensitive Random Walk KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 1215-1223.

Dias MB, Locher D, Li M, El-Deredy W, Lisboa PJG. 2008. The Value of Personalised Recommender Systems to E-Business: A Case Study RECSYS'08: PROCEEDINGS OF THE 2008 ACM CONFERENCE ON RECOMMENDER SYSTEMS, ACM Conference on Recommender Systems: 291-294.

Li M, Dias B, El-Deredy W, Lisboa PJG. 2007. A Probabilistic Model for Item-Based Recommender Systems RECSYS 07: PROCEEDINGS OF THE 2007 ACM CONFERENCE ON RECOMMENDER SYSTEMS, ACM Conference on Recommender Systems:129-132.

Profiling and inference in public health

Risk modelling for commissioning in health and social care has involved the development of bespoke conditional independence maps to estimate the level of individual interventions required in a multi-sectoral plan to achieve desired changes in outcome variables. This is one of few tools available for the design of multi-sectoral interventions, which is currently under consideration by the Public Health England.

A closely related application with the Merseyside Fire and Rescue Service (MRFS) produced a risk model for Accidental Dwelling Fires, which is used by MFRS to prioritise domestic dwellings for visits by the Fire Service with the aim of fire prevention and to identify referral needs to other support services (Taylor et al 2016).

Key publications:

Taylor M, Higgins E, Lisboa P, Jarman I, Hussain A. 2016. Community fire prevention via population segmentation modelling COMMUNITY DEVELOPMENT JOURNAL, 51: 229-247.

Clinical decision support

We have focused on two specific specialist domains in oncology. The development and application of flexible models of survival enables the detailed modelling of event rates, or hazard functions, which are useful both to gain insights into response to therapy but also to better inform the choice of adjuvant therapy. Uniquely, we provide interpretable models for support vector machines (Van Belle and Lisboa 2014).

In parallel, we introduced source signal separation as a methodology for tissue identification from Magnetic Resonance Spectra (MRS) in brain oncology. This research is closely related to practical clinical interfaces developed by our collaborators in the Universitat Autónoma de Barcelona and Universitat Politécnica de Catalunya and is currently funded by a Marie Curie Fellowship. (Delgado-Goñi et al, 2016, Ortega-Martorell et 2013, 2012 a,b).

Key publications:

Van Belle V, Lisboa P. 2014. White box radial basis function classifiers with component selection for clinical prediction models. Artificial Intelligence in Medicine, 60:53-64.

Delgado-Goñi T, Ortega-Martorell S, Ciezka M, Olier I, Candiota AP, Julià-Sapé M, Fernández F, Pumarola M, Lisboa PJ, Arús C. 2016. MRSI-based molecular imaging of therapy response to temozolomide in preclinical glioblastoma using source analysis NMR in Biomedicine, 29:732-743.

Ortega-Martorell S, Ruiz H, Vellido A, Olier I, Romero E, Julia-Sape M, Martin JD, Jarman IH, Arus C, Lisboa PJG. 2013. A Novel Semi-Supervised Methodology for Extracting Tumor Type-Specific MRS Sources in Human Brain Data PLOS ONE, 8.

Ortega-Martorell S, Lisboa PJ, Vellido A, Simões RV, Pumarola M, Julià-Sapé M, Arús C. 2012a Convex non-negative matrix factorization for brain tumor delimitation from MRSI data. PLoS One, 7:e47824.

Ortega-Martorell S, Lisboa PJ, Vellido A, Julià-Sapé M, Arús C. 2012b. Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours. BMC Bioinformatics, 13:38.

Human resources (HR) analytics

HR analytics provides an increased opportunity for HR professionals to contribute strategically to their businesses through the linking of employee insight data to organisational level performance outcomes. However, the HR function is experiencing challenges which goes beyond the immediate technical and IT investments that become necessary, or management and cultural concerns.

The Data Science Research Centre provides expertise, new tools and methods for analysing HR data such as employee engagement, training, recruitment, retention, EDI etc. with the aim of providing employee insight which can help define the link between people practices and performance.

Key publications:

Sparrow, P. R., Otaye-Ebede, L. E. and Chen, P., The HR Analytics Journey: Is The HR Architecture Fit For Purpose? Presented at the CIPD conference for applied research 2015, London, December, 2015.

Otaye-Ebede, L. (In press). Antecedents and Outcomes of Managing Diversity in a UK Context: Test of a Mediation Model. International Journal of Human Resource Management.

Guillaume, Y. R. F., Dawson, J. F., Otaye-Ebede, L., Woods, S. A., and West, M. A. (2015). Harnessing demographic differences in organizations: what moderates the effects of workplace diversity?. Journal of Organizational Behavior. 10.1002/job.2040

Otaye, L., and Wong, W. (2014). Mapping the contours of fairness: The impact of unfairness and leadership (in) action on job satisfaction, turnover intention and employer advocacy. Journal of Organizational Effectiveness: People and Performance, 1(2), 191-204.

Aryee, S., Walumbwa, F. O., Seidu, E. Y., and Otaye, L. E. (2013). Developing and Leveraging Human Capital Resource to Promote Service Quality Testing a Theory of Performance. Journal of Management. 10.1177/ 0149206312471394.

Aryee, S., Walumbwa, F. O., Seidu, E. Y., and Otaye, L. E. (2012). Impact of high-performance work systems on individual-and branch-level performance: test of a multilevel model of intermediate linkages. Journal of Applied Psychology, 97(2), 287.

Sport analytics

We provide analytical expertise at operational and strategic levels to the Football Exchange in the Research Institute for Sport and Exercise Sciences at LJMU and to the Advisory Board for the Performance Lab at Prozone, a member of the leading sports data and technology company STATS.

A recent Knowledge Transfer Partnership grant with Prozone introduced the first multivariate model for automatic identification of playing style from annotated football match data. This model is currently undergoing evaluation by coaching staff from the English Premier League (EPL). Further, a quantitative model of player performance was also developed which lays a solid foundation for player ranking using data of a granularity never before possible for elite football. This project won the Educate North Award for Commercial Engagement.

Moreover, high performance sport increasingly relies on measurement data to monitor and profile players not only by performance but also by risk of injury. Analytical risk models make use of structure finding algorithms combined with rigorous failure time models and advanced visualisation methods, in order to systematise the analysis of complex databases and to make this available to Sports specialists in a way that is informative and readily understood. A co-sponsored PhD project with a major EP team has been secured and is currently recruiting (Nerdergaard et al 2016, Datson et al 2016).

Key publications:

Datson N, Drust B, Weston M, Jarman I, Lisboa P, Gregson W. 2016. Match physical performance of elite female soccer players during international competition. Journal of strength and conditioning research/National Strength and Conditioning Association.

Nedergaard NJ, Robinson MA, Eusterwiemann E, Drust B, Lisboa PJ, Vanrenterghem J. 2016. The Relationship Between Whole-Body External Loading and Body-Worn Accelerometry During Team Sports Movements. International journal of sports physiology and performance.


Meet the members of our research centre.

Loading staff profiles…


Search publications written by members of the Data Science Research Centre.

Search for a research paper

48 papers found

  • A Data Science and Machine Learning Approach to Measure and Monitor Physical Activity in Children

    Fergus P, Hussain A, Al-Jumeily D and kaky A and Lunn J

    Publish date:2017

  • A Data Science and Machine Learning Approach to Measure and Monitor Physical Activity in Children

    Fergus P, Hussain A, Al-Jumeily D and kaky A and Lunn J

    Publish date:2017

  • A machine learning approach to measure and monitor physical activity in children

    Fergus P, Hussain AJ, Hearty J, Fairclough S, Boddy L, Mackintosh K, Stratton G, Ridgers N, Al-Jumeily D and Aljaaf AJ and Lunn J

    Publish date:2017

  • A machine learning approach to measure and monitor physical activity in children

    Fergus P, Hussain AJ, Hearty J, Fairclough S, Boddy L, Mackintosh K, Stratton G, Ridgers N, Al-Jumeily D and Aljaaf AJ and Lunn J

    Publish date:2017

  • A performance evaluation of systematic analysis for combining multi-class models for sickle cell disorder data sets

    Khalaf M, Hussain AJ, Al-Jumeily D, Keight R, Keenan R, Al Kafri AS, Chalmers C and Fergus P and Idowu IO

    Publish date:2017

  • A performance evaluation of systematic analysis for combining multi-class models for sickle cell disorder data sets

    Khalaf M, Hussain AJ, Al-Jumeily D, Keight R, Keenan R, Al Kafri AS, Chalmers C and Fergus P and Idowu IO

    Publish date:2017

Contact us

If you need any further information about the Data Science Research Centre, please contact Professor Paulo Lisboa.