A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioral data - 07/05/18

Doi : 10.1016/j.respe.2018.03.306

L. Antelmi ^a,^⁎ , M. Lorenzi ^a, V. Manera ^b,^c, P. Robert ^c,^d,^e, N. Ayache ^a
^a UCA, Inria Sophia Antipolis, Epione Research Project, Sophia Antipolis, France
^b UCA, Inria Sophia Antipolis, Stars Research Project, Sophia Antipolis, France
^c CoBTeK, université de Nice Sophia Antipolis, Nice, France
^d Centre Mémoire, Nice, France
^e CHU de Nice, Nice, France

^⁎Corresponding author.

Résumé

Introduction

The aim of this study is to develop a generative and probabilistic statistical learning model for the joint analysis of heterogeneous biomedical data. The model will be applied to the investigation of neurological disorders from collections of brain imaging, body sensors, biological and clinical data available in current large-scale health databases. The resulting methodological framework will be tested on the UK Biobank, as well as on pathology-specific clinical data, as provided by the ADNI, or INSIGHT initiatives.

Methods

We propose a variational approximation of Bayesian Canonical Correlation Analysis (CCA). The proposed formulation is inspired by current advanced in variational learning, and offers the potential to scale to high-dimensional observations, such as medical images and arrays of biological data. We proved that the variational lower bound can be optimized through modern learning libraries such as Torch and TensorFlow.

Results

We currently benchmarked the method with respect to classical CCA on both synthetic data and on the classical benchmarking datasets in machine learning (IRIS dataset). With respect to the synthetic dataset (Fig. 1A), we observed a strong agreement between the score components computed with classical CCA and our method. Moreover, the classification results on IRIS showed that the two methods essentially provide the same latent representation (Fig. 1B).

Conclusion

Our method shows promising results for the future application to medical data. The method is computationally efficient and scalable, hence able to process complex multivariate multidimensional datasets. We expect to highlight meaningful relationship among biomarkers that could be used to develop optimal strategies for disease classification, quantification, and prediction. In the future, the proposed approach will be tested in several experimental settings :

– classification/stratification ;

– prediction and imputation from a set of observed data (e.g., predict biological and clinical output from medical imaging information).

Le texte complet de cet article est disponible en PDF.

Keywords : CCA, Statistical learning

Plan

Disclosure of interest

Export

Vol 66 - N° S3

P. S180 - mai 2018 Retour au numéro

Article précédent

Plug-Stat^® : un nouveau logiciel statistique sur mesure pour mieux valoriser les données de cohortes
F. Le Borgne, M.-C. Fournier, C. Loncle, Y. Foucher

| Article suivant

Analyse statistique de données radiomiques et métabolomiques : prédiction des lésions mammaires triple-négatives
F. Orlhac, O. Humbert, T. Pourcher, L. Jing, J.-M. Guigonis, J. Darcourt, N. Ayache, C. Bouveyron

Bienvenue sur EM-consulte, la référence des professionnels de santé.
L’accès au texte intégral de cet article nécessite un abonnement.

Déjà abonné à cette revue ?

connectez-vous ou créez un compte

A method for statistical learning in large databases of heterogeneous imaging, cognitive and behavioral data - 07/05/18

Résumé

Introduction

Methods

Results

Conclusion

Plan

Export citations

Fichier

Contenu

Accès rapides

Mon compte

Aide & support

Plateformes Elsevier Masson

Déclaration CNIL