The overarching goal of our brain imaging and genetics lab is to enable effective mining of big yet imperfect brain imaging and multi-omic data for biomarker discovery, early diagnosis and mechanism exploration of Alzheimer’s and other complex dieases. In particular, we are interested in curating and leveraging various prior knowledge of brain and biological networks to maximize the usage of these imperfect big data.

Here are some projects that we currently work on:

Computational Methods to Mine Multi-omic Data for Systems Biology of Complex Diseases

We have been developing new computational methods to enable the integration of large scale heterogeneous multi-omic data with rich domain knowledge for better biomarker and association discovery. In particular, we developed a novel biological knowledge guided structured sparse learning model together with large-scale optimization methods to integrate -omic data and biological networks from multiple sources and discover -omic modules involving heterogeneous biomarkers for accurately predicting outcomes of interest. In addition, we coupled multi-task learning with structured sparse association models to jointly learn the bi-multivariate associations between imaging phenotypes and -omic features with dense functional connections for multiple groups. The project contributes to a new solution framework spanning the areas of machine learning, data mining and network science, and also provide novel perspectives as to how to effectively integrate the large-scale and heterogeneous -omic data for a systems biology of complex diseases.

Computational strategies for incompleteness and heterogeneity in multi-omic data.

Multi-omics refers to the integrative analysis of multiple types of -omics data (e.g., genotype, gene expression and protein expression). Increasing multi-omic data provides opportunities for discovery of disease biomarkers from multiple molecular scales and therefore can further our understanding of underlying disease mechanisms. Despite this great potential, existing multi-omic data collections are mostly incomplete and of heterogeneous types (e.g., continuous and categorical numbers). Integrating these data for joint analysis typically requires exclusion of many subjects with missing values; as a consequence, a large chunk of data remains unused. This project provides novel perspectives in handling the incompleteness and heterogeneity problems in multi-omics data and hereafter allow biomedical researchers to gain more insights from rapidly growing yet imperfect biomedical data. This project aims to develop new classes of computational methods to enable the joint mining of incomplete and heterogeneous multi-omic data by leveraging various biological networks for discovery of functionally connected biomarke

Gene co-expression underlying the connectomic alterations in Alzheimer’s disease.

A brain connectome at the macroscale is typically represented as networks, where nodes are brain regions of interest (ROIs) and links indicate their functional or structural connections. Both functional and structural brain network architecture are heritable and found disrupted in AD or its prodromal stage. Recent availability of brain-wide transcriptome data has made possible another type of brain connectome, brain co-expression network, which captures spatial variations in gene expression with links as transcriptional coupling between ROIs. Some studies showed that co-expression network is closely connected to structural and functional brain networks. However, the genes inducing such connection remains unknown. Identification of these genes will transform our understanding of the biological underpinnings of altered neural system in AD and can exert a huge impact on the development of new diagnostic, therapeutic and preventative approaches for AD. The complexity of network data, however, has presented critical computational challenge requiring new concepts and enabling approaches. To address these challenges, we propose novel integrative approaches to identify the genes underlying the association between co- expression networks and AD-altered networks. By leveraging the brain-wide transcriptome data, we will learn a small set of genes whose co-expression patterns across ROIs can best explain their altered connections in AD.

Integrative Predictive Modeling of Alzheimer’s Disease.

Alzheimer’s disease (AD) is a medical emergency that has, to date, proven impossible to defeat. Being able to accurately predict disease progression in the early symptomatic stages will critically advance our field. Predictive models abiding to a salient disease feature (e.g. amyloid deposition) would, by design, offer narrow- scope advances and will invariably come short of accurate disease modeling and outcome prediction. The present application proposes a systems-level, multimodal approach to identify promising imaging-genetics biomarkers that will reliably predict cognitive decline at early disease stages. Our long-term research goal is to develop a method for cost-efficient risk assessment and predictive modeling and to implement it in therapeutic drug development. The overall objective of this application is to develop an integrative predictive framework for mild cognitive impairment (MCI) based on biomarker signatures. Our central hypothesis is that our state-of-the- art statistical and topological multimodal data analysis will significantly improve the diagnostic and predictive accuracy in MCI and help close this knowledge gap in AD pathogenesis. To this end, we propose to characterize the gene expression patterns and neuroimaging endophenotypes in MCI using persistent homology and develop a multi-kernel learning framework for diagnostic prediction of MCI and its progression to AD dementia. The methods proposed in this application offer significant advances over the status-quo by utilizing contemporary state-of-the-art analytic approaches such as persistent homology-based topological surface analysis and Bayesian multi-kernel learning framework. We will also rely on the integration of prior knowledge, whereby augmenting the strength of data-driven methods with interpretable domain expertise.

… and more.