Our research addresses the following questions:
(1) How to learn sensible representations of high dimensional data. Representation learning is an emerging technique largely based on unsupervised learning to learn the internal structure of the data. The outcome will be new (usually lower dimensional) data that enable more effective downstream machine learning tasks.
(2) How to identify association and causality using data mining techniques. Identifying association and causality is a long-standing field in statistics and has broad applications in genomics and precision medicine. A specific focus of our research is stably identifying them in the presence of noise and complicated correlation structures.
(3) How to carry out statistical inference in multi-scale omics data (e.g., single-cell sequencing). Integrating multiple layers of data may represent the current trend of statistical inference. We focus on how to integrate -omics data including genomics, transcriptomics, epigenomics and other -omics data to form predictors. In particular, we are interested in analyzing single-cell -omics data to infer within-tissue and within-individual dynamics at the single-cell resolution.
My current research focuses on developing statistical methods to
- analyze lifetime data involving latent processes where the underlying disease may resolve while some covariates are incompletely observed or subject to misclassification to avoid ignorance of patient heterogeneity, biased estimates and invalid inference,
- develop joint models for classification and prediction based on mixed measurements involving surrogate classifiers or observations subject to measurement error to produce higher accuracy and precision in subgroup attribution or diagnostic test,
- conduct causal inference using advanced statistical learning methods to address the complications of having missing and/or misclassified confounders to produce unbiased estimates of treatment effect,
- propose advanced and adaptive methods for variable selection and group-variable selection in recurrent event analysis and survival analysis and investigate their oracle properties, and
- model the longitudinal data and survival data or multivariate lifetime time jointly and propose computationally efficient methods for algorithm implementation and statistical inference.
I am keen in supporting medical research through transdisciplinary partnership. My collaborators include epidemiologists, oncologists, radiologists, medical physicists, gastroenterologists, cardiovascularists, and rheumatologists. Researchers in other areas are also welcome to contact me for prospective collaboration.
I am interested in working with students at both graduate and undergraduate levels. Students with good work ethic, strong interests and solid background in statistics, biostatistics, applied mathematics, computer science and other related areas are welcome to make inquiries about graduate studies or post-doctoral positions.