Statistics

Research Topics

  • Statistical estimation and inference
  • Likelihood methods
  • Minimum distance methods
  • Bayesian methods
  • Methods for high-dimensional/big data
  • Statistical learning
  • Data mining and machine learning
  • Classification and clustering
  • Dimension reduction and feature selection
  • Pattern recognition
  • Experimental design
  • Reliability and survival analysis
  • Extreme-value analysis
  • Stochastic process
  • Statistical engineering
  • Financial statistics
  • Industrial statistics

Research Spotlight: Dr. Thierry Chekouo

Thierry Chekouo

My research interests are in developing new statistical frameworks for analyzing datasets characterized by high dimensionality and complex structures. I am particularly interested in the development of novel Bayesian methodologies motivated by real problems in integromics, imaging genetics and genomics. Many of these methods have incorporated biological and external knowledge through prior distributions.

A. Development of innovative Bayesian statistical methods for biclustering.

I have developed innovative statistical methods for biclustering that aims to cluster simultaneously rows and columns of a data matrix. In (A1), we proposed a Bayesian biclustering model that incorporates gene-gene relationship (using gene ontologies) through prior distributions when applied to gene expression data. We developed a hybrid MCMC (Markov chain Monte Carlo) procedure that mixes the Metropolis–Hastings sampler with a variant of the Wang–Landau algorithm. A theoretical proof of the convergence of this algorithm was provided. In (A2), we shed light on associated statistical models behind the biclustering algorithms. It turns out that most of the known techniques have a hidden Bayesian flavor. We then proposed a Bayesian biclustering model that controls the degree of overlapping between biclusters. We applied our methods to gene expression databases, with the aim of confirming known or finding novel subnetworks of proteins/genes associated with disease.

(A1) Chekouo T, Murua A and Raffelsberger W. The Gibbs-plaid biclustering model. The Annals of Applied statistics. 2016; 9:1643–1670.

(A2) Chekouo T and Murua A. The Penalized Biclustering model and Related Algorithms. Journal of Applied Statistics. 2015; 42(6):12551277.
 

B. Development of innovative statistical methods for integration of multiplatform -omic data.

I have developed innovative statistical methods for jointly analyzing multiple type of genomic data such as mRNA expression, DNA methylation data and miRNA expression. The methods are capable of identifying a small set of prognostic markers that are associated with clinical outcomes (e.g. survival data). In (B1), our approach is built in a Bayesian framework and incorporate the complex dependence between data types through prior distributions. In (B2), the novel integrative Bayesian approach fully exploits the amount of available information across platforms and does not exclude any of the subjects from the analysis. By applying our methods to Kidney cancer data, we were able to identify and validate some biomarkers that are predictive for the disease.

(B1) Chekouo T, Stingo FC, Doecke JD and Do KA. miRNA-target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer. Biometrics. 2015 Jun;71(2):42838. PubMed PMID: 25639276; PubMed Central PMCID: PMC4499566.

(B2) Chekouo T, Stingo FC, Doecke JD and Do KA. A Bayesian integrative approach for multiplatform genomic data: A kidney cancer case study. Biometrics. 2017 Jun;73(2):615624. PubMed PMID: 27669160.
 

C. Development of innovative statistical methods for imaging-genetics.

I have developed an innovative statistical method for jointly analyzing imaging and genetic data. In (C1), we propose an integrative Bayesian risk predictive model that combines both single nucleotide polymorphism (SNP) arrays and functional magnetic resonance imaging (fMRI). By incorporating the dependence between imaging and genetic data, the method allows us to discriminate between individuals with schizophrenia and healthy controls, based on a sparse set of discriminatory regions of interests and SNPs. In terms of prediction and feature selection, we found our approach to outperform competing methods that do not use the dependence fMRI-SNP to the selection of discriminatory markers.

(C1) Chekouo T, Stingo F, Guindani M and Do K. A Bayesian predictive model for imaging genetics with application to schizophrenia. The Annals of Applied Statistics. 2016; 10(3):15471571.


D. Development of innovative statistical methods for simultaneous clustering and variable selection.

In (D1), inspired by the plaid biclustering model, we proposed a model that performs simultaneously clustering and variable selection. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it especially suitable for gene expression, where genes may participate in multiple biological pathways. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. An application of our approach to the gene expression data of the kidney recall cell carcinoma validates some previously identified cancer biomarkers.

(D1) Chekouo T and Murua A. High-dimensional variable selection with the plaid mixture model for clustering, Computational Statistics. 2018; 33(3):14751496

 

E. Bayesian approaches for detecting patterns of markers for extremely small sample sizes

In (E1), we have developed an innovative Bayesian approach that efficiently identifies patterns of markers with similar patterns of biological relevance. Motivated by the availability of ion mobility mass spectrometry data on cell line experiments in myelodysplastic syndromes and acute myeloid leukemia from the "Moon Shots" Program at MD Anderson Cancer Center, our methodology can identify protein markers that follow biologically meaningful trends. Extensive simulation studies demonstrate the good performance of the proposed method even in the presence of relatively small treatment effects and sample sizes.

(E1) Chekouo T, Stingo F., Class C., Yan Y., Bohannan Z., Wei Y., Garcia Manero, G., Hanash S. and Do K. Investigating Protein Patterns in Human Leukemia Cell Line Experiments: A Bayesian Approach for Extremely Small Sample Sizes. 2019, Statistical methods in medical research, DOI: 10.1177/0962280219852721.  

.