Combinatorial inference
Combinatorial inference develops novel theories on high dimensional universality phenomenon and combinatorial information-theoretical lower bounds and proposes methods to conduct the global inferential analysis on the topological properties of the graphs in the graphical models and other combinatorial structures.
The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control
[arXiv] [Package] |
Ranking of Large Language Model with Nonparametric Prompts
[arXiv] |
StarTrek: Combinatorial Variable Selection with False Discovery Rate Control
The Annals of Statistics 52.1: 78-102. 2024 [Journal] [Package] |
Lagrangian Inference for Ranking Problems
Operations Research 71.1: 202-223. 2022 [arXiv] [Journal] [Package] |
Inferring Differential Hub Nodes on Differential Gaussian Graphical Models.
Statistica Sinica. 35(4), 2023. [Journal] |
Combinatorial-Probabilistic Trade-Off: Community Properties Test in the Stochastic Block Models
Conference version: International Conference on Learning Representations (spotlight paper). [Video] Journal version: IEEE Transactions on Information Theory, 2023. [Journal] |
Inference on the optimal assortment in the multinomial logit model
ACM Conference on Economics and Computation, 2023. [arXiv] [Journal] |
Computational and Statistical Tradeoffs in Inferring Combinatorial Structures of Ising Model International Conference on Machine Learning, pp. 4901-4910. PMLR, 2020. [Journal] |
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
Journal of the American Statistical Association, 116(534), 746-755, 2021 [arXiv][Journal] |
Sketching Method for Large Scale Combinatorial Inference
Advances in Neural Information Processing Systems 31, 10598-0607, 2018 [Journal] |
Combinatorial Inference for Graphical Models
(*: equal contribution) Annals of Statistics, 47(2), pp.795-827, 2018 [arXiv] [Journal] |
Complex Biomedical Data Analysis
Modern statistical methods deal with biomedical datasets with complex structures including high-dimensionality, heterogeneity, heavy-tailness, time-dependency and so on. Complex data inference aims to develop a new generation of inferential methods like hypothesis testing, confidence intervals and false discovery control for complex datasets.
Electronic Health Records
ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis
Journal of Biomedical Informatics, 2025+ [Package] |
DOME: Directional Medical Embedding Vectors from Electronic Health Records.
Journal of Biomedical Informatics, 162, 104768, 2025 [Journal] |
LATTE: Label-efficient incident phenotyping from longitudinal electronic health records
Patterns, 5(1), 2024. [Journal] [Package] |
Prompt Discriminative Language Models for Domain Adaptation
Proceedings of the 5th Clinical Natural Language Processing Workshop, pp. 247-258. 2023. [Journal] |
Multi-source Learning via Completion of Block-wise
Overlapping Noisy Matrices
Journal of Machine Learning Research. 24(221), 1-43, 2023. [Journal] [Package] |
Multimodal representation learning for predicting molecule–disease relations
Bioinformatics, 39(2), btad085. 2023. [Journal] |
Penalized estimation of frailty-based illness–death models for semi-competing risks
Biometrics, 79(3), 1657-1669, 2023 [Journal] |
Multiview Incomplete Knowledge Graph Integration with Application to Cross-institutional EHR Data Harmonization
(*: co-senior author) Journal of Biomedical Informatics 133: 104147. 2022. [Journal] |
Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data.
, NPJ digital medicine 4, no. 1, 151. 2021 [Journal] [Package] |
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
Advances in Neural Information Processing Systems 33: 18967-18977, 2020. [Journal] |
Medical Imaging Data
The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control
[arXiv] [Package] |
Progression of traction bronchiectasis/bronchiolectasis in interstitial lung abnormalities is associated with increased all-cause mortality: Age Gene/Environment Susceptibility-Reykjavik Study.
European journal of radiology open 8 100334, 2021 [Journal] |
Interstitial lung abnormalities in patients with stage I non-small cell lung cancer are associated with shorter overall survival: the Boston lung cancer study.
Cancer Imaging 21, no. 1 1-7, 2021 [Journal] |
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
Journal of the American Statistical Association, 116(534), 746-755, 2021 [arXiv][Journal] |
Estimating and inferring the maximum degree of stimulus-locked time-varying brain connectivity networks.
Biometrics. Jun;77(2):379-390, 2020 [Journal] |
Environmental Mobile Health
Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery
Annals of Applied Statistics, to appear. [arXiv] |
Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
[arXiv] [Package] |
Genomic Data
FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data
[arXiv] [Package] |
StarTrek: Combinatorial Variable Selection with False Discovery Rate Control
The Annals of Statistics 52.1: 78-102. 2024 [Journal] |
Inferring Differential Hub Nodes on Differential Gaussian Graphical Models.
Statistica Sinica. 35(4), 2023. [Journal] |
Inferential Machine Learning
Modern data analysis introduces novel estimation methods including distributed algorithms, privacy methods, kernel estimators, etc. We aims to develop inference method to handle the uncertainty assessment under these algorithms.
Federated Learning
Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
[arXiv] [Package] |
Federated Offline Reinforcement Learning
Journal of the American Statistical Association, 2024. [Journal] |
FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data
[arXiv] |
Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
[arXiv] [Package] |
Deep Learning
Graph over-parameterization: Why the graph helps the training of deep graph convolutional network
Neurocomputing, 534, 77-85. 2023. [Journal] |
Heteroskedastic and imbalanced deep learning with adaptive regularization.
International Conference on Learning Representations. 2021 [arXiv] |
On tighter generalization bound for deep neural networks: Cnns, resnets, and beyond.
[arXiv] |
Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization
IEEE Transactions on Information Theory, 65(6):3489-3514, 2019. [arXiv][Journal] |
Reinforcement Learning
Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery
[arXiv] |
Federated Offline Reinforcement Learning
Journal of the American Statistical Association, 2024. [Journal] |
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
Advances in Neural Information Processing Systems 33: 18967-18977, 2020. [Journal] |