Research

Combinatorial inference


Combinatorial inference develops novel theories on high dimensional universality phenomenon and combinatorial information-theoretical lower bounds and proposes methods to conduct the global inferential analysis on the topological properties of the graphs in the graphical models and other combinatorial structures.

The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control
J. Liang, Y. Liu, D. Zhou, S. Zhang, J. Lu
[arXiv] [Package]
Ranking of Large Language Model with Nonparametric Prompts
Z. Wang, Y. Han, E. X. Fang, L. Wang, J. Lu
[arXiv]
StarTrek: Combinatorial Variable Selection with False Discovery Rate Control
L. Zhang and J. Lu
The Annals of Statistics 52.1: 78-102. 2024
[Journal] [Package]
Lagrangian Inference for Ranking Problems
Y. Liu, E.X. Fang, J. Lu
Operations Research 71.1: 202-223. 2022
[arXiv] [Journal] [Package]
Inferring Differential Hub Nodes on Differential Gaussian Graphical Models.
X. Zhou, K.M. Tan, J. Lu
Statistica Sinica. 35(4), 2023.
[Journal]
Combinatorial-Probabilistic Trade-Off: Community Properties Test in the Stochastic Block Models
S. Shen, J. Lu
Conference version: International Conference on Learning Representations (spotlight paper). [Video]
Journal version: IEEE Transactions on Information Theory, 2023.
[Journal]
Inference on the optimal assortment in the multinomial logit model
X. Chen S. Shen, E. Fang, J. Lu
ACM Conference on Economics and Computation, 2023.
[arXiv] [Journal]
Computational and Statistical Tradeoffs in Inferring Combinatorial Structures of Ising Model
J. Ying, Z. Wang, J. Lu
International Conference on Machine Learning, pp. 4901-4910. PMLR, 2020.
[Journal]
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
C. Ma, J. Lu, H. Liu
Journal of the American Statistical Association, 116(534), 746-755, 2021
[arXiv][Journal]
Sketching Method for Large Scale Combinatorial Inference
W. Sun, J. Lu, H. Liu.
Advances in Neural Information Processing Systems 31, 10598-0607, 2018
[Journal]
Combinatorial Inference for Graphical Models
M. Neykov*, J. Lu*, H. Liu. (*: equal contribution)
Annals of Statistics, 47(2), pp.795-827, 2018
[arXiv] [Journal]

Complex Biomedical Data Analysis


Modern statistical methods deal with biomedical datasets with complex structures including high-dimensionality, heterogeneity, heavy-tailness, time-dependency and so on. Complex data inference aims to develop a new generation of inferential methods like hypothesis testing, confidence intervals and false discovery control for complex datasets.

Electronic Health Records


ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis
Z. Gan, D. Zhou, E. Rush, V. Panickan, Y. Ho, G. Ostrouchov, Z. Xu, S. Shen, X. Xiong, K. Greco, C. Hong, C. Bonzel, J. Wen, L. Costa, T. Cai, E. Begoli, Z. Xia, M G, K. Liao, K. Cho, T. Cai, J. Lu
Journal of Biomedical Informatics, 2025+
[Package]
DOME: Directional Medical Embedding Vectors from Electronic Health Records.
J. Wen, H. Xue, E. Rush, V. A. Panickan, T. Cai, D. Zhou, Y. Ho, L. Costa, E. Begoli, C. Hong, J. M. Gaziano, K. Cho, K. Liao, J. Lu, T. Cai
Journal of Biomedical Informatics, 162, 104768, 2025
[Journal]
LATTE: Label-efficient incident phenotyping from longitudinal electronic health records
J. Wen, J, Hou, CL Bonzel, ..., J. Lu, K. Cho, T. Cai
Patterns, 5(1), 2024.
[Journal] [Package]
Prompt Discriminative Language Models for Domain Adaptation
K. Lu, P. Potash, X. Lin, Y. Sun, Z. Qian, Z. Yuan, T. Naumann, T. Cai, J. Lu
Proceedings of the 5th Clinical Natural Language Processing Workshop, pp. 247-258. 2023.
[Journal]
Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices
D. Zhou, T. Cai, J. Lu
Journal of Machine Learning Research. 24(221), 1-43, 2023.
[Journal] [Package]
Multimodal representation learning for predicting molecule–disease relations
J Wen, X Zhang, E Rush, V A Panickan, X Li, T Cai, D Zhou, Y Ho, L Costa, E Begoli, C Hong, J Gaziano, K Cho, J. Lu, K Liao, M Zitnik, T Cai
Bioinformatics, 39(2), btad085. 2023.
[Journal]
Penalized estimation of frailty-based illness–death models for semi-competing risks
H.T. Reeder, J. Lu, S.Haneuse
Biometrics, 79(3), 1657-1669, 2023
[Journal]
Multiview Incomplete Knowledge Graph Integration with Application to Cross-institutional EHR Data Harmonization
D. Zhou, Z. Gan, X. Shi, A. Patwari, E. Rush, CL. Bonzel, V. A. Panickan, C. Hong, YL. Ho, T. Cai, L. Costa, X. Li, V.M. Castro, S.N. Murphy, G. Brat, G. Weber, P. Avillach, J.M. Gaziano, K. Cho, K. Liao, J. Lu*, T. Cai* (*: co-senior author)
Journal of Biomedical Informatics 133: 104147. 2022.
[Journal]
Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data.
C. Hong, E. Rush, M. Liu, D. Zhou , J. Sun, A. Sonabend, V. M. Castro, P. Schubert, V. Panickan, T. Cai, L. Costa, Z. He, N. Link, R. Hauser, J.M. Gaziano, S. Murphy, G. Ostrouchov, Y. Ho, E. Begoli, J. Lu, K. Cho, K. Liao, T. Cai
, NPJ digital medicine 4, no. 1, 151. 2021
[Journal] [Package]
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
A. Sonabend W., J. Lu, L.A. Celi, T. Cai, P. Szolovits
Advances in Neural Information Processing Systems 33: 18967-18977, 2020.
[Journal]

Medical Imaging Data


The Wreaths of KHAN: Uniform Graph Feature Selection with False Discovery Rate Control
J. Liang, Y. Liu, D. Zhou, S. Zhang, J. Lu
[arXiv] [Package]
Progression of traction bronchiectasis/bronchiolectasis in interstitial lung abnormalities is associated with increased all-cause mortality: Age Gene/Environment Susceptibility-Reykjavik Study.
H. Takuya, T. Hida, M. Nishino, J. Lu, R. Putman, E.F. Gudmundsson, A. Hata
European journal of radiology open 8 100334, 2021
[Journal]
Interstitial lung abnormalities in patients with stage I non-small cell lung cancer are associated with shorter overall survival: the Boston lung cancer study.
H. Tomoyuki, A. Hata, J. Lu, V. Valtchinov, T. Hino, M. Nishino, H. Honda, N. Tomiyama, D. C. Christiani, H. Hatabu.
Cancer Imaging 21, no. 1 1-7, 2021
[Journal]
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
C. Ma, J. Lu, H. Liu
Journal of the American Statistical Association, 116(534), 746-755, 2021
[arXiv][Journal]
Estimating and inferring the maximum degree of stimulus-locked time-varying brain connectivity networks.
KM Tan, J. Lu, T. Zhang, H. Liu
Biometrics. Jun;77(2):379-390, 2020
[Journal]

Environmental Mobile Health


Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery
P. Emedom-Nnamdi, T. Smith, J-P. Onnela, J. Lu
Annals of Applied Statistics, to appear.
[arXiv]
Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
A. Sonabend, J. Zhang, J. Schwartz, B. A. Coull, J. Lu
[arXiv] [Package]

Genomic Data


FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data
S. Shen, J. Lu, X. Lin
[arXiv] [Package]
StarTrek: Combinatorial Variable Selection with False Discovery Rate Control
L. Zhang and J. Lu
The Annals of Statistics 52.1: 78-102. 2024
[Journal]
Inferring Differential Hub Nodes on Differential Gaussian Graphical Models.
X. Zhou, K.M. Tan, J. Lu
Statistica Sinica. 35(4), 2023.
[Journal]

Inferential Machine Learning


Modern data analysis introduces novel estimation methods including distributed algorithms, privacy methods, kernel estimators, etc. We aims to develop inference method to handle the uncertainty assessment under these algorithms.

Federated Learning


Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
A. Sonabend, J. Zhang, J. Schwartz, B. A. Coull, J. Lu
[arXiv] [Package]
Federated Offline Reinforcement Learning
D. Zhou, Y. Zhang, A. Sonabend, Z. Wang, J. Lu, T. Cai
Journal of the American Statistical Association, 2024.
[Journal]
FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data
S. Shen, J. Lu, X. Lin
[arXiv]
Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects
A. Sonabend, J. Zhang, J. Schwartz, B. A. Coull, J. Lu
[arXiv] [Package]

Deep Learning


Graph over-parameterization: Why the graph helps the training of deep graph convolutional network
Y Lin, S Li, J Xu, J Xu, D Huang, W Zheng, Y Cao, J. Lu
Neurocomputing, 534, 77-85. 2023.
[Journal]
Heteroskedastic and imbalanced deep learning with adaptive regularization.
K. Cao, Y. Chen, J. Lu, N. Arechiga, A. Gaidon, T. Ma
International Conference on Learning Representations. 2021
[arXiv]
On tighter generalization bound for deep neural networks: Cnns, resnets, and beyond.
X. Li, J. Lu, Z. Wang, J. Haupt, T. Zhao
[arXiv]
Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization
X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, T. Zhao.
IEEE Transactions on Information Theory, 65(6):3489-3514, 2019.
[arXiv][Journal]

Reinforcement Learning


Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery
P. Emedom-Nnamdi, T. Smith, J-P. Onnela, J. Lu
[arXiv]
Federated Offline Reinforcement Learning
D. Zhou, Y. Zhang, A. Sonabend, Z. Wang, J. Lu, T. Cai
Journal of the American Statistical Association, 2024.
[Journal]
Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation
A. Sonabend W., J. Lu, L.A. Celi, T. Cai, P. Szolovits
Advances in Neural Information Processing Systems 33: 18967-18977, 2020.
[Journal]