Bayesian MI-LASSO for Variable Selection on Multiply-Imputed Data

Aug 6, 2025·

Jungang Zou

Sijian Wang

Qixuan Chen

· 0 min read

Abstract

Multiple imputation is widely used for handling missing data in real-world applications. For variable selection on multiply-imputed datasets, however, if selection is performed on each imputed dataset separately, it can result in different sets of selected variables across datasets. MI-LASSO, one of the most commonly used approaches to this problem, regards the same variable across all separate imputed datasets as a group variable and exploits the group LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend MI-LASSO to a Bayesian framework and propose four Bayesian MI-LASSO models for variable selection on multiply-imputed data, including three shrinkage prior-based and one Spike-Slab prior-based methods. To further support robust variable selection, we develop a four-step projection predictive variable selection procedure that avoids ad hoc thresholding and facilitates valid post-selection inference. Simulation studies showed that the Bayesian MI-LASSO outperformed MI-LASSO and other alternative approaches, achieving higher specificity and lower mean squared error across a range of settings. We further demonstrated these methods via a case study using a multiply-imputed dataset from the University of Michigan Dioxin Exposure Study. The R package BMIselect is available on CRAN.

Type

Preprint

Last updated on Aug 6, 2025

Bayesian Models, Group LASSO, Multiple Imputation, Projection Predictive Variable Selection.

Authors

Jungang Zou (he/him)

Ph.D. student

← Nonparametric Bayesian Additive Regression Trees for Prediction and Missing Data Imputation in Longitudinal Studies Sep 30, 2025

Evaluating sustained reach and effectiveness of collaborative care models: A Cross-sectional study of the New York State Collaborative Care Medicaid Program Jan 12, 2025 →