Skip to Main Content
1,308
Views
23
CrossRef citations to date
Altmetric

Statistical Computing and Graphics

k-POD: A Method for k-Means Clustering of Missing Data

Pages 91-99
Received 01 Nov 2014
Accepted author version posted online: 11 Sep 2015
Published online:31 Mar 2016
 
Translator disclaimer

The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our k-POD method presents a simple extension of k-means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.

[Received November 2014. Revised August 2015.]

Additional information

Notes on contributors

Jocelyn T. Chi

Jocelyn T. Chi is Ph.D. Student (E-mail: )

Eric C. Chi

and Eric C. Chi is Assistant Professor (E-mail: eric_chi@ncsu.edu), Department of Statistics, North Carolina State University, Raleigh, NC 27695.

Richard G. Baraniuk

Richard G. Baraniuk is Professor, Department of Electrical and Computer Engineering, Rice University, Houston TX 77005 (E-mail: richb@rice.edu). This material is based upon work supported by, or in part by, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number ARO MURI W911NF0910383.