Skip to Main Content
558
Views
10
CrossRef citations to date
Altmetric
Pages 987-999
Received 01 Jul 2014
Accepted author version posted online: 24 Jun 2015
Published online:07 Nov 2015
 
Translator disclaimer

Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods to identify the variables to change followed by hot deck imputation. We present an approach that fully integrates editing and imputation for continuous microdata under linear constraints. Our approach relies on a Bayesian hierarchical model that includes (i) a flexible joint probability model for the underlying true values of the data with support only on the set of values that satisfy all editing constraints, (ii) a model for latent indicators of the variables that are in error, and (iii) a model for the reported responses for variables in error. We illustrate the potential advantages of the Bayesian editing approach over existing approaches using simulation studies. We apply the model to edit faulty data from the 2007 U.S. Census of Manufactures. Supplementary materials for this article are available online.

Additional information

Notes on contributors

Hang J. Kim

Hang J. Kim is Assistant Professor, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, formerly Postdoctoral Associate, Department of Statistical Science, Duke University, Durham, NC 27708, and National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: hang.kim0@uc.edu). Lawrence H. Cox is Assistant Director for Official Statistics, National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: cox@niss.org). Alan F. Karr is Director, Center of Excellence for Complex Data Analysis, RTI International, Research Triangle Park, NC 27709 (E-mail: karr@rti.org). Jerome P. Reiter (E-mail: jerry@stat.duke.edu) is Mrs. Alexander Hehmeyer Professor and Quanli Wang (E-mail: quanli@stat.duke.edu) is Associate in Research, Department of Statistical Science, Duke University, Durham, NC 27708. This research was supported by the National Science Foundation (SES-11-31897). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Lawrence H. Cox

Hang J. Kim is Assistant Professor, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, formerly Postdoctoral Associate, Department of Statistical Science, Duke University, Durham, NC 27708, and National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: hang.kim0@uc.edu). Lawrence H. Cox is Assistant Director for Official Statistics, National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: cox@niss.org). Alan F. Karr is Director, Center of Excellence for Complex Data Analysis, RTI International, Research Triangle Park, NC 27709 (E-mail: karr@rti.org). Jerome P. Reiter (E-mail: jerry@stat.duke.edu) is Mrs. Alexander Hehmeyer Professor and Quanli Wang (E-mail: quanli@stat.duke.edu) is Associate in Research, Department of Statistical Science, Duke University, Durham, NC 27708. This research was supported by the National Science Foundation (SES-11-31897). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Alan F. Karr

Hang J. Kim is Assistant Professor, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, formerly Postdoctoral Associate, Department of Statistical Science, Duke University, Durham, NC 27708, and National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: hang.kim0@uc.edu). Lawrence H. Cox is Assistant Director for Official Statistics, National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: cox@niss.org). Alan F. Karr is Director, Center of Excellence for Complex Data Analysis, RTI International, Research Triangle Park, NC 27709 (E-mail: karr@rti.org). Jerome P. Reiter (E-mail: jerry@stat.duke.edu) is Mrs. Alexander Hehmeyer Professor and Quanli Wang (E-mail: quanli@stat.duke.edu) is Associate in Research, Department of Statistical Science, Duke University, Durham, NC 27708. This research was supported by the National Science Foundation (SES-11-31897). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Jerome P. Reiter

Hang J. Kim is Assistant Professor, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, formerly Postdoctoral Associate, Department of Statistical Science, Duke University, Durham, NC 27708, and National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: hang.kim0@uc.edu). Lawrence H. Cox is Assistant Director for Official Statistics, National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: cox@niss.org). Alan F. Karr is Director, Center of Excellence for Complex Data Analysis, RTI International, Research Triangle Park, NC 27709 (E-mail: karr@rti.org). Jerome P. Reiter (E-mail: jerry@stat.duke.edu) is Mrs. Alexander Hehmeyer Professor and Quanli Wang (E-mail: quanli@stat.duke.edu) is Associate in Research, Department of Statistical Science, Duke University, Durham, NC 27708. This research was supported by the National Science Foundation (SES-11-31897). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Quanli Wang

Hang J. Kim is Assistant Professor, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45221, formerly Postdoctoral Associate, Department of Statistical Science, Duke University, Durham, NC 27708, and National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: hang.kim0@uc.edu). Lawrence H. Cox is Assistant Director for Official Statistics, National Institute of Statistical Sciences, Research Triangle Park, NC 27709 (E-mail: cox@niss.org). Alan F. Karr is Director, Center of Excellence for Complex Data Analysis, RTI International, Research Triangle Park, NC 27709 (E-mail: karr@rti.org). Jerome P. Reiter (E-mail: jerry@stat.duke.edu) is Mrs. Alexander Hehmeyer Professor and Quanli Wang (E-mail: quanli@stat.duke.edu) is Associate in Research, Department of Statistical Science, Duke University, Durham, NC 27708. This research was supported by the National Science Foundation (SES-11-31897). Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the National Science Foundation. All results have been reviewed to ensure that no confidential information is disclosed.

Login options

Purchase * Save for later
Online

Article Purchase 24 hours to view or download: USD 44.00 Add to cart

Issue Purchase 30 days to view or download: USD 268.00 Add to cart

* Local tax will be added as applicable