Economic and Social Research Council
This website will look much better in a web browser that supports web standards, but it is accessible to any browser or Internet device. Go to main content.
2007 Conference
2006 Conference
2005 Conference
2004 Conference

A distributed search infrastructure for Statistical Disclosure Control on a Grid

K. R. Mayes, M. J. Elliot, A. M. Manning, D. Haglin and J. R. Gurd

K. R. Mayes, A. M. Manning, J. R. Gurd
Centre for Novel Computing, School of Computer Science,
University of Manchester

M. J. Elliot
Centre for Census and Survey Research,
University of Manchester.

D. Haglin
Department of Computer and Information Sciences,
Minnesota State University.

Email address of corresponding author: ken@manchester.ac.uk.

Statistical disclosure control is concerned with preventing the release of data that might allow an individual population unit to be identified, or for new attributions to be made about specific population units. However, the search time for identifying records can grow exponentially as record size increases, thus severely restricting the usefulness of such methods. This paper presents a prototype infrastructure that allows SUDA2, an algorithm for efficiently identifying risky records in microdata, to execute on a heterogeneous network of computers. The search space is divided into subspaces, each of which can be searched on a separate computer. The work involved in searching a given subspace depends on the nature rather than the amount of data, and so is not always predictable. This means that the infrastructure must be able to act adaptively, particularly as resources may vary in their capabilities. The infrastructure is able to combine a master-worker computational structure with work stealing and checkpointing to attempt to keep resources busy for as long as possible. The infrastructure is able to reduce execution time of searches.

P D F document Full Paper