ACM Home Page
Please provide us with feedback. Feedback
When and how to subsample: report on the KDD-2001 panel
Full text PdfPdf (171 KB)
Source ACM SIGKDD Explorations Newsletter archive
Volume 3 ,  Issue 2  (January 2002) table of contents
COLUMN: Reports from KDD-2001 table of contents
Pages: 74 - 75  
Year of Publication: 2002
ISSN:1931-0145
Author
Pedro Domingos  University of Washington, Seattle, WA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 15,   Citation Count: 1
Additional Information:

abstract   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/507515.507528
What is a DOI?

ABSTRACT

Databases in the terabyte range are now common. In many domains, mining all the data available in reasonable time is already beyond the reach of current systems. Yet the size of databases continues to grow rapidly. Is subsampling unavoidable? Or should it be avoided at all costs? If we subsample, what is the best way to do it? What issues must be taken into account? The KDD-2001 Panel on When and How to Subsample addressed these and related questions, with the twin goals of developing practical guidelines and identifying key research issues. It was chaired by Pedro Domingos (University of Washington), and the participants were Surajit Chaudhuri (Microsoft Research), David Jensen (University of Massachusetts at Amherst), Ronny Kohavi (Blue Martini), and Foster Provost (New York University). Below is each panelist's summary of his position.