| Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems |
| Full text |
Pdf
(856 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
table of contents
Chicago, Illinois, USA
POSTER SESSION: Industry/government track poster
table of contents
Pages: 756 - 762
Year of Publication: 2005
ISBN:1-59593-135-X
|
|
Authors
|
|
Daniel R. Jeske
|
University of California, Riverside, CA
|
|
Behrokh Samadi
|
Lucent Technologies, Holmdel, NJ
|
|
Pengyue J. Lin
|
University of California, Riverside, CA
|
|
Lan Ye
|
University of California, Riverside, CA
|
|
Sean Cox
|
University of California, Riverside, CA
|
|
Rui Xiao
|
University of California, Riverside, CA
|
|
Ted Younglove
|
University of California, Riverside, CA
|
|
Minh Ly
|
University of California, Riverside, CA
|
|
Douglas Holt
|
University of California, Riverside, CA
|
|
Ryan Rich
|
University of California, Riverside, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 19, Downloads (12 Months): 103, Citation Count: 0
|
|
|
ABSTRACT
Information Discovery and Analysis Systems (IDAS) are designed to correlate multiple sources of data and use data mining techniques to identify potential significant events. Application domains for IDAS are numerous and include the emerging area of homeland security.Developing test cases for an IDAS requires background data sets into which hypothetical future scenarios can be overlaid. The IDAS can then be measured in terms of false positive and false negative error rates. Obtaining the test data sets can be an obstacle due to both privacy issues and also the time and cost associated with collecting a diverse set of data sources.In this paper, we give an overview of the design and architecture of an IDAS Data Set Generator (IDSG) that enables a fast and comprehensive test of an IDAS. The IDSG generates data using statistical and rule-based algorithms and also semantic graphs that represent interdependencies between attributes. A credit card transaction application is used to illustrate the approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Abowd, J.M. and Lane, J.I. Synthetic Data and Confidentiality Protection. U.S. Census Bureau, LEHD Program Technical Paper No. TP-2003-10, (2003).
|
| |
2
|
|
| |
3
|
Department of Defense, Office of the Inspector General. Information Technology Management: Terrorism Information Awareness Program. Report No. D-2004-033. (2004).
|
| |
4
|
General Accounting Office, Data Mining: Federal Efforts Cover a Wide Range of Uses. GAO-04-548. (2004).
|
| |
5
|
Kusiak, A., Kernstine, K.H., Kern, J.A., McLaughlin, K.A., and Tseng, T.L. Data Mining: Medical and Engineering Case Studies. Proceedings of the Industrial Engineering Research 2000 Conference, Cleveland, Ohio, May 21-23, (2000), 1--7.
|
| |
6
|
Leskovec, J. Grobelnik, M., and Millic-Frayling, N. Learning Sub-structures of Document Semantic Graphs for Document Summarization. LinkKDD 2004, August 2004, Seattle WA, USA. (2004).
|
 |
7
|
Thomas Ormerod , Nicola Morley , Linden Ball , Charles Langley , Clive Spenser, Using ethnography to design a mass detection tool (MDT) for the early discovery of insurance fraud, CHI '03 extended abstracts on Human factors in computing systems, April 05-10, 2003, Ft. Lauderdale, Florida, USA
[doi> 10.1145/765891.765910]
|
| |
8
|
Prince, E., and Nicholson, W.L. A Test of a Robust/Resistant Refinement Procedure on Synthetic Data. Acta Cryst., A39, (1983), 407--410.
|
| |
9
|
Rogers, M. Graham, J., and Tonge, R.P. Using Statistical Image Models for Objective Evaluation of Spot Detection in Two-Dimensional Gels. Proteomics, June, 3(6) (2003), 879--886.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Yun, W.T., Stefanova, L., Mitra, A.K., and Krishnamurti, T.N.. Multi-Model Synthetic Superensemble Prediction System. Acta Cryst., A39, (1983), 407--410.
|
| |
13
|
Zhu, X., Aref, W.G., Fan, J., Catlin, A.C., and Elmagarmid, A.K. Medical Video Mining for Efficient Database Indexing, Management, and Access. IEEE Int. Conf. On Data Engineering (ICDE '03), Bangalore, India, March 5-March 8, (2003), 1--12.
|
|