|
ABSTRACT
Mashup is a web technology that combines information from more than one source into a single web application. This technique provides a new platform for different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In this paper, we study and resolve a real-life privacy problem in a data mashup application for the financial industry in Sweden, and propose a privacy-preserving data mashup (PPMashup) algorithm to securely integrate private data from different data providers, whereas the integrated data still retains the essential information for supporting general data exploration or a specific data mining task, such as classification analysis. Experiments on real-life data suggest that our proposed method is effective for simultaneously preserving both privacy and information usefulness, and is scalable for handling large volume of data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. M. Abowd and J. Lane. New approaches to confidentiality protection: Synthetic data, remote access and research data centers. In Proc. of Privacy in Statistical Databases: CASC Project International Workshop (PSD 2004), pages 282--289, Barcelona, Spain, June 2004.
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
U. Dayal and H. Y. Hwang. View definition and generalization for database integration in a multidatabase systems. IEEE Transactions on Software Engineering, 10(6):628--645, 1984.
|
| |
7
|
W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proc. of the 4th SDM, Florida, 2004.
|
| |
8
|
|
 |
9
|
|
| |
10
|
W. A. Fuller. Masking procedures for microdata disclosure limitation. Official Statistics, 9(2):383--406, 1993.
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
J. Goguen and J. Meseguer. Unwinding and inference control. In Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 1984.
|
| |
15
|
T. Hinke. Inference aggregation detection in database management systems. In Proc. of the IEEE Symposium on Security and Privacy, pages 96--107, Oakland, CA, April 1988.
|
| |
16
|
T. Hinke, H. Degulach, and A. Chandrasekhar. A fast algorithm for detecting second paths in database inference analysis. Journal of Computer Security, 1995.
|
| |
17
|
R. D. Hof. Mix, match, and mutate. Business Week, July 2005.
|
| |
18
|
A. Hundepool and L. Willenborg. μ- and τ-argus: Software for statistical disclosure control. In Proc. of the 3rd International Seminar on Statistical Confidentiality, 1996.
|
 |
19
|
|
| |
20
|
S. Jajodia and C. Meadows. Inference problems in multilevel database management systems. IEEE Information Security: An Integrated Collection of Essays, pages 570--584, 1995.
|
| |
21
|
W. Jiang and C. Clifton. Privacy-preserving distributed k-anonymity. In Proc. of the 19th Annual IFIP WG 11.3 Working Conference on Data and Applications Security, pages 166--177, August 2005.
|
| |
22
|
|
| |
23
|
J. Kim and W. Winkler. Masking microdata files. In Proc. of the Section on Survey Research Methods, pages 114--119, 1995.
|
 |
24
|
|
 |
25
|
|
| |
26
|
J. M. Mateo-Sanz, A. Martínez-Ballesté, and J. Domingo-Ferrer. Fast generation of accurate synthetic microdata. In Proceedings of Privacy in Statistical Databases: CASC Project International Workshop (PSD 2004), pages 298--306, Barcelona, Spain, June 2004.
|
 |
27
|
|
| |
28
|
D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases, 1998. http://ics.uci.edu/~mlearn/MLRepository.html.
|
| |
29
|
N. Nisan. Algorithms for selfish agents. In Proceedings of the 16th Symposium on Theoretical Aspects of Computer Science, Trier, Germany, March 1999.
|
| |
30
|
S. Pohlig and M. Hellman. An improved algorithm for computing logarithms over gf(p) and its cryptographic significance. IEEE Transactions on Information Theory, IT-24:106--110, 1978.
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
 |
35
|
|
 |
36
|
|
 |
37
|
|
| |
38
|
|
 |
39
|
|
 |
40
|
Raymond Chi-Wing Wong , Jiuyong Li , Ada Wai-Chee Fu , Ke Wang, (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150499]
|
| |
41
|
|
 |
42
|
|
 |
43
|
|
 |
44
|
|
| |
45
|
Z. Yang, S. Zhong, and R. N. Wright. Privacy-preserving classification of customer data without loss of accuracy. In Proc. of the 5th SDM, pages 92--102, 2005.
|
| |
46
|
|
|