ACM Home Page
Please provide us with feedback. Feedback
Meta methods for model sharing in personal information systems
Full text PdfPdf (510 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 26 ,  Issue 4  (September 2008) table of contents
Article No. 22  
Year of Publication: 2008
ISSN:1046-8188
Authors
Stefan Siersdorfer  University of Sheffield, UK
Sergej Sizov  University of Koblenz-Landau, Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 32,   Downloads (12 Months): 350,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1402256.1402261
What is a DOI?

ABSTRACT

This article introduces a methodology for automatically organizing document collections into thematic categories for Personal Information Management (PIM) through collaborative sharing of machine learning models in an efficient and privacy-preserving way. Our objective is to combine multiple independently learned models from several users to construct an advanced ensemble-based decision model by taking the knowledge of multiple users into account in a decentralized manner, for example, in a peer-to-peer overlay network. High accuracy of the corresponding supervised (classification) and unsupervised (clustering) methods is achieved by restrictively leaving out uncertain documents rather than assigning them to inappropriate topics or clusters with low confidence. We introduce a formal probabilistic model for the resulting ensemble based meta methods and explain how it can be used for constructing estimators and for goal-oriented tuning. Comprehensive evaluation results on different reference data sets illustrate the viability of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Bender, M., Michel, S., Triantafillou, P., Weikum, G., and Zimmer, C. 2006. P2P content search: Give the Web back to the people. In Proceedings of the 5th International Workshop on Peer-to-Peer Systems (IPTPS).
 
5
Bender, M., Michel, S., Weikum, G., and Zimmer, C. 2004. Bookmark-driven query routing in peer-to-peer Web search. In Proceedings of the SIGIR Workshop on P2P Information Retrieval.
6
7
8
 
9
Brank, J., Grobelnik, M., Milic-Frayling, N., and Mladenic, D. 2003. Training text classifiers with SVM on very few positive examples. Tech. rep. MSR-TR-2003-34, Microsoft Corp.
 
10
 
11
Brinker, K. and Hüllermeier, E. 2006. Case-based label ranking. In Machine Learning: Proceedings of the 17th European Conference on Machine Learning (ECML'06), J. Fürnkranz, T. Scheffer, and M. Spiliopoulou, Eds. Lecture Notes in Computer Science. Springer, 566--573.
 
12
 
13
 
14
Buckland, M. K. 1992. Emmanuel Goldberg, electronic document retrieval, and Vannevar Bush's memex. J. Amer. Soc. Inform. Sci. 43, 4, 284--294.
 
15
 
16
Bush, V. 1945. As we may think. Atlantic Monthly 176, 1, 101--108.
 
17
 
18
19
 
20
Cormack, G. V. 2006. Trec 2006 spam evaluation track overview. In Proceedings of the 15th Text Retrieval Conference (TREC'06).
 
21
22
23
 
24
 
25
 
26
 
27
Dong, X. and Halevy, A. 2005. A platform for personal information management and integration. In Proceedings of the 2nd Conference on Innovative Systems Research (CIDR). 119--130.
28
29
30
 
31
Ester, M., Kriegel, H.-P., and Sander, J. 2001. Knowledge Discovery in Databases. Springer.
32
 
33
34
 
35
36
37
 
38
Goerlitz, O., Sizov, S., and Staab, S. 2008. PINTS: Peer-to-Peer infrastructure for tagging systems. In Proceedings of the 7th International Workshop on Peer-to-Peer Systems (IPTPS).
 
39
Groza, T., Handschuh, S., Moeller, K., Grimnes, G., Sauermann, L., Minack, E., Mesnage, C., Jazayeri, M., Reif, G., and Gudjonsdottir, R. 2007. The NEPOMUK Project—On the way to the social semantic desktop. In Proceedings of the International Conference on Semantic Technologies (I-Semantics). 201--211.
 
40
 
41
 
42
Hartigan, J. and Wong, M. 1979. A k-Means clustering algorithm. Appl. Stat. 28, 100--108.
 
43
imdb. Internet movie database. http://www.imdb.com.
 
44
 
45
 
46
Klimt, B. and Yang, Y. 2004. The enron corpus: A new dataset for email classification research. In Proceedings of the 15th European Conference on Machine Learning (ECML'04). Lecture Notes in Computer Science, Springer, 217--226.
 
47
 
48
Kuhn, H. 1955. The Hungarian method for the assignment problem. Naval Resear. Logistics Quart. 2, 83--97.
 
49
 
50
 
51
52
 
53
 
54
 
55
 
56
 
57
Masulli, F. and Valentini, G. 2000. Comparing decomposition methods for classification. In Proceedings of the International Conference on Knowledge-Based Intelligent Engineering Systems and Applied Technologies (KES). 788--792.
 
58
 
59
Millen, D., Yeng, M., Whittaker, S., and Feinberg, J. 2007. Social bookmarking and exploratory search. In Proceedings of the European Conference on Computer Supported Cooperative Work.
60
 
61
Pierskalla, W. 1968. The multi-dimensional assignment problem. Operations Res. 16, 422--431.
 
62
Platt, J. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, MIT Press, 61--74.
 
63
 
64
Quan, D., Huynh, D., and Karger, D. 2003. Haystack: A platform for authoring end user semantic web applications. In Proceedings of the International Semantic Web Conference. 738--753.
 
65
66
67
 
68
Shvaiko, P. and Euzenat, J. 2005. A survey of schema-based matching approaches. Lecture Notes in Computer Science, vol. 3730, Springer, 146--171.
 
69
Siersdorfer, S. and Sizov, S. 2003. Construction of Feature Spaces and Meta Methods for Classification of Web Documents. In Proceedings of the 10th Conference Datenbanksysteme fuer Business, Technologie und Web (BTW). 197--206.
70
 
71
Siersdorfer, S. and Sizov, S. 2006. Automatic document organization in a p2p environment. In Proceedings of the 28th European Conference on IR Research (ECIR). 265--276.
 
72
Siersdorfer, S. and Sizov, S. 2007. Restrictive methods and meta methods for thematically focused web search. In Handbook of Research on Web Information Systems Quality, Idea Group.
73
 
74
Siersdorfer, S. and Weikum, G. 2005. Using restrictive classification and meta classification for junk elimination. In Proceedings of the 27th European Conference on Information Retrieval (ECIR'05), D. Losada and J. M. F. Luna, Eds. Lecture Notes in Computer Science, vol. 3408. Springer, 287--299.
75
 
76
 
77
Surendran, A. C., Platt, J. C., and Renshaw, E. 2005. Automatic discovery of personal topics to organize email. In Proceedings of the 2nd Conference on Email and Anti-Spam.
78
 
79
 
80
Vaidya, J. and Clifton, C. 2004. Privacy preserving naive bayes classifier for vertically partitioned data. In Proceedings of the SIAM International Conference on Data Mining.
 
81
Vailaya, A. and Jain, A. K. 2000. Reject option for vq-based bayesian classification. In Proceedings of the International Conference on Pattern Recognition (ICPR'00). 2048--2051.
 
82
Van Rijsbergen, C. 1977. A theoretical basis for the use of co-occurence data in information retrieval. J. Document. 33, 2, 106--119.
83
84
 
85
 
86
 
87
Zhang, R. and Metaxas, D. 2006. Ro-svm: Support vector machine with reject option for image categorization. In Proceedings of the British Machine Vision Conference (BMNC'06). vol. 3, 1209--1218.

Collaborative Colleagues:
Stefan Siersdorfer: colleagues
Sergej Sizov: colleagues