|
ABSTRACT
In the information filtering paradigm, clients subscribe to a server with continuous queries or profiles that express their information needs. Clients can also publish documents to servers. Whenever a document is published, the continuous queries satisfying this document are found and notifications are sent to appropriate clients. This article deals with the filtering problem that needs to be solved efficiently by each server: Given a database of continuous queries db and a document d, find all queries q ∈ db that match d. We present data structures and indexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries expressed in the model AWP. AWP is based on named attributes with values of type text, and its query language includes Boolean and word proximity operators.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Marcos K. Aguilera , Robert E. Strom , Daniel C. Sturman , Mark Astley , Tushar D. Chandra, Matching events in a content-based subscription system, Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing, p.53-61, May 04-06, 1999, Atlanta, Georgia, United States
[doi> 10.1145/301308.301326]
|
| |
3
|
|
| |
4
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
 |
5
|
Mehmet Altinel , Demet Aksoy , Thomas Baby , Michael Franklin , William Shapiro , Stan Zdonik, DBIS-toolkit: adaptable middleware for large scale data delivery, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.544-546, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
Ashwin R. Bharambe , Mukesh Agrawal , Srinivasan Seshan, Mercury: supporting scalable multi-attribute range queries, Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, August 30-September 03, 2004, Portland, Oregon, USA
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
Callan, J., Croft, W., and Harding, S. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications. Springer-Verlag, 78--83.
|
| |
19
|
Alexis Campailla , Sagar Chaki , Edmund Clarke , Somesh Jha , Helmut Veith, Efficient filtering in publish-subscribe systems using binary decision diagrams, Proceedings of the 23rd International Conference on Software Engineering, p.443-452, May 12-19, 2001, Toronto, Ontario, Canada
|
 |
20
|
|
 |
21
|
Antonio Carzaniga , David S. Rosenblum , Alexander L. Wolf, Achieving scalability and expressiveness in an Internet-scale event notification service, Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, p.219-227, July 16-19, 2000, Portland, Oregon, United States
[doi> 10.1145/343477.343622]
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
 |
28
|
|
 |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
Devroye, L. 1992. A study of trie-like structures under the density model. Annals Appl. Prob. 2, 2, 402--434.
|
 |
34
|
David J DeWitt , Randy H Katz , Frank Olken , Leonard D Shapiro , Michael R Stonebraker , David Wood, Implementation techniques for main memory database systems, Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts
|
 |
35
|
|
| |
36
|
Dong, L. 2002. Automatic term extraction and similarity assessment in a domain specific document corpus. M.S. thesis, Department of Computer Science, Dalhousie University, Halifax, Canada.
|
 |
37
|
Françoise Fabret , H. Arno Jacobsen , François Llirbat , Joăo Pereira , Kenneth A. Ross , Dennis Shasha, Filtering algorithms and implementation for very fast publish/subscribe systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.115-126, May 21-24, 2001, Santa Barbara, California, United States
|
| |
38
|
Flajolet, P. 1983. On the performance evaluation of extendible hashing and trie searching. Acta Informatica 20, 345--369.
|
 |
39
|
|
 |
40
|
|
 |
41
|
|
| |
42
|
Frantzi, K., Ananiadou, S., and Mima, H. 2000. Automatic recognition of multiword terms:the c-value/nc-value method. JODL 5, 2.
|
 |
43
|
|
 |
44
|
|
| |
45
|
|
| |
46
|
|
| |
47
|
|
| |
48
|
|
 |
49
|
|
 |
50
|
|
| |
51
|
Idreos, S., Koubarakis, M., and Tryfonopoulos, C. 2004b. P2P-DIET: One-time and continuous queries in super-peer networks. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT). 851--853.
|
| |
52
|
Jacquet, P. and Szpankowski, W. 1991. Analysis of digital tries with Markovian dependency. IEEE Trans. Inform. Theor. 37, 5, 1470--1475.
|
 |
53
|
David Karger , Eric Lehman , Tom Leighton , Rina Panigrahy , Matthew Levine , Daniel Lewin, Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.654-663, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258660]
|
| |
54
|
Knuth, D. 1973a. The Art of Computer Programming. Vol. 3: Sorting and Searching. Addison-Wesley, Reading, MA.
|
| |
55
|
Knuth, D. 1973b. The Art of Computer Programming. Vol. 1: Fundamental Algorithms. Addison-Wesley, Reading, MA.
|
| |
56
|
Manolis Koubarakis , T. Koutris , C. Tryfonopoulos , P. Raftopoulou, Information Alert in Distributed Digital Libraries: The Models, Languages, and Architecture of DIAS, Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, p.527-542, September 16-18, 2002
|
| |
57
|
|
 |
58
|
|
| |
59
|
|
| |
60
|
Luhn, H. 1958. A business intelligence system. IBM J. Reasear. Devel. 2, 4, 314--319.
|
| |
61
|
Milios, E., Zhang, Y., He, B., and Dong, L. 2003. Automatic term extraction and document similarity in special text corpora. In Proceedings of the 6th Conference of the Pacific Association for Computational Linguistics (PACLing). 275--284.
|
| |
62
|
|
 |
63
|
|
 |
64
|
Benjamin Nguyen , Serge Abiteboul , Grégory Cobena , Mihaí Preda, Monitoring XML data on the Web, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.437-448, May 21-24, 2001, Santa Barbara, California, United States
|
| |
65
|
Nilsson, S. and Karlsson, G. 1999. IP-address lookup using LC-tries. IEEE J. Select. Areas Comm. 17, 6, 1083--1092.
|
 |
66
|
|
| |
67
|
|
| |
68
|
Pietzuch, P. and Bacon, J. 2002. Hermes: A distributed event-based middleware architecture. In Proceedings of the 1st International Workshop on Distributed Event-Based Systems (DEBS'02).
|
| |
69
|
|
 |
70
|
Sylvia Ratnasamy , Paul Francis , Mark Handley , Richard Karp , Scott Schenker, A scalable content-addressable network, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.161-172, August 2001, San Diego, California, United States
|
| |
71
|
Regnier, M. and Jacquet, P. 1989. New results on the size of tries. IEEE Trans. Inform. Theor. 35, 1, 203--205.
|
| |
72
|
Rivest, R. L. 1976. Partial-match retrieval algorithms. SIAM J. Comput. 5, 1, 19--50.
|
| |
73
|
|
| |
74
|
|
| |
75
|
Severance, C. and Pramanik, S. 1990. Distributed linear hashing for main memory databases. In Proceedings of the International Conference on Parallel Processing. 92--95.
|
 |
76
|
Ion Stoica , Robert Morris , David Karger , M. Frans Kaashoek , Hari Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, p.149-160, August 2001, San Diego, California, United States
|
 |
77
|
|
| |
78
|
Tam, D., Azimi, R., and Jacobsen, H.-A. 2003. Building content-based publish/subscribe systems with distributed hash tables. In Proceedings of the 1st International Workshop On Databases, Information Systems and Peer-to-Peer Computing.
|
| |
79
|
Tang, C. and Xu, Z. 2003. pFilter: Global information filtering and dissemination using structured overlays. In FTDCS.
|
 |
80
|
Wesley W. Terpstra , Stefan Behnel , Ludger Fiege , Andreas Zeidler , Alejandro P. Buchmann, A peer-to-peer approach to content-based publish/subscribe, Proceedings of the 2nd international workshop on Distributed event-based systems, June 08-08, 2003, San Diego, California
[doi> 10.1145/966618.966627]
|
| |
81
|
|
| |
82
|
|
| |
83
|
Tryfonopoulos, C., Idreos, S., and Koubarakis, M. 2005a. LibraRing: An architecture for distributed digital libraries based on DHTs. In Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL). 25--36.
|
 |
84
|
|
| |
85
|
Tryfonopoulos, C. and Koubarakis, M. 2002. Selective dissemination of information in P2P systems: Data models, query languages, algorithms and computational complexity. Tech. Rep. TR-ISL-02-2003, Department of Electronic and Computer Engineering, Technical University of Crete.
|
 |
86
|
|
| |
87
|
|
| |
88
|
|
 |
89
|
|
 |
90
|
|
| |
91
|
Yang, B. and Garcia-Molina, H. 2003. Designing a super-peer network. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03).
|
| |
92
|
Yochum, J. A. 1985. A high-speed text scanning algorithm utilising least frequent trigraphs. In Proceedings of the IEEE Symposium on New Directions in Computing.
|
 |
93
|
|
| |
94
|
Christian Zimmer , Christos Tryfonopoulos , Klaus Berberich , Manolis Koubarakis , Gerhard Weikum, Approximate Information Filtering in Peer-to-Peer Networks, Proceedings of the 9th international conference on Web Information Systems Engineering, September 01-03, 2008, Auckland, New Zealand
[doi> 10.1007/978-3-540-85481-4_3]
|
| |
95
|
Zimmer, C., Tryfonopoulos, C., and Weikum, G. 2007. MinervaDL: An architecture for information retrieval and filtering in distributed digital libraries. In Proceedings of the 11th European Conference on Research and Advanced Technology for Digital Libraries (ECDL). 148--160.
|
 |
96
|
|
|