|
ABSTRACT
Information filtering includes monitoring text streams to detect patterns that are more complex than those handled by search engines. Text stream monitoring and pattern detection have far reaching applications such as tracking information flow among terrorist outfits, web parental control, and business intelligence. Pattern characterization requirements of applications entail an expressive language for specifying patterns than what is currently provided by Information Retrieval Query Languages (IRQLs) and current information filtering systems. Pattern specification alone does not suffice, as detecting these complex patterns is equally important in order to use these systems for real-world applications.InfoFilter, a content-based information filtering system, presented in this paper, allows users to specify complex patterns and detects these patterns in incoming text streams from various sources such as news feed, emails, web pages and caption text from streaming videos. Complex patterns such as combinations of sequential, structural patterns, wild cards, word frequencies, proximity, Boolean operators and synonyms are formulated using the expressive pattern specification language, PSL, proposed in this paper. Once specified, these complex patterns are detected using a data flow paradigm over Pattern Detection Graphs (PDGs).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
C. Fellbaum, "WordNet: An Electronic Lexical Database," MIT press, 1998.
|
 |
5
|
|
| |
6
|
|
| |
7
|
U. Manber, "Glimpse: A Tool To Search Through Entire File System," in Proc. of USENIX Winter 1994 Technical Conference.
|
| |
8
|
M. Araújo, G. Navarro, and N. Ziviani, "Large Text Searching Allowing Errors," in Proc. of South American Workshop on String Processing, 1997, pp. 2--20.
|
| |
9
|
K. Aas, "A Survey on Personalized Information Filtering Systems For The World Wide Web," Report No. 922, Norwegian Computing Center, December, 1997.
|
| |
10
|
W. B. C. James, P. Callan and S. M. Harding, "The INQUERY Retrieval System," in Proc. of DEXA, 1992.
|
| |
11
|
"Structured Query Retrieval in Lemur." {Online}. Available: http://www-2.cs.cmu.edu/~emur/2.2/StructuredQuery.html
|
| |
12
|
R. Adaikkalavan and S. Chakravarthy, "SnoopIB: Interval-Based Event Specification and Detection for Active Databases," in Proc. of Advances in Databases and Information Systems (ADBIS), LNCS 2798, 2003, pp. 190--204.
|
| |
13
|
L. Elkhalifa, "InfoFilter: Complex Pattern Specification and Detection Over Text Streams," M. S. Thesis, Dept. of CSE, The University of Texas at Arlington, 2004. {Online}. Available: http://itlab.uta.edu/ITLABWEB/Students/sharma/theses/Laali.pdf
|
| |
14
|
"JWNL (Java WordNet Library)." {Online}. Available: http://sourceforge.net/projects/jwordnet
|
 |
15
|
|
| |
16
|
M. Nelson, "Fast String Searching With Suffix Trees," in Dr. Dobb's Journal, August 1996.
|
| |
17
|
"Sun Microsystems, JavaMail API Specification v 1.3.1." 2003.
|
|