|
ABSTRACT
This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Booth, T.U Sequential Machines and Automata Theory. Wiley, New York, 1967.
|
 |
3
|
|
| |
4
|
Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.
|
| |
11
|
|
| |
12
|
Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.
|
| |
13
|
Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.
|
| |
14
|
|
| |
15
|
McNaughton, R., and Yamada, H. Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9:1 (1960), 39-47.
|
| |
16
|
Rabin, M.O., and Scott, D. Finite automata and their decision problems. IBM J. Research and Development 3, (1959), 114-125.
|
 |
17
|
|
CITED BY 220
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amihood Amir , Gary Benson , Martin Farach, Alphabet independent two dimensional matching, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, p.59-68, May 04-06, 1992, Victoria, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Morris , K. Taylor , M. Harwood, Handing of significant deviations from boilerplate text, Proceedings of the 1st international conference on Artificial intelligence and law, p.145-154, May 27-29, 1997, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amihood Amir , Martin Farach , Ramana M. Idury , Johannes A. La Poutré , Alejandro A. Schäffer, Improved dynamic dictionary matching, Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, p.392-401, January 25-27, 1993, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amihood Amir , Moshe Lewenstein , Ely Porat, Faster algorithms for string matching with k mismatches, Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms, p.794-803, January 09-11, 2000, San Francisco, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Maxime Crochemore , Costas S. Iliopoulos , Thierry Lecroq , Yoan J. Pinzon , Wojciech Plandowski , Wojciech Rytter, Occurrence and substring heuristics for δ-matching, Fundamenta Informaticae, v.56 n.1,2, p.1-21, July 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yuki Kadoya , Masao Fuketa , Elsayed Atlam , Kazuhiro Morita , Shinkaku Kashiji , Jun-ichi Aoe, An efficient e-mail filtering using time priority measurement, Information Sciences—Informatics and Computer Science: An International Journal, v.166 n.1-4, p.213-229, 29 October 2004
|
|
|
Bijit Hore , Hakan Hacigumus , Bala Iyer , Sharad Mehrotra, Indexing text data under space constraints, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Y. Kadoya , M. Fuketa , El-Sayed Atlam , K. Morita , T. Sumitomo , J. Aoe, A compression algorithm using integrated record information for translation dictionaries, Information Sciences—Informatics and Computer Science: An International Journal, v.165 n.3-4, p.171-186, 19 October 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yaxuan Qi , Zongwei Zhou , Baohua Yang , Fei He , Yibo Xue , Jun Li, Towards effective network algorithms on multi-core network processors, Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, November 06-07, 2008, San Jose, California
|
|
|
|
|
|
Loek Cleophas , Bruce W. Watson , Derrick G. Kourie , Andrew Boake, TABASCO: a taxonomy-based domain engineering method, Proceedings of the 2005 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries, p.38-47, September 20-22, 2005, White River, South Africa
|
|
|
|
|
|
Paolo Ferragina , S. Muthukrishnan , Mark de Berg, Multi-method dispatching: a geometric approach with applications to string matching problems, Proceedings of the thirty-first annual ACM symposium on Theory of computing, p.483-491, May 01-04, 1999, Atlanta, Georgia, United States
|
|
|
Amihood Amir , Gad M. Landau , Usi Vishkin, Efficient pattern matching with scaling, Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, p.344-357, January 22-24, 1990, San Francisco, California, United States
|
|
|
|
|
|
K. Selçuk Candan , Mehmet E. Dönderler , J. Ramamoorthy , Jong W. Kim, Clustering and indexing of experience sequences for popularity-driven recommendations, Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences, October 28-28, 2006, Santa Barbara, California, USA
|
|
|
|
|
|
|
|
|
Wim De Pauw , David Lorenz , John Vlissides , Mark Wegman, Execution patterns in object-oriented visualization, Proceedings of the 4th conference on USENIX Conference on Object-Oriented Technologies and Systems (COOTS), p.16-16, April 27-30, 1998, Santa Fe, New Mexico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yevgeniy Miretskiy , Abhijith Das , Charles P. Wright , Erez Zadok, Avfs: an on-access anti-virus file system, Proceedings of the 13th conference on USENIX Security Symposium, p.6-6, August 09-13, 2004, San Diego, CA
|
|
|
Sungwon Yi , Byoung-koo Kim , Jintae Oh , Jongsoo Jang , George Kesidis , Chita R. Das, Memory-efficient content filtering hardware for high-speed intrusion detection systems, Proceedings of the 2007 ACM symposium on Applied computing, March 11-15, 2007, Seoul, Korea
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Susumu Yata , Masaki Oono , Kazuhiro Morita , Masao Fuketa , Toru Sumitomo , Jun-ichi Aoe, A compact static double-array keeping character codes, Information Processing and Management: an International Journal, v.43 n.1, p.237-247, January 2007
|
|
|
|
|
|
Ying-Dar Lin , Kuo-Kun Tseng , Tsern-Huei Lee , Yi-Neng Lin , Chen-Chou Hung , Yuan-Cheng Lai, A platform-based SoC design and implementation of scalable automaton matching for deep packet inspection, Journal of Systems Architecture: the EUROMICRO Journal, v.53 n.12, p.937-950, December, 2007
|
|
|
|
|
|
Sailesh Kumar , Balakrishnan Chandrasekaran , Jonathan Turner , George Varghese, Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia, Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems, December 03-04, 2007, Orlando, Florida, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kurt A. Shoens , Allen Luniewski , Peter M. Schwarz , James W. Stamos , Joachim Thomas, II, The Rufus System: Information Organization for Semi-Structured Data, Proceedings of the 19th International Conference on Very Large Data Bases, p.97-107, August 24-27, 1993
|
|
|
|
|
|
|
|
|
|
|
|
Z. M. Kedem , G. M. Landau , K. V. Palem, Optimal parallel suffix-prefix matching algorithm and applications, Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, p.388-398, June 18-21, 1989, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Olivier Morandi , Fulvio Risso , Silvio Valenti , Paolo Veglia, Design and implementation of a framework for creating portable and efficient packet-processing applications, Proceedings of the 7th ACM international conference on Embedded software, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Atul Mahajan , Benfano Soewito , Sai K. Parsi , Ning Weng , Haibo Wang, Implementing high-speed string matching hardware for network intrusion detection systems, Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, February 24-26, 2008, Monterey, California, USA
|
|
|
|
|
|
|
|
|
Chen Chen , Xifeng Yan , Philip S. Yu , Jiawei Han , Dong-Qing Zhang , Xiaohui Gu, Towards graph containment search and indexing, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
Ying Liu , Lucian V. Lita , R. Stefan Niculescu , Kun Bai , Prasenjit Mitra , C. Lee Giles, Real-time data pre-processing technique for efficient feature extraction in large scale datasets, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Jun Harada , Masao Fuketa , Kazuhiro Morita , Touru Sumitomo , Wataru Hiraishi , El-Sayed Atlam , Jun-Ichi Aoe, Estimation of FAQ knowledge bases by using semantic expressions for questions and answers, International Journal of Computer Applications in Technology, v.32 n.1, p.69-81, July 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Elias Athanasopoulos , Antonis Krithinakis , Georgios Kopidakis , Graeme Maxwell , Alistair Poustie , Bob Manning , Rod Webb , Martin Koyabe , Carla Di Cairano-Gilfedder, WISDOM: security-aware fibres, Proceedings of the Second European Workshop on System Security, p.22-27, March 31-31, 2009, Nuremburg, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Masayuki Kessoku , Kazuhiko Tsuda , El-Sayed Atlam , Kazuhiro Morita , Masao Fuketa , Jun-ichi Aoe, A method to implement effective My-page service system using three-dimensional vectors, International Journal of Computer Applications in Technology, v.35 n.2/3/4, p.262-270, June 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.2
Nonnumerical Algorithms and Problems
Subjects:
Pattern matching
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models;
Search process
General Terms:
Algorithms,
Design,
Management,
Performance,
Theory
Keywords:
bibliographic search,
computational complexity,
finite state machines,
information retrieval,
keywords and phrases,
string pattern matching,
text-editing
|