|
ABSTRACT
Given a regular expression R of length P and a word A of length N, the membership problem is to determine if A is in the language denoted by R. An O(PN/lgN) time algorithm is presented that is based on a lgN speedup of the standard O(PN) time simulation of R's nonderministic finite automaton on A using a combination of the node-listing and “Four-Russians” paradigms. This result places a new worst-case upper bound on regular expression pattern matching. Moreover, in practice the method provides an implementation that is faster than existing software for small regular expressions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ano, A. V. Pattern matching in strings. In R. Book, ed. In Formal Language Theory. Academic Press, Orlando, Fla., 1980.
|
| |
2
|
Ano, A. V., HOPCROVT, J. E., AND ULLMAN, J. D. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1975.
|
| |
3
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
4
|
ARLAZAROV, V. L., DINIC, E. A., KRONROD, M. A., AND FARADZEV, I. A. On economic construction of the transitive closure of a directed graph. Dokl. Acad. Nauk SSSR 194 (1970), 487-488 (in Russian). English translation in Soz'iet Math. Dokl. 11 (1975), 1209-1210.
|
| |
5
|
GALIL, Z. Open problems in stringology. A. Apostolico and Z. Galil, eds. In Combinatorzal Algorithms on Words. Springer-Verlag, New York, 1985, pp. 1-8.
|
| |
6
|
HECHT, M. S., AND ULLMAN, J. D. A simple algorithm for global flow analysis programs. SIAM J. Comput. 4, 4 (1975), 519-532.
|
 |
7
|
|
| |
8
|
MASEK, W. J., AND PATERSON, M.S. A faster algorithm for computing string-edit distances. J. Comput. Syst. Sci. 20, } (1980), 18-31.
|
| |
9
|
MASEK, W. J., AND PATERSON, M. S. How to compute string-edit distances quickly. D. Sankoff and j. B. Kruskal, eds. In Time Warps, String Edzts, and Macromolecules: The Theoly and Practice of Sequence Comparison. Addison-Wesley, Reading, Mass., 1983, pp. 337-349.
|
| |
10
|
MILLER. W. Efficient searching of biosequence databases. Tech. Rep. CS-88-34, Dept. Comput. Sci., The Pennsylvania State Univ., University Park, Pa, I988.
|
| |
11
|
MYERS, E. W., AND MILLER, W. Approximate matching of regular expressions. Bull. Math. Biol. 51, 1 (1989), 5-37.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
WAGNER, e. A., AND SEIFERAS, J.I. Correcting counter-automaton-recognizable languages. SIAM. J. Comput. 7, 3 (1978), 357-375.
|
CITED BY 9
|
|
|
|
|
Gonzalo Navarro , Mathieu Raffinot, Fast and simple character classes and bounded gaps pattern matching, with application to protein searching, Proceedings of the fifth annual international conference on Computational biology, p.231-240, April 22-25, 2001, Montreal, Quebec, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|