skip to main content
article
Free access

Efficient string matching: an aid to bibliographic search

Published: 01 June 1975 Publication History

Abstract

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

References

[1]
Aho, A.V.Hopscroft, J,E. and Ullman, D. J., the Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.
[2]
Booth, T.U Sequential Machines and Automata Theory. Wiley, New York, 1967.
[3]
Brzozowski, J.A. Derivatives of regular expressions. J. ACM 11:4 (October 1964), 481-494.
[4]
Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.
[5]
Fischer, M.J., and Paterson, M.S. String matching and other products. Technical Report 41, Project MAC, M.I.T., 1974.
[6]
Gimpel, J.A. A theory of discrete.patterns and their implementation in SNOBOL4. Comm. ACM 16:2 (February 1973), 91-100.
[7]
Harrison, M.C. Implementation of the substring test by hashing. Comm. ACM14:12 (December 1971), 777-779.
[8]
Johnson, W.L., Porter, J.H., Ackley, S.I., and Ross, D.T. Automatic generation of efficient lexical processors using finite state techniques. Comm. ACMll:I2 (December 1968), 805-813.
[9]
Kernighan, B.W., and Cherry, L.L. A system for typesetting mathematics. Comm. ACM18:3 (March 1975), 151-156.
[10]
Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.
[11]
Knuth, D.E. Fundamental Algorithms, second edition, The Art of Computer Programming 1, Addison-Wesley, Reading, Mass., 1973.
[12]
Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.
[13]
Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.
[14]
Kohavi, Z. Switching and Finite Automata Theory. McGraw- Hill, New York, 1970.
[15]
McNaughton, R., and Yamada, H. Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9:1 (1960), 39-47.
[16]
Rabin, M.O., and Scott, D. Finite automata and their decision problems. IBM J. Research and Development 3, (1959), 114-125.
[17]
Thompson, K. Regular search expression algorithm. Comm. ACM 11:6 (June 1968), 419-422.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 18, Issue 6
June 1975
51 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/360825
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1975
Published in CACM Volume 18, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bibliographic search
  2. computational complexity
  3. finite state machines
  4. information retrieval
  5. keywords and phrases
  6. string pattern matching
  7. text-editing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,581
  • Downloads (Last 6 weeks)195
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media