skip to main content
10.1145/3079079.3079100acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Demystifying automata processing: GPUs, FPGAs or Micron's AP?

Published: 14 June 2017 Publication History

Abstract

Many established and emerging applications perform at their core some form of pattern matching, a computation that maps naturally onto finite automata abstractions. As a consequence, in recent years there has been a substantial amount of work on high-speed automata processing, which has led to a number of implementations targeting a variety of parallel platforms: CPUs, GPUs, FPGAs, ASICs, and Network Processors. More recently, Micron has announced its Automata Processor (AP), a DRAM-based accelerator of non-deterministic finite automata (NFA). Despite the abundance of work in this domain, the advantages and disadvantages of different automata processing accelerators and the innovation space in this area are still unclear.
In this work we target this problem and propose a toolchain to allow an apples-to-apples comparison of NFA acceleration engines on three platforms: GPUs, FPGAs and Micron's AP. We discuss the automata optimizations that are applicable to these three platforms. We perform an evaluation on large-scale datasets: to this end, we propose an NFA partitioning algorithm that minimizes the number of state replications required to maintain functional equivalence with an unpartitioned NFA, and we evaluate the scalability of each implementation to both large NFAs and large numbers of input streams. Our experimental evaluation covers resource utilization, traversal throughput, and preprocessing overhead and shows that the FPGA provides the best traversal throughputs (on the order of Gbps) at the cost of significant preprocessing times (on the order of hours); GPUs deliver modest traversal throughputs (on the order of Mbps), but offer low preprocessing times (on the order of seconds or minutes) and good pattern densities (they can accommodate large datasets on a single device); Micron's AP delivers throughputs, pattern densities, and preprocessing times that are intermediate between those of FPGAs and GPUs, and it is most suited for applications that use datasets consisting of many small NFAs with a topology that is fixed and known a priori.

References

[1]
S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. Turner, "Algorithms to accelerate multiple regular expressions matching for deep packet inspection," in Proc. of SIGCOMM 2006.
[2]
S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese, "Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia," in Proc. of ANCS 2007.
[3]
S. Kumar, J. Turner, and J. Williams, "Advanced algorithms for fast and scalable deep packet inspection," in Proc. of ANCS 2016.
[4]
M. Becchi, and P. Crowley, "An improved algorithm to accelerate regular expression evaluation," in Proc. of ANCS 2007.
[5]
M. Becchi, and P. Crowley, "A hybrid finite automaton for practical deep packet inspection," in Proc. of CoNEXT 2007.
[6]
M. Becchi, and P. Crowley, "Extending finite automata to efficiently match Perl-compatible regular expressions," in Proc. of CoNEXT 2008.
[7]
R. Smith, C. Estan, S. Jha, and S. Kong, "Deflating the big bang: fast and scalable deep packet inspection with extended finite automata," in Proc. of SIGCOMM 2008.
[8]
A. X. Liu, and E. Torng, "An overlay automata approach to regular expression matching," in Proc. of INFOCOM 2014.
[9]
X. Yu, B. Lin, and M. Becchi, "Revisiting State Blow-up: Automatically Building Augmented-FA while Preserving Functional Equivalence," JSAC, vol. 32, no. 10, pp. 1822--1833, Oct. 2014.
[10]
N. Cascarano, P. Rolando, F. Risso, and R. Sisto, "iNFAnt: NFA pattern matching on GPGPU devices," SIGCOMM Comput. Commun. Rev., vol. 40, no. 5, pp. 20--26, 2010.
[11]
Y. Zu, M. Yang, Z. Xu, L. Wang, X. Tian, K. Peng, and Q. Dong, "GPU-based NFA implementation for memory efficient high speed regular expression matching," in Proc. of PPOPP 2012.
[12]
X. Yu, and M. Becchi, "GPU acceleration of regular expression matching for large datasets: exploring the implementation space," in Proc. of CF 2013.
[13]
R. Sidhu, and V. K. Prasanna, "Fast Regular Expression Matching Using FPGAs," in Proc. of FCCM 2001.
[14]
M. Becchi, and P. Crowley, "Efficient regular expression evaluation: theory to practice," in Proc. of ANCS 2008.
[15]
Y.-H. E. Yang, W. Jiang, and V. K. Prasanna, "Compact architecture for high-throughput regular expression matching on FPGA," in Proc. of ANCS 2008.
[16]
A. Mitra, W. Najjar, and L. Bhuyan, "Compiling PCRE to FPGA for accelerating SNORT IDS," in Proc. of ANCS 2007.
[17]
B. C. Brodie, D. E. Taylor, and R. K. Cytron, "A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching," in Proc. of ISCA 2006.
[18]
J. Van Lunteren, C. Hagleitner, T. Heil, G. Biran, U. Shvadron, and K. Atasu, "Designing a Programmable Wire-Speed Regular-Expression Matching Accelerator," in Proc. of MICRO 2012.
[19]
Y. Fang, T. T. Hoang, M. Becchi, and A. A. Chien, "Fast support for unstructured data processing: the unified automata processor," in Proc. of MICRO 2015.
[20]
M. Becchi, C. Wiseman, and P. Crowley, "Evaluating regular expression matching engines on network and general purpose processors," in Proc. of ANCS 2009.
[21]
P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes, "An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing," TPDS, vol. PP, no. 99, pp. 1--1, 2014.
[22]
I. Roy, and S. Aluru, "Finding Motifs in Biological Sequences Using the Micron Automata Processor," in Proc. of IPDPS 2014.
[23]
K. Wang, Y. Qi, J. J. Fox, M. R. Stan, and K. Skadron, "Association Rule Mining with the Micron Automata Processor," in Proc. of IPDPS 2015.
[24]
K. Zhou, J. Wadden, J. J. Fox, K. Wang, D. E. Brown, and K. Skadron, "Regular expression acceleration on the micron automata processor: Brill tagging as a case study," Proc. of Big Data 2015.
[25]
I. Roy, A. Srivastava, M. Nourian, M. Becchi, and S. Aluru, "High Performance Pattern Matching Using the Automata Processor," in Proc. of IPDPS 2016.
[26]
I. Roy, N. Jammula, and S. Aluru, "Algorithmic Techniques for Solving Graph Problems on the Automata Processor," in Proc of IPDPS 2016.
[27]
K. Wang, E. Sadredini, and K. Skadron, "Sequential pattern mining with the Micron automata processor," in Proc. of CF 2016.
[28]
J. E. Hopcroft, and J. Ullman, Introduction to automata theory, languages, and computation: Addison-Wesley, Reading, Massachusetts, 1979.
[29]
F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, "Fast and memory-efficient regular expression matching for deep packet inspection," in Proc. of ANCS 2006.
[30]
K. Angstadt, W. Weimer, and K. Skadron, "RAPID Programming of Pattern-Recognition Processors," in Proc. of ASPLOS 2016.
[31]
G. Karypis, and V. Kumar, "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs," SIAM J. Sci. Comput., vol. 20, no. 1, pp. 359--392, 1998.
[32]
J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, and M. Stan, "ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures," in Proc. of IISWC 2016.
[33]
A. Todd, H. Truong, J. Deters, J. Long, G. Conant, and M. Becchi, "Parallel Gene Upstream Comparison via Multi-Level Hash Tables on GPU," in Proc. of ICPADS 2016.
[34]
M. Becchi, M. Franklin, and P. Crowley, "A workload for evaluating deep packet inspection architectures," in Proc of IISWC 2008.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '17: Proceedings of the International Conference on Supercomputing
June 2017
300 pages
ISBN:9781450350204
DOI:10.1145/3079079
  • General Chairs:
  • William D. Gropp,
  • Pete Beckman,
  • Program Chairs:
  • Zhiyuan Li,
  • Francisco J. Cazorla
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. GPU
  3. Micron's AP
  4. automata processing
  5. pattern matching
  6. reconfigurable computing

Qualifiers

  • Research-article

Conference

ICS '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media