skip to main content
10.1145/3123939.3123983acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

UDP: a programmable accelerator for extract-transform-load workloads and more

Published: 14 October 2017 Publication History

Abstract

Big data analytic applications give rise to large-scale extract-transform-load (ETL) as a fundamental step to transform new data into a native representation. ETL workloads pose significant performance challenges on conventional architectures, so we propose the design of the unstructured data processor (UDP), a software programmable accelerator that includes multi-way dispatch, variable-size symbol support, Flexible-source dispatch (stream buffer and scalar registers), and memory addressing to accelerate ETL kernels both for current and novel future encoding and compression. Specifically, UDP excels at branch-intensive and symbol and pattern-oriented workloads, and can offload them from CPUs.
To evaluate UDP, we use a broad set of data processing workloads inspired by ETL, but broad enough to also apply to query execution, stream processing, and intrusion detection/monitoring. A single UDP accelerates these data processing tasks 20-fold (geometric mean, largest increase from 0.4 GB/s to 40 GB/s) and performance per watt by a geomean of 1,900-fold. UDP ASIC implementation in 28nm CMOS shows UDP logic area of 3.82mm2 (8.69mm2 with 1MB local memory), and logic power of 0.149W (0.864W with 1MB local memory); both much smaller than a single core.

References

[1]
{n. d.}. Cadence Tensilica Xtensa. https://ip.cadence.com/uploads/902/TIP_What_Why_How_Cust_Processors_WP_V3_FINAL-pdf. ({n. d.}).
[2]
{n. d.}. IBM Netezza Data Warehouse Appliances. http://www-01.ibm.com/software/data/netezza/. ({n. d.}).
[3]
{n. d.}. NEON - ARM. https://www.arm.com/products/processors/technologies/neon.php. ({n. d.}).
[4]
{n. d.}. Oracle Exadata Storage Server. http://www.oracle.com/technetwork/index.html. ({n. d.}).
[5]
2001. Canterbury Corpus. (2001). http://corpus.canterbury.ac.nz/
[6]
2008. CACTI 6.5. http://www.cs.utah.edu/~rajeev/cacti6/. (2008).
[7]
2008. IEEE 754 floating-point format. (2008). http://grouper.ieee.org/groups/754/
[8]
2009. libcsv C library. https://sourceforge.net/projects/libcsv/. (2009).
[9]
2010. The IBM Power Edge of Network Processor. (2010). http://www.cercs.gatech.edu/iucrc10/material/franke.pdf
[10]
2010. Intel Xeon Processor E5620 Specification. (2010). https://ark.intel.com/products/47925
[11]
2011. The ARMv8 Architecture, white paper. (2011). https://www.arm.com/files/downloads/ARMv8_white_paper_v5.pdf
[12]
2011. Cavium NITROX DPI L7 Content Processor Family. (2011). http://www.cavium.com/processor_NITROX-DPI.html
[13]
2011. PARSEC 3.0. (2011). http://parsec.cs.princeton.edu/
[14]
2012. Big Data Research and Development Initiative. https://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf. (2012).
[15]
2012. Boost C++ library. http://www.boost.org/. (2012).
[16]
2012. Chicago City Crime Report. (2012). http://data.cityofchicago.org
[17]
2012. Chicago City Restaurant Inspection. (2012). http://data.cityofchicago.org
[18]
2013. Apache Parquet C++ library. https://github.com/apache/parquet-cpp. (2013).
[19]
2013. Frontiers in Massive Data Analysis. National Research Council Press. ISBN: 978-0-309-28778-4
[20]
2013. Intel Advanced Vector Extensions. (2013). https://software.intel.com/en-us/isa-extensions/intel-avx
[21]
2013. Intel communications chipset 8955. (2013). http://ark.intel.com/products/80372/Intel-DH8955-PCH
[22]
2013. libhuffman C library. https://github.com/drichardson/huffman. (2013).
[23]
2013. New York City Taxi Report. http://www.andresmh.com/nyctaxitrips/. (2013).
[24]
2014. Berkeley Big Data Benchmark. (2014). https://amplab.cs.berkeley.edu/benchmark/
[25]
2015. Intel Hyperscan. (2015). https://github.com/01org/hyperscan
[26]
2015. Sparc M7 Die Size (wikipedia). https://en.wikipedia.org/wiki/SPARC. (2015).
[27]
2016. Federal Big Data Research and Development Strategic Plan. http://www.whitehouse.gov/. (May 2016).
[28]
2016. GNU Scientific Library, https://www.gnu.org/software/gsl/. (2016).
[29]
2016. Google Snappy compression library. https://github.com/google/snappy. (2016).
[30]
2016. Intel Chipset 89xx Series. http://www.intel.com/content/dam/www/public/us/en/documents/solution-briefs/scaling-acceleration-capacity-brief.pdf. (2016).
[31]
2016. Keysight CX3300 Appliance. (2016). http://www.keysight.com/en/pc-2633352/device-current-waveform-analyzers?cc=US&lc=eng
[32]
2016. M7: Next Generation SPARC. (2016). http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/migration/m7-next-gen-spare-presentation-2326292.html
[33]
2016. PostgreSQL Database. (2016). https://www.postgresql.org/
[34]
2017. Intel64 and IA-32 Architectures. (2017). https://software.intel.com/en-us/articles/intel-sdm
[35]
2017. TPC-H Benchmark. http://www.tpc.org/tpch/. (2017).
[36]
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling queries on compressed data. In Proc. of NSDI'15.
[37]
Ioannis Alagiannis et al. 2012. NoDB: efficient query execution on raw data files. In Proc. of SIGMOD'12. ACM, 241--252.
[38]
Jorge Albericio et al. 2014. Wormhole: Wisely predicting multidimensional branches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 509--520.
[39]
James Allen et al. 1994. Huffman decoder architecture for high speed operation and reduced memory. (1994). US Patent 5,325,092.
[40]
Kevin Angstadt, Westley Weimer, and Kevin Skadron. 2016. RAPID Programming of Pattern-Recognition Processors. In Proc. of ASPLOS'16.
[41]
Jeff Barr. 2016. Developer Preview âĂŞ EC2 Instances (F1) with Programmable Hardware. https://aws.amazon.com/blogs/aws/developer-preview-ec2-instances-f1-with-programmable-hardware/. (nov 2016).
[42]
C. Gordon Bell. 1977. What Have We Learned from the PDP-11? Springer Netherlands.
[43]
Shekhar Borkar and Andrew A. Chien. 2011. The Future of Microprocessors. Commun. ACM 54, 5 (May 2011), 67--77.
[44]
Robert D. Cameron et al. 2014. Bitwise Data Parallelism in Regular Expression Matching. In Proc. of PACT '14.
[45]
Adrian Caulfield et al. 2016. A Cloud-Scale Acceleration Architecture. In Proc. of MICRO'16. ACM/IEEE.
[46]
Fay Chang et al. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008).
[47]
Andrew A. Chien, Allan Snavely, and Mark Gahagan. 2011. 10x10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency. In The Third Workshop on Emerging Parallel Architctures at the International Conference on Computational Science.
[48]
Andrew A. Chien, Tung Thanh-Hoang, Dilip Vasudevan, Yuanwei Fang, and Amirali Shambayati. 2015. 10x10: A Case Study in Highly-Programmable and Energy-Efficient Heterogeneous Federated Architecture. ACM SIGARCH Computer Architecture News 43, 3 (2015), 2--9.
[49]
William J Dally et al. 2004. Stream Processors: Programmability and Efficiency. Queue 2, 1 (March 2004).
[50]
Ahmed Elgohary et al. 2016. Compressed linear algebra for large-scale machine learning. Proceedings of the VLDB Endowment 9, 12 (2016), 960--971.
[51]
Yuanwei Fang et al. 2014. Generalized Pattern Matching Micro-Engine, in 4th Workshop on Architectures and Systems for Big Data (ASBD) held with ISCA'14 (2014).
[52]
Yuanwei Fang and Andrew A. Chien. 2017. UDP System Interface and Lane ISA Definition. Technical Report, https://newtraell.cs.uchicago.edu/research/publications/techreports/TR-2017-05
[53]
Yuanwei Fang, Andrew A Chien, Andrew Lehane, and Lee Barford. 2016. Performance of parallel prefix circuit transition localization of pulsed waveforms. In 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings.
[54]
Yuanwei Fang, Tung T. Hoang, Michela Becchi, and Andrew A. Chien. 2015. Fast Support for Unstructured Data Processing: The Unified Automata Processor. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48).
[55]
Yuanwei Fang, Andrew Lehane, and Andrew A. Chien. 2015. EffCLiP: Efficient Coupled-Linear Packing for Finite Automata. University of Chicago Technical Report, TR-2015--05 (May 2015).
[56]
Jeremy Fowers et al. 2015. A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs. In Proc. of FCCM '15.
[57]
Vaibhav Gogte et al. 2016. HARE: Hardware Accelerator for Regular Expressions. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE.
[58]
Bjørn Bugge Grathwohl, Fritz Henglein, Ulrik Terp Rasmussen, Kristoffer Aalund Søholm, and Sebastian Paaske Tørholm. 2016. Kleenex: Compiling Nondeterministic Transducers to Deterministic Streaming Transducers. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16).
[59]
Shay Gueron. 2012. Intel Advanced Encryption Standard (AES) New Instructions Set. https://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set. (September 2012).
[60]
Peter F Hallin and David W Ussery. 2004. CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data. In Bioinformatics, Vol. 20. Oxford Univ Press.
[61]
Timothy Heil et al. 2014. Architecture and Performance of the Hardware Accelerators in IBM PowerEN Processor. ACM Trans. Parallel Comput. 1, 1 (May 2014).
[62]
John E. Hopcroft and Jeffrey D. Ullman. 1969. Formal Languages and Their Relation to Automata.
[63]
Zsolt Istvan et al. 2014. Histograms As a Side Effect of Data Movement for Big Data. In Proc. of SIGMOD '14.
[64]
Daniel A Jiménez. 2005. Piecewise linear branch prediction. In Proc. Of ISCA'05. IEEE Computer Society.
[65]
Brucek Khailany et al. 2001. Imagine: Media Processing with Streams. IEEE Micro 21, 2 (March 2001).
[66]
Sailesh Kumar et al. 2006. Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection. Proc. of SIGCOMM'06 (Aug. 2006).
[67]
Chih-Chieh Lee, I-Cheng K Chen, and Trevor N Mudge. 1997. The bi-mode branch predictor. In Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture. IEEE Computer Society, 4--13.
[68]
Penny Li et al. 2015. 4.2 A 20nm 32-Core 64MB L3 cache SPARC M7 processor. In Solid-State Circuits Conference-(ISSCC), 2015 IEEE International. IEEE, 1--3.
[69]
Sergey Melnik et al. 2010. Dremel: Interactive Analysis of Web-scale Datasets. In PVLDB'10.
[70]
Tobias Mühlbauer et al. 2013. Instant Loading for Main Memory Databases. Proceedings of the VLDB Endowment 6, 14 (Sept. 2013).
[71]
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-parallel Finite-state Machines. In Proc. of ASPLOS '14.
[72]
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-parallel Finite-state Machines. In Proc. of ASPLOS'14.
[73]
Valentina Salapura et al. 2012. Accelerating Business Analytics Applications. In Proc. of HPCA '12.
[74]
Mike Stonebraker et al. 2005. C-store: a column-oriented DBMS. In Proc. of VLDB'05. VLDB Endowment.
[75]
Arun Subramaniyan and Reetuparna Das. 2017. Parallel Automata Processor. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 600--612.
[76]
Tung Thanh-Hoang et al. 2016. A Data Layout Transformation (DLT) accelerator: Architectural support for data movement optimization in accelerated-centric heterogeneous systems. In Proc. of DATE'16. IEEE.
[77]
T. Thanh-Hoang, A. Shambayati, C. Deutschbein, H. Hoffmann, and A. A. Chien. 2014. Performance and energy limits of a processor-integrated EFT accelerator. In 2014 IEEE High Performance Extreme Computing Conference (HPEC). 1--6.
[78]
T. Thanh-Hoang, A. Shambayati, H. Hoffmann, and A. A. Chien. 2015. Does arithmetic logic dominate data movement? a systematic comparison of energy-efficiency for EFT accelerators. In 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 66--67.
[79]
Jim Turley. 2014. Introduction to Intel Architecture, white paper. (2014). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-introduction-basics-paper.pdf
[80]
Jan Van Lunteren et al. 2012. Designing a Programmable Wire-Speed Regular-Expression Matching Accelerator. In MICRO'12.
[81]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. Proc. VLDB Endow. 7, 11 (July 2014).
[82]
Lisa Wu et al. 2014. Q100: The Architecture and Design of a Database Processing Unit. In Proc. of ASPLOS '14.
[83]
Tse-Yu Yeh and Yale N Patt. 1991. Two-level adaptive training branch prediction. In Proceedings of the 24th annual international symposium on Microarchitecture. ACM, 51--61.
[84]
Tse-Yu Yeh and Yale N Patt. 1992. Alternative implementations of two-level adaptive branch prediction. In Proc. of ISCA'92. ACM, 124--134.
[85]
Xiaodong Yu and Michela Becchi. 2013. GPU Acceleration of Regular Expression Matching for Large Datasets: Exploring the Implementation Space. In Proc. of CF '13.
[86]
Zhijia Zhao and Xipeng Shen. 2015. On-the-fly principled speculation for FSM parallelization. In Proc. of ASPLOS'15. ACM, 619--630.
[87]
Yuan Zu et al. 2012. GPU-based NFA Implementation for Memory Efficient High Speed Regular Expression Matching. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
October 2017
850 pages
ISBN:9781450349529
DOI:10.1145/3123939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compression
  2. control-flow accelerator
  3. data analytics
  4. data encoding and transformation
  5. parsing

Qualifiers

  • Research-article

Funding Sources

  • Defense Advanced Research Projects Agency

Conference

MICRO-50
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)129
  • Downloads (Last 6 weeks)10
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media