skip to main content
10.1145/3293883.3295705acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

QTLS: high-performance TLS asynchronous offload framework with Intel® QuickAssist technology

Published: 16 February 2019 Publication History

Abstract

Hardware accelerators are a promising solution to optimize the Total Cost of Ownership (TCO) of cloud datacenters. This paper targets the costly Transport Layer Security (TLS) and investigates the TLS acceleration for the widely-deployed event-driven TLS servers or terminators. Our study reveals an important fact: the straight offloading of TLS-involved crypto operations suffers from the frequent long-lasting blockings in the offload I/O, leading to the underutilization of both CPU and accelerator resources.
To achieve efficient TLS acceleration for the event-driven web architecture, we propose QTLS, a high-performance TLS asynchronous offload framework based on Intel® QuickAssist Technology (QAT). QTLS re-engineers the TLS software stack and divides the TLS offloading into four phases to eliminate blockings. Then, multiple crypto operations from different TLS connections can be offloaded concurrently in one process/thread, bringing a performance boost. Moreover, QTLS is built with a heuristic polling scheme to retrieve accelerator responses efficiently and timely, and a kernel-bypass notification scheme to avoid expensive switches between user mode and kernel mode while delivering async events. The comprehensive evaluation shows that QTLS can provide up to 9x connections per second (CPS) with TLS-RSA (2048bit), 2x secure data transfer throughput and 85% reduction of average response time compared to the software baseline.

References

[1]
Daniel J. Bernstein and Tanja Lange. 2017. SafeCurves: choosing safe curves for elliptic-curve cryptography. Retrieved November 30, 2018 from https://safecurves.cr.yp.to/
[2]
Dan Boneh and Hovav Shacham. 2002. Fast variants of RSA. Crypto-Bytes 5, 1 (2002), 1--9.
[3]
Ran Canetti, Shai Halevi, and Jonathan Katz. 2003. A Forward-Secure Public-Key Encryption Scheme. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT). 255--271.
[4]
Claude Castelluccia, Einar Mykletun, and Gene Tsudik. 2006. Improving secure server performance by re-balancing SSL/TLS handshakes. In Proceedings of the ACM Symposium on Information, computer and communications security (ASIACCS). 26--34.
[5]
Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A Cloud-Scale Acceleration Architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 7:1--7:13.
[6]
Cristian Coarfa, Peter Druschel, and Dan S Wallach. 2006. Performance Analysis of TLS Web servers. ACM Transactions on Computer Systems (TOCS) 24, 1 (2006), 39--69.
[7]
HAProxy Community. 2018. HAProxy - The Reliable, High Performance TCP/HTTP Load Balancer. Retrieved November 30, 2018 from http://www.haproxy.org/
[8]
Squid Community. 2018. Squid: Optimising Web Delivery. Retrieved November 30, 2018 from http://www.squid-cache.org/
[9]
Cas Cremers, Marko Horvat, Jonathan Hoyland, Sam Scott, and Thyla van der Merwe. 2017. A Comprehensive Symbolic Analysis of TLS 1.3. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS). 1773--1788.
[10]
Tim Dierks. 2008. The transport layer security (TLS) protocol version 1.2. Technical Report.
[11]
Benjamin Erb. 2012. Concurrent programming for scalable web architectures. (2012).
[12]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 51--66.
[13]
OpenSSL Software Foundation. 2018. OpenSSL: Cryptography and SSL/TLS Toolkit. Retrieved November 30, 2018 from https://www.openssl.org/
[14]
The Apache Software Foundation. 2018. The Apache HTTP Server Project. Retrieved November 30, 2018 from https://httpd.apache.org/
[15]
Owen Garrett. 2015. NGINX vs. Apache: Our View of a Decade-Old Question. Retrieved November 30, 2018 from https://www.nginx.com/blog/nginx-vs-apache-our-view/
[16]
Vinodh Gopal, James Guilford, Erdinc Ozturk, Wajdi Feghali, Gil Wolrich, and Martin Dixon. 2009. Fast and constant-time implementation of modular exponentiation. (2009).
[17]
Shay Gueron and Vlad Krasnov. 2015. Fast prime field elliptic-curve cryptography with 256-bit primes. Journal of Cryptographic Engineering 5, 2 (2015), 141--151.
[18]
Owen Harrison and John Waldron. 2008. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the 17th USENIX Security Symposium (Security). 195--210.
[19]
Intel. 2014. Intel® QuickAssist Technology Performance Optimization Guide. Technical Report. https://01.org/sites/default/files/page/330687_qat_perf_opt_guide_rev_1.0.pdf
[20]
Intel. 2018. Intel® QuickAssist Technology (Intel® QAT). Retrieved November 30, 2018 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
[21]
Intel and Wangsu. 2018. Working Together to Build a High Efficiency CDN System for HTTPS. Technical Report. https://01.org/sites/default/files/downloads/intelr-quickassist-technology/i12036-casestudy-intelqatcdnaccelerationen337190-001us.pdf
[22]
Takashi Isobe, Satoshi Tsutsumi, Koichiro Seto, Kenji Aoshima, and Kazutoshi Kariya. 2010. 10 Gbps Implementation of TLS/SSL Accelerator on FPGA. In Proceedings of the 18th International Workshop on Quality of Service (IWQoS). 1--6.
[23]
Keon Jang, Sangjin Han, Seungyeop Han, Sue B Moon, and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors. In Proceedings of the 8th USENIX conference on Networked Systems Design and Implementation (NSDI).
[24]
Zia-Uddin-Ahamed Khan and Mohammed Benaissa. 2015. Throughput/area-efficient ECC processor using Montgomery point multiplication on FPGA. IEEE Transactions on Circuits and Systems II: Express Briefs 62, 11 (2015), 1078--1082.
[25]
Moein Khazraee, Lu Zhang, Luis Vega, and Michael Bedford Taylor. 2017. Moonwalk: Nre Optimization in ASIC Clouds. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 511--526.
[26]
Michael E. Kounavis, Xiaozhu Kang, Ken Grewal, Mathew Eszenyi, Shay Gueron, and David Durham. 2010. Encrypting the Internet. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 135--146.
[27]
Oliver Kowalke. 2018. Boost.Fiber 1.68.0 Overview. Retrieved November 30, 2018 from https://www.boost.org/doc/libs/1_68_0/libs/fiber/doc/html/fiber/overview.html
[28]
Hugo Krawczyk and Pasi Eronen. 2010. HMAC-based Extract-and-Expand Key Derivation Function (HKDF). Technical Report.
[29]
Yang Liu, Jianguo Wang, and Steven Swanson. 2018. Griffin: Uniting CPU and GPU in Information Retrieval Systems for Intra-Query Parallelism. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). 327--337.
[30]
Peter Membrey, David Hows, and Eelco Plugge. 2012. SSL load balancing. In Practical Load Balancing. Springer, 175--192.
[31]
Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Annual Conference of ACM Special Interest Group on Data Communication (SIGCOMM). 15--28.
[32]
David Naylor, Alessandro Finamore, Ilias Leontiadis, Yan Grunenberger, Marco Mellia, Maurizio Munafò, Konstantina Papagiannaki, and Peter Steenkiste. 2014. The Cost of the "S" inHTTPS. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies (CoNEXT). 133--140.
[33]
Q-Success. 2018. Usage of Nginx broken down by ranking. Retrieved November 30, 2018 from https://w3techs.com/technologies/breakdown/ws-nginx/ranking
[34]
Qualys. 2018. SSL Pulse. Retrieved November 30, 2018 from https://www.ssllabs.com/ssl-pulse/
[35]
Amir Rawdat. 2017. Testing the Performance of NGINX and NGINX Plus Web Servers. Retrieved November 30, 2018 from https://www.nginx.com/blog/testing-the-performance-of-nginx-and-nginx-plus-web-servers/
[36]
Will Reese. 2008. Nginx: the High-performance Web Server and Reverse Proxy. Linux Journal 2008, 173, Article 2 (2008).
[37]
Eric Rescorla. 2018. The transport layer security (TLS) protocol version 1.3. Technical Report.
[38]
Hovav Shacham and Dan Boneh. 2001. improving SSL Handshake Performance via Batching. In Proceedings of the Cryptographer's Track at RSA Conference (CT-RSA). 28--43.
[39]
Hovav Shacham, Dan Boneh, and Eric Rescorla. 2004. Client-side caching for TLS. ACM Transactions on Information and System Security (TISSEC) 7, 4 (2004), 553--575.
[40]
Mostafa I Soliman and Ghada Y Abozaid. 2011. FPGA implementation and performance evaluation of a high throughput crypto coprocessor. Journal of Parallel and Distributed Computing (JPDC) 71, 8 (2011), 1075--1084.
[41]
Alibaba Open Source. 2018. tengine qat ssl. Retrieved November 30, 2018 from http://tengine.taobao.org/document/tengine_qat_ssl.html
[42]
Drew Springall, Zakir Durumeric, and J Alex Halderman. 2016. Measuring the Security Harm of TLS Crypto Shortcuts. In Proceedings of the Internet Measurement Conference (IMC). 33--47.
[43]
Robert Szerwinski and Tim Güneysu. 2008. Exploiting the Power of GPUs for Asymmetric Cryptography. In Proceedings of the 10th International Workshop on Cryptographic Hardware and Embedded Systems (CHES).
[44]
LiteSpeed Technologies. 2018. Event-Driven vs. Process-Based Web Servers. Retrieved November 30, 2018 from https://www.litespeedtech.com/products/litespeed-web-server/features/event-driven-architecture
[45]
Changzheng Wei, Jian Li, Weigang Li, Ping Yu, and Haibing Guan. 2017. STYX: A Trusted and Accelerated Hierarchical SSL Key Management and Distribution System for Cloud Based CDN Application. In Proceedings of the ACM Symposium on Cloud Computing (SoCC). 201--213.
[46]
Wikipedia. 2018. AES instruction set. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/AES_instruction_set
[47]
Wikipedia. 2018. Elliptic-curve cryptography. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_cryptography
[48]
Wikipedia. 2018. Elliptic-curve Diffie-Hellman. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Elliptic-curve_DiffieHellman
[49]
Wikipedia. 2018. Epoll. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Epoll
[50]
Wikipedia. 2018. Fiber (computer science). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Fiber_(computer_science)
[51]
Wikipedia. 2018. File descriptor. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/File_descriptor
[52]
Wikipedia. 2018. Kqueue. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Kqueue
[53]
Wikipedia. 2018. OpenSSL. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/OpenSSL
[54]
Wikipedia. 2018. Pseudorandom function family. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Pseudorandom_function_family
[55]
Wikipedia. 2018. RSA (cryptosystem). Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/RSA_(cryptosystem)
[56]
Wikipedia. 2018. TLS termination proxy. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/TLS_termination_proxy
[57]
Wikipedia. 2018. Transport Layer Security. Retrieved November 30, 2018 from https://en.wikipedia.org/wiki/Transport_Layer_Security
[58]
Jason Yang and James Goodman. 2007. Symmetric Key Cryptography on Modern Graphics Hardware. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security (ASIACRYPT). 249--264.
[59]
Shun Yao and Dantong Yu. 2017. PhiOpenSSL: Using the Xeon Phi Coprocessor for Efficient Cryptographic Calculations. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 565--574.

Cited By

View all
  • (2024)LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUICProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681670(7918-7927)Online publication date: 28-Oct-2024
  • (2024)HD-IOV: SW-HW Co-designed I/O Virtualization with Scalability and Flexibility for Hyper-Density CloudProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629557(834-850)Online publication date: 22-Apr-2024
  • (2024)vCrypto: a Unified Para-Virtualization Framework for Heterogeneous Cryptographic ResourcesIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621287(781-790)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming
February 2019
472 pages
ISBN:9781450362252
DOI:10.1145/3293883
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 16 February 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. SSL/TLS
  2. asynchronous offload
  3. crypto accelerator
  4. crypto operations
  5. event-driven web architecture

Qualifiers

  • Research-article

Conference

PPoPP '19

Acceptance Rates

PPoPP '19 Paper Acceptance Rate 29 of 152 submissions, 19%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)8
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media