Polynomial Multiplication in NTRU Prime: Comparison of Optimization Strategies on Cortex-M4

Erdem Alkim; Dean Yun-Li  Cheng; Chi-Ming Marvin Chung; Hülya Evkan; Leo Wei-Lun  Huang; Vincent Hwang; Ching-Lin Trista Li; Ruben Niederhagen; Cheng-Jhih Shih; Julian Wälde; Bo-Yin  Yang

doi:10.46586/tches.v2021.i1.217-238

Authors

Erdem Alkim Ondokuz Mayıs University, Samsun, Turkey; Fraunhofer SIT, Darmstadt, Germany
Dean Yun-Li Cheng Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Chi-Ming Marvin Chung Academia Sinica, Taipei, Taiwan
Hülya Evkan Fraunhofer SIT, Darmstadt, Germany
Leo Wei-Lun Huang Academia Sinica, Taipei, Taiwan
Vincent Hwang Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Ching-Lin Trista Li Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan
Ruben Niederhagen University of Southern Denmark, Odense, Denmark
Cheng-Jhih Shih Academia Sinica, Taipei, Taiwan
Julian Wälde Fraunhofer SIT, Darmstadt, Germany
Bo-Yin Yang Academia Sinica, Taipei, Taiwan

DOI:

https://doi.org/10.46586/tches.v2021.i1.217-238

Keywords:

NTT, polynomial multiplication, Cortex-M4, NTRU Prime, PQC

Abstract

This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.

Polynomial Multiplication in NTRU Prime

Comparison of Optimization Strategies on Cortex-M4

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

iacr-logo