skip to main content
research-article

Sorting in Memristive Memory

Published: 13 October 2022 Publication History

Abstract

Sorting data is needed in many application domains. Traditionally, the data is read from memory and sent to a general-purpose processor or application-specific hardware for sorting. The sorted data is then written back to the memory. Reading/writing data from/to memory and transferring data between memory and processing unit incur significant latency and energy overhead. In this work, we develop the first architectures for in-memory sorting of data to the best of our knowledge. We propose two architectures. The first architecture is applicable to the conventional format of representing data, i.e., weighted binary radix. The second architecture is proposed for developing unary processing systems, where data is encoded as uniform unary bit-streams. As we present, each of the two architectures has different advantages and disadvantages, making one or the other more suitable for a specific application. However, the common property of both is a significant reduction in the processing time compared to prior sorting designs. Our evaluations show on average 37 × and 138× energy reduction for binary and unary designs, respectively, compared to conventional CMOS off-memory sorting systems in a 45 nm technology. We designed a 3×3 and a 5×5 Median filter using the proposed sorting solutions, which we used for processing 64×64 pixel images. Our results show a reduction of 14× and 634× in energy and latency, respectively, with the proposed binary, and 5.6× and 152×103 in energy and latency with the proposed unary approach compared to those of the off-memory binary and unary designs for the 3 × 3 Median filtering system.

References

[1]
Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’17). 481–492.
[2]
Sherenaz W. Al-Haj Baddar and Basel A. Mahafzah. 2014. Bitonic sort on a chained-cubic tree interconnection network. J. Parallel Distrib. Comput. 74, 1 (2014), 1744–1761.
[3]
Armin Alaghi and John P. Hayes. 2013. Exploiting correlation in stochastic circuit design. In Proceedings of the IEEE 31st International Conference on Computer Design (ICCD’13). 39–46.
[4]
Armin Alaghi, Weikang Qian, and John P. Hayes. 2018. The promise and challenge of stochastic computing. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 8 (2018), 1515–1531.
[5]
Mohsen Riahi Alam, M. Hassan Najafi, and Nima Taheri Nejad. 2021. Exact stochastic computing multiplication in memristive memory. IEEE Design Test 38, 6 (2021), 36–43. DOI:
[6]
Shaahin Angizi, Zhezhi He, Adnan Siraj Rakin, and Deliang Fan. 2018. CMP-PIM: An energy-efficient comparator-based processing-in-memory neural network accelerator. In Proceedings of the 55th Annual Design Automation Conference (DAC’18). Association for Computing Machinery, New York, NY, Article 105, 6 pages.
[7]
Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H. Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-Data processing: Insights from a MICRO-46 workshop. IEEE Micro 34, 4 (2014), 36–42.
[8]
Ken E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the Joint Computer Conference (AFIPS’68). Association for Computing Machinery, New York, NY, 307–314.
[9]
Mohammed Bey Ahmed Khernache, Arezki Laga, and Jalil Boukhobza. 2018. MONTRES-NVM: An external sorting algorithm for hybrid memory. In Proceedings of the IEEE 7th Nonvolatile Memory Systems and Applications Symposium (NVMSA’18). 49–54.
[10]
Julien Borghetti, Gregory S. Snider, Philip J. Kuekes, J. Joshua Yang, Duncan R. Stewart, and Richard Stanley Williams. 2010. “Memristive” switches enable “stateful” logic operations via material implication. Nature 464 (Apr. 2010), 873–876.
[11]
Vladimir Brajovic and Takeo Kanade. 1999. A VLSI sorting image sensor: Global massively parallel intensity-to-time processing for low-latency adaptive vision. IEEE Trans. Robot. Autom. 15, 1 (1999), 67–75.
[12]
Gabriele Capannini, Fabrizio Silvestri, and Ranieri Baraglia. 2012. Sorting on GPUs for large scale datasets: A thorough comparison. Info. Process. Manage. 48, 5 (2012), 903–917.
[13]
Daniel Cederman and Philippas Tsigas. 2010. GPU-Quicksort: A practical quicksort algorithm for graphics processors. ACM JEA 14, Article 4 (2010), 84 pages.
[14]
Chaitali Chakrabarti and Li-Yu Wang. 1994. Novel sorting network-based architectures for rank order filters. IEEE TVLSI 2, 4 (Dec. 1994), 502–507.
[15]
Ren Chen and Viktor K. Prasanna. 2017. Computer generation of high throughput and memory efficient sorting designs on FPGA. IEEE Trans. Parallel Distrib. Syst. 28, 11 (2017), 3100–3113.
[16]
Long Cheng, Hao-Xuan Zheng, Yi Li, Ting-Chang Chang, Simon M. Sze, and Xiangshui Miao. 2020. In-memory digital comparator based on a single multivalued one-transistor-one-resistor memristor. IEEE Trans. Electron Dev. 67, 3 (2020), 1293–1296.
[17]
Zhaole Chu, Yongping Luo, Peiquan Jin, and Shouhong Wan. 2021. NVMSorting: Efficient sorting on non-volatile memory. In Proceedings of the 33rd International Conference on Software Engineering and Knowledge Engineering (SEKE’21).
[18]
Alberto Colavita, Enzo Mumolo, and Gabriele Capello. 1997. A novel sorting algorithm and its application to a gamma-ray telescope asynchronous data acquisition system. Nuclear Inst. Methods Phys. Res. Sec. A 394, 3 (1997), 374–380.
[19]
S. Rasoul Faraji and Kia Bazargan. 2020. Hybrid binary-unary hardware accelerator. IEEE Trans. Comput. 69, 9 (2020), 1308–1319.
[20]
Amin Farmahini-Farahani, Henry J. Duwe III, Michael J. Schulte, and Katherine Compton. 2013. Modular design of high-throughput, low-latency sorting units. IEEE Trans. Comput. 62, 7 (2013), 1389–1402.
[21]
Brian R. Gaines. 1969. Stochastic computing systems. In Advances in Information Systems Science. Springer U.S., 37–172.
[22]
Buǧra Gedik, Rajesh R. Bordawekar, and Philip S. Yu. 2007. CellSort: High performance sorting on the cell processor. In Proceedings of the Conference on Very Large data Bases (VLDB’07). 1286–1297.
[23]
Naga Govindaraju, Jim Gray, Ritesh Kumar, and Dinesh Manocha. 2006. GPUTeraSort: High performance graphics co-processor sorting for large database management. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’06). 325–336.
[24]
Goetz Graefe. 2006. Implementing sorting in database systems. ACM Comput. Surv. 38, 3, Article 10 (Sept. 2006).
[25]
Saransh Gupta, Mohsen Imani, and Tajana Rosing. 2018. FELIX: Fast and energy-efficient logic in memory. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–7.
[26]
Saransh Gupta, Mohsen Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Hassan Najafi, and Tajana Rosing. 2020. SCRIMP: A general stochastic computing architecture using ReRAM in-memory processing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’20). 1598–1601.
[27]
Said Hamdioui, Shahar Kvatinsky, Gert Cauwenberghs, Lei Xie, Nimrod Wald, Siddharth Joshi, Hesham Mostafa Elsayed, Henk Corporaal, and Koen Bertels. 2017. Memristor for computing: Myth or reality? InProceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’17).
[28]
Amir Hossein Jalilvand, M. Hassan Najafi, and Mahdi Fazeli. 2020. Fuzzy-logic using unary bit-stream processing. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5.
[29]
Supreet Jeloka, Naveen Bharathwaj Akesh, Dennis Sylvester, and David Blaauw. 2016. A 28 nm configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit cell enabling logic-in-memory. IEEE J. Solid-State Circ. 51, 4 (2016), 1009–1021.
[30]
Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, and David Glasco. 2011. GPUs and the future of parallel computing. IEEE Micro 31, 5 (2011), 7–17.
[31]
Dirk Koch and Jim Torresen. 2011. FPGASort: A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the Conference on Field-programmable Gate Arrays (FPGA’11).
[32]
Gunjae Koo, Kiran Kumar Matam, Te I., H. V. Krishna Giri Narra, Jing Li, Hung-Wei Tseng, Steven Swanson, and Murali Annavaram. 2017. Summarizer: Trading communication with computing near storage. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’17). 219–231.
[33]
Shahar Kvatinsky, Dmitry Belousov, Slavik Liman, Guy Satat, Nimrod Wald, Eby G. Friedman, Avinoam Kolodny, and Uri C. Weiser. 2014. MAGIC—Memristor-Aided logic. IEEE Trans. Circ. Syst. II: Express Briefs 61, 11 (2014), 895–899.
[34]
Shahar Kvatinsky, Misbah Ramadan, Eby G. Friedman, and Avinoam Kolodny. 2015. VTEAM: A general model for voltage-controlled memristors. IEEE Trans. Circ. Syst. II: Express Briefs 62, 8 (2015), 786–790.
[35]
Peng Li, D. J. Lilja, Weikang Qian, K. Bazargan, and M. D. Riedel. 2014. Computation on stochastic bit streams digital image processing case studies. IEEE TVLSI 22, 3 (2014), 449–462.
[36]
Zheyu Li, Nagadastagiri Challapalle, Akshay Krishna Ramanathan, and Vijaykrishnan Narayanan. 2020. IMC-Sort: In-memory parallel sorting architecture using hybrid memory cube. In Proceedings of the on Great Lakes Symposium on VLSI (GLSVLSI’20). Association for Computing Machinery, New York, NY, 45–50.
[37]
Advait Madhavan, Timothy Sherwood, and Dmitri Strukov. 2014. Race Logic: A hardware acceleration for dynamic programming algorithms. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). 517–528.
[38]
Soheil Mohajer, Zhiheng Wang, and Kia Bazargan. 2018. Routing magic: Performing computations using routing networks and voting logic on unary encoded data. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 77–86.
[39]
M. Hassan Najafi, S. Rasoul Faraji, Kia Bazargan, and David Lilja. 2020. Energy-Efficient pulse-based convolution for near-sensor processing. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5.
[40]
M. Hassan Najafi, David Lilja, Marc D. Riedel, and Kia Bazargan. 2018. Low-Cost sorting network circuits using unary processing. IEEE Trans. VLSI Syst. 26, 8 (Aug. 2018), 1471–1480.
[41]
M. Hassan Najafi, David J. Lilja, Marc Riedel, and Kia Bazargan. 2017. Power and area efficient sorting networks using unary processing. In Proceedings of the IEEE International Conference on Computer Design (ICCD’17). 125–128.
[42]
Hoang Anh Du Nguyen, Jintao Yu, Muath Abu Lebdeh, Mottaqiallah Taouil, Said Hamdioui, and Francky Catthoor. 2020. A classification of memory-centric computing. J. Emerg. Technol. Comput. Syst. 16, 2, Article 13 (Jan. 2020), 26 pages.
[43]
Eesa Nikahd, Payman Behnam, and Reza Sameni. 2016. High-Speed hardware implementation of fixed and runtime variable window length 1-D median filters. IEEE Tran. Circ. Syst. II: Express Briefs (2016).
[44]
Stephan Olarlu, M. Cristina Pinotti, and Si Qing Zheng. 2000. An optimal hardware-algorithm for sorting using a fixed-size parallel sorting device. IEEE Trans. Comput. 49, 12 (2000), 1310–1324.
[45]
Kostas Pagiamtzis and Ali Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE J. Solid-State Circ. 41, 3 (2006), 712–727.
[46]
David K. Pok, Chien-In Chen, John J. Schamus, Christine T. Montgomery, and James B. Y. Tsui. 1997. Chip design for monobit receiver. IEEE Trans. Microwave Theory Techn. 45, 12 (Dec 1997), 2283–2295.
[47]
Wolfgang J. Poppelbaum. 1978. Burst processing: A deterministic counterpart to stochastic computing. In Proceedings of the 1st International Symposium on Stochastic Computing and Its Apps.
[48]
Wolfgang J. Poppelbaum, A. Dollas, J. B. Glickman, and C. O’Toole. 1987. Unary processing. In Advances in Computers. Vol. 26. Elsevier, 47–92.
[49]
Ananth Krishna Prasad, Morteza Rezaalipour, Masoud Dehyadegari, and Mahdi Nazm Bojnordi. 2021. Memristive data ranking. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA’21). 440–452.
[50]
Seth H. Pugsley, Arjun Deb, Rajeev Balasubramonian, and Feifei Li. 2015. Fixed-function hardware sorting accelerators for near data MapReduce execution. In Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD’15). 439–442.
[51]
Weikang Qiao, Jihun Oh, Licheng Guo, Mau-Chung Frank Chang, and Jason Cong. 2021. FANS: FPGA-Accelerated near-storage sorting. In Proceedings of the IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’21). 106–114.
[52]
David Radakovits and Nima TaheriNejad. 2019. Implementation and characterization of a memristive memory system. In Proceedings of the IEEE 32nd Canadian Conference on Electrical and Computer Engineering (CCECE’19). 1–5.
[53]
Zhenyuan Ruan, Tong He, and Jason Cong. 2019. Analyzing and modeling in-storage computing workloads on EISC—An FPGA-Based system-level emulation platform. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). 1–8.
[54]
Sahand Salamat, Armin Haj Aboutalebi, Behnam Khaleghi, Joo Hwan Lee, Yang Seok Ki, and Tajana Rosing. 2021. NASCENT: Near-Storage acceleration of database sort on SmartSSD. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 262–272.
[55]
Nikola Samardzic, Weikang Qiao, Vaibhav Aggarwal, Mau-Chung Frank Chang, and Jason Cong. 2020. Bonsai: High-Performance adaptive merge tree sorting. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). 282–294.
[56]
Nadathur Satish, Mark Harris, and Michael Garland. 2009. Designing efficient sorting algorithms for manycore GPUs. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing. 1–10.
[57]
James E. Smith. 2018. Space-time algebra: A model for neocortical computation. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’18). 289–300.
[58]
Donpaul C. Stephens, Jon C. R. Bennett, and Hui Zhang. 1999. Implementing scheduling algorithms in high-speed networks. IEEE JSAC 17, 6 (June 1999), 1145–1158.
[59]
Nima TaheriNejad. 2021. SIXOR: Single-cycle in-memristor XOR. IEEE Trans. Very Large Scale Integr. Syst. 29, 5 (2021), 925–935.
[60]
Nima TaheriNejad, P. D. Sai Manoj, and Axel Jantsch. 2015. Memristors’ potential for multi-bit storage and pattern learning. In Proceedings of the IEEE European Modelling Symposium (EMS’15). 450–455.
[61]
N. TaheriNejad and D. Radakovits. 2019. From behavioral design of memristive circuits and systems to physical implementations. IEEE Circ. Syst. Mag. 19, 4 (2019), 6–18. DOI:https://doi.org/10.1109/MCAS.2019.2945209
[62]
Nima TaheriNejad, Manoj P. D. Sai, Michael Rathmair, and Axel Jantsch. 2016. Fully digital write-in scheme for multi-bit memristive storage. In Proceedings of the 13th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE’16). 1–6.
[63]
Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. 2015. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources. Technical Report CS2015-1015, Department of Computer Science and Engineering, University of California, San Diego technical report. http://csetechrep.ucsd.edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2015-1015.
[64]
Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, and Timothy Sherwood. 2020. A computational temporal logic for superconducting accelerators. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20). Association for Computing Machinery, New York, NY, 435–448.
[65]
Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel. 2020. UGEMM: Unary computing architecture for GEMM applications. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA’20). 377–390.
[66]
Lei Xie, Hoang Anh Du Nguyen, Jintao Yu, Ali Kaichouhi, Mottaqiallah Taouil, Mohammad AlFailakawi, and Said Hamdioui. 2017. Scouting logic: A novel memristor-based logic design for resistive computing. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17). 176–181.
[67]
Cong Xu, Xiangyu Dong, Norman P. Jouppi, and Yuan Xie. 2011. Design implications of memristor-based RRAM cross-point structures. In Proceedings of the Design, Automation Test in Europe. 1–6.
[68]
Yawen Zhang, Sheng Lin, Runsheng Wang, Yanzhi Wang, Yuan Wang, Weikang Qian, and Ru Huang. 2020. When sorting network meets parallel bitstreams: A fault-tolerant parallel ternary neural network accelerator based on stochastic computing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’20). 1287–1290.
[69]
Mohammed A. Zidan, John Paul Strachan, and Wei D. Lu. 2018. The future of electronics based on memristive systems. Nature Electr. 1 (2018), 22–29.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems
ACM Journal on Emerging Technologies in Computing Systems  Volume 18, Issue 4
October 2022
429 pages
ISSN:1550-4832
EISSN:1550-4840
DOI:10.1145/3563906
  • Editor:
  • Ramesh Karri
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 13 October 2022
Online AM: 11 February 2022
Accepted: 21 January 2022
Revised: 28 October 2021
Received: 25 June 2021
Published in JETC Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. In-memory computation
  2. sorting networks
  3. unary processing
  4. stochastic computing
  5. memristor
  6. median filtering
  7. ReRAM

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Louisiana Board of Regents Support
  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)208
  • Downloads (Last 6 weeks)16
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media