skip to main content
research-article

RAPIDx: High-Performance ReRAM Processing In-Memory Accelerator for Sequence Alignment

Published: 01 October 2023 Publication History

Abstract

Genome sequence alignment is the core of many biological applications. The advancement of sequencing technologies produces a tremendous amount of data, making sequence alignment a critical bottleneck in bioinformatics analysis. The existing hardware accelerators for alignment suffer from limited on-chip memory, costly data movement, and poorly optimized alignment algorithms. They cannot afford to concurrently process the massive amount of data generated by sequencing machines. In this article, we propose a ReRAM-based accelerator, RAPIDx, using processing in-memory (PIM) for sequence alignment. RAPIDx achieves superior efficiency and performance via software&#x2013;hardware co-design. First, we propose an adaptive banded parallelism alignment algorithm suitable for PIM architecture. Compared to the original dynamic programming-based alignment, the proposed algorithm significantly reduces the required complexity, data bit width, and memory footprint at the cost of negligible accuracy degradation. Then, we propose the efficient PIM architecture that implements the proposed algorithm. The data flow in RAPIDx achieves four-level parallelism and we design an in-situ alignment computation flow in ReRAM, delivering <inline-formula> <tex-math notation="LaTeX">$5.5-9.7\times $ </tex-math></inline-formula> efficiency and throughput improvements compared to our previous PIM design, RAPID. The proposed RAPIDx is reconfigurable to serve as a co-processor integrated into the existing genome analysis pipeline to boost sequence alignment or edit distance calculation. On short-read alignment, RAPIDx delivers <inline-formula> <tex-math notation="LaTeX">$131.1\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$46.8\times $ </tex-math></inline-formula> throughput improvements over state-of-the-art CPU and GPU libraries, respectively. As compared to ASIC accelerators for long-read alignment, the performance of RAPIDx is <inline-formula> <tex-math notation="LaTeX">$1.8{\times }-2.9{\times }$ </tex-math></inline-formula> higher.

References

[1]
F. E. Dewey, S. Pan, M. T. Wheeler, S. R. Quake, and E. A. Ashley, “DNA sequencing: Clinical applications of new DNA sequencing technologies,” Circulation, vol. 125, no. 7, pp. 931–944, 2012.
[2]
A. J. Drummond and A. Rambaut, “BEAST: Bayesian evolutionary analysis by sampling trees,” Evol. Biol., vol. 7, no. 1, pp. 1–8, 2007.
[3]
S. J. Watsonet al., “Viral population analysis and minority-variant detection using short read next-generation sequencing,” Philosoph. Trans. Royal Soc. B, Biol. Sci., vol. 368, no. 1614, 2013, Art. no.
[4]
S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., vol. 48, no. 3, pp. 443–453, 1970.
[5]
T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” J. Mol. Biol., vol. 147, no. 1, pp. 195–197, 1981.
[6]
M. Šošić and M. Šikić, “Edlib: A C/C+ library for fast, exact sequence alignment using edit distance,” Bioinformatics, vol. 33, no. 9, pp. 1394–1395, 2017.
[7]
H. Li, “Minimap2: Pairwise alignment for nucleotide sequences,” Bioinformatics, vol. 34, no. 18, pp. 3094–3100, 2018.
[8]
N. Ahmed, J. Lévy, S. Ren, H. Mushtaq, K. Bertels, and Z. Al-Ars, “GASAL2: A GPU accelerated sequence alignment library for high-throughput NGS data,” Bioinformatics, vol. 20, no. 1, pp. 1–20, 2019.
[9]
H. Li and R. Durbin, “Fast and accurate long-read alignment with Burrows–Wheeler transform,” Bioinformatics, vol. 26, no. 5, pp. 589–595, 2010.
[10]
B. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nat. Methods, vol. 9, no. 4, pp. 357–359, 2012.
[11]
Y.-L. Liao, Y.-C. Li, N.-C. Chen, and Y.-C. Lu, “Adaptively banded Smith–Waterman algorithm for long reads and its hardware accelerator,” in Proc. Int. Conf. Appl.-Specific Syst., Archit. Processors, 2018, pp. 1–9.
[12]
D. S. Caliet al., “GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis,” in Proc. IEEE/ACM MICRO, 2020, pp. 951–966.
[13]
Y. Turakhia, G. Bejerano, and W. J. Dally, “Darwin: A genomics co-processor provides up to 15,000 × acceleration on long read assembly,” in Proc. ASPLOS, 2018, pp. 199–213.
[14]
E. F. D. O. Sandes, G. Miranda, X. Martorell, E. Ayguade, G. Teodoro, and A. C. M. Melo, “CUDAlign 4.0: Incremental speculative traceback for exact chromosome-wide alignment in GPU clusters,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 10, pp. 2838–2850, Oct. 2016.
[15]
J. Arram, T. Kaplan, W. Luk, and P. Jiang, “Leveraging FPGAs for accelerating short read alignment,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 14, no. 3, pp. 668–677, May/Jun. 2017.
[16]
DNA sequencing costs: Data from the NHGRI genome sequencing program (GSP).” Accessed: Aug. 2022. [Online]. Available: www.genome.gov/ sequencingcostsdata
[17]
GenBank and WGS statistics.” Accessed: Aug. 2022. [Online]. Available: https://www.ncbi.nlm.nih.gov/genbank/statistics/
[18]
A. M. Wengeret al., “Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome,” Nat. Biotechnol., vol. 37, no. 10, pp. 1155–1162, 2019.
[19]
M. Gokhale, B. Holmes, and K. Iobst, “Processing in memory: The Terasys massively parallel PIM array,” Computer, vol. 28, no. 4, pp. 23–31, Apr. 1995.
[20]
J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture,” in Proc. ISCA, 2015, pp. 336–348.
[21]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie, “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” in Proc. DAC, 2016, pp. 1–6.
[22]
S. Gupta, M. Imani, H. Kaur, and T. S. Rosing, “NNPIM: A processing in-memory architecture for neural network acceleration,” IEEE Trans. Comput., vol. 68, no. 9, pp. 1325–1337, Sep. 2019.
[23]
R. Kaplan, L. Yavits, R. Ginosar, and U. Weiser, “A resistive CAM processing-in-storage architecture for DNA sequence alignment,” IEEE Micro, vol. 37, no. 4, pp. 20–28, Jul./Aug. 2017.
[24]
R. Kaplan, L. Yavits, and R. Ginosasr, “BioSEAL: In-memory biological sequence alignment accelerator for large-scale genomic data,” in Proc. ACM Int. Syst. Storage Conf., 2020, pp. 36–48.
[25]
W. Huangfu, S. Li, X. Hu, and Y. Xie, “RADAR: A 3D-reRAM based DNA alignment accelerator architecture,” in Proc. DAC, 2018, pp. 1–6.
[26]
S. Angizi, J. Sun, W. Zhang, and D. Fan, “AlignS: A processing-in-memory accelerator for DNA short read alignment leveraging SOT-MRAM,” in Proc. DAC, 2019, pp. 1–6.
[27]
F. Zokaee, H. R. Zarandi, and L. Jiang, “AligneR: A process-in-memory architecture for short read alignment in ReRAMs,” IEEE Comput. Archit. Lett., vol. 17, no. 2, pp. 237–240, Jul.–Dec. 2018.
[28]
S. Gupta, M. Imani, B. Khaleghi, V. Kumar, and T. Rosing, “RAPID: A ReRAM processing in-memory architecture for DNA sequence alignment,” in Proc. IEEE/ACM ISLPED, 2019, pp. 1–6.
[29]
K. Liu, S. Nelesen, S. Raghavan, C. R. Linder, and T. Warnow, “Barking up the wrong treelength: The impact of gap penalty on alignment and tree accuracy,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 6, no. 1, pp. 7–21, Jan.–Mar. 2009.
[30]
D. Fujikiet al., “SeedEx: A genome sequencing accelerator for optimal alignments in subminimal space,” in Proc. IEEE/ACM MICRO, 2020, pp. 937–950.
[31]
K.-M. Chao, W. R. Pearson, and W. Miller, “Aligning two sequences within a specified diagonal band,” Bioinformatics, vol. 8, no. 5, pp. 481–487, 1992.
[32]
S. Gupta, M. Imani, and T. Rosing, “FELIX: Fast and energy-efficient logic in memory,” in Proc. IEEE/ACM ICCAD, 2018, pp. 1–7.
[33]
M. Burrows and D. Wheeler, “A block-sorting lossless data compression algorithm,” Digit. Syst. Res. Center, Palo Alto, CA, USA, Rep. SRC-RR-124, 1994.
[34]
G. Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming,” J. ACM, vol. 46, no. 3, pp. 395–415, 1999.
[35]
H. Suzuki and M. Kasahara, “Introducing difference recurrence relations for faster semi-global alignment of long sequences,” Bioinformatics, vol. 19, no. 1, pp. 33–47, 2018.
[36]
S. Angizi, J. Sun, W. Zhang, and D. Fan, “PIM-Aligner: A processing-in-MRAM platform for biological sequence alignment,” in Proc. DATE, 2020, pp. 1265–1270.
[37]
S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, 1990.
[38]
D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, vol. 227, no. 4693, pp. 1435–1441, 1985.
[39]
S. S. Banerjeeet al., “ASAP: Accelerated short-read alignment on programmable hardware,” IEEE Trans. Comput., vol. 68, no. 3, pp. 331–346, Mar. 2019.
[40]
O. Gotoh, “An improved algorithm for matching biological sequences,” J. Mol. Biol., vol. 162, no. 3, pp. 705–708, 1982.
[41]
K. Lee, J. Jeong, S. Cheon, W. Choi, and J. Park, “Bit parallel 6T SRAM in-memory computing with reconfigurable bit-precision,” in Proc. DAC, 2020, pp. 1–6.
[42]
J. Boukhobza, S. Rubini, R. Chen, and Z. Shao, “Emerging NVM: A survey on architectural integration and research challenges,” ACM Trans. Design Autom. Electron. Syst., vol. 23, no. 2, pp. 1–32, 2018.
[43]
D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with FeFETs,” in Proc. ISLPED, 2018, pp. 1–6.
[44]
M. Kim, M. Liu, L. R. Everson, and C. H. Kim, “An embedded NAND flash-based compute-in-memory array demonstrated in a standard logic process,” IEEE J. Solid-State Circuits, vol. 57, no. 2, pp. 625–638, Feb. 2022.
[45]
C.-X. Xueet al., “16.1 A 22nm 4mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices,” in Proc. ISSCC, vol. 64, 2021, pp. 245–247.
[46]
N. Talati, S. Gupta, P. Mane, and S. Kvatinsky, “Logic design within memristive memories using memristor-aided loGIC (MAGIC),” IEEE Trans. Nanotechnol., vol. 15, no. 4, pp. 635–650, Jul. 2016.
[47]
J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, and R. S. Williams, “Memristive switches enable stateful logic operations via material implication,” Nature, vol. 464, no. 7290, pp. 873–876, 2010.
[48]
B. C. Janget al., “Memristive logic-in-memory integrated circuits for energy-efficient flexible electronics,” Adv. Funct. Mater., vol. 28, no. 2, 2018, Art. no.
[49]
S. Kvatinsky, M. Ramadan, E. G. Friedman, and A. Kolodny, “VTEAM: A general model for voltage-controlled memristors,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no. 8, pp. 786–790, Aug. 2015.
[50]
A. Haj-Ali, R. Ben-Hur, N. Wald, and S. Kvatinsky, “Efficient algorithms for in-memory fixed point multiplication using magic,” in Proc. IEEE ISCAS, 2018, pp. 1–5.
[51]
S. Kvatinskyet al., “MAGIC—Memristor-aided logic,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 11, pp. 895–899, Nov. 2014.
[52]
M. Imani, S. Gupta, Y. Kim, and T. Rosing, “FloatPIM: In-memory acceleration of deep neural network training with high precision,” in Proc. ISCA, 2019, pp. 802–815.
[53]
H. Suzuki and M. Kasahara. “Acceleration of nucleotide semi-global alignment with adaptive banded dynamic programming.” BioRxiv. 2017. [Online]. Available: https://doi.org/10.1101/130633
[54]
J. J. Yang, D. B. Strukov, and D. R. Stewart, “Memristive devices for computing,” Nat. Nanotechnol., vol. 8, no. 1, pp. 13–24, 2013.
[55]
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, “NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 7, pp. 994–1007, Jul. 2012.
[56]
J. E. Stineet al., “FreePDK: An open-source variation-aware design kit,” in Proc. IEEE Int. Conf. Microelectron. Syst. Educ., 2007, pp. 173–174.
[57]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “CACTI 6.0: A tool to model large caches,” HP Lab., Palo Alto, CA, USA, Rep. HPL-2009-85, 2009.
[58]
(Nat. Center Biotechnol. Inf., Bethesda, MD, USA). Genome Reference Consortium Human Build 38. (2013). [Online]. Available: https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26
[59]
Y. Ono, K. Asai, and M. Hamada, “PBSIM: PacBio reads simulator—Toward accurate genome assembly,” Bioinformatics, vol. 29, no. 1, pp. 119–121, 2013.
[60]
M. Holtgrewe, “Mason—A read simulator for second generation sequencing data,” Dept. Math. Comput. Sci., Freie Universität, Berlin, Germany, 2010.
[61]
W. Huangfu, X. Li, S. Li, X. Hu, P. Gu, and Y. Xie, “Medal: Scalable DIMM based near data processing accelerator for DNA seeding algorithm,” in Proc. IEEE/ACM MICRO, 2019, pp. 587–599.
[62]
Q. Luoet al., “Nb1-x O2 based universal selector with ultra-high endurance (>1012), high speed (10ns) and excellent Vth stability,” in Proc. Symp. VLSI Technol., 2019, pp. T236–T237.
[63]
Illumina sequencing platforms.” Accessed: Nov. 2022. [Online]. Available: https://www.illumina.com/systems/sequencing-platforms.html

Cited By

View all
  • (2024)DRAM-Based Acceleration of Open Modification Search in Hyperdimensional SpaceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338284243:9(2592-2605)Online publication date: 1-Sep-2024
  • (2023)Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPUProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607094(1-16)Online publication date: 12-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  Volume 42, Issue 10
Oct. 2023
350 pages

Publisher

IEEE Press

Publication History

Published: 01 October 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DRAM-Based Acceleration of Open Modification Search in Hyperdimensional SpaceIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.338284243:9(2592-2605)Online publication date: 1-Sep-2024
  • (2023)Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPUProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607094(1-16)Online publication date: 12-Nov-2023

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media