FET:Small:AlignMEM:Fast and Efficient DNA Sequence Alignment in Non-Volatile Magnetic RAM

项目来源

美国国家科学基金(NSF)

项目主持人

Deliang Fan

项目受资助机构

Arizona State University

财政年度

2025,2020

立项时间

未公开

项目编号

2528723

研究期限

未知 / 未知

项目级别

国家级

受资助金额

613079.00美元

学科

未公开

学科代码

未公开

基金类别

Standard Grant

关键词

FET-Fndtns of Emerging Tech ; FET:Foundations of Emerging Technologie ; DES AUTO FOR MICRO&NANO SYST

参与者

未公开

参与机构

ARIZONA STATE UNIVERSITY

项目标书摘要:The state-of-the-art DNA sequencing technologies could generate Terabytes of DNA sequence data in a single run,and their throughput is expected to increase 3-5 times each year in the coming years.In order to apply these big DNA-data into follow-up complex disease diagnostics/prognostics,such as cancer risk assessment,tailor patient treatment,and prenatal testing,they must be first aligned to a 3.2-billion-length human reference genome.However,the existing software tools for this purpose may need hours or days to align such large amount of DNA sequence data even with very powerful computing systems of today due to the'memory wall'challenge in state-of-the-art computing architecture that describes the speed mismatch between memory units and computing units.To this end this,project leverages innovations from non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)technology and in-memory computing architecture.If successful,it can achieve up to two orders magnitude higher computing performance,speed and energy efficiency for next-generation DNA sequence analysis system,which enables large-scale fast genomic data analytics to support research on various disease studies and biomedical applications.This project will develop new undergraduate/graduate level course modules on in-memory computing architecture and bioinformatics.This project will follow two main research tracks.The first one explores how to leverage the intrinsic non-volatile MRAM device property to efficiently develop ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.The second research track will investigate how to develop fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.Alignments generated will be used to estimate gene expression,and identify single nucleotide mutation events for patient samples,leading to molecular signatures for disease risk assessment.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.This project explores to leverage innovations from both post-CMOS non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)device technology and in-memory computing architecture to develop a revolutionary DNA sequence alignment-in-memory(AlignMEM)system.It advances next-generation ultra-fast and high-throughput DNA short read AlignMEM paradigm and targets to achieve two orders higher speed,throughput and energy efficiency compared to existing CPU/GPU computing systems.Intellectual merits:Across the whole life of this project,for the intellectual merits,the PIs’team successfully finished the proposed two main research tracks.For the first research track,the PIs’team designed different types of non-volatile memory based ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.For the second research track,the PIs’team developed fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.The PIs’team further used the processed data for estimating gene expression and many different types of genome processing.Our fabricated world-first genome processing-in-memory chip prototype successfully achieved the targeted energy efficiency with around two orders of magnitude higher than state-of-the-art counterpart,a strong indication of the success of this project.Research Publications:The above discussed research outcomes have led to 10+IEEE/ACM international journal and conference research publications from the PIs’team,such as JSSC,SSCL,TCAD,JLPEA,ENM,CICC,DAC,GLSVLSI,ISQED.PhD thesis:PhD graduated with thesis:“Compute-in-memory Circuits and Architectures for Efficient Acceleration of AI and Data Centric Workloads”.Genome processing chip prototype fabrication:The PIs’team has designed and fabricated the world first genome processing-in-memory chip prototype.The chip prototype is designed to accelerate two key types of genome processing applications using our developed PIM chip prototype:the state-of-the-art(SOTA)burrows–wheeler transform(BWT)-based DNA short-read alignment and alignment-free mRNA quantification.The chip prototype achieves 2.12 G suffixes/J(suffixes per joule)at 1.0 V,which is the most energy-efficient solution to date for genome processing.Broader Impacts:To promote its broader impacts,the PI has conducted followings:Students training:Four PhD students are partially supported through this project at ASU and UCF,conducting research in the in-memory computing circuit hardware and genome processing algorithm.Multiple master students and undergraduate students from the PI’s classes are trained with knowledge of state-of-the-art non-volatile memory design and circuit design.The PI also supervised several senior design teams with the topic related to this project to train undergraduate students.Outreach:the PI has organized and chaired in-memory computing workshop associated with the community’s top-tier conference,Design Automation Conference.The workshop attracted 100+attendees each year,serving as a great platform to promote the research outcomes of this project.Open source tools/models:multiple open-source software and tools are generated and shared in github for public use.Those tools are free to download for public.Last Modified:12/11/2025Modified by:Deliang FanThis project explores to leverage innovations from both post-CMOS non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)device technology and in-memory computing architecture to develop a revolutionary DNA sequence alignment-in-memory(AlignMEM)system.It advances next-generation ultra-fast and high-throughput DNA short read AlignMEM paradigm and targets to achieve two orders higher speed,throughput and energy efficiency compared to existing CPU/GPU computing systems.Intellectual merits:Across the whole life of this project,for the intellectual merits,the PIs team successfully finished the proposed two main research tracks.For the first research track,the PIs team designed different types of non-volatile memory based ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.For the second research track,the PIs team developed fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.The PIs team further used the processed data for estimating gene expression and many different types of genome processing.Our fabricated world-first genome processing-in-memory chip prototype successfully achieved the targeted energy efficiency with around two orders of magnitude higher than state-of-the-art counterpart,a strong indication of the success of this project.Research Publications:The above discussed research outcomes have led to 10+IEEE/ACM international journal and conference research publications from the PIs team,such as JSSC,SSCL,TCAD,JLPEA,ENM,CICC,DAC,GLSVLSI,ISQED.PhD thesis:PhD graduated with thesis:Compute-in-memory Circuits and Architectures for Efficient Acceleration of AI and Data Centric Workloads.Genome processing chip prototype fabrication:The PIs team has designed and fabricated the world first genome processing-in-memory chip prototype.The chip prototype is designed to accelerate two key types of genome processing applications using our developed PIM chip prototype:the state-of-the-art(SOTA)burrowswheeler transform(BWT)-based DNA short-read alignment and alignment-free mRNA quantification.The chip prototype achieves 2.12 G suffixes/J(suffixes per joule)at 1.0 V,which is the most energy-efficient solution to date for genome processing.Broader Impacts:To promote its broader impacts,the PI has conducted followings:Students training:Four PhD students are partially supported through this project at ASU and UCF,conducting research in the in-memory computing circuit hardware and genome processing algorithm.Multiple master students and undergraduate students from the PIs classes are trained with knowledge of state-of-the-art non-volatile memory design and circuit design.The PI also supervised several senior design teams with the topic related to this project to train undergraduate students.Outreach:the PI has organized and chaired in-memory computing workshop associated with the communitys top-tier conference,Design Automation Conference.The workshop attracted 100+attendees each year,serving as a great platform to promote the research outcomes of this project.Open source tools/models:multiple open-source software and tools are generated and shared in github for public use.Those tools are free to download for public.Last Modified:12/11/2025Submitted by:DeliangFan

人员信息

Deliang Fan(Principal Investigator):dfan@asu.edu;

机构信息

【Arizona State University(Performance Institution)】StreetAddress:660 S MILL AVENUE STE 204,TEMPE,Arizona,United States/ZipCode:852813670;【ARIZONA STATE UNIVERSITY】StreetAddress:1475 N SCOTTSDALE RD STE 200,SCOTTSDALE,Arizona,United States/PhoneNumber:4809655479/ZipCode:852573538;

项目主管部门

Directorate for Computer and Information Science and Engineering(CSE)-Division of Computing and Communication Foundations(CCF)

项目官员

Sankar Basu(Email:sabasu@nsf.gov;Phone:7032927843)

  • 排序方式:
  • 2
  • /
  • 1.SAFER: Sparsity Integrated Compute-in-Memory AI Accelerator with a Fused Dot-Product Engine and a RISC-V CPU

    • 关键词:
    • Energy efficiency;Engines;Indium alloys;Program processors;Reduced instruction set computing;Static random access storage;Data movements;Digital in-memory computing;Floating points;Floating-point and integer acceleration;Memory circuits;Memory footprint;Memory macro;Multiply-and-accumulate;Peak energy;RISC-V
    • Sridharan, Amitesh;Ali, Asmer Hamid;Lee, Yongjae;Anupreetham, Anupreetham;Liu, Yaotian;Zhang, Jeff;Seo, Jae-Sun;Fan, Deliang
    • 《51st IEEE European Solid-State Electronics Research Conference, ESSERC 2025》
    • 2025年
    • September 8, 2025 - September 11, 2025
    • Munich, Germany
    • 会议

    We present a sparsity-aware in-SRAM multiply-and-accumulate (MAC) accelerator with a fused dot-product engine (SAFE) and a RISC-V CPU (SAFER). For the first time, we implement a unified dot-product compute methodology in Compute-in-memory (CIM) circuits vastly reducing the hardware footprint for simultaneously supporting both floating point (FP) and integer (INT) MACs. Additionally, we integrate various N: M sparsity formats allowing the CIM macro to store and operate exclusively on compressed non-zero weights. We also tightly integrate a 32-bit RISC-V CPU to SAFE for efficient data-movement across chip. The 28 nm SAFER prototype achieves a peak energy efficiency of 105.7 TOPS/W (78.9 TOPS/W) and 79.9 TOPS/W (63 TOPS/W) in the macro (chip) level for FP8 and INT8 workloads respectively. SAFER also achieves a memory footprint reduction proportional to sparsity through compressed storage, vastly reducing the macro count required for large AI models. For our proposed figure of merit which accounts for PPA along with memory footprint, and for this FoM SAFER improves current SoTA CIMs by 13.8 × for FP8 workloads. © 2025 IEEE.

    ...
  • 2.Efficient Self-Supervised Continual Learning with Progressive Task-Correlated Layer Freezing

    • 关键词:
    • Semi-supervised learning;Support vector machines;% reductions;Catastrophic forgetting;Continual learning;Layer freezing;Learning methods;Multiple tasks;Training process;Training time;Unlabeled data;Visual representations
    • Yang, Li;Lin, Sen;Zhang, Fan;Zhang, Junshan;Fan, Deliang
    • 《26th International Symposium on Quality Electronic Design, ISQED 2025》
    • 2025年
    • April 23, 2025 - April 25, 2025
    • Hybrid, San Francisco, CA, United states
    • 会议

    Inspired by the success of Self-Supervised Learning (SSL) in learning visual representations from unlabeled data, a few recent works have studied SSL in the context of Continual Learning (CL), where multiple tasks are learned sequentially, giving rise to a new paradigm, namely Self-Supervised Continual Learning (SSCL). It has been shown that the SSCL outperforms Supervised Continual Learning (SCL) as the learned representations are more informative and robust to catastrophic forgetting. However, building upon the training process of SSL, prior SSCL studies involve training all the parameters for each task, resulting to prohibitively high training cost. In this work, we first analyze the training time and memory consumption and reveals that the backward gradient calculation is the bottleneck. Moreover, by investigating the task correlations in SSCL, we further discover an interesting phenomenon that, with the SSL-learned background model, the intermediate features are highly correlated between tasks. Based on these new finding, we propose a new SSCL method with layer-wise freezing which progressively freezes partial layers with the highest correlation ratios for each task to improve training computation efficiency and memory efficiency. Extensive experiments across multiple datasets are performed, where our proposed method shows superior performance against the SoTA SSCL methods under various SSL frameworks. For example, compared to LUMP, our method achieves 1.18x, 1.15x, and 1.2x GPU training time reduction, 1.65x, 1.61x, and 1.6x memory reduction, 1.46x, 1.44x, and 1.46x backward FLOPs reduction, and 1.31%/1.98%/1.21% forgetting reduction without accuracy degradation on three datasets, respectively. © 2025 IEEE.

    ...
  • 3.DSPIMM: A Fully Digital SParse In-Memory Matrix Vector Multiplier for Communication Applications

    • 关键词:
    • Backpropagation;Channel coding;Decoding;Energy efficiency;Matrix algebra;Static random access storage;Belief propagation;Channel decoder;Communication application;Hardware performance;In-memory-computing;MAC;Matrix-vector multipliers;Memory matrix;Neural decoder;Sparsity
    • Sridharan, Amitesh;Zhang, Fan;Sui, Yang;Yuan, Bo;Fan, Deliang
    • 《60th ACM/IEEE Design Automation Conference, DAC 2023》
    • 2023年
    • July 9, 2023 - July 13, 2023
    • San Francisco, CA, United states
    • 会议

    Channel decoders are key computing modules in wired/wireless communication systems. Recently neural network (NN)-based decoders have shown their promising error-correcting performance because of their end-to-end learning capability. However, compared with the traditional approaches, the emerging neural belief propagation (NBP) solution suffers higher storage and computational complexity, limiting its hardware performance. To address this challenge and develop a channel decoder that can achieve high decoding performance and hardware performance simultaneously, in this paper we take a first step towards exploring SRAM-based in-memory computing for efficient NBP channel decoding. We first analyze the unique sparsity pattern in the NBP processing, and then propose an efficient and fully Digital Sparse In-Memory Matrix vector Multiplier (DSPIMM) computing platform. Extensive experiments demonstrate that our proposed DSPIMM achieves significantly higher energy efficiency and throughput than the state-of-the-art counterparts. © 2023 IEEE.

    ...
  • 4.A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment

    • 关键词:
    • Energy efficiency;Genes;Hafnium oxides;RRAM;Alignment algorithms;Compute-in-memory;Genome sequencing;Genome sequencing alignment;Genomics analysis;Macro design;Memory macro;Memory wall;Short-read alignments;State of the art
    • Zhang, Fan;He, Wangxin;Yeo, Injune;Liehr, Maximilian;Cady, Nathaniel;Cao, Yu;Seo, Jae-Sun;Fan, Deliang
    • 《49th IEEE European Solid State Circuits Conference, ESSCIRC 2023》
    • 2023年
    • September 11, 2023 - September 14, 2023
    • Lisbon, Portugal
    • 会议

    In genomic analysis, the major computation bottleneck is the memory-and compute-intensive DNA short reads alignment due to memory-wall challenge. This work presents the first Resistive RAM (RRAM) based Compute-in-Memory (CIM) macro design for accelerating state-of-the-art BWT based genome sequencing alignment. Our design could support all the core instructions, i.e., XNOR based match, count, and addition, required by alignment algorithm. The proposed CIM macro implemented in integration of HfO2 RRAM and 65nm CMOS demonstrates the best energy efficiency to date with 2.07 TOPS/W and 2. 12Gsuffixes/J at 1. 0V. © 2023 IEEE.

    ...
  • 5.A 1.23-GHz 16-kb Programmable and Generic Processing-in-SRAM Accelerator in 65nm

    • 关键词:
    • Computation theory ; Cryptography ; Energy efficiency ; Integrated circuit design;Boolean logic operations ; Chip design ; Complete sets ; Computing platform ; Full adders ; In;memory computing ; Parallel vectors ; Programmability ; Single cycle ; Vector operations
    • SridharanAmitesh;AngiziShaahin;CherupallySaiKiran;ZhangFan;SeoJae-Sun;FanDeliang
    • 《48th IEEE European Solid State Circuits Conference, ESSCIRC 2022》
    • 2022年
    • September 19, 2022 - September 22, 2022
    • Milan, Italy
    • 会议

    We present a generic and programmable Processing-in-SRAM (PSRAM) accelerator chip design based on an 8T-SRAM array to accommodate a complete set of Boolean logic operations (e.g., NOR/NAND/XOR, both 2- and 3-input), majority, and full adder, for the first time, all in a single cycle. PSRAM provides the programmability required for in-memory computing platforms that could be used for various applications such as parallel vector operation, neural networks, and data encryption. The prototype design is implemented in a SRAM macro with size of 16 kb, demonstrating one of the fastest programmable in-memory computing system to date operating at 1.23 GHz. The 65nm prototype chip achieves system-level peak throughput of 1.2 TOPS, and energy-efficiency of 34.98 TOPS/W at 1.2V. © 2022 IEEE.

    ...
  • 6.XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning

    • 关键词:
    • Continual Learning; In-Memory-Computing; Sparse Learning
    • Zhang, Fan;Yang, Li;Meng, Jian;Seo, Jae-sun;Cao, Yu ;Fan, Deliang
    • 《25th Design, Automation and Test in Europe Conference and Exhibition》
    • 2022年
    • MAR 14-23, 2022
    • ELECTR NETWORK
    • 会议

    Leveraging the ReRAM crossbar-based In-Memory-Computing (IMC) to accelerate single task DNN inference has been widely studied. However, using the ReRAM crossbar for continual learning has not been explored yet. In this work, we propose XST, a novel crossbar column-wise sparse training framework for continual learning. XST significantly reduces the training cost and saves inference energy. More importantly, it is friendly to existing crossbar-based convolution engine with almost no hardware overhead. Compared with the state-of-the-art CPG method, the experiments show that XST's accuracy achieves 4.95% higher accuracy. Furthermore, XST demonstrates similar to 5.59x training speedup and 1.5x inference energy-saving.

    ...
  • 7.Efficient Multi-task Adaption for Crossbar-based In-Memory Computing

    • 关键词:
    • Energy efficiency;Energy utilization;Green computing;Learning systems;RRAM;Computing platform;Continual learning;Energy-efficient computing;Highly parallels;In-memory computing;Multi tasks;Multi-task continual learning;Non-volatile memory;Specific tasks;Task inference
    • Zhang, Fan;Yang, Li;Fan, Deliang
    • 《56th Asilomar Conference on Signals, Systems and Computers, ACSSC 2022》
    • 2022年
    • October 31, 2022 - November 2, 2022
    • Virtual, Online, United states
    • 会议

    ReRAM crossbar Non-volatile memory (NVM) based In-Memory Computing (IMC) has been widely investigated as a highly parallel, fast, and energy-efficient computing platform for Deep Neural Networks (DNNs), especially for one specific task inference. However, due to the intrinsic high energy consumption of weight re-programming and the relatively low endurance issue, adapting the ReRAM crossbar-based IMC hardware for continual learning or multi-task learning has not been well explored. In this paper, we discuss a crossbar-aware learning method with a 2-tier masking technique that could enable efficient and fast new task adaption for a deployed DNN model with minor hardware overhead. © 2022 IEEE.

    ...
  • 8.APA-Scan: detection and visualization of 3 '-UTR alternative polyadenylation with RNA-seq and 3 '-end-seq data

    • 关键词:
    • Alternative polyadenylation; Transcriptome; RNA-seq; 3 '-End-seq;CLEAVAGE; 3'UTRS; MODEL
    • Fahmi, Naima Ahmed;Ahmed, Khandakar Tanvir;Chang, Jae-Woong;Nassereddeen, Heba;Fan, Deliang;Yong, Jeongsik;Zhang, Wei
    • 《International Conference on Intelligent Biology and Medicine 》
    • 2022年
    • AUG 08-10, 2021
    • Philadelphia, PA
    • 会议

    Background: The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3'-untranslated region (3'-UTR) of mRNA produces transcripts with shorter or longer 3'-UTR. Often, 3'-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3'-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3'-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3'-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3'-UTR APA events and visualizes the RNAseq short-read coverage with gene annotations.Methods: APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3'-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3'-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3'-UTR annotation and read coverage on the 3'-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user's manual are freely available at https://github.com/compbiolabucf/APA-Scan.Result: APA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3'-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3'-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3'-UTR APA events and improve genome annotation.Conclusion: APA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3'-UTR APA events. The pipeline integrates both RNA-seq and 3'-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots.

    ...
  • 9.XBM: A Crossbar Column-wise Binary Mask Learning Method for Efficient Multiple Task Adaption

    • 关键词:
    • Computer aided design;Learning algorithms;Learning systems;Binary masks;Catastrophic forgetting;Cell reprogramming;Crossbar arrays;Design innovations;Learn+;Learning methods;Memory overheads;Multiple tasks;Power
    • Zhang, Fan;Yang, Li;Meng, Jian;Cao, Yu Kevin;Seo, Jae-Sun;Fan, Deliang
    • 《27th Asia and South Pacific Design Automation Conference, ASP-DAC 2022》
    • 2022年
    • January 17, 2022 - January 20, 2022
    • Virtual, Online, Taiwan
    • 会议

    Recently, utilizing ReRAM crossbar array to accelerate DNN inference on single task has been widely studied. However, using the crossbar array for multiple task adaption has not been well explored. In this paper, for the first time, we propose XBM, a novel crossbar column-wise binary mask learning method for multiple task adaption in ReRAM crossbar DNN accelerator. XBM leverages the mask-based learning algorithm's benefit to avoid catastrophic forgetting to learn a task-specific mask for each new task. With our hardware-aware design innovation, the required masking operation to adapt for a new task could be easily implemented in existing crossbar based convolution engine with minimal hardware/ memory overhead and, more importantly, no need of power hungry cell re-programming, unlike prior works. The extensive experimental results show that compared with state-of-the-art multiple task adaption methods, XBM keeps the similar accuracy on new tasks while only requires 1.4% mask memory size compared with popular piggyback. Moreover, the elimination of cell re-programming or tuning saves up to 40% energy during new task adaption. © 2022 IEEE.

    ...
  • 10.Max-PIM: Fast and Efficient Max/Min Searching in DRAM

    • 关键词:
    • Big data;Data handling;Iterative methods;Computation theory;Boolean logic;Fixed points;IMC;In-DRAM computing;Max/min;Memory algorithms;Memory wall;Min-max;PIM;Speed up
    • Zhang, Fan;Angizi, Shaahin;Fan, Deliang
    • 《58th ACM/IEEE Design Automation Conference, DAC 2021》
    • 2021年
    • December 5, 2021 - December 9, 2021
    • San Francisco, CA, United states
    • 会议

    Recently, in-DRAM computing is becoming one promising technique to address the notorious 'memory-wall' issue for big data processing. In this work, for the first time, we propose a novel 'Min/Max-in-memory' algorithm based on iterative XNOR bit-wise comparison, which supports parallel inmemory searching for minimum and maximum of bulk data stored in DRAM as unsigned signed integers, fixed-point and floating numbers. We then develop a new processing-in-DRAM architecture, called Max-PIM, that supports complete bit-wise Boolean logic and beyond. Differentiating from prior works, Max-PIM is optimized with one-cycle fast XNOR logicin-DRAM operation and in-memory data transpose, which are heavily used and keys to accelerate the proposed Min/Max-in-memory algorithm efficiently. Extensive experiments of utilizing Max-PIM in big data sorting and graph processing applications show that it could speed up 50X and 1000X than GPU and CPU, while only consuming 10% and 1% energy, respectively. Moreover, comparing with recent representative In-DRAM computing platforms, i.e., Ambit [1], DRISA [2], our design could speed up 3X-10X. © 2021 IEEE.

    ...
  • 排序方式:
  • 2
  • /