FET:Small:AlignMEM:Fast a... - Deliang Fa... - 美国国家科学基金(NSF...

FET:Small:AlignMEM:Fast and Efficient DNA Sequence Alignment in Non-Volatile Magnetic RAM

项目来源

美国国家科学基金(NSF)

项目主持人

Deliang Fan

项目受资助机构

Arizona State University

财政年度

2025,2020

立项时间

未公开

项目编号

2528723

研究期限

未知 / 未知

项目级别

国家级

受资助金额

613079.00美元

学科

未公开

学科代码

未公开

基金类别

Standard Grant

关键词

FET-Fndtns of Emerging Tech ; FET:Foundations of Emerging Technologie ; DES AUTO FOR MICRO&NANO SYST

参与者

未公开

参与机构

ARIZONA STATE UNIVERSITY

项目标书摘要：The state-of-the-art DNA sequencing technologies could generate Terabytes of DNA sequence data in a single run,and their throughput is expected to increase 3-5 times each year in the coming years.In order to apply these big DNA-data into follow-up complex disease diagnostics/prognostics,such as cancer risk assessment,tailor patient treatment,and prenatal testing,they must be first aligned to a 3.2-billion-length human reference genome.However,the existing software tools for this purpose may need hours or days to align such large amount of DNA sequence data even with very powerful computing systems of today due to the'memory wall'challenge in state-of-the-art computing architecture that describes the speed mismatch between memory units and computing units.To this end this,project leverages innovations from non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)technology and in-memory computing architecture.If successful,it can achieve up to two orders magnitude higher computing performance,speed and energy efficiency for next-generation DNA sequence analysis system,which enables large-scale fast genomic data analytics to support research on various disease studies and biomedical applications.This project will develop new undergraduate/graduate level course modules on in-memory computing architecture and bioinformatics.This project will follow two main research tracks.The first one explores how to leverage the intrinsic non-volatile MRAM device property to efficiently develop ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.The second research track will investigate how to develop fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.Alignments generated will be used to estimate gene expression,and identify single nucleotide mutation events for patient samples,leading to molecular signatures for disease risk assessment.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.This project explores to leverage innovations from both post-CMOS non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)device technology and in-memory computing architecture to develop a revolutionary DNA sequence alignment-in-memory(AlignMEM)system.It advances next-generation ultra-fast and high-throughput DNA short read AlignMEM paradigm and targets to achieve two orders higher speed,throughput and energy efficiency compared to existing CPU/GPU computing systems.Intellectual merits:Across the whole life of this project,for the intellectual merits,the PIs’team successfully finished the proposed two main research tracks.For the first research track,the PIs’team designed different types of non-volatile memory based ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.For the second research track,the PIs’team developed fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.The PIs’team further used the processed data for estimating gene expression and many different types of genome processing.Our fabricated world-first genome processing-in-memory chip prototype successfully achieved the targeted energy efficiency with around two orders of magnitude higher than state-of-the-art counterpart,a strong indication of the success of this project.Research Publications:The above discussed research outcomes have led to 10+IEEE/ACM international journal and conference research publications from the PIs’team,such as JSSC,SSCL,TCAD,JLPEA,ENM,CICC,DAC,GLSVLSI,ISQED.PhD thesis:PhD graduated with thesis:“Compute-in-memory Circuits and Architectures for Efficient Acceleration of AI and Data Centric Workloads”.Genome processing chip prototype fabrication:The PIs’team has designed and fabricated the world first genome processing-in-memory chip prototype.The chip prototype is designed to accelerate two key types of genome processing applications using our developed PIM chip prototype:the state-of-the-art(SOTA)burrows–wheeler transform(BWT)-based DNA short-read alignment and alignment-free mRNA quantification.The chip prototype achieves 2.12 G suffixes/J(suffixes per joule)at 1.0 V,which is the most energy-efficient solution to date for genome processing.Broader Impacts:To promote its broader impacts,the PI has conducted followings:Students training:Four PhD students are partially supported through this project at ASU and UCF,conducting research in the in-memory computing circuit hardware and genome processing algorithm.Multiple master students and undergraduate students from the PI’s classes are trained with knowledge of state-of-the-art non-volatile memory design and circuit design.The PI also supervised several senior design teams with the topic related to this project to train undergraduate students.Outreach:the PI has organized and chaired in-memory computing workshop associated with the community’s top-tier conference,Design Automation Conference.The workshop attracted 100+attendees each year,serving as a great platform to promote the research outcomes of this project.Open source tools/models:multiple open-source software and tools are generated and shared in github for public use.Those tools are free to download for public.Last Modified:12/11/2025Modified by:Deliang FanThis project explores to leverage innovations from both post-CMOS non-volatile nano-magnet based Magnetic Random Access Memory(MRAM)device technology and in-memory computing architecture to develop a revolutionary DNA sequence alignment-in-memory(AlignMEM)system.It advances next-generation ultra-fast and high-throughput DNA short read AlignMEM paradigm and targets to achieve two orders higher speed,throughput and energy efficiency compared to existing CPU/GPU computing systems.Intellectual merits:Across the whole life of this project,for the intellectual merits,the PIs team successfully finished the proposed two main research tracks.For the first research track,the PIs team designed different types of non-volatile memory based ultra-parallel,reconfigurable in-memory logic required by DNA alignment computation and its big DNA-data Processing-in-Memory(PIM)accelerator architecture.For the second research track,the PIs team developed fast DNA alignment-in-memory algorithm based on Burrows-Wheeler Transformation to match with the proposed MRAM based PIM platform and its large-scale genomic analysis application in disease phenotype prediction.The PIs team further used the processed data for estimating gene expression and many different types of genome processing.Our fabricated world-first genome processing-in-memory chip prototype successfully achieved the targeted energy efficiency with around two orders of magnitude higher than state-of-the-art counterpart,a strong indication of the success of this project.Research Publications:The above discussed research outcomes have led to 10+IEEE/ACM international journal and conference research publications from the PIs team,such as JSSC,SSCL,TCAD,JLPEA,ENM,CICC,DAC,GLSVLSI,ISQED.PhD thesis:PhD graduated with thesis:Compute-in-memory Circuits and Architectures for Efficient Acceleration of AI and Data Centric Workloads.Genome processing chip prototype fabrication:The PIs team has designed and fabricated the world first genome processing-in-memory chip prototype.The chip prototype is designed to accelerate two key types of genome processing applications using our developed PIM chip prototype:the state-of-the-art(SOTA)burrowswheeler transform(BWT)-based DNA short-read alignment and alignment-free mRNA quantification.The chip prototype achieves 2.12 G suffixes/J(suffixes per joule)at 1.0 V,which is the most energy-efficient solution to date for genome processing.Broader Impacts:To promote its broader impacts,the PI has conducted followings:Students training:Four PhD students are partially supported through this project at ASU and UCF,conducting research in the in-memory computing circuit hardware and genome processing algorithm.Multiple master students and undergraduate students from the PIs classes are trained with knowledge of state-of-the-art non-volatile memory design and circuit design.The PI also supervised several senior design teams with the topic related to this project to train undergraduate students.Outreach:the PI has organized and chaired in-memory computing workshop associated with the communitys top-tier conference,Design Automation Conference.The workshop attracted 100+attendees each year,serving as a great platform to promote the research outcomes of this project.Open source tools/models:multiple open-source software and tools are generated and shared in github for public use.Those tools are free to download for public.Last Modified:12/11/2025Submitted by:DeliangFan

人员信息

Deliang Fan(Principal Investigator)：dfan@asu.edu；

机构信息

【Arizona State University(Performance Institution)】StreetAddress：660 S MILL AVENUE STE 204,TEMPE,Arizona,United States/ZipCode：852813670；【ARIZONA STATE UNIVERSITY】StreetAddress：1475 N SCOTTSDALE RD STE 200,SCOTTSDALE,Arizona,United States/PhoneNumber：4809655479/ZipCode：852573538；

项目主管部门

Directorate for Computer and Information Science and Engineering(CSE)-Division of Computing and Communication Foundations(CCF)

项目官员

Sankar Basu(Email：sabasu@nsf.gov；Phone：7032927843)

排序方式：时间相关性
显示方式：列表摘要

1.SAFER: Sparsity Integrated Compute-in-Memory AI Accelerator with a Fused Dot-Product Engine and a RISC-V CPU

关键词：
Energy efficiency;Engines;Indium alloys;Program processors;Reduced instruction set computing;Static random access storage;Data movements;Digital in-memory computing;Floating points;Floating-point and integer acceleration;Memory circuits;Memory footprint;Memory macro;Multiply-and-accumulate;Peak energy;RISC-V

Sridharan, Amitesh;Ali, Asmer Hamid;Lee, Yongjae;Anupreetham, Anupreetham;Liu, Yaotian;Zhang, Jeff;Seo, Jae-Sun;Fan, Deliang
《51st IEEE European Solid-State Electronics Research Conference, ESSERC 2025》
2025年
September 8, 2025 - September 11, 2025
Munich, Germany
会议

We present a sparsity-aware in-SRAM multiply-and-accumulate (MAC) accelerator with a fused dot-product engine (SAFE) and a RISC-V CPU (SAFER). For the first time, we implement a unified dot-product compute methodology in Compute-in-memory (CIM) circuits vastly reducing the hardware footprint for simultaneously supporting both floating point (FP) and integer (INT) MACs. Additionally, we integrate various N: M sparsity formats allowing the CIM macro to store and operate exclusively on compressed non-zero weights. We also tightly integrate a 32-bit RISC-V CPU to SAFE for efficient data-movement across chip. The 28 nm SAFER prototype achieves a peak energy efficiency of 105.7 TOPS/W (78.9 TOPS/W) and 79.9 TOPS/W (63 TOPS/W) in the macro (chip) level for FP8 and INT8 workloads respectively. SAFER also achieves a memory footprint reduction proportional to sparsity through compressed storage, vastly reducing the macro count required for large AI models. For our proposed figure of merit which accounts for PPA along with memory footprint, and for this FoM SAFER improves current SoTA CIMs by 13.8 × for FP8 workloads. © 2025 IEEE.

...

2.Efficient Self-Supervised Continual Learning with Progressive Task-Correlated Layer Freezing

关键词：
Semi-supervised learning;Support vector machines;% reductions;Catastrophic forgetting;Continual learning;Layer freezing;Learning methods;Multiple tasks;Training process;Training time;Unlabeled data;Visual representations

Yang, Li;Lin, Sen;Zhang, Fan;Zhang, Junshan;Fan, Deliang
《26th International Symposium on Quality Electronic Design, ISQED 2025》
2025年
April 23, 2025 - April 25, 2025
Hybrid, San Francisco, CA, United states
会议

Inspired by the success of Self-Supervised Learning (SSL) in learning visual representations from unlabeled data, a few recent works have studied SSL in the context of Continual Learning (CL), where multiple tasks are learned sequentially, giving rise to a new paradigm, namely Self-Supervised Continual Learning (SSCL). It has been shown that the SSCL outperforms Supervised Continual Learning (SCL) as the learned representations are more informative and robust to catastrophic forgetting. However, building upon the training process of SSL, prior SSCL studies involve training all the parameters for each task, resulting to prohibitively high training cost. In this work, we first analyze the training time and memory consumption and reveals that the backward gradient calculation is the bottleneck. Moreover, by investigating the task correlations in SSCL, we further discover an interesting phenomenon that, with the SSL-learned background model, the intermediate features are highly correlated between tasks. Based on these new finding, we propose a new SSCL method with layer-wise freezing which progressively freezes partial layers with the highest correlation ratios for each task to improve training computation efficiency and memory efficiency. Extensive experiments across multiple datasets are performed, where our proposed method shows superior performance against the SoTA SSCL methods under various SSL frameworks. For example, compared to LUMP, our method achieves 1.18x, 1.15x, and 1.2x GPU training time reduction, 1.65x, 1.61x, and 1.6x memory reduction, 1.46x, 1.44x, and 1.46x backward FLOPs reduction, and 1.31%/1.98%/1.21% forgetting reduction without accuracy degradation on three datasets, respectively. © 2025 IEEE.

...

3.Dichotomous intronic polyadenylation profiles reveal multifaceted gene functions in the pan-cancer transcriptome.

Sun, Jiao;Kim, Jin-Young;Jun, Semo;Park, Meeyeon;de Jong, Ebbing;Chang, Jae-Woong;Cheng, Sze;Fan, Deliang;Chen, Yue;Griffin, Timothy J;Lee, Jung-Hee;You, Ho Jin;Zhang, Wei;Yong, Jeongsik
《Experimental & molecular medicine》
2024年
卷
期
期刊

Alternative cleavage and polyadenylation within introns (intronic APA) generate shorter mRNA isoforms; however, their physiological significance remains elusive. In this study, we developed a comprehensive workflow to analyze intronic APA profiles using the mammalian target of rapamycin (mTOR)-regulated transcriptome as a model system. Our investigation revealed two contrasting effects within the transcriptome in response to fluctuations in cellular mTOR activity: an increase in intronic APA for a subset of genes and a decrease for another subset of genes. The application of this workflow to RNA-seq data from The Cancer Genome Atlas demonstrated that this dichotomous intronic APA pattern is a consistent feature in transcriptomes across both normal tissues and various cancer types. Notably, our analyses of protein length changes resulting from intronic APA events revealed two distinct phenomena in proteome programming: a loss of functional domains due to significant changes in protein length or minimal alterations in C-terminal protein sequences within unstructured regions. Focusing on conserved intronic APA events across 10 different cancer types highlighted the prevalence of the latter cases in cancer transcriptomes, whereas the former cases were relatively enriched in normal tissue transcriptomes. These observations suggest potential, yet distinct, roles for intronic APA events during pathogenic processes and emphasize the abundance of protein isoforms with similar lengths in the cancer proteome. Furthermore, our investigation into the isoform-specific functions of JMJD6 intronic APA events supported the hypothesis that alterations in unstructured C-terminal protein regions lead to functional differences. Collectively, our findings underscore intronic APA events as a discrete molecular signature present in both normal tissues and cancer transcriptomes, highlighting the contribution of APA to the multifaceted functionality of the cancer proteome. © 2024. The Author(s).

...

4.Aligner-D: Leveraging In-DRAM Computing to Accelerate DNA Short Read Alignment

关键词：
DNA; Random access memory; Task analysis; Genomics; Bioinformatics;Throughput; Sequential analysis; DNA short read alignment;processing-in-memory; DRAM; accelerator

Zhang, Fan;Angizi, Shaahin;Sun, Jiao;Zhang, Wei;Fan, Deliang
《IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS》
2023年
13卷
1期
期刊

DNA short read alignment task has become a major sequential bottleneck to humongous amounts of data generated by next-generation sequencing platforms. In this paper, an energy-efficient and high-throughput Processing-in-Memory (PIM) accelerator based on DRAM (named Aligner-D) is presented to execute DNA short-read alignment with the state-of-the-art BWT alignment algorithm. We first present the PIM design that utilizes DRAM's internal high parallelism and throughput. It converts each DRAM array to a potent processing unit for alignment tasks. The proposed Aligner-D can efficiently execute the bulk bit-wise XNOR-based matching operation required by the alignment task with only 3-transistor/col overhead. We then introduce a highly parallel and customized read alignment algorithm based on BWT that supports both exact and inexact match tasks. Next, we present how to map the correlated data of the alignment task to utilize the parallelism from both new hardware and algorithm maximumly. The experimental results demonstrate that Aligner-D obtains $\sim 4\times $ , $\sim 2.45\times $ , $\sim 3.26\times $ , and $\sim 1.65\times $ improvement, respectively, compared with other in-memory computing platforms: Ambit (Seshadri et al., 2017), DRISA-1T1C (Li et al., 2017), DRISA-3T1C (Li et al., 2017), and ReDRAM (Angizi and Fan, 2019). As for DNA short read alignment, Aligner-D boosts the alignment throughput per Watt by $\sim 20104\times $ , $\sim 3522\times $ , $\sim 927\times $ , $\sim 88\times $ , $\sim 5.28\times $ , and $\sim 2.34\times $ , over ReCAM, CPU, GPU, FPGA, Ambit, and DRISA, respectively.

...

5.DSPIMM: A Fully Digital SParse In-Memory Matrix Vector Multiplier for Communication Applications

关键词：
Backpropagation;Channel coding;Decoding;Energy efficiency;Matrix algebra;Static random access storage;Belief propagation;Channel decoder;Communication application;Hardware performance;In-memory-computing;MAC;Matrix-vector multipliers;Memory matrix;Neural decoder;Sparsity

Sridharan, Amitesh;Zhang, Fan;Sui, Yang;Yuan, Bo;Fan, Deliang
《60th ACM/IEEE Design Automation Conference, DAC 2023》
2023年
July 9, 2023 - July 13, 2023
San Francisco, CA, United states
会议

Channel decoders are key computing modules in wired/wireless communication systems. Recently neural network (NN)-based decoders have shown their promising error-correcting performance because of their end-to-end learning capability. However, compared with the traditional approaches, the emerging neural belief propagation (NBP) solution suffers higher storage and computational complexity, limiting its hardware performance. To address this challenge and develop a channel decoder that can achieve high decoding performance and hardware performance simultaneously, in this paper we take a first step towards exploring SRAM-based in-memory computing for efficient NBP channel decoding. We first analyze the unique sparsity pattern in the NBP processing, and then propose an efficient and fully Digital Sparse In-Memory Matrix vector Multiplier (DSPIMM) computing platform. Extensive experiments demonstrate that our proposed DSPIMM achieves significantly higher energy efficiency and throughput than the state-of-the-art counterparts. © 2023 IEEE.

...

6.A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment

关键词：
Energy efficiency;Genes;Hafnium oxides;RRAM;Alignment algorithms;Compute-in-memory;Genome sequencing;Genome sequencing alignment;Genomics analysis;Macro design;Memory macro;Memory wall;Short-read alignments;State of the art

Zhang, Fan;He, Wangxin;Yeo, Injune;Liehr, Maximilian;Cady, Nathaniel;Cao, Yu;Seo, Jae-Sun;Fan, Deliang
《49th IEEE European Solid State Circuits Conference, ESSCIRC 2023》
2023年
September 11, 2023 - September 14, 2023
Lisbon, Portugal
会议

In genomic analysis, the major computation bottleneck is the memory-and compute-intensive DNA short reads alignment due to memory-wall challenge. This work presents the first Resistive RAM (RRAM) based Compute-in-Memory (CIM) macro design for accelerating state-of-the-art BWT based genome sequencing alignment. Our design could support all the core instructions, i.e., XNOR based match, count, and addition, required by alignment algorithm. The proposed CIM macro implemented in integration of HfO2 RRAM and 65nm CMOS demonstrates the best energy efficiency to date with 2.07 TOPS/W and 2. 12Gsuffixes/J at 1. 0V. © 2023 IEEE.

...

7.MeF-RAM: A New Non-Volatile Cache Memory Based on Magneto-Electric FET

关键词：
Magneto-electric FETs; non-volatile memory; memory bit-cell; cachedesign;PERFORMANCE; BENCHMARKING; OPTIMIZATION; CIRCUIT; ENERGY; WSE2

Angizi, Shaahin;Khoshavi, Navid;Marshall, Andrew;Dowben, Peter;Fan, Deliang
《ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS》
2022年
27卷
2期
期刊

Magneto-Electric FET (MEFET) is a recently developed post-CMOS FET, which offers intriguing characteristics for high-speed and low-power design in both logic and memory applications. In this article, we present MeF-RAM, a non-volatile cache memory design based on 2-Transistor-1-MEFET (2T1M) memory bit-cell with separate read and write paths. We show that with proper co-design across MEFET device, memory cell circuit, and array architecture, MeF-RAM is a promising candidate for fast non-volatile memory (NVM). To evaluate its cache performance in the memory system, we, for the first time, build a device-to-architecture cross-layer evaluation framework to quantitatively analyze and benchmark the MeF-RAM design with other memory technologies, including both volatile memory (i.e., SRAM, eDRAM) and other popular non-volatile emerging memory (i.e., ReRAM, STT-MRAM, and SOT-MRAM). The experiment results for the PARSEC benchmark suite indicate that, as an L2 cache memory, MeF-RAM reduces Energy Area Latency (EAT) product on average by similar to 98% and similar to 70% compared with typical 6T-SRAM and 2T1R SOT-MRAM counterparts, respectively.

...

8.APA-Scan: detection and visualization of 3'-UTR alternative polyadenylation with RNA-seq and 3'-end-seq data.

关键词：
0 / 3' Untranslated Regions. 0 / MicroRNAs. 0 / Protein Isoforms. 0 / RNA Precursors. 0 / RNA, Messenger;3′-End-seq; Alternative polyadenylation; RNA-seq; Transcriptome

Fahmi, Naima Ahmed;Ahmed, Khandakar Tanvir;Chang, Jae-Woong;Nassereddeen, Heba;Fan, Deliang;Yong, Jeongsik;Zhang, Wei
《BMC bioinformatics》
2022年
23卷
Suppl 3期
期刊

BACKGROUND: The eukaryotic genome is capable of producing multiple isoforms from a gene by alternative polyadenylation (APA) during pre-mRNA processing. APA in the 3'-untranslated region (3'-UTR) of mRNA produces transcripts with shorter or longer 3'-UTR. Often, 3'-UTR serves as a binding platform for microRNAs and RNA-binding proteins, which affect the fate of the mRNA transcript. Thus, 3'-UTR APA is known to modulate translation and provides a mean to regulate gene expression at the post-transcriptional level. Current bioinformatics pipelines have limited capability in profiling 3'-UTR APA events due to incomplete annotations and a low-resolution analyzing power: widely available bioinformatics pipelines do not reference actionable polyadenylation (cleavage) sites but simulate 3'-UTR APA only using RNA-seq read coverage, causing false positive identifications. To overcome these limitations, we developed APA-Scan, a robust program that identifies 3'-UTR APA events and visualizes the RNA-seq short-read coverage with gene annotations.; METHODS: APA-Scan utilizes either predicted or experimentally validated actionable polyadenylation signals as a reference for polyadenylation sites and calculates the quantity of long and short 3'-UTR transcripts in the RNA-seq data. APA-Scan works in three major steps: (i) calculate the read coverage of the 3'-UTR regions of genes; (ii) identify the potential APA sites and evaluate the significance of the events among two biological conditions; (iii) graphical representation of user specific event with 3'-UTR annotation and read coverage on the 3'-UTR regions. APA-Scan is implemented in Python3. Source code and a comprehensive user's manual are freely available at https://github.com/compbiolabucf/APA-Scan .; RESULT: APA-Scan was applied to both simulated and real RNA-seq datasets and compared with two widely used baselines DaPars and APAtrap. In simulation APA-Scan significantly improved the accuracy of 3'-UTR APA identification compared to the other baselines. The performance of APA-Scan was also validated by 3'-end-seq data and qPCR on mouse embryonic fibroblast cells. The experiments confirm that APA-Scan can detect unannotated 3'-UTR APA events and improve genome annotation.; CONCLUSION: APA-Scan is a comprehensive computational pipeline to detect transcriptome-wide 3'-UTR APA events. The pipeline integrates both RNA-seq and 3'-end-seq data information and can efficiently identify the significant events with a high-resolution short reads coverage plots. © 2022. The Author(s).

...

9.A 1.23-GHz 16-kb Programmable and Generic Processing-in-SRAM Accelerator in 65nm

关键词：
Computation theory ; Cryptography ; Energy efficiency ; Integrated circuit design;Boolean logic operations ; Chip design ; Complete sets ; Computing platform ; Full adders ; In;memory computing ; Parallel vectors ; Programmability ; Single cycle ; Vector operations

SridharanAmitesh;AngiziShaahin;CherupallySaiKiran;ZhangFan;SeoJae-Sun;FanDeliang
《48th IEEE European Solid State Circuits Conference, ESSCIRC 2022》
2022年
September 19, 2022 - September 22, 2022
Milan, Italy
会议

We present a generic and programmable Processing-in-SRAM (PSRAM) accelerator chip design based on an 8T-SRAM array to accommodate a complete set of Boolean logic operations (e.g., NOR/NAND/XOR, both 2- and 3-input), majority, and full adder, for the first time, all in a single cycle. PSRAM provides the programmability required for in-memory computing platforms that could be used for various applications such as parallel vector operation, neural networks, and data encryption. The prototype design is implemented in a SRAM macro with size of 16 kb, demonstrating one of the fastest programmable in-memory computing system to date operating at 1.23 GHz. The 65nm prototype chip achieves system-level peak throughput of 1.2 TOPS, and energy-efficiency of 34.98 TOPS/W at 1.2V. © 2022 IEEE.

...

10.XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning

关键词：
Continual Learning; In-Memory-Computing; Sparse Learning

Zhang, Fan;Yang, Li;Meng, Jian;Seo, Jae-sun;Cao, Yu ;Fan, Deliang
《25th Design, Automation and Test in Europe Conference and Exhibition》
2022年
MAR 14-23, 2022
ELECTR NETWORK
会议

Leveraging the ReRAM crossbar-based In-Memory-Computing (IMC) to accelerate single task DNN inference has been widely studied. However, using the ReRAM crossbar for continual learning has not been explored yet. In this work, we propose XST, a novel crossbar column-wise sparse training framework for continual learning. XST significantly reduces the training cost and saves inference energy. More importantly, it is friendly to existing crossbar-based convolution engine with almost no hardware overhead. Compared with the state-of-the-art CPG method, the experiments show that XST's accuracy achieves 4.95% higher accuracy. Furthermore, XST demonstrates similar to 5.59x training speedup and 1.5x inference energy-saving.

...

排序方式：时间相关性
显示方式：列表摘要