Filter by type:

Sort by year:

XOR-CiM: An Efficient Computing-in- SOT-MRAM Design for Binary Neural Network Acceleration

Conference Paper
Mehrdad Morsali, Ranyang Zhou, Sepehr Tabrizchi, Arman Roohi, and Shaahin Angizi
24th International Symposium on Quality Electronic Design (ISQED)
Publication year: 2023

In this work, we leverage the uni-polar switching behavior of Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to develop an efficient digital Computing-in-Memory (CiM) platform named XOR-CiM. XOR-CiM converts typical MRAM sub-arrays to massively parallel computational cores with ultra-high bandwidth, greatly reducing energy consumption dealing with convolutional layers and accelerating X(N)OR-intensive Binary Neural Networks (BNNs) inference. With a similar inference accuracy to digital CiMs, XOR-CiM achieves ∼4.5× and 1.8× higher energy-efficiency and speed-up compared to the recent MRAM-based CiM platforms.

SenTer: A Reconfigurable Processing-in- Sensor Architecture Enabling Efficient Ternary MLP

Conference Paper
Sepehr Tabrizchi, Rebati Gaire, Shaahin Angizi, and Arman Rooh
Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI)
Publication year: 2023

Recently, Intelligent IoT (IIoT), including various sensors, has gained significant attention due to its capability of sensing, deciding, and acting by leveraging artificial neural networks (ANN). Nevertheless, to achieve acceptable accuracy and high performance in visual systems, a power-delay-efficient architecture is required. In this paper, we propose an ultra-low-power processing in-sensor architecture, namely SenTer, realizing low-precision ternary multi-layer perceptron networks, which can operate in detection and classification modes. Moreover, SenTer supports two activation functions based on user needs and the desired accuracy-energy trade-off. SenTer is capable of performing all the required computations for the MLP’s first layer in the analog domain and then submitting its results to a co-processor. Therefore, SenTer significantly reduces the overhead of analog buffers, data conversion, and transmission power consumption by using only one ADC. Additionally, our simulation results demonstrate acceptable accuracy on various datasets compared to the full precision models.

PISA: A Non-Volatile Processing-In-Sensor Accelerator for Imaging Systems

Journal Paper
Sh. Angizi, S. Tabrizchi, D. Pan, and A. Roohi
IEEE Transactions on Emerging Topics in Computing (TETC)
Publication year: 2023

This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of  1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by  84% compared to a baseline.

P-PIM: A Parallel Processing-in-DRAM Framework Enabling RowHammer Protection

Conference Paper
Ranyang Zhou, Sepehr Tabrizchi, Mehrdad Morsali, Arman Roohi, and Shaahin Angizi
Design, Automation & Test in Europe Conference & Exhibition (DATE)
Publication year: 2023

In this work, we propose a Parallel Processing-In-DRAM architecture named P-PIM leveraging the high density of DRAM to enable fast and flexible computation. P-PIM enables bulk bit-wise in-DRAM logic between operands in the same bit-line by elevating the analog operation of the memory sub-array based on a novel dual-row activation mechanism. With this, P-PIM can opportunistically perform a complete and inexpensive in-DRAM RowHammer (RH) self-tracking and mitigation technique to protect the memory unit against such a challenging security vulnerability. Our results show that P-PIM achieves ~72% higher energy efficiency than the fastest charge-sharing-based designs. As for the RH protection, with a worst-case slowdown of ~0.8%, P-PIM archives up to 71% energy-saving over the SRAM/CAM-based frameworks and about 90% saving over DRAM-based frameworks.

Ocellus: Highly Parallel Convolution-in-Pixel Scheme Realizing Power-Delay-Efficient Edge Intelligence

Conference Paper
S. Tabrizchi, Sh. Angizi, and A. Roohi
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)
Publication year: 2023

NeSe: Near-Sensor Event-Driven Scheme for Low Power Energy Harvesting Sensors

Conference Paper
Sepehr Tabrizchi, Mehrdad Morsali, Shaahin Angizi, Arman Roohi
IEEE International Symposium on Circuits and Systems (ISCAS)
Publication year: 2023

Digital technologies have made it possible to deploy visual sensor nodes capable of detecting motion events in the coverage area cost-effectively. However, background subtraction, as a widely used approach, remains an intractable task due to its inability to achieve competitive accuracy and reduced computation cost simultaneously. In this paper, an effective background subtraction approach, namely NeSe, for tiny energy-harvested sensors is proposed leveraging non-volatile memory (NVM). Using the developed software/hardware method, the accuracy and efficiency of event detection can be adjusted at runtime by changing the precision depending on the application’s needs. Due to the near-sensor implementation of background subtraction and NVM usage, the proposed design reduces the data movement overhead while ensuring intermittent resiliency. The background is stored for a specific time interval within NVMs and compared with the next frame. If the power is cut, the background remains unchanged and is updated after the interval passes. Once the moving object is detected, the device switches to the high-powered sensor mode to capture the image.

Design and Evaluation of a Near-Sensor Magneto-Electric FET-based Event Detector

Journal Paper
Mehrdad Morsali; Sepehr Tabrizchi; Andrew Marshall; Arman Roohi
IEEE Transactions on Electron Devices (TED)
Publication year: 2023

As a recently developed post-CMOS FET, magneto-electric FETs (MEFETs) offer high-speed and low-power design characteristics for logic and memory applications. In this article, a near-sensor processing (NSP) platform leveraging the MEFETs is presented that enables event detection for edge vision sensors at a low cost by eliminating the need for power-hungry analog-to-digital circuits (ADCs). Besides, an efficient background comparison method is presented with adjustable precision that offers the output quality efficiency tradeoff, depending on the application’s needs. Our device-to-architecture evaluations show that the proposed hardware–software codesign reduces the energy consumption and execution time on average by a factor of  15 × and  2.4 × compared to the SOT-MRAM counterpart running employing the same method.

Comparative Study of Low Bit-width DNN Accelerators: Opportunities and Challenges

Conference Paper
D. Vungarala, M. Morsali, S. Tabrizchi, A. Roohi, and Sh. Angizi
66th International Midwest Symposium on Circuits and Systems (MWSCAS)
Publication year: 2023

AppCiP: Energy-Efficient Approximate Convolution-in-Pixel Scheme for Neural Network Acceleration

Journal Paper
S. Tabrizchi, A. Nezhadi, Sh. Angizi, and A. Roohi
IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS)
Publication year: 2023

Nowadays, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. However, capturing data and analyzing it via a backend/cloud processor are energy-intensive and long-latency, resulting in a memory bottleneck and low-speed feature extraction at the edge. This paper presents AppCiP architecture as a sensing and computing integration design to efficiently enable Artificial Intelligence (AI) on resource-limited sensing devices. AppCiP provides a number of unique capabilities, including instant and reconfigurable RGB to grayscale conversion, highly parallel analog convolution-in-pixel, and realizing low-precision quinary weight neural networks. These features significantly mitigate the overhead of analog-to-digital converters and analog buffers, leading to a considerable reduction in power consumption and area overhead. Our circuit-to-application co-simulation results demonstrate that AppCiP achieves ~3 orders of magnitude higher efficiency on power consumption compared with the fastest existing designs considering different CNN workloads. It reaches a frame rate of 3000 and an efficiency of ~4.12 TOp/s/W. The performance accuracy of the AppCiP architecture on different datasets such as SVHN, Pest, CIFAR-10, MHIST, and CBL Face detection is evaluated and compared with the state-of-the-art design. The obtained results exhibit the best results among other processing in/near pixel architectures, while AppCip only degrades the accuracy by less than 1% on average compared to the floating-point baseline.

A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks

Journal Paper
Sh. Angizi, M. Morsali, S. Tabrizchi, and A. Roohi
IEEE Transactions on Emerging Topics in Computing (TETC)
Publication year: 2023

In this work, a high-speed and energy-efficient comparator-based N ear- S ensor L ocal B inary P attern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN datasets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and an energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2× and execution time by a factor of 4× compared to the best recent LBP-based networks.

TizBin: A Low-Power Image Sensor with Event and Object Detection Using Efficient Processing-in-Pixel Schemes

Conference Paper
Sepehr Tabrizchi, Shaahin Angizi, and Arman Roohi
40th International Conference on Computer Design (ICCD)
Publication year: 2022

In the Artificial Intelligence of Things (AIoT) era, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. Thus, this paper proposes TizBin, a low-power processing in-sensor scheme with event and object detection capabilities to eliminate power costs of data conversion and transmission and enable data-intensive neural network tasks. Once the moving object is detected, TizBin architecture switches to the high-power object detection mode to capture the image. TizBin offers several unique features, such as analog convolutions enabling low-precision ternary weight neural networks (TWNN) to mitigate the overhead of analog buffer and analog-to-digital converters. Moreover, TizBin exploits non-volatile magnetic RAMs to store NN’s weights, remarkably reducing static power consumption. Our circuit-to-application co-simulation results for TWNNs demonstrate minor accuracy degradation on various image datasets, while TizBin achieves a frame rate of 1000 and efficiency of ∼1.83 TOp/s/W.

semiMul: Floating-Point Free Implementations for Efficient and Accurate Neural Network Training

Conference Paper
Ali Nezhadi, Shaahin Angizi, and Arman Roohi
21st IEEE International Conference on Machine Learning and Applications (ICMLA)
Publication year: 2022

Multiply–accumulate operation (MAC) is a fundamental component of machine learning tasks, where multiplication (either integer or float multiplication) compared to addition is costly in terms of hardware implementation or power consumption. In this paper, we approximate floating-point multiplication by converting it to integer addition while preserving the test accuracy of shallow and deep neural networks. We mathematically show and prove that our proposed method can be utilized with any floating-point format (e.g., FP8, FP16, FP32, etc.). It is also highly compatible with conventional hardware architectures and can be employed in CPU, GPU, or ASIC accelerators for neural network tasks with minimum hardware cost. Moreover, the proposed method can be utilized in embedded processors without a floating-point unit to perform neural network tasks. We evaluated our method on various datasets such as MNIST, FashionMNIST, SVHN, Cifar-10, and Cifar-100, with both FP16 and FP32 arithmetics. The proposed method preserves the test accuracy and, in some cases, overcomes the overfitting problem and improves the test accuracy.

SCiMA: a Generic Single-Cycle Compute-in-Memory Acceleration Scheme for Matrix Computations

Conference Paper
Sepehr Tabrizchi, Shaahin Angizi, and Arman Roohi
IEEE International Symposium on Circuits and Systems (ISCAS)
Publication year: 2022

This work proposes a new generic Single-cycle Compute-in-Memory (CiM) Accelerator for matrix computation named SCiMA. SCiMA is developed on top of the existing commodity Spin-Orbit Torque Magnetic Random-Access Memory chip. Every sub-array’s peripherals are transformed to realize a full set of single-cycle 2- and 3-input in-memory bulk bitwise functions specifically designed to accelerate a wide variety of graph and matrix multiplication tasks. We explore SCiMA’s efficiency by selecting a complex matrix processing operation, i.e., calculating determinant as an essential and under-explored application in the CiM domain. The cross-layer device-to-architecture simulation framework shows the presented platform can reduce energy consumption by 70.43% compared with the most recent CiM designs implemented with the same memory technology. SCiMA also achieves up to 2.5x speedup compared with current CiM platforms.

ReFACE: Efficient Design Methodology for Acceleration of Digital Filter Implementations

Conference Paper
Arman Roohi, Shaahin Angizi, Pooriya Navaeilavasani, and MohammadReza Taheri
23rd International Symposium on Quality Electronic Design (ISQED)
Publication year: 2022

Because of the impressive performance and success of artificial intelligence (AI)-based applications, filters as a primary part of digital signal processing systems are widely used, especially finite impulse response (FIR) filtering. Although they offer several advantages, such as stability, they are computationally intensive. Hence, in this paper, we propose a systematic methodology to efficiently implement computing in-memory (CIM) accelerators for FIR filters using various CMOS and post-CMOS technologies, referred to as ReFACE. ReFACE leverages a residue number system (RNS) to speed up the essential operations of digital filters, instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain. Moreover, the CIM architecture eliminates the off-chip data transfer by leveraging the maximum internal bandwidth of memory chips to realize a local and parallel computation on small residues independently. Taking advantage of both RNS and CIM results in significant power and latency reduction. As a proof-of-concept, ReFACE is leveraged to implement a 4-tap RNS FIR. The simulation results verified its superior performance with up to 85× and 12× improvement in energy consumption and execution time, respectively, compared with an ASIC accelerator.

ReD-LUT: Reconfigurable In-DRAM LUTs Enabling Massive Parallel Computation

Conference Paper
Ranyang Zhou, Arman Roohi, Durga Misra, and Shaahin Angizi
41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Publication year: 2022

In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results show that for a quantized deep learning workload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ~37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ~2.2× compared to an ASIC implementation.

Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks,

Journal Paper
S. Tabrizchi, Sh. Angizi, and A. Roohi
Journal of Low Power Electronics and Applications
Publication year: 2022

Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.

MR-PIPA: An Integrated Multi-level RRAM (HfOx) based Processing-In-Pixel Accelerator

Journal Paper
M. Abedin, A. Roohi, M. Liehr, N. Cady, and Sh. Angizi
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
Publication year: 2022

This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks (NNs) leveraging a novel compute-pixel with nonvolatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs. Our proposed design, namely an integrated multilevel RRAM (HfOx)-based processing-in-pixel accelerator (MR-PIPA), achieves a frame rate of 1000 and efficiency of ~1.89 TOp/s/W, while it substantially reduces data conversion and transmission energy by ~84% compared to a baseline at the cost of minor accuracy degradation.

LT-PIM: An LUT-based Processing-in-DRAM Architecture with RowHammer Self-Tracking

Journal Paper
R. Zhou, S. Tabrizchi, A. Roohi, and Sh. Angizi
IEEE Computer Architecture Letters (CAL)
Publication year: 2022

Herein, we propose LT-PIM as a L ookup T able-based P rocessing- I n- M emory architecture leveraging the high density of DRAM to enable massively parallel and flexible computation. LT-PIM supports lookup table queries to execute complex arithmetic operations, such as multiplication via only memory read operation. In addition, LT-PIM enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands in the same bit-line. With this, LT-PIM enables a complete and inexpensive in-DRAM RowHammer (RH) self-tracking approach. Our results demonstrate that LT-PIM achieves  70% higher energy efficiency than the fastest charge-sharing-based designs and  32% over the best LUT-based designs. As for the RH self-tracking, with a worst-case slowdown of  0.2%, LT-PIM archives up to  80% energy-saving over the best designs.

Integrated Sensing and Computing using Energy-Efficient Magnetic Synapses

Conference Paper
Shaahin Angizi, and Arman Roohi
23rd International Symposium on Quality Electronic Design (ISQED)
Publication year: 2022

This work presents a processing-in-sensor platform leveraging magnetic devices as a flexible and efficient solution for real-time and smart image processing in AI devices. The main idea is to combine the typical sensing mechanism with an intrinsic coarse-grained convolution operation at the edge to remarkably reduce the power consumption of data conversion and transmission to an off-chip processor imposed by the first layer of deep neural networks. Our initial results demonstrate acceptable accuracy on the SVHN image data-set, while the proposed platform substantially reduces data conversion and transmission energy compared with a baseline sensor-CPU platform.

HARDeNN: Hardwareassisted Attack-resilient Deep Neural Network Architectures

Journal Paper
N. Khoshavi, M. Maghsoudloo, A. Roohi, S. Sargolzaei, and B. Yu
Microprocessors and Microsystems
Publication year: 2022

We propose HARDeNN, a low-overhead end-to-end inference accelerator methodology to armor the underlying pre-trained neural network architecture against black-box non-input adversarial attacks. In order to find the most vulnerable neural network architectures parameters, a hardware-assisted fault injection tool and a statistical stress model have been proposed to synergy uniform fault assessment across layers and targeted in-layer fault assessment to realize a holistic, rigorous fault evaluation in NN topologies susceptible to non-input adversarial black-box attacks. The key observation from the assessment shows that the weights and activation functions are the most vulnerable neural network parameters that are susceptible to both single-bit and multiple-bit flip attacks. Concerning the aforementioned parameters, a multi-objective design space exploration is conducted to find a superior design under different resource constraints. The error-resiliency magnitude offered by HARDeNN can be adjusted based on the given boundaries. The experimental results show that HARDeNN methodology enhances the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 multi-bit upsets that target weight and activation layers,

FlexiDRAM: A Flexible in-DRAM Framework to Enable Parallel General-Purpose Computation

Conference Paper
Ranyang Zhou, Arman Roohi, Durga Misra, and Shaahin Angizi
Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)
Publication year: 2022

In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.

Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks

Journal Paper
M. Alali, A. Roohi, Sh. Angizi, and J. S. Deogun
Micromachines
Publication year: 2022

Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGB-colored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., <8:8>. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems.

Enabling efficient training of convolutional neural networks for histopathology images

Conference Paper
Mohammed H. Alali, Arman Roohi, and Jitender S. Deogun
International Conference on Image Analysis and Processing
Publication year: 2022

Convolutional Neural Networks (CNNs) have gained lots of attention in various digital imaging applications. They have proven to produce incredible results, especially on big data, that require high processing demands. With the increasing size of datasets, especially in computational pathology, CNN processing takes even longer and uses higher computational resources. Considerable research has been conducted to improve the efficiency of CNN, such as quantization. This paper aims to apply efficient training and inference of ResNet using quantization on histopathology images, the Patch Camelyon (PCam) dataset. An analysis for efficient approaches to classify histopathology images is presented. First, the original RGB-colored images are evaluated. Then, compression methods such as channel reduction and sparsity are applied. When comparing sparsity on grayscale with RGB modes, classification accuracy is relatively the same, but the total number of MACs is less in sparsity on grayscale by 77% than RGB. A higher classification result was achieved by grayscale mode, which requires much fewer MACs than the original RGB mode. Our method’s low energy and processing make this project suitable for inference on wearable healthcare low powered devices and mobile hospitals in rural areas or developing countries. This also assists pathologists by presenting a preliminary diagnosis.

Enabling Edge Computing Using Emerging Memory Technologies: From Device to Architecture

Book Chapter
Arman Roohi, Shaahin Angizi, Deliang Fan
Frontiers of Quality Electronic Design (QED) AI, IoT and Hardware Security
Publication year: 2022

This book chapter describes, explores, and analyzes the designs and framework for energy-efficient and reliable edge computing from device to architecture to handle and compute data-intensive tasks and applications. First, we present a comprehensive study regarding magnetic random-access memory (MRAM) as a promising nonvolatile memory component due to its interesting features, including nonvolatility, near-zero standby power, high integration density, and radiation hardness. To enable efficient and reliable computing units, optimized in-memory processing accelerators for data and compute-intensive tasks via algorithm and hardware codesign approaches are discussed. Moreover, two other high attention topics, namely, normally off computing and hardware security, are examined. Thus, two design methodologies are introduced to mitigate MRAM write energy cost while provided benefits are efficiently utilized. The first design methodology approach, referred to as NV-clustering, is developed to realize middleware-transparent intermittent computing. The foundations of our work are advanced from the ground up by extending this emerging MRAM device to discover logic-in-memory methods that leverage intrinsic nonvolatility to realize intermittent robust computation. Then power analysis-resilient circuit (PARC) procedure as an extension of NV-clustering is developed as a power-masked synthesis technique in the presence of power analysis attacks.

Efficient Targeted Bit-Flip Attack Against the Local Binary Pattern Network

Conference Paper
Arman Roohi, and Shaahin Angizi
IEEE International Symposium on Hardware Oriented Security and Trust (HOST)
Publication year: 2022

Deep neural networks (DNNs) have shown their great capability of surpassing human performance in many areas. With the help of quantization, artificial intelligence (AI) powered devices are ubiquitously deployed. Yet, the easily accessible AI-powered edge devices become the target of malicious users who can deteriorate the privacy and integrity of the inference process. This paper proposes two adversarial attack scenarios, including three threat models, which crush local binary pattern networks (LBPNet). These attacks can be applied maliciously to flip a limited number of susceptible bits in kernels within the system’s shared memory. The threat could be driven through the Row-Hammer attack and significantly drops the model’s accuracy. Our preliminary simulation results demonstrate flipping only the most significant bit of the first LBP layer decreases the accuracy from 99.51 % down to 18 % on the MNIST data-set. We then briefly discuss potential hardware/software -oriented defense mechanisms as countermeasures to such attacks.

EaseMiss: HW/SW Co-Optimization for Efficient Large Matrix-Matrix Multiply Operations

Conference Paper
Ali Nezhadi, Shaahin Angizi, and Arman Roohi
IEEE 15th Dallas Circuit And System Conference (DCAS)
Publication year: 2022

Due to the essential role of matrix multiplication in many scientific applications, especially in data and compute -intensive applications, we explore the efficiency of highly used matrix production algorithms. This paper proposes an HW/SW co-optimization technique, entitled EaseMiss, to reduce the cache miss ratio for large general matrix-matrix multiplications. First, we revise the algorithms by applying three software optimization techniques to improve performance. Choosing the proper algorithms to achieve the best performance is examined and formulated. By leveraging the proposed optimizations, the number of cache misses decreases by a factor of 3 in a conventional data cache. To further improve, we then propose SPLiTCACHE to virtually split data cache regarding matrices’ dimensions for better data reuse. This method can be easily embedded into conventional general-purpose processors or GPUs at the cost of negligible logical circuit overhead. After using the correct and valid splitting, the obtained results show that the cache misses reduce by a factor of 2 compared to the conventional data cache on average in the machine learning workloads.

Design and Evaluation of a Robust Power-Efficient Ternary SRAM Cell

Conference Paper
Sepehr Tabrizchi, Shaahin Angizi, and Arman Roohi
IEEE 65th International Midwest Symposium on Circuits and Systems (MWSCAS)
Publication year: 2022

This paper presents a novel ternary Static Random Access Memory (T-SRAM) cell. To validate the functionality of the proposed T-SRAM, carbon nanotube field-effect transistors are selected as a proof-of-concept, whereas either post-CMOS or CMOS technologies can replace it. Our T-SRAM intrinsically eliminates the need to store the intermediate ternary state’s voltage level, thus significantly reducing leakage power and increasing robustness. Extensive SPICE simulation and comparison results show that the proposed T-SRAM can be a promising alternative for CMOS SRAMs deploying in low-power edge AI. Further, the analysis verifies that the proposed design is more robust than previous implementations.

A Processing-in-Pixel Accelerator based on Multi-level HfOx ReRAM

Conference Paper
Minhaz Abedin, Arman Roohi, Nathaniel Cady, and Shaahin Angizi
International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)
Publication year: 2022

This work paves the way to realize a processing-in-pixel accelerator based on a multi-level HfO x ReRAM as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs.

RNSiM: Efficient Deep Neural Network Accelerator Using Residue Number Systems

Conference Paper
Arman Roohi, MohammadReza Taheri, Shaahin Angizi, and Deliang Fan
2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
Publication year: 2021

In this paper, we propose an efficient convolutional neural network (CNN) accelerator design, entitled RNSiM, based on the Residue Number System (RNS) as an alternative for the conventional binary number representation. Instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain, the novelty of RNSiM lies in that all the data, including stored weights and communication/computation, are performed in the RNS domain. Due to the inherent parallelism of the RNS arithmetic, power and latency are significantly reduced. Moreover, an enhanced integrated intermodulo operation core is developed to decrease the overhead imposed by non-modular operations. Further improvement in systems’ performance efficiency is achieved by developing efficient Processing-in-Memory (PIM) designs using various volatile CMOS and non-volatile Post-CMOS technologies to accelerate RNS-based multiplication-and-accumulations (MACs). The RN-SiM accelerator’s performance on different datasets, including MNIST, SVHN, and CIFAR-10, is evaluated. With almost the same accuracy to the baseline CNN, the RNSiM accelerator can significantly increase both energy-efficiency and speedup compared with the state-of-the-art FPGA, GPU, and PIM designs. RNSiM and other RNS-PIMs, based on our method, reduce the energy consumption by orders of 2877× and 331897× compared with the FPGA and the GPU platforms, respectively.

Processing-in-Memory Acceleration of MAC-based Applications Using Residue Number System: A Comparative Study

Conference Paper
Shaahin Angizi, Arman Roohi, MohammadReza Taheri, Deliang Fan
31st ACM Great Lakes Symposium on VLSI (GLSVLSI 2021), June 22-25, 2021
Publication year: 2021

Entropy-Based Modeling for Estimating Adversarial Bit-flip Attack Impact on Binarized Neural Network

Conference Paper
Navud Khoshavi, Saman Sargolzaei, Yu Bi, A. Roohi
26th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan.18-21
Publication year: 2021

Over past years, the high demand to efficiently process deep learning (DL) models has driven the market of the chip design companies. However, the new Deep Chip architectures, a common term to refer to DL hardware accelerator, have slightly paid attention to the security requirements in quantized neural networks (QNNs), while the black/white -box adversarial attacks can jeopardize the integrity of the inference accelerator. Therefore in this paper, a comprehensive study of the resiliency of QNN topologies to black-box attacks is examined. Herein, different attack scenarios are performed on an FPGA-processor co-design, and the collected results are extensively analyzed to give an estimation of the impact’s degree of different types of attacks on the QNN topology. To be specific, we evaluated the sensitivity of the QNN accelerator to a range number of bit-flip attacks (BFAs) that might occur in the operational lifetime of the device. The BFAs are injected at uniformly distributed times either across the entire QNN or per individual layer during the image classification. The acquired results are utilized to build the entropy-based model that can be leveraged to construct resilient QNN architectures to bit-flip attacks.

SHIELDeNN: Online Accelerated Framework for Fault-Tolerant Deep Neural Network Architectures

Conference Paper
Navid Khoshavi, Arman Roohi, Connor Broyles, Saman Sargolzaei, Yu Bi, David Z. Pan
57th Design Automation Conference (DAC), San Francisco, CA, USA, July 19-23
Publication year: 2020

We propose SHIELDeNN, an end-to-end inference accelerator frame-work that synergizes the mitigation approach and computational resources to realize a low-overhead error-resilient Neural Network (NN) overlay. We develop a rigorous fault assessment paradigm to delineate a ground-truth fault-skeleton map for revealing the most vulnerable parameters in NN. The error-susceptible parameters and resource constraints are given to a function to find superior design. The error-resiliency magnitude offered by SHIELDeNN can be adjusted based on the given boundaries. SHIELDeNN methodology improves the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 MBUs that target weight and activation layers, respectively.

Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures

Conference Paper
Arman Roohi
International Green and Sustainable Computing Conference, October 19-22
Publication year: 2020

This work shows a promising solution to efficiently implement normally-off computing (NoC) and power analysis side-channel attack resilient designs using spin-based devices. Spintronics, as post-CMOS devices, provide interesting features such as non-volatility, which plays an essential role in NoC structures. Besides, spin-based components can naturally function as a polymorphic gate (PG) that realizes reconfigurable logic functions with inherent security attributes. However, Spintronics’ utilization imposes power and area overhead compared to CMOS-based designs. Thus, herein an efficient design methodology is introduced to mitigate these overheads. This approach is first extended to realize the targeted insertion PG Modules within the VLSI implementations to make it resilient against power failure, entitled NV-Clustering. Then PARC as an extension of NV-Clustering was developed as a power-masked synthesis method in the presence of power analysis side-channel attack. PARC randomly generates power maskable building blocks with the optimum PDP and area overhead. In addition to NV-Clustering, the PARC can be expanded against fault injection approaches due to PG modules’ reconfigurability to cover the faults.

Hardware-assisted Black-box Adversarial Attack Evaluation Framework on Binarized Neural Network

Article
Navid Khoshavi, Arman Roohi, Yu Bi
41st IEEE Symposium on Security and Privacy
Publication year: 2020

Fiji-FIN: A Fault Injection Framework on Quantized Neural Network Inference Accelerator

Conference Paper
Navid Khoshavi, Connor Broyles, Yu Bi, Arman Roohi
IEEE International Conference on Machine Learning and Applications (ICMLA), October 19-22
Publication year: 2020

In recent years, the big data booming has boosted the development of highly accurate prediction models driven from machine learning (ML) and deep learning (DL) algorithms. These models can be orchestrated on the customized hardware in the safety-critical missions to accelerate the inference process in ML/DL -powered IoT. However, the radiation-induced transient faults and black/white -box attacks can potentially impact the individual parameters in ML/DL models which may result in generating noisy data/labels or compromising the pre-trained model. In this paper, we propose Fiji-FIN 1 , a suitable framework for evaluating the resiliency of IoT devices during the ML/DL model execution with respect to the major security challenges such as bit perturbation attacks and soft errors. Fiji-FIN is capable of injecting both single bit/event flip/upset and multi-bit flip/upset faults on the architectural ML/DL accelerator embedded in ML/DL -powered IoT. Fiji-FIN is significantly more accurate compared to the existing software-level fault injections paradigms on ML/DL -driven IoT devices.

Entropy-Based Modeling for Estimating Soft Errors Impact on Binarized Neural Network Inference

Article
Navid Khoshavi, Saman Sargolzaei, Arman Roohi, Connor Broyles, Yu Bi
arXiv preprint
Publication year: 2020

Over past years, the easy accessibility to the large scale datasets has significantly shifted the paradigm for developing highly accurate prediction models that are driven from Neural Network (NN). These models can be potentially impacted by the radiation-induced transient faults that might lead to the gradual downgrade of the long-running expected NN inference accelerator. The crucial observation from our rigorous vulnerability assessment on the NN inference accelerator demonstrates that the weights and activation functions are unevenly susceptible to both single-event upset (SEU) and multi-bit upset (MBU), especially in the first five layers of our selected convolution neural network. In this paper, we present the relatively-accurate statistical models to delineate the impact of both undertaken SEU and MBU across layers and per each layer of the selected NN. These models can be used for evaluating the error-resiliency magnitude of NN topology before adopting them in the safety-critical applications.

Entropy-Based Modeling for Estimating Soft Errors Impact on Binarized Neural Network Inference

Article
Navid Khoshavi, Arman Roohi, Saman Sargolzaei, Connor Broyles, Yu Bi
Publication year: 2020

Abstract

Over past years, the easy accessibility to the large scale datasets has significantly shifted the paradigm for developing highly accurate prediction models that are driven from Neural Network (NN). These models can be potentially impacted by the radiation-induced transient faults that might lead to the gradual downgrade of the long-running expected NN inference accelerator. The crucial observation from our rigorous vulnerability assessment on the NN inference accelerator demonstrates that the weights and activation functions are unevenly susceptible to both single-event upset (SEU) and multi-bit upset (MBU), especially in the first five layers of our selected convolution neural network. In this paper, we present the relatively-accurate statistical models to delineate the impact of both undertaken SEU and MBU across layers and per each layer of the selected NN. These models can be used for evaluating the error-resiliency magnitude of NN topology before adopting them in the safety-critical applications.

Keywords

  • Fault Injection,
  • Deep Neural Network Accelerator,
  • Machine Learning,
  • Soft Error,
  • Statistical Model

Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience

Conference Paper
Arman Roohi, Shaahin Angizi, Deliang Fan, Ronald F DeMara
20th International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, USA, 2019, pp. 8-13.
Publication year: 2019

Abstract

Herein, a bit-wise Convolutional Neural Network (CNN) in-memory accelerator is implemented using Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) computational sub-arrays. It utilizes a novel AND-Accumulation method capable of significantly-reduced energy consumption within convolutional layers and performs various low bitwidth CNN inference operations entirely within MRAM. Power-intermittence resiliency is also enhanced by retaining the partial state information needed to maintain computational forward-progress, which is advantageous for battery-less IoT nodes. Simulation results indicate ~5.4× higher energy-efficiency and 9× speedup over ReRAM-based acceleration, or roughly ~9.7× higher energy-efficiency and 13.5× speedup over recent CMOS-only approaches, while maintaining inference accuracy comparable to baseline designs.

PARC: A Novel Design Methodology for Power Analysis Resilient Circuits using Spintronics

Journal Paper
Arman Roohi, , and Ronald F. DeMara.
IEEE Transactions on Nanotechnology, vol. 18, pp. 885-889, 2019
Publication year: 2019

Abstract

A prevalent class of side-channel attacks relies on Differential Power Analysis (DPA) methods, which monitor power traces during cryptographic processing to discover secret keys. Reconfigurability via Polymorphic Gate Modules (PGMs) offers an approach to obscure DPA information by dynamically rearranging the operation of constituent sub-circuits, albeit at the cost of increased area and power consumption. Thus, we develop the Power Analysis-Resilient Circuit (PARC) design methodology to instantiate the use of spin-based devices as an extension to conventional Register Transfer Language (RTL) specifications. PARC replaces a specific portion of the circuit to maximize a new Effectiveness of Design (EoD) metric, which quantifies DPA impact versus its performance overhead. To validate functionality, PARC is applied to various benchmark circuits including ISCAS-89, MCNC, and ITC-99. EoD results indicate that PARC significantly increases the number of power traces that an adversary would need to use in order to extract power information but incurs a low cost to the functional circuit itself.

IRC: Cross-layer design exploration of Intermittent Robust Computation units for IoTs

Conference Paper
Arman Roohi, Ronald F DeMara
IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, USA, 2019, pp. 354-359.
Publication year: 2019

Abstract

Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of the Internet of Things (IoT) devices by utilizing ambient sources of energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile, it is proposed to innovate Intermittent Robust Computation (IRC) Unit as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in spin-based switching devices. The foundations of IRC will be advanced from the device-level upwards, by extending a Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device. The device will then be used to realize SHE-MTJ Majority/Polymorphic Gate (MG/PG) logic approaches and libraries. Then a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Finally, the NV-Clustering synthesis procedure and corresponding tool module are proposed to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. It also realizes middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Simulation results for various benchmark circuits including ISCAS-89 validate functionality and power dissipation, area, and delay benefits.

ApGAN: Approximate GAN for Robust Low-Energy Learning from Imprecise Components

Journal Paper
Arman Roohi, Shadi Sheikhfaal, Shaahin Angizi, Deliang Fan, and Ronald F. DeMara
IEEE Transactions on Computers, vol. 69, no. 3, pp. 349-360, 1 March 2020.
Publication year: 2019

Abstract

A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware implementation perspectives. First, inspired by the binary pattern feature extraction method along with binarized representation entropy, the existing Deep Convolutional GAN (DCGAN) algorithm is modified by binarizing the weights for a specific portion of layers within both the generator and discriminator models. Further reduction in storage and computation resources is achieved by leveraging a novel hardware-configurable in-memory addition scheme, which can operate in the accurate and approximate modes. Finally, a memristor-based processing-in-memory accelerator for ApGAN is developed. The performance of the ApGAN accelerator on different data-sets such as Fashion-MNIST, CIFAR-10, STL-10, and celeb-A is evaluated and compared with recent GAN accelerator designs. With almost the same Inception Score (IS) to the baseline GAN, the ApGAN accelerator can increase the energy-efficiency by ~28.6× achieving 35-fold speedup compared with a baseline GPU platform. Additionally, it shows 2.5× and 5.8× higher energy-efficiency and speedup over CMOS-ASIC accelerator subject to an 11 percent reduction in IS.

Keywords

  • Generative adversarial network
  • in-memory processing platform
  • neural network acceleration
  • hardware mapping

Synthesis of Normally-Off Boolean Circuits: An Evolutionary Optimization Approach Utilizing Spintronic Devices

Conference Paper
Arman Roohi, Ramtin Zand, Ronald F DeMara
19th International Symposium on Quality Electronic Design (ISQED), Santa Clara, CA, 2018, pp. 49-54.
Publication year: 2018

Abstract

In this paper, we develop an evolutionary-driven circuit optimization methodology, which can be leveraged for the synthesis of spintronic-based normally-off computing (NoC) circuits. NoC architectures distribute nonvolatile memory elements throughout the CMOS logic plane, creating a new class of fine-grained functionally-constrained synthesis challenges. Spin-based NoC circuits synthesis objectives include increased computational throughput and reduced static power consumption. Our proposed methodology utilizes Genetic Algorithms (GAs) to optimize the implementation of a Boolean logic expression in terms of area, delay, or power consumption. It first leverages the spin-based device characteristics to achieve a primary semi-optimized implementation, then further performance optimization is applied to the implemented design based on the NoC requirements and optimization criteria. As a proof-of-concept, the optimization approach is leveraged to implement a functionally-complete set of Boolean logic gates using spin Hall effect (SHE)-magnetic tunnel junctions (MTJs), which are optimized for both power and delay objectives. NoC synthesis methodologies supporting NoC circuit design of emerging device and hybrid CMOS logic applications. Finally, Simulation results and analyses verified the functionality of our proposed optimization tool for NoC circuit implementations.

NV-Clustering: Normally-Off Computing Using Non-Volatile Datapaths

Journal Paper
Arman Roohi, Ronald F DeMara
IEEE Transactions on Computers, vol. 67, no. 7, pp. 949-959, 1 July 2018
Publication year: 2018

Abstract

With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node.

Logic-Encrypted Synthesis for Energy-Harvesting-Powered Spintronic-Embedded Datapath Design

Conference Paper
Arman Roohi, Ramtin Zand, Ronald F DeMara
Great Lakes Symposium on VLSI (GLSVLSI ’18). Association for Computing Machinery, New York, NY, USA, 9–14.
Publication year: 2018

Abstract

The objectives of advancing secure, intermittency-tolerant, and energy-aware logic datapaths are addressed herein by developing a spin-based design methodology and its corresponding synthesis steps. The approach selectively-inserts Non-Volatile (NV) Polymorphic Gates (PGs) to realize datapaths which are suitable for intrinsic operation in Energy-Harvesting-Powered (EHP) devices. Spin Hall Effect (SHE)-based Magnetic Tunnel (MTJs) are utilized to design NV-PGs, which are combined within a Flip-Flop (FF) circuit to develop a PG-FF realizing Boolean logic functions with inherent state-holding capability. The reconfigurability of PGs is leveraged for logic-encryption to enhance the security of the developed intermittency-resilient circuits, which are applied to ISCAS-89, MCNS, and ITC-99 benchmarks. The results obtained indicate that the PG-FF based design can achieve up to 7.1% and 13.6% improvements in terms of area and Power Delay Product (PDP), respectively, compared to NV-FF based methodologies that replace the CMOS-based FFs with NV-FFs. Further PDP improvements are achieved by using low-energy barrier SHE-MTJ devices within the PG-FF circuit. SHE-MTJs with 30kT energy exhibit 40.5% reduction in PDP at the cost of lower retention times in the range of minutes, which is still sufficient to achieve forward progress in EHP devices having more than hundreds of power-on and power-off cycles per minute.

Heterogeneous technology configurable fabrics for field-programmable co-design of cmos and spin-based devices

Conference Paper
Ronald F DeMara, Arman Roohi, Ramtin Zand, Steven D Pyle
Journal of Consumer Psychology, Volume 22, Issue 2, April 2012, Pages 191-194
Publication year: 2018

Abstract

The architecture, operation, and characteristics of two post-CMOS reconfigurable fabrics are identified to realize energy-sparing and resilience features, while remaining feasible for near-term fabrication. First, Storage Cell Replacement Fabrics (SCRFs) provide a reconfigurable computing platform utilizing near- zero leakage Spin Hall Effect devices which replace SRAM bit-cells within Look-Up Tables (LUTs) and/or switch boxes to complement the advantages of MOS transistor-based multiplexer select trees. Second, Heterogeneous Technology Configurable Fabrics (HTCFs) are identified to extend reconfigurable computing platforms via a palette of CMOS, spin-based, or other emerging device technologies, such as various Magnetic Tunnel Junction (MTJ) and Domain Wall Motion devices. HTCFs are composed of a triad of Emerging Device Blocks, CMOS Logic Blocks, and Signal Conversion Blocks. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HTCFs enable new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. Both SCRFs and HTCFs offer a platform for fine- grained Logic-In-Memory architectures and runtime adaptive hardware. SPICE simulations indicate 6% to 67% reduction in read energy, 21% reduction in reconfiguration energy, and 78% higher clock frequency versus alternative fabricated emerging device architectures, and a significant reduction in leakage compared to CMOS-based approaches.

Fundamentals, Modeling, and Application of Magnetic Tunnel Junctions

Book Chapter
Ramtin Zand, Arman Roohi, Ronald F DeMara
Nanoscale Devices: Physics, Modeling, and Their Application (2018): 337.
Publication year: 2018

Abstract

Aggressive Metal Oxide Semiconductor (MOS) technology scaling in digital circuits has resulted in important challenges including a significant increase in leakage currents, shortchannel effects, and drain saturation growth while reducing the power supply voltage for digital
applications. Furthermore, by extensions to sub 10-nm regimes, error resiliency has become a major challenge for the microelectronics industry, particularly mission-critical systems, e.g. space and terrestrial applications. Therefore, emerging devices and technologies have attracted considerable
attention in recent years as an alternative for CMOS based technologies such as spintronics [1-6], resistive random access memory (RRAM) [7-10], phase-change memory (PCM) [11, 12], and Quantum Cellular Automata (QCA) [13-18]. Among promising devices, the 2014 Magnetism
Roadmap [19] identifies nanomagnetic devices as capable post-CMOS candidates, of which Magnetic Tunnel Junctions (MTJs) are considered as one of the most promising technologies spanning both logic [20-22] and memory functionalities [23-26]. MTJs are characterized by nonvolatility, near-zero standby power, high integration density, and radiation-hardness, as a technology progression from CMOS. Moreover, MTJ can be readily integrated at the back-end process of the CMOS fabrication, due to its vertical structure [27, 28]. In this book chapter, we will focus on the fundamentals and modeling of the MTJs using precise physics equations. Moreover, some of their applications in reconfigurable fabrics and logic-in-memory architectures will be studied.

Voltage-based concatenatable full adder using spin hall effect switching

Journal Paper
Arman Roohi, Ramtin Zand, Deliang Fan, Ronald F DeMara
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 12, pp. 2134-2138, Dec. 2017.
Publication year: 2017

Abstract

Magnetic tunnel junction (MTJ)-based devices have been studied extensively as a promising candidate to implement hybrid energy-efficient computing circuits due to their nonvolatility, high integration density, and CMOS compatibility. In this paper, MTJs are leveraged to develop a novel full adder (FA) based on 3- and 5-input majority gates. Spin Hall effect (SHE) is utilized for changing the MTJ states resulting in low-energy switching behavior. SHE-MTJ devices are modeled in Verilog-A using precise physical equations. SPICE circuit simulator is used to validate the functionality of 1-bit SHE-based FA. The simulation results show 76% and 32% improvement over previous voltage-mode MTJ-based FA in terms of energy consumption and device count, respectively. The concatanatability of our proposed 1-bit SHE-FA is investigated through developing a 4-bit SHE-FA. Finally, delay and power consumption of an n-bit SHE-based adder has been formulated to provide a basis for developing an energy efficient SHE-based n-bit arithmetic logic unit.

Towards ultra-efficient QCA reversible circuits

Journal Paper
Amir Mokhtar Chabi, Arman Roohi, Hossein Khademolhosseini, Shadi Sheikhfaal, Shaahin Angizi, Keivan Navi, Ronald F DeMara
Publication year: 2017

Abstract

Nanotechnologies, remarkably Quantum-dot Cellular Automata (QCA), offer an attractive perspective for future computing technologies. In this paper, QCA is investigated as an implementation method for reversible logic. A novel XOR gate and also a new approach to implement 2:1 multiplexer are presented. Moreover, an efficient and potent universal reversible gate based on the proposed XOR gate is designed. The proposed reversible gate has a superb performance in implementing the QCA standard benchmark combinational functions in terms of area, complexity, power consumption, and cost function in comparison to the other reversible gates. The gate achieves the lowest overall cost among the most cost-efficient designs presented so far, with a reduction of 24%. In order to employ the merits of reversibility, the proposed reversible gate is leveraged to design the four common latches (D latch, T latch, JK latch, and SR latch). Specialized structures of the proposed circuits could be used as building blocks in designing sequential and combinational circuits in QCA architectures.

Secure Intermittent-Robust Computation for Energy Harvesting Device Security and Outage Resilience

Conference Paper
Arman Roohi, Ronald F DeMara, Longfei Wang, Selçuk Köse
IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation
Publication year: 2017

Abstract:

In this paper, we propose Secure Intermittent-Robust Computation (SIRC) for Energy Harvesting Powered Internet of Things (IoT) Devices. This effort innovates a new duty-cycle-variable computing approach to facilitate and invigorate security in energy-harvesting-powered IoT network nodes. The proposed SIRC architecture is developed from the ground up by extending emerging post-CMOS switching elements to realize majority-gate logic that is intrinsically-capable of middleware-coherent, battery-free without check-pointing or micro-tasking, and can be resilient to wireless power transfer attacks including charge attacks and data attacks. Potential countermeasures for these attacks are identified at the circuit-level through gate-resolution immunity of power interruption. As a proof-of-concept, a power-maskable design using SIRC approach is developed for s27 circuit from ISCAS89 benchmark. The obtained results shows SIRC provides reduced area consumption and increase number of power traces to extract crypted data.

Heterogeneous energy-sparing reconfigurable logic: spin-based storage and CNFET-based multiplexing

Journal Paper
Mohan Krishna Gopi Krishna, Arman Roohi, Ramtin Zand, Ronald F DeMara
IET Circuits, Devices & Systems, vol. 11, no. 3, pp. 274-279, 5 2017.
Publication year: 2017

Abstract

Field programmable gate array (FPGA) attributes of logic configurability, bitstream storage, and dynamic signal routing can be realised by leveraging the complementary benefits of emerging devices with complementary metal oxide semiconductor (CMOS)-based devices. A novel carbon/magnet lookup table (CM-LUT) is developed and evaluated by trading off a range of mixed heterogeneous technologies to balance energy, delay, and reliability attributes. Herein, magnetic spintronic devices are employed in the configuration memory to contribute non-volatility and high scalability. Meanwhile, carbon nanotube field-effect transistors (CNFETs) provide desirable conductivity, low delay, and low power consumption. The proposed CM-LUT offers ultra-low power and high-speed operation while maintaining high endurance re-programmability with increased radiation-induced soft-error immunity. The proposed four-input one-output CM-LUT utilises 41 CNFETs and 20 magnetic tunnel junctions for read operations and 35 CNFET to perform write operations. Results indicate that CM-LUT achieves an average four-fold energy reduction, eight-fold faster circuit operation and 9.3% reconfiguration power delay product improvement in comparison with spin-based look-up tables. Finally, additional hybrid technology designs are considered to balance performance with the demands of energy consumption for near-threshold operation.

Energy-efficient and process-variation-resilient write circuit schemes for spin hall effect mram device

Journal Paper
Ramtin Zand, Arman Roohi, Ronald F DeMara
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 9, pp. 2394-2401, Sept. 2017
Publication year: 2017

Abstract

In this paper, various energy-efficient write schemes are proposed for switching operation of spin hall effect (SHE)-based magnetic tunnel junctions (MTJs). A transmission gate (TG)-based write scheme is proposed, which provides a symmetric and energy-efficient switching behavior. We have modeled an SHE-MTJ using precise physics equations, and then leveraged the model in SPICE circuit simulator to verify the functionality of our designs. Simulation results show the TG-based write scheme advantages in terms of device count and switching energy. In particular, it can operate at 12% higher clock frequency while realizing at least 13% reduction in energy consumption compared to the most energy-efficient write circuits. We have analyzed the performance of the implemented write circuits in presence of process variation (PV) in the transistors’ threshold voltage and SHE-MTJ dimensions. Results show that the proposed TG-based design is the second most PV-resilient write circuit scheme for SHE-MTJs among the implemented designs. Finally, we have proposed the 1TG-1T-1R SHE-based magnetic random access memory (MRAM) bit cell based on the TG-based write circuit. Comparisons with several of the most energy-efficient and variation-resilient SHE-MRAM cells indicate that 1TG-1T-1R delivers reduced energy consumption with 43.9% and 10.7% energy-delay product improvement, while incurring low area overhead.

Scalable adaptive spintronic reconfigurable logic using area-matched MTJ design

Journal Paper
Ramtin Zand, Arman Roohi, Soheil Salehi, Ronald F DeMara
IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 7, pp. 678-682, July 2016.
Publication year: 2016

Abstract:

Spin-transfer torque (STT) random access memory has been researched as a promising alternative for static random access memory in reconfigurable fabrics, particularly in lookup tables (LUTs), due to its nonvolatility, low standby and static power, and high integration density features. In this brief, we leverage physical characteristics of magnetic tunnel junctions (MTJs) to design a unique reference MTJ which has a calibrated resistance matching the STT-based LUT (STT-LUT) circuit requirements to provide optimal reading operation. Results obtained show 42% and 70% power-delay product (PDP) improvement over previous MTJ-based LUT designs. Moreover, a four-input adaptive STT-based LUT (A-LUT) is proposed based on the developed STT-LUT, which is configurable to function in seven independent modes. An n-input A-LUT exhibits PDP which can be a fraction of n-input STT-LUT PDP, when performing two-input to (n-1)-input Boolean logic functions.

Loss-aware switch design and non-blocking detection algorithm for intra-chip scale photonic interconnection networks

Journal Paper
Hesam Shabani, Arman Roohi, Akram Reza, Midia Reshadi, Nader Bagherzadeh, Ronald F DeMara
IEEE Transactions on Computers, vol. 65, no. 6, pp. 1789-1801, 1 June 2016.
Publication year: 2016

Abstract:

As the number of on-chip processor cores increases, power-efficient solutions are sought for data communication between cores. The Helix-h non-blocking photonic switch is developed to improve physical-layer and network performance parameters for a wide range of silicon nano-photonic multicore interconnection topologies. Traffic benchmarks and practical case studies using a cycle-accurate simulation environment indicate significantly reduced insertion loss providing improved bandwidth density and scalability to manycore plurality. Improvements in system performance parameters are quantified for network bandwidth, transmission efficiency, and latency in popular photonic internconnection topologies, in comparison to previous switch designs. For instance, utilizing the Helix-h switch in a mesh topology, the bandwidth is increased by 112 percent compared to the previously highest performing switch design. Execution time and energy efficiency are improved by up to 92 and 99 percent, respectively, for representative multicore applications. Finally, the technique is generalized to a novel graph-theoretic method for articulating blocking conditions in photonic switches.

Energy-efficient nonvolatile reconfigurable logic using spin hall effect-based lookup tables

Journal Paper
Ramtin Zand, Arman Roohi, Deliang Fan, Ronald F DeMara
IEEE Transactions on Nanotechnology, vol. 16, no. 1, pp. 32-43, Jan. 2017.
Publication year: 2016

Abstract:

In this paper, we leverage magnetic tunnel junction (MTJ) devices to design an energy-efficient nonvolatile lookup table (LUT), which utilizes a spin Hall effect (SHE) assisted switching approach for MTJ storage cells. SHE-MTJ characteristics are modeled in Verilog-A based on precise physical equations. Functionality of the proposed SHE-MTJ-based LUT is validated using SPICE simulation. Our proposed SHE-MTJ-based LUT (SHE-LUT) is compared with the most energy-efficient MTJ-based LUT circuits. The obtained results show more than 6%, 37%, and 67% improvement over three previous MTJ-based designs in term of read energy consumption. Moreover, the reconfiguration delay and energy of the proposed design is compared with that of the MTJ-based LUTs which utilize the spin transfer torque (STT) switching approach for reconfiguration. The results exhibit that SHE-LUT can operate at 78% higher clock frequency while achieving at least 21% improvement in terms of reconfiguration energy consumption. The operation-specific clocking mechanisms for managing the SHE-LUT operations are introduced along with detailed analyses concerning tradeoffs. Results are extended to design a 6-input fracturable LUT using SHE-MTJs.

A tunable majority gate-based full adder using current-induced domain wall nanomagnets

Journal Paper
Arman Roohi, Ramtin Zand, Ronald F DeMara
IEEE Transactions on Magnetics, vol. 52, no. 8, pp. 1-7, Aug. 2016, Art no. 3401507.
Publication year: 2016

Abstract:

Domain wall nanomagnet (DWNM)-based devices have been extensively studied as a promising alternative to the conventional CMOS technology in both the memory and logic implementations due to their non-volatility, near-zero standby power, and high integration density characteristics. In this paper, we leverage a physics-based model of a DWNM device to design a highly scalable current-mode majority gate to achieve a novel one bit full-adder (FA) circuit. The modeled DWNM specifications are calibrated with the experimentally measured data. The functionality of the proposed DWNM-based FA (DWNM-FA) is verified using a SPICE circuit simulator. The detailed analysis and the calculations have been performed to realize the proposed DWNM-FA delay and power consumption corresponding to the various induced input currents at different operating temperatures. The power-delay product of DWNM-FA is examined to tune the operation within the optimum induced input current region to obtain desired power-delay requirements over a range of 200 μA to 1 mA at temperatures from 298 to 378 K. Finally, the comparison results exhibit 52% and 49% area improvement as well as 41% and 31% improvement in device count complexity over CMOS-based and magnetic tunnel junction-based FA designs, respectively.

A parity-preserving reversible QCA gate with self-checking cascadable resiliency

Journal Paper
Arman Roohi, Ramtin Zand, Shaahin Angizi, Ronald F DeMara
IEEE Transactions on Emerging Topics in Computing, vol. 6, no. 4, pp. 450-459, 1 Oct.-Dec. 2018.
Publication year: 2016

Abstract:

A novel Parity-Preserving Reversible Gate (PPRG) is developed using Quantum-dot Cellular Automata (QCA) technology. PPRG enables rich fault-tolerance features, as well as reversibility attributes sought for energy-neutral computation. Performance of the PPRG design is validated through implementing thirteen standard combinational Boolean functions of three variables, which demonstrate from 10.7 to 41.9 percent improvement over the previous gate counts obtained with other reversible and/or preserving gate designs. Switching and leakage energy dissipation as low as 0.141 eV and 0.294 eV, for 1.5 Ek energy level are achieved using PPRG, respectively. The utility of PPRG is leveraged to design a one-bit full adder with 171 cells occupying only 0.19 mm 2 area. Finally, fault detection and isolation properties are formalized into a concise procedure. PPRG-based circuits capable of self-configuring active recovery for selected three-variable standard functions are realized using a memoryless method irrespective of garbage outputs.

Wire crossing constrained QCA circuit design using bilayer logic decomposition

Journal Paper
A Roohi, H Thapliyal, RF DeMara
Electronics Letters, vol. 51, no. 21, pp. 1677-1679, 8 10 2015.
Publication year: 2015

Abstract:

Quantum-dot cellular automata (QCA) seek potential benefits over CMOS devices such as low-power consumption, small dimensions, and high-speed operation. Two prominent QCA concerns of wire crossing complexity and circuit robustness are addressed by developing a three-step bilayer logic decomposition (BLD) methodology to design QCA-based logic circuits. The partitioning of QCA computing operations into logic layers realises considerable improvements in complexity, area, and modularity metrics. Moreover, since larger circuits are divided into two increasingly disjoint sub-planes, verification of the functionality of the design becomes compartmentalised. Design capability of the proposed approach is illustrated and analysed by implementing an area-efficient full comparator (FC) based on a novel logic realisation. The resulting 1-bit FC achieves 32% improvement in complexity metrics in comparison with the previous optimal QCA-based FC. The related waveforms used in verification of the BLD-generated FC which are obtained by the QCADesigner simulation tool are discussed as a motivating example of the BLD methodology.

Reactive rejuvenation of CMOS logic paths using self-activating voltage domains

Conference Paper
Rizwan A Ashraf, Ahmad Al-Zahrani, Navid Khoshavi, Ramtin Zand, Soheil Salehi, Arman Roohi, Mingjie Lin, Ronald F DeMara
IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, 2015, pp. 2944-2947.
Publication year: 2015

Abstract:

Although the trend of technology scaling is sought to realize higher performance computer systems, it also results in Integrated Circuits (ICs) suffering from increasing Process, Voltage, and Temperature (PVT) variations and adverse aging effects. In most cases, these reliability threats manifest themselves as timing errors on critical speed-paths of the circuit, if a large design guardband is not reserved. In this work, we propose the Reactive Rejuvenation (RR) architectural approach consisting of detection and recovery phases to mitigate circuit from BTI-induced aging. The BTI impact on the critical and near critical paths performance is continuously examined through a lightweight logic circuit which asserts an error signal in the case of any timing violation in those paths. By utilizing timing violation occurrence in the system, the timing-sensitive portion of the circuit is recovered from BTI through switching computations to redundant aging-critical voltage domain. The proposed technique achieves aging mitigation and reduced energy consumption as compared to a baseline circuit. Thus, significant voltage guardbands to meet the desired timing specification are avoided.

Modeling an Improved Modified Type in Metallic Quantum-Dot Fixed Cell for Nano Structure Implementation

Conference Paper
Samira Sayedsalehi, Arman Roohi
23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, 2015, pp. 412-415.
Publication year: 2015

Abstract:

Quantum-dot cellular automata (QCA) is a transistor-less computation approach which encodes binary information via configuration of charges among quantum dots. The fundamental QCA logic primitives are the majority gate and the inverter gate which can be employed to design various QCA circuits. In this study by applying some fixed predefined level of polarization, a detailed modeling of a modified type of fixed metal-dots QCA cell will be explored. An efficient architecture controlled by predefined polarization of fixed cells that position next to the input cells is presented for implementing a desired nano structure. The efficiency of the proposed approach is verified by implementing of several important examples of Boolean function.

Design and verification of new n-bit quantum-dot synchronous counters using majority function-based JK flip-flops

Journal Paper
Shaahin Angizi, Samira Sayedsalehi, Arman Roohi, Nader Bagherzadeh, Keivan Navi
Journal of Circuits, Systems and Computers, Vol. 24, No. 10, 1550153 (2015)
Publication year: 2015

Quantum-dot Cellular Automata (QCA) is an attractive nanoelectronics paradigm which is widely advocated as a possible replacement of conventional CMOS technology. Designing memory cells is a very interesting field of research in QCA domain. In this paper, we are going to propose novel nanotechnology-compatible designs based on the majority gate structures. In the first step, this objective is accomplished by QCA implementation of two well-organized JK flip-flop designs and in the second step; synchronous counters with different sizes are presented as an application. To evaluate functional correctness of the proposed designs and compare with state-of-the-art, QCADesigner tool is employed.

Design and evaluation of an ultra-area-efficient fault-tolerant QCA full adder

Journal Paper
Arman Roohi, Ronald F DeMara, Navid Khoshavi
Microelectronics Journal 46, no. 6 (2015): 531-542.
Publication year: 2015

Abstract

Quantum-dot cellular automata (QCA) has been studied extensively as a promising switching technology at nanoscale level. Despite several potential advantages of QCA-based designs over conventional CMOS logic, some deposition defects are probable to occur in QCA-based systems which have necessitated fault-tolerant structures. Whereas binary adders are among the most frequently-used components in digital systems, this work targets designing a highly-optimized robust full adder in a QCA framework. Results demonstrate the superiority of the proposed full adder in terms of latency, complexity and area with respect to previous full adder designs. Further, the functionality and the defect tolerance of the proposed full adder in the presence of QCA deposition faults are studied. The functionality and correctness of our design is confirmed using high-level synthesis, which is followed by delineating its normal and faulty behavior using a Probabilistic Transfer Matrix (PTM) method. The related waveforms which verify the robustness of the proposed designs are discussed via generation using the QCADesigner simulation tool.

Cost-efficient QCA reversible combinational circuits based on a new reversible gate

Conference Paper
Amir Mokhtar Chabi, Arman Roohi, Ronald F DeMara, Shaahin Angizi, Keivan Navi, Hossein Khademolhosseini
18th CSI International Symposium on Computer Architecture and Digital Systems (CADS), Tehran, 2015, pp. 1-6.
Publication year: 2015

Abstract:

Nanotechnologies, notably Quantum-dot Cellular Automata (QCA), provide an attractive perspective for future computing technologies. In this paper, Quantum-dot Cellular Automata (QCA) is investigated as an implementation method for reversible logic. A novel XOR gate and also a new approach to implement 2:1 multiplexer are presented. Moreover, an efficient and potent universal reversible gate based on the proposed XOR gate is designed. The proposed reversible gate has a superb performance in implementing the QCA standard benchmark combinational functions in terms of area, complexity, power consumption and cost function in comparison to the other reversible gates. The gate achieves the lowest overall cost among the most cost-efficient designs presented so far, with a reduction of 24%.

Reconfigurable Spintronic Fabric using Domain Wall Devices

Article
Ronald F DeMara, Ramtin Zand, Arman Roohi, Soheil Salehi, Steven Pyle
Publication year: 2014

Abstract

GWhile spintronic-based neuromorphic architectures offer analog computation strategies [2], in this proposal we exploit reconfigurability and associative processing using a Logic-In-Memory (LIM) paradigm. LIM is compatible with conventional computing algorithms and integrates logical operations with data storage, making it an ideal choice for parallel SIMD operations to eliminate frequent accesses to memory, which are extreme contributors to energy consumption. Spin-based LIM architectures have the capability to increase computational throughput, reduce the die area, provide instant-on functionality, and reduce static power consumption [3]. Feasibility of a low power spintronic LIM chip has recently been demonstrated in [4] for database applications.

Quantum-dot cellular automata: computing in nanoscale

Journal Paper
Arman Roohi, Hossein Khademolhosseini
Reviews in Theoretical Science, Volume 2, Number 1, March 2014, pp. 46-76(31)
Publication year: 2014

Abstract

Traditional CMOS technology is approaching its end-of-life, so employing novel technologies such as nano-scale ones are being deployed. Quantum-dot cellular automata (QCA) is a new computing method in nanotechnology that has considerable features such as low power, small dimension and high speed switch. In this paper a comprehensive study of QCA is provided in which we discuss the preliminaries and describe the different aspects of QCA. The state of the art in this field is also presented.

A symmetric quantum-dot cellular automata design for 5-input majority gate

Journal Paper
Arman Roohi, Hossein Khademolhosseini, Samira Sayedsalehi, Keivan Navi
J Comput Electron 13, 701–708 (2014).
Publication year: 2014

Abstract

By the inevitable scaling down of the feature size of the MOS transistors which are deeper in nanoranges, the CMOS technology has encountered many critical challenges and problems such as very high leakage currents, reduced gate control, high power density, increased circuit noise sensitivity and very high lithography costs. Quantum-dot cellular automata (QCA) owing to its high device density, extremely low power consumption and very high switching speed could be a feasible competitive alternative. In this paper, a novel 5-input majority gate, an important fundamental building block in QCA circuits, is designed in a symmetric form. In addition to the majority gate, a SR latch, a SR gate and an efficient one bit QCA full adder are implemented employing the new 5-input majority gate. In order to verify the functionality of the proposed designs, QCADesigner tool is used. The results demonstrate that the proposed SR latch and full adder perform equally well or in many cases better than previous circuits.

Parallel-XY: a novel loss-aware non-blocking photonic router for silicon nano-photonic networks-on-chip

Journal Paper
Hesam Shabani, Arman Roohi, Akram Reza, Hossein Khademolhosseini, Midia Reshadi
Journal of Computational and Theoretical Nanoscience, Volume 10, Number 6, June 2013, pp. 1510-1514(5)
Publication year: 2013

Abstract

Photonic technology is now recognized as a promising platform among the existing solutions to the challenges facing interconnection networks in current chips. Recent progresses in silicon nanophotonic technologies have provided an adequate infrastructure to construct photonic communication links with higher bandwidths and power efficiency in comparison with the traditional electrical communications. Router is a core component in photonic networks-on-chip. This paper proposes a 5 × 5 non-blocking photonic router which is designed for XY routing algorithm. The simulation results show that our proposed router achieves significant improvements in terms of insertion loss owing to reduction in number of the waveguide crossings. These improvements leveraging wavelength-division-multiplexing lead to obtain higher bandwidth density.

Designing reconfigurable quantum-dot cellular automata logic circuits

Journal Paper
Keivan Navi, Arman Roohi, Samira Sayedsalehi
Journal of Computational and Theoretical Nanoscience, Volume 10, Number 5, May 2013, pp. 1137-1146(10)
Publication year: 2013

 Abstract

Quantum-dot cellular automata (QCA) is an emerging nanoscale technology and a possible alternative to conventional CMOS technology. QCA has attractive features such as high speed, low power consumption and smaller area occupation. In this paper, a novel configurable QCA circuit is presented. This circuit can be configured to perform various logic functions such as 2-input or 3-input AND, 2-input or 3-input OR, and other possible variations. By using these kinds of circuits, the hardware requirements for a QCA design can be reduced and various functions can be obtained. We then propose an efficient QCA design of a one-bit full adder constructed based on our circuit. Our design will be compared with previous designs. The comparison shows that the proposed adder has better performance in terms of latency, complexity, and size in QCA. In order to verify the functionality of the proposed circuit, it is checked by means of computer simulations using QCADesigner tool.

Design and evaluation of a reconfigurable fault tolerant quantum-dot cellular automata gate

Journal Paper
Arman Roohi, Samira Sayedsalehi, Hossein Khademolhosseini, Keivan Navi
Journal of Computational and Theoretical Nanoscience, Volume 10, Number 2, February 2013, pp. 380-388(9)
Publication year: 2013

 Abstract

Quantum-dot cellular automata (QCA) which encodes binary information by means of charge configuration of quantum-dot cells rather than current, represents a new computing platform at the nanotechnology level. On the plus side, it offers significant improvements over CMOS due to its low power consumption, high speed and small dimension, however on the negative side a large number of manufacturing defects are likely to occur requiring new fault tolerant architectures. The defects might occur in manufacturing or synthesis phases of the design process. In this paper, we present and analyze a novel reconfigurable fault tolerant gate. In addition to its reconfigurability property which makes it capable of covering some commonly used functions, the gate is designed in such a way that it is defect-tolerant against the synthesis phase defects. In order to simulate the functionality of the proposed gate, QCADesigner is used and related waveforms are presented.

Implementation of reversible logic design in nanoelectronics on basis of majority gates

Conference Paper
Arman Roohi, Hossein Khademolhosseini, Samira Sayedsalehi, Keivan Navi
The 16th CSI International Symposium on Computer Architecture and Digital Systems (CADS 2012), Shiraz, Fars, 2012, pp. 1-6.
Publication year: 2012

Abstract:

Due to low power dissipation in computing, reversible logic is an attractive field of research in quantum and optical computing. Since the conventional CMOS technology cannot be used for implementing reversible gates owing to its high power dissipation, employing novel technologies such as nano-scale ones are being deployed. In this paper we utilize Quantum-dot Cellular Automata (QCA) as a candidate technology for implementing reversible logic gates. This paper presents a new realization approach to reversible logic based on majority gates (MGs) and a new reversible gate is proposed as well. The gate will be compared with an existing MG-based structure in terms of delay, complexity and area. The results show that even though our gate requires more cells, it returns the outputs in less clock cycles and hence the design is faster.

 

A novel genetic algorithm based method for efficient QCA circuit design

Book Chapter
Mohsen Kamrani, Hossein Khademolhosseini, Arman Roohi, Poornik Aloustanimirmahalleh
Wyld D., Zizka J., Nagamalai D. (eds) Advances in Computer Science, Engineering & Applications. Advances in Intelligent and Soft Computing, vol 166. Springer, Berlin, Heidelberg
Publication year: 2012

Abstract

In this paper we have proposed an efficient method based on Genetic Algorithms (GAs) to design quantum cellular automata (QCA) circuits with minimum possible number of gates. The basic gates used to design these circuits are 2-input and 3-input NAND gates in addition to inverter gate. Due to use of these two types of NAND gates and their contradictory effects, a new fitness function has been defined. In addition, in this method we have used a type of mutation operator that can significantly help the GA to avoid local optima. The results show that the proposed approach is very efficient in deriving NAND based QCA designs.

A novel architecture for quantum-dot cellular automata multiplexer

Journal Paper
Arman Roohi, Hossein Khademolhosseini, Samira Sayedsalehi, Keivan Navi
International Journal of Computer Science Issues
Publication year: 2011

Abstract

Quantum-dot Cellular Automata (QCA) technology is attractive due to its low power consumption, fast speed and small dimension; therefore it is a promising alternative to CMOS technology. Additionally, multiplexer is a useful part in many important circuits. In this paper we propose a novel design of 2: 1 MUX in QCA. Moreover, a 4: 1 multiplexer, an XOR gate and a latch are proposed based on our 2: 1 multiplexer design. The simulation results have been verified using the QCADesigner.

A new redundant method on representing numbers with moduli set {3n, 3n−1, 3n−2}

Conference Paper
Hossein Khademolhosseini, Arman Roohi
2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), Tamilnadu, 2011, pp. 163-166.
Publication year: 2011

Abstract:

The residue number system (RNS) is a system for representing numbers. It uses the residues of numbers with respect to a moduli set. Due to the possibility of parallel operations and smaller numbers used in this system in comparison with the binary equivalents, calculations are applicable with higher speed. Because of the suitable features of RNS, this system is used in many cases such as DSP devices and filters. Summation is the most widely used operation in this system, by use of which, conversions and other operations may be done. The method that has been offered in this paper is a new definition for numbers representation, using {3 n -2, 3 n -1, 3 n } set. We use redundancy to improve the residues representation. This method makes conversions, summation and consequently subtraction and multiplication faster and makes the circuits of them much easier.

A different design approach for high performance in nanostructure using Quantum Cellular Automata

Journal Paper
S Sayedsalehi, A Roohi, K Navi
Canadian J. on Electrical and Electronics Eng 2 (2011): 526-530.
Publication year: 2011

Abstract

Quantum-dot cellular automaton (QCA) is an emerging technology that can be considered as a possible alternative for semiconductor transistor technology. In this paper, an efficient approach to design circuits is explored in nanoscale. This field is an attractive and exciting blend of computer architecture. The majority gate and the inverter gate as fundamental building block are used in the conventional method for QCA circuit implementation. In this proposed approach, an arbitrary circuit is implemented based on suitable arrangement of QCA cells. The functionality of the proposed circuit is verified using the kink energy computations. Besides, for evaluating the performance of the proposed approach; several QCA circuits have been suggested. Finally, these proposed QCA circuits are compared with the other classical and conventional circuits in term of area, cell counts and latency

A combinational logic optimization for majority gate-based nanoelectronic circuits based on GA

Conference Paper
A Roohi, M Kamrani, S Sayedsalehi, K Navi
2011 International Semiconductor Device Research Symposium (ISDRS), College Park, MD, 2011, pp. 1-2.
Publication year: 2011

Abstract

Quantum dots cellular automata is a new computing method in the nanotechnology that has considerable features such as low power, small dimension and high speed switch. A QCA device stores logic based on the position of individual electrons. The fundamental logic elements in QCA are the majority (Fig.1 (a)) and inverter gates (Fig.1 (b)) that operate based on the Coulomb repulsion between electrons [1].