Journal Paper

PISA: A Non-Volatile Processing-In-Sensor Accelerator for Imaging Systems

Journal Paper

Sh. Angizi, S. Tabrizchi, D. Pan, and A. Roohi

IEEE Transactions on Emerging Topics in Computing (TETC)

Publication year: 2023

This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of ∼ 1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by ∼ 84% compared to a baseline.

Design and Evaluation of a Near-Sensor Magneto-Electric FET-based Event Detector

Journal Paper

Mehrdad Morsali; Sepehr Tabrizchi; Andrew Marshall; Arman Roohi

IEEE Transactions on Electron Devices (TED)

Publication year: 2023

As a recently developed post-CMOS FET, magneto-electric FETs (MEFETs) offer high-speed and low-power design characteristics for logic and memory applications. In this article, a near-sensor processing (NSP) platform leveraging the MEFETs is presented that enables event detection for edge vision sensors at a low cost by eliminating the need for power-hungry analog-to-digital circuits (ADCs). Besides, an efficient background comparison method is presented with adjustable precision that offers the output quality efficiency tradeoff, depending on the application’s needs. Our device-to-architecture evaluations show that the proposed hardware–software codesign reduces the energy consumption and execution time on average by a factor of ∼ 15 × and ∼ 2.4 × compared to the SOT-MRAM counterpart running employing the same method.

AppCiP: Energy-Efficient Approximate Convolution-in-Pixel Scheme for Neural Network Acceleration

Journal Paper

S. Tabrizchi, A. Nezhadi, Sh. Angizi, and A. Roohi

IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS)

Publication year: 2023

Nowadays, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. However, capturing data and analyzing it via a backend/cloud processor are energy-intensive and long-latency, resulting in a memory bottleneck and low-speed feature extraction at the edge. This paper presents AppCiP architecture as a sensing and computing integration design to efficiently enable Artificial Intelligence (AI) on resource-limited sensing devices. AppCiP provides a number of unique capabilities, including instant and reconfigurable RGB to grayscale conversion, highly parallel analog convolution-in-pixel, and realizing low-precision quinary weight neural networks. These features significantly mitigate the overhead of analog-to-digital converters and analog buffers, leading to a considerable reduction in power consumption and area overhead. Our circuit-to-application co-simulation results demonstrate that AppCiP achieves ~3 orders of magnitude higher efficiency on power consumption compared with the fastest existing designs considering different CNN workloads. It reaches a frame rate of 3000 and an efficiency of ~4.12 TOp/s/W. The performance accuracy of the AppCiP architecture on different datasets such as SVHN, Pest, CIFAR-10, MHIST, and CBL Face detection is evaluated and compared with the state-of-the-art design. The obtained results exhibit the best results among other processing in/near pixel architectures, while AppCip only degrades the accuracy by less than 1% on average compared to the floating-point baseline.

A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks

Journal Paper

Sh. Angizi, M. Morsali, S. Tabrizchi, and A. Roohi

IEEE Transactions on Emerging Topics in Computing (TETC)

Publication year: 2023

In this work, a high-speed and energy-efficient comparator-based N ear- S ensor L ocal B inary P attern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN datasets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and an energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2× and execution time by a factor of 4× compared to the best recent LBP-based networks.

Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks,

Journal Paper

S. Tabrizchi, Sh. Angizi, and A. Roohi

Journal of Low Power Electronics and Applications

Publication year: 2022

Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.

MR-PIPA: An Integrated Multi-level RRAM (HfOx) based Processing-In-Pixel Accelerator

Journal Paper

M. Abedin, A. Roohi, M. Liehr, N. Cady, and Sh. Angizi

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Publication year: 2022

This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks (NNs) leveraging a novel compute-pixel with nonvolatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs. Our proposed design, namely an integrated multilevel RRAM (HfOx)-based processing-in-pixel accelerator (MR-PIPA), achieves a frame rate of 1000 and efficiency of ~1.89 TOp/s/W, while it substantially reduces data conversion and transmission energy by ~84% compared to a baseline at the cost of minor accuracy degradation.

LT-PIM: An LUT-based Processing-in-DRAM Architecture with RowHammer Self-Tracking

Journal Paper

R. Zhou, S. Tabrizchi, A. Roohi, and Sh. Angizi

IEEE Computer Architecture Letters (CAL)

Publication year: 2022

Herein, we propose LT-PIM as a L ookup T able-based P rocessing- I n- M emory architecture leveraging the high density of DRAM to enable massively parallel and flexible computation. LT-PIM supports lookup table queries to execute complex arithmetic operations, such as multiplication via only memory read operation. In addition, LT-PIM enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands in the same bit-line. With this, LT-PIM enables a complete and inexpensive in-DRAM RowHammer (RH) self-tracking approach. Our results demonstrate that LT-PIM achieves ∼ 70% higher energy efficiency than the fastest charge-sharing-based designs and ∼ 32% over the best LUT-based designs. As for the RH self-tracking, with a worst-case slowdown of ∼ 0.2%, LT-PIM archives up to ∼ 80% energy-saving over the best designs.

HARDeNN: Hardwareassisted Attack-resilient Deep Neural Network Architectures

Journal Paper

N. Khoshavi, M. Maghsoudloo, A. Roohi, S. Sargolzaei, and B. Yu

Microprocessors and Microsystems

Publication year: 2022

We propose HARDeNN, a low-overhead end-to-end inference accelerator methodology to armor the underlying pre-trained neural network architecture against black-box non-input adversarial attacks. In order to find the most vulnerable neural network architectures parameters, a hardware-assisted fault injection tool and a statistical stress model have been proposed to synergy uniform fault assessment across layers and targeted in-layer fault assessment to realize a holistic, rigorous fault evaluation in NN topologies susceptible to non-input adversarial black-box attacks. The key observation from the assessment shows that the weights and activation functions are the most vulnerable neural network parameters that are susceptible to both single-bit and multiple-bit flip attacks. Concerning the aforementioned parameters, a multi-objective design space exploration is conducted to find a superior design under different resource constraints. The error-resiliency magnitude offered by HARDeNN can be adjusted based on the given boundaries. The experimental results show that HARDeNN methodology enhances the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 multi-bit upsets that target weight and activation layers,

Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks

Journal Paper

M. Alali, A. Roohi, Sh. Angizi, and J. S. Deogun

Micromachines

Publication year: 2022

Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGB-colored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., <8:8>. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems.

PARC: A Novel Design Methodology for Power Analysis Resilient Circuits using Spintronics

Journal Paper

Arman Roohi, , and Ronald F. DeMara.

IEEE Transactions on Nanotechnology, vol. 18, pp. 885-889, 2019

Publication year: 2019

Abstract

A prevalent class of side-channel attacks relies on Differential Power Analysis (DPA) methods, which monitor power traces during cryptographic processing to discover secret keys. Reconfigurability via Polymorphic Gate Modules (PGMs) offers an approach to obscure DPA information by dynamically rearranging the operation of constituent sub-circuits, albeit at the cost of increased area and power consumption. Thus, we develop the Power Analysis-Resilient Circuit (PARC) design methodology to instantiate the use of spin-based devices as an extension to conventional Register Transfer Language (RTL) specifications. PARC replaces a specific portion of the circuit to maximize a new Effectiveness of Design (EoD) metric, which quantifies DPA impact versus its performance overhead. To validate functionality, PARC is applied to various benchmark circuits including ISCAS-89, MCNC, and ITC-99. EoD results indicate that PARC significantly increases the number of power traces that an adversary would need to use in order to extract power information but incurs a low cost to the functional circuit itself.

ApGAN: Approximate GAN for Robust Low-Energy Learning from Imprecise Components

Journal Paper

Arman Roohi, Shadi Sheikhfaal, Shaahin Angizi, Deliang Fan, and Ronald F. DeMara

IEEE Transactions on Computers, vol. 69, no. 3, pp. 349-360, 1 March 2020.

Publication year: 2019

Abstract

A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware implementation perspectives. First, inspired by the binary pattern feature extraction method along with binarized representation entropy, the existing Deep Convolutional GAN (DCGAN) algorithm is modified by binarizing the weights for a specific portion of layers within both the generator and discriminator models. Further reduction in storage and computation resources is achieved by leveraging a novel hardware-configurable in-memory addition scheme, which can operate in the accurate and approximate modes. Finally, a memristor-based processing-in-memory accelerator for ApGAN is developed. The performance of the ApGAN accelerator on different data-sets such as Fashion-MNIST, CIFAR-10, STL-10, and celeb-A is evaluated and compared with recent GAN accelerator designs. With almost the same Inception Score (IS) to the baseline GAN, the ApGAN accelerator can increase the energy-efficiency by ~28.6× achieving 35-fold speedup compared with a baseline GPU platform. Additionally, it shows 2.5× and 5.8× higher energy-efficiency and speedup over CMOS-ASIC accelerator subject to an 11 percent reduction in IS.

Keywords

Generative adversarial network
in-memory processing platform
neural network acceleration
hardware mapping

NV-Clustering: Normally-Off Computing Using Non-Volatile Datapaths

Journal Paper

Arman Roohi, Ronald F DeMara

IEEE Transactions on Computers, vol. 67, no. 7, pp. 949-959, 1 July 2018

Publication year: 2018

Abstract

With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node.

Voltage-based concatenatable full adder using spin hall effect switching

Journal Paper

Arman Roohi, Ramtin Zand, Deliang Fan, Ronald F DeMara

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 12, pp. 2134-2138, Dec. 2017.

Publication year: 2017

Abstract

Magnetic tunnel junction (MTJ)-based devices have been studied extensively as a promising candidate to implement hybrid energy-efficient computing circuits due to their nonvolatility, high integration density, and CMOS compatibility. In this paper, MTJs are leveraged to develop a novel full adder (FA) based on 3- and 5-input majority gates. Spin Hall effect (SHE) is utilized for changing the MTJ states resulting in low-energy switching behavior. SHE-MTJ devices are modeled in Verilog-A using precise physical equations. SPICE circuit simulator is used to validate the functionality of 1-bit SHE-based FA. The simulation results show 76% and 32% improvement over previous voltage-mode MTJ-based FA in terms of energy consumption and device count, respectively. The concatanatability of our proposed 1-bit SHE-FA is investigated through developing a 4-bit SHE-FA. Finally, delay and power consumption of an n-bit SHE-based adder has been formulated to provide a basis for developing an energy efficient SHE-based n-bit arithmetic logic unit.

Towards ultra-efficient QCA reversible circuits

Journal Paper

Amir Mokhtar Chabi, Arman Roohi, Hossein Khademolhosseini, Shadi Sheikhfaal, Shaahin Angizi, Keivan Navi, Ronald F DeMara

Publication year: 2017

Abstract

Nanotechnologies, remarkably Quantum-dot Cellular Automata (QCA), offer an attractive perspective for future computing technologies. In this paper, QCA is investigated as an implementation method for reversible logic. A novel XOR gate and also a new approach to implement 2:1 multiplexer are presented. Moreover, an efficient and potent universal reversible gate based on the proposed XOR gate is designed. The proposed reversible gate has a superb performance in implementing the QCA standard benchmark combinational functions in terms of area, complexity, power consumption, and cost function in comparison to the other reversible gates. The gate achieves the lowest overall cost among the most cost-efficient designs presented so far, with a reduction of 24%. In order to employ the merits of reversibility, the proposed reversible gate is leveraged to design the four common latches (D latch, T latch, JK latch, and SR latch). Specialized structures of the proposed circuits could be used as building blocks in designing sequential and combinational circuits in QCA architectures.

Heterogeneous energy-sparing reconfigurable logic: spin-based storage and CNFET-based multiplexing

Journal Paper

Mohan Krishna Gopi Krishna, Arman Roohi, Ramtin Zand, Ronald F DeMara

IET Circuits, Devices & Systems, vol. 11, no. 3, pp. 274-279, 5 2017.

Publication year: 2017

Abstract

Field programmable gate array (FPGA) attributes of logic configurability, bitstream storage, and dynamic signal routing can be realised by leveraging the complementary benefits of emerging devices with complementary metal oxide semiconductor (CMOS)-based devices. A novel carbon/magnet lookup table (CM-LUT) is developed and evaluated by trading off a range of mixed heterogeneous technologies to balance energy, delay, and reliability attributes. Herein, magnetic spintronic devices are employed in the configuration memory to contribute non-volatility and high scalability. Meanwhile, carbon nanotube field-effect transistors (CNFETs) provide desirable conductivity, low delay, and low power consumption. The proposed CM-LUT offers ultra-low power and high-speed operation while maintaining high endurance re-programmability with increased radiation-induced soft-error immunity. The proposed four-input one-output CM-LUT utilises 41 CNFETs and 20 magnetic tunnel junctions for read operations and 35 CNFET to perform write operations. Results indicate that CM-LUT achieves an average four-fold energy reduction, eight-fold faster circuit operation and 9.3% reconfiguration power delay product improvement in comparison with spin-based look-up tables. Finally, additional hybrid technology designs are considered to balance performance with the demands of energy consumption for near-threshold operation.

Energy-efficient and process-variation-resilient write circuit schemes for spin hall effect mram device

Journal Paper

Ramtin Zand, Arman Roohi, Ronald F DeMara

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 9, pp. 2394-2401, Sept. 2017

Publication year: 2017

Abstract

In this paper, various energy-efficient write schemes are proposed for switching operation of spin hall effect (SHE)-based magnetic tunnel junctions (MTJs). A transmission gate (TG)-based write scheme is proposed, which provides a symmetric and energy-efficient switching behavior. We have modeled an SHE-MTJ using precise physics equations, and then leveraged the model in SPICE circuit simulator to verify the functionality of our designs. Simulation results show the TG-based write scheme advantages in terms of device count and switching energy. In particular, it can operate at 12% higher clock frequency while realizing at least 13% reduction in energy consumption compared to the most energy-efficient write circuits. We have analyzed the performance of the implemented write circuits in presence of process variation (PV) in the transistors’ threshold voltage and SHE-MTJ dimensions. Results show that the proposed TG-based design is the second most PV-resilient write circuit scheme for SHE-MTJs among the implemented designs. Finally, we have proposed the 1TG-1T-1R SHE-based magnetic random access memory (MRAM) bit cell based on the TG-based write circuit. Comparisons with several of the most energy-efficient and variation-resilient SHE-MRAM cells indicate that 1TG-1T-1R delivers reduced energy consumption with 43.9% and 10.7% energy-delay product improvement, while incurring low area overhead.

Scalable adaptive spintronic reconfigurable logic using area-matched MTJ design

Journal Paper

Ramtin Zand, Arman Roohi, Soheil Salehi, Ronald F DeMara

IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 7, pp. 678-682, July 2016.

Publication year: 2016

Abstract:

Spin-transfer torque (STT) random access memory has been researched as a promising alternative for static random access memory in reconfigurable fabrics, particularly in lookup tables (LUTs), due to its nonvolatility, low standby and static power, and high integration density features. In this brief, we leverage physical characteristics of magnetic tunnel junctions (MTJs) to design a unique reference MTJ which has a calibrated resistance matching the STT-based LUT (STT-LUT) circuit requirements to provide optimal reading operation. Results obtained show 42% and 70% power-delay product (PDP) improvement over previous MTJ-based LUT designs. Moreover, a four-input adaptive STT-based LUT (A-LUT) is proposed based on the developed STT-LUT, which is configurable to function in seven independent modes. An n-input A-LUT exhibits PDP which can be a fraction of n-input STT-LUT PDP, when performing two-input to (n-1)-input Boolean logic functions.

Loss-aware switch design and non-blocking detection algorithm for intra-chip scale photonic interconnection networks

Journal Paper

Hesam Shabani, Arman Roohi, Akram Reza, Midia Reshadi, Nader Bagherzadeh, Ronald F DeMara

IEEE Transactions on Computers, vol. 65, no. 6, pp. 1789-1801, 1 June 2016.

Publication year: 2016

Abstract:

As the number of on-chip processor cores increases, power-efficient solutions are sought for data communication between cores. The Helix-h non-blocking photonic switch is developed to improve physical-layer and network performance parameters for a wide range of silicon nano-photonic multicore interconnection topologies. Traffic benchmarks and practical case studies using a cycle-accurate simulation environment indicate significantly reduced insertion loss providing improved bandwidth density and scalability to manycore plurality. Improvements in system performance parameters are quantified for network bandwidth, transmission efficiency, and latency in popular photonic internconnection topologies, in comparison to previous switch designs. For instance, utilizing the Helix-h switch in a mesh topology, the bandwidth is increased by 112 percent compared to the previously highest performing switch design. Execution time and energy efficiency are improved by up to 92 and 99 percent, respectively, for representative multicore applications. Finally, the technique is generalized to a novel graph-theoretic method for articulating blocking conditions in photonic switches.

Energy-efficient nonvolatile reconfigurable logic using spin hall effect-based lookup tables

Journal Paper

Ramtin Zand, Arman Roohi, Deliang Fan, Ronald F DeMara

IEEE Transactions on Nanotechnology, vol. 16, no. 1, pp. 32-43, Jan. 2017.

Publication year: 2016

Abstract:

In this paper, we leverage magnetic tunnel junction (MTJ) devices to design an energy-efficient nonvolatile lookup table (LUT), which utilizes a spin Hall effect (SHE) assisted switching approach for MTJ storage cells. SHE-MTJ characteristics are modeled in Verilog-A based on precise physical equations. Functionality of the proposed SHE-MTJ-based LUT is validated using SPICE simulation. Our proposed SHE-MTJ-based LUT (SHE-LUT) is compared with the most energy-efficient MTJ-based LUT circuits. The obtained results show more than 6%, 37%, and 67% improvement over three previous MTJ-based designs in term of read energy consumption. Moreover, the reconfiguration delay and energy of the proposed design is compared with that of the MTJ-based LUTs which utilize the spin transfer torque (STT) switching approach for reconfiguration. The results exhibit that SHE-LUT can operate at 78% higher clock frequency while achieving at least 21% improvement in terms of reconfiguration energy consumption. The operation-specific clocking mechanisms for managing the SHE-LUT operations are introduced along with detailed analyses concerning tradeoffs. Results are extended to design a 6-input fracturable LUT using SHE-MTJs.

A tunable majority gate-based full adder using current-induced domain wall nanomagnets

Journal Paper

Arman Roohi, Ramtin Zand, Ronald F DeMara

IEEE Transactions on Magnetics, vol. 52, no. 8, pp. 1-7, Aug. 2016, Art no. 3401507.

Publication year: 2016

Abstract:

Domain wall nanomagnet (DWNM)-based devices have been extensively studied as a promising alternative to the conventional CMOS technology in both the memory and logic implementations due to their non-volatility, near-zero standby power, and high integration density characteristics. In this paper, we leverage a physics-based model of a DWNM device to design a highly scalable current-mode majority gate to achieve a novel one bit full-adder (FA) circuit. The modeled DWNM specifications are calibrated with the experimentally measured data. The functionality of the proposed DWNM-based FA (DWNM-FA) is verified using a SPICE circuit simulator. The detailed analysis and the calculations have been performed to realize the proposed DWNM-FA delay and power consumption corresponding to the various induced input currents at different operating temperatures. The power-delay product of DWNM-FA is examined to tune the operation within the optimum induced input current region to obtain desired power-delay requirements over a range of 200 μA to 1 mA at temperatures from 298 to 378 K. Finally, the comparison results exhibit 52% and 49% area improvement as well as 41% and 31% improvement in device count complexity over CMOS-based and magnetic tunnel junction-based FA designs, respectively.

A parity-preserving reversible QCA gate with self-checking cascadable resiliency

Journal Paper

Arman Roohi, Ramtin Zand, Shaahin Angizi, Ronald F DeMara

IEEE Transactions on Emerging Topics in Computing, vol. 6, no. 4, pp. 450-459, 1 Oct.-Dec. 2018.

Publication year: 2016

Abstract:

A novel Parity-Preserving Reversible Gate (PPRG) is developed using Quantum-dot Cellular Automata (QCA) technology. PPRG enables rich fault-tolerance features, as well as reversibility attributes sought for energy-neutral computation. Performance of the PPRG design is validated through implementing thirteen standard combinational Boolean functions of three variables, which demonstrate from 10.7 to 41.9 percent improvement over the previous gate counts obtained with other reversible and/or preserving gate designs. Switching and leakage energy dissipation as low as 0.141 eV and 0.294 eV, for 1.5 Ek energy level are achieved using PPRG, respectively. The utility of PPRG is leveraged to design a one-bit full adder with 171 cells occupying only 0.19 mm ² area. Finally, fault detection and isolation properties are formalized into a concise procedure. PPRG-based circuits capable of self-configuring active recovery for selected three-variable standard functions are realized using a memoryless method irrespective of garbage outputs.

Wire crossing constrained QCA circuit design using bilayer logic decomposition

Journal Paper

A Roohi, H Thapliyal, RF DeMara

Electronics Letters, vol. 51, no. 21, pp. 1677-1679, 8 10 2015.

Publication year: 2015

Abstract:

Quantum-dot cellular automata (QCA) seek potential benefits over CMOS devices such as low-power consumption, small dimensions, and high-speed operation. Two prominent QCA concerns of wire crossing complexity and circuit robustness are addressed by developing a three-step bilayer logic decomposition (BLD) methodology to design QCA-based logic circuits. The partitioning of QCA computing operations into logic layers realises considerable improvements in complexity, area, and modularity metrics. Moreover, since larger circuits are divided into two increasingly disjoint sub-planes, verification of the functionality of the design becomes compartmentalised. Design capability of the proposed approach is illustrated and analysed by implementing an area-efficient full comparator (FC) based on a novel logic realisation. The resulting 1-bit FC achieves 32% improvement in complexity metrics in comparison with the previous optimal QCA-based FC. The related waveforms used in verification of the BLD-generated FC which are obtained by the QCADesigner simulation tool are discussed as a motivating example of the BLD methodology.

Design and verification of new n-bit quantum-dot synchronous counters using majority function-based JK flip-flops

Journal Paper

Shaahin Angizi, Samira Sayedsalehi, Arman Roohi, Nader Bagherzadeh, Keivan Navi

Journal of Circuits, Systems and Computers, Vol. 24, No. 10, 1550153 (2015)

Publication year: 2015

Abstract

Quantum-dot Cellular Automata (QCA) is an attractive nanoelectronics paradigm which is widely advocated as a possible replacement of conventional CMOS technology. Designing memory cells is a very interesting field of research in QCA domain. In this paper, we are going to propose novel nanotechnology-compatible designs based on the majority gate structures. In the first step, this objective is accomplished by QCA implementation of two well-organized JK flip-flop designs and in the second step; synchronous counters with different sizes are presented as an application. To evaluate functional correctness of the proposed designs and compare with state-of-the-art, QCADesigner tool is employed.

Design and evaluation of an ultra-area-efficient fault-tolerant QCA full adder

Journal Paper

Arman Roohi, Ronald F DeMara, Navid Khoshavi

Microelectronics Journal 46, no. 6 (2015): 531-542.

Publication year: 2015

Abstract

Quantum-dot cellular automata (QCA) has been studied extensively as a promising switching technology at nanoscale level. Despite several potential advantages of QCA-based designs over conventional CMOS logic, some deposition defects are probable to occur in QCA-based systems which have necessitated fault-tolerant structures. Whereas binary adders are among the most frequently-used components in digital systems, this work targets designing a highly-optimized robust full adder in a QCA framework. Results demonstrate the superiority of the proposed full adder in terms of latency, complexity and area with respect to previous full adder designs. Further, the functionality and the defect tolerance of the proposed full adder in the presence of QCA deposition faults are studied. The functionality and correctness of our design is confirmed using high-level synthesis, which is followed by delineating its normal and faulty behavior using a Probabilistic Transfer Matrix (PTM) method. The related waveforms which verify the robustness of the proposed designs are discussed via generation using the QCADesigner simulation tool.

Quantum-dot cellular automata: computing in nanoscale

Journal Paper

Arman Roohi, Hossein Khademolhosseini

Reviews in Theoretical Science, Volume 2, Number 1, March 2014, pp. 46-76(31)

Publication year: 2014

Abstract

Traditional CMOS technology is approaching its end-of-life, so employing novel technologies such as nano-scale ones are being deployed. Quantum-dot cellular automata (QCA) is a new computing method in nanotechnology that has considerable features such as low power, small dimension and high speed switch. In this paper a comprehensive study of QCA is provided in which we discuss the preliminaries and describe the different aspects of QCA. The state of the art in this field is also presented.

A symmetric quantum-dot cellular automata design for 5-input majority gate

Journal Paper

Arman Roohi, Hossein Khademolhosseini, Samira Sayedsalehi, Keivan Navi

J Comput Electron 13, 701–708 (2014).

Publication year: 2014

Abstract

By the inevitable scaling down of the feature size of the MOS transistors which are deeper in nanoranges, the CMOS technology has encountered many critical challenges and problems such as very high leakage currents, reduced gate control, high power density, increased circuit noise sensitivity and very high lithography costs. Quantum-dot cellular automata (QCA) owing to its high device density, extremely low power consumption and very high switching speed could be a feasible competitive alternative. In this paper, a novel 5-input majority gate, an important fundamental building block in QCA circuits, is designed in a symmetric form. In addition to the majority gate, a SR latch, a SR gate and an efficient one bit QCA full adder are implemented employing the new 5-input majority gate. In order to verify the functionality of the proposed designs, QCADesigner tool is used. The results demonstrate that the proposed SR latch and full adder perform equally well or in many cases better than previous circuits.

Parallel-XY: a novel loss-aware non-blocking photonic router for silicon nano-photonic networks-on-chip

Journal Paper

Hesam Shabani, Arman Roohi, Akram Reza, Hossein Khademolhosseini, Midia Reshadi

Journal of Computational and Theoretical Nanoscience, Volume 10, Number 6, June 2013, pp. 1510-1514(5)

Publication year: 2013

Abstract

Photonic technology is now recognized as a promising platform among the existing solutions to the challenges facing interconnection networks in current chips. Recent progresses in silicon nanophotonic technologies have provided an adequate infrastructure to construct photonic communication links with higher bandwidths and power efficiency in comparison with the traditional electrical communications. Router is a core component in photonic networks-on-chip. This paper proposes a 5 × 5 non-blocking photonic router which is designed for XY routing algorithm. The simulation results show that our proposed router achieves significant improvements in terms of insertion loss owing to reduction in number of the waveguide crossings. These improvements leveraging wavelength-division-multiplexing lead to obtain higher bandwidth density.

Designing reconfigurable quantum-dot cellular automata logic circuits

Journal Paper

Keivan Navi, Arman Roohi, Samira Sayedsalehi

Journal of Computational and Theoretical Nanoscience, Volume 10, Number 5, May 2013, pp. 1137-1146(10)

Publication year: 2013

Abstract

Quantum-dot cellular automata (QCA) is an emerging nanoscale technology and a possible alternative to conventional CMOS technology. QCA has attractive features such as high speed, low power consumption and smaller area occupation. In this paper, a novel configurable QCA circuit is presented. This circuit can be configured to perform various logic functions such as 2-input or 3-input AND, 2-input or 3-input OR, and other possible variations. By using these kinds of circuits, the hardware requirements for a QCA design can be reduced and various functions can be obtained. We then propose an efficient QCA design of a one-bit full adder constructed based on our circuit. Our design will be compared with previous designs. The comparison shows that the proposed adder has better performance in terms of latency, complexity, and size in QCA. In order to verify the functionality of the proposed circuit, it is checked by means of computer simulations using QCADesigner tool.

Design and evaluation of a reconfigurable fault tolerant quantum-dot cellular automata gate

Journal Paper

Arman Roohi, Samira Sayedsalehi, Hossein Khademolhosseini, Keivan Navi

Journal of Computational and Theoretical Nanoscience, Volume 10, Number 2, February 2013, pp. 380-388(9)

Publication year: 2013

Abstract

Quantum-dot cellular automata (QCA) which encodes binary information by means of charge configuration of quantum-dot cells rather than current, represents a new computing platform at the nanotechnology level. On the plus side, it offers significant improvements over CMOS due to its low power consumption, high speed and small dimension, however on the negative side a large number of manufacturing defects are likely to occur requiring new fault tolerant architectures. The defects might occur in manufacturing or synthesis phases of the design process. In this paper, we present and analyze a novel reconfigurable fault tolerant gate. In addition to its reconfigurability property which makes it capable of covering some commonly used functions, the gate is designed in such a way that it is defect-tolerant against the synthesis phase defects. In order to simulate the functionality of the proposed gate, QCADesigner is used and related waveforms are presented.

A novel architecture for quantum-dot cellular automata multiplexer

Journal Paper

Arman Roohi, Hossein Khademolhosseini, Samira Sayedsalehi, Keivan Navi

International Journal of Computer Science Issues

Publication year: 2011

Abstract

Quantum-dot Cellular Automata (QCA) technology is attractive due to its low power consumption, fast speed and small dimension; therefore it is a promising alternative to CMOS technology. Additionally, multiplexer is a useful part in many important circuits. In this paper we propose a novel design of 2: 1 MUX in QCA. Moreover, a 4: 1 multiplexer, an XOR gate and a latch are proposed based on our 2: 1 multiplexer design. The simulation results have been verified using the QCADesigner.

A different design approach for high performance in nanostructure using Quantum Cellular Automata

Journal Paper

S Sayedsalehi, A Roohi, K Navi

Canadian J. on Electrical and Electronics Eng 2 (2011): 526-530.

Publication year: 2011

Abstract

Quantum-dot cellular automaton (QCA) is an emerging technology that can be considered as a possible alternative for semiconductor transistor technology. In this paper, an efficient approach to design circuits is explored in nanoscale. This field is an attractive and exciting blend of computer architecture. The majority gate and the inverter gate as fundamental building block are used in the conventional method for QCA circuit implementation. In this proposed approach, an arbitrary circuit is implemented based on suitable arrangement of QCA cells. The functionality of the proposed circuit is verified using the kink energy computations. Besides, for evaluating the performance of the proposed approach; several QCA circuits have been suggested. Finally, these proposed QCA circuits are compared with the other classical and conventional circuits in term of area, cell counts and latency

Arman Roohi

Publication Types:

PISA: A Non-Volatile Processing-In-Sensor Accelerator for Imaging Systems

Design and Evaluation of a Near-Sensor Magneto-Electric FET-based Event Detector

AppCiP: Energy-Efficient Approximate Convolution-in-Pixel Scheme for Neural Network Acceleration

A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks

Ocelli: Efficient Processing-in-Pixel Array Enabling Edge Inference of Ternary Neural Networks,

MR-PIPA: An Integrated Multi-level RRAM (HfOx) based Processing-In-Pixel Accelerator

LT-PIM: An LUT-based Processing-in-DRAM Architecture with RowHammer Self-Tracking

HARDeNN: Hardwareassisted Attack-resilient Deep Neural Network Architectures

Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks

PARC: A Novel Design Methodology for Power Analysis Resilient Circuits using Spintronics

Abstract

ApGAN: Approximate GAN for Robust Low-Energy Learning from Imprecise Components

Abstract

Keywords

NV-Clustering: Normally-Off Computing Using Non-Volatile Datapaths

Abstract

Voltage-based concatenatable full adder using spin hall effect switching

Abstract

Towards ultra-efficient QCA reversible circuits

Abstract

Heterogeneous energy-sparing reconfigurable logic: spin-based storage and CNFET-based multiplexing

Abstract

Energy-efficient and process-variation-resilient write circuit schemes for spin hall effect mram device

Abstract

Scalable adaptive spintronic reconfigurable logic using area-matched MTJ design

Loss-aware switch design and non-blocking detection algorithm for intra-chip scale photonic interconnection networks

Abstract:

Energy-efficient nonvolatile reconfigurable logic using spin hall effect-based lookup tables

A tunable majority gate-based full adder using current-induced domain wall nanomagnets

Abstract:

A parity-preserving reversible QCA gate with self-checking cascadable resiliency

Wire crossing constrained QCA circuit design using bilayer logic decomposition

Abstract:

Design and verification of new n-bit quantum-dot synchronous counters using majority function-based JK flip-flops

Abstract

Design and evaluation of an ultra-area-efficient fault-tolerant QCA full adder

Abstract

Quantum-dot cellular automata: computing in nanoscale

Abstract

A symmetric quantum-dot cellular automata design for 5-input majority gate

Abstract

Parallel-XY: a novel loss-aware non-blocking photonic router for silicon nano-photonic networks-on-chip

Abstract

Designing reconfigurable quantum-dot cellular automata logic circuits

Abstract

Design and evaluation of a reconfigurable fault tolerant quantum-dot cellular automata gate

Abstract

A novel architecture for quantum-dot cellular automata multiplexer

Abstract

A different design approach for high performance in nanostructure using Quantum Cellular Automata

Abstract