In this work, we leverage the uni-polar switching behavior of Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) to develop an efficient digital Computing-in-Memory (CiM) platform named XOR-CiM. XOR-CiM converts typical MRAM sub-arrays to massively parallel computational cores with ultra-high bandwidth, greatly reducing energy consumption dealing with convolutional layers and accelerating X(N)OR-intensive Binary Neural Networks (BNNs) inference. With a similar inference accuracy to digital CiMs, XOR-CiM achieves ∼4.5× and 1.8× higher energy-efficiency and speed-up compared to the recent MRAM-based CiM platforms.
Recently, Intelligent IoT (IIoT), including various sensors, has gained significant attention due to its capability of sensing, deciding, and acting by leveraging artificial neural networks (ANN). Nevertheless, to achieve acceptable accuracy and high performance in visual systems, a power-delay-efficient architecture is required. In this paper, we propose an ultra-low-power processing in-sensor architecture, namely SenTer, realizing low-precision ternary multi-layer perceptron networks, which can operate in detection and classification modes. Moreover, SenTer supports two activation functions based on user needs and the desired accuracy-energy trade-off. SenTer is capable of performing all the required computations for the MLP’s first layer in the analog domain and then submitting its results to a co-processor. Therefore, SenTer significantly reduces the overhead of analog buffers, data conversion, and transmission power consumption by using only one ADC. Additionally, our simulation results demonstrate acceptable accuracy on various datasets compared to the full precision models.
This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of ∼ 1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by ∼ 84% compared to a baseline.
In this work, we propose a Parallel Processing-In-DRAM architecture named P-PIM leveraging the high density of DRAM to enable fast and flexible computation. P-PIM enables bulk bit-wise in-DRAM logic between operands in the same bit-line by elevating the analog operation of the memory sub-array based on a novel dual-row activation mechanism. With this, P-PIM can opportunistically perform a complete and inexpensive in-DRAM RowHammer (RH) self-tracking and mitigation technique to protect the memory unit against such a challenging security vulnerability. Our results show that P-PIM achieves ~72% higher energy efficiency than the fastest charge-sharing-based designs. As for the RH protection, with a worst-case slowdown of ~0.8%, P-PIM archives up to 71% energy-saving over the SRAM/CAM-based frameworks and about 90% saving over DRAM-based frameworks.
Digital technologies have made it possible to deploy visual sensor nodes capable of detecting motion events in the coverage area cost-effectively. However, background subtraction, as a widely used approach, remains an intractable task due to its inability to achieve competitive accuracy and reduced computation cost simultaneously. In this paper, an effective background subtraction approach, namely NeSe, for tiny energy-harvested sensors is proposed leveraging non-volatile memory (NVM). Using the developed software/hardware method, the accuracy and efficiency of event detection can be adjusted at runtime by changing the precision depending on the application’s needs. Due to the near-sensor implementation of background subtraction and NVM usage, the proposed design reduces the data movement overhead while ensuring intermittent resiliency. The background is stored for a specific time interval within NVMs and compared with the next frame. If the power is cut, the background remains unchanged and is updated after the interval passes. Once the moving object is detected, the device switches to the high-powered sensor mode to capture the image.
As a recently developed post-CMOS FET, magneto-electric FETs (MEFETs) offer high-speed and low-power design characteristics for logic and memory applications. In this article, a near-sensor processing (NSP) platform leveraging the MEFETs is presented that enables event detection for edge vision sensors at a low cost by eliminating the need for power-hungry analog-to-digital circuits (ADCs). Besides, an efficient background comparison method is presented with adjustable precision that offers the output quality efficiency tradeoff, depending on the application’s needs. Our device-to-architecture evaluations show that the proposed hardware–software codesign reduces the energy consumption and execution time on average by a factor of ∼ 15 × and ∼ 2.4 × compared to the SOT-MRAM counterpart running employing the same method.
Nowadays, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. However, capturing data and analyzing it via a backend/cloud processor are energy-intensive and long-latency, resulting in a memory bottleneck and low-speed feature extraction at the edge. This paper presents AppCiP architecture as a sensing and computing integration design to efficiently enable Artificial Intelligence (AI) on resource-limited sensing devices. AppCiP provides a number of unique capabilities, including instant and reconfigurable RGB to grayscale conversion, highly parallel analog convolution-in-pixel, and realizing low-precision quinary weight neural networks. These features significantly mitigate the overhead of analog-to-digital converters and analog buffers, leading to a considerable reduction in power consumption and area overhead. Our circuit-to-application co-simulation results demonstrate that AppCiP achieves ~3 orders of magnitude higher efficiency on power consumption compared with the fastest existing designs considering different CNN workloads. It reaches a frame rate of 3000 and an efficiency of ~4.12 TOp/s/W. The performance accuracy of the AppCiP architecture on different datasets such as SVHN, Pest, CIFAR-10, MHIST, and CBL Face detection is evaluated and compared with the state-of-the-art design. The obtained results exhibit the best results among other processing in/near pixel architectures, while AppCip only degrades the accuracy by less than 1% on average compared to the floating-point baseline.
In this work, a high-speed and energy-efficient comparator-based N ear- S ensor L ocal B inary P attern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN datasets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and an energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2× and execution time by a factor of 4× compared to the best recent LBP-based networks.
In the Artificial Intelligence of Things (AIoT) era, always-on intelligent and self-powered visual perception systems have gained considerable attention and are widely used. Thus, this paper proposes TizBin, a low-power processing in-sensor scheme with event and object detection capabilities to eliminate power costs of data conversion and transmission and enable data-intensive neural network tasks. Once the moving object is detected, TizBin architecture switches to the high-power object detection mode to capture the image. TizBin offers several unique features, such as analog convolutions enabling low-precision ternary weight neural networks (TWNN) to mitigate the overhead of analog buffer and analog-to-digital converters. Moreover, TizBin exploits non-volatile magnetic RAMs to store NN’s weights, remarkably reducing static power consumption. Our circuit-to-application co-simulation results for TWNNs demonstrate minor accuracy degradation on various image datasets, while TizBin achieves a frame rate of 1000 and efficiency of ∼1.83 TOp/s/W.
Multiply–accumulate operation (MAC) is a fundamental component of machine learning tasks, where multiplication (either integer or float multiplication) compared to addition is costly in terms of hardware implementation or power consumption. In this paper, we approximate floating-point multiplication by converting it to integer addition while preserving the test accuracy of shallow and deep neural networks. We mathematically show and prove that our proposed method can be utilized with any floating-point format (e.g., FP8, FP16, FP32, etc.). It is also highly compatible with conventional hardware architectures and can be employed in CPU, GPU, or ASIC accelerators for neural network tasks with minimum hardware cost. Moreover, the proposed method can be utilized in embedded processors without a floating-point unit to perform neural network tasks. We evaluated our method on various datasets such as MNIST, FashionMNIST, SVHN, Cifar-10, and Cifar-100, with both FP16 and FP32 arithmetics. The proposed method preserves the test accuracy and, in some cases, overcomes the overfitting problem and improves the test accuracy.
This work proposes a new generic Single-cycle Compute-in-Memory (CiM) Accelerator for matrix computation named SCiMA. SCiMA is developed on top of the existing commodity Spin-Orbit Torque Magnetic Random-Access Memory chip. Every sub-array’s peripherals are transformed to realize a full set of single-cycle 2- and 3-input in-memory bulk bitwise functions specifically designed to accelerate a wide variety of graph and matrix multiplication tasks. We explore SCiMA’s efficiency by selecting a complex matrix processing operation, i.e., calculating determinant as an essential and under-explored application in the CiM domain. The cross-layer device-to-architecture simulation framework shows the presented platform can reduce energy consumption by 70.43% compared with the most recent CiM designs implemented with the same memory technology. SCiMA also achieves up to 2.5x speedup compared with current CiM platforms.
Because of the impressive performance and success of artificial intelligence (AI)-based applications, filters as a primary part of digital signal processing systems are widely used, especially finite impulse response (FIR) filtering. Although they offer several advantages, such as stability, they are computationally intensive. Hence, in this paper, we propose a systematic methodology to efficiently implement computing in-memory (CIM) accelerators for FIR filters using various CMOS and post-CMOS technologies, referred to as ReFACE. ReFACE leverages a residue number system (RNS) to speed up the essential operations of digital filters, instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain. Moreover, the CIM architecture eliminates the off-chip data transfer by leveraging the maximum internal bandwidth of memory chips to realize a local and parallel computation on small residues independently. Taking advantage of both RNS and CIM results in significant power and latency reduction. As a proof-of-concept, ReFACE is leveraged to implement a 4-tap RNS FIR. The simulation results verified its superior performance with up to 85× and 12× improvement in energy consumption and execution time, respectively, compared with an ASIC accelerator.
In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results show that for a quantized deep learning workload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ~37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ~2.2× compared to an ASIC implementation.
Convolutional Neural Networks (CNNs), due to their recent successes, have gained lots of attention in various vision-based applications. They have proven to produce incredible results, especially on big data, that require high processing demands. However, CNN processing demands have limited their usage in embedded edge devices with constrained energy budgets and hardware. This paper proposes an efficient new architecture, namely Ocelli includes a ternary compute pixel (TCP) consisting of a CMOS-based pixel and a compute add-on. The proposed Ocelli architecture offers several features; (I) Because of the compute add-on, TCPs can produce ternary values (i.e., −1, 0, +1) regarding the light intensity as pixels’ inputs; (II) Ocelli realizes analog convolutions enabling low-precision ternary weight neural networks. Since the first layer’s convolution operations are the performance bottleneck of accelerators, Ocelli mitigates the overhead of analog buffers and analog-to-digital converters. Moreover, our design supports a zero-skipping scheme to further power reduction; (III) Ocelli exploits non-volatile magnetic RAMs to store CNN’s weights, which remarkably reduces the static power consumption; and finally, (IV) Ocelli has two modes, including sensing and processing. Once the object is detected, the architecture switches to the typical sensing mode to capture the image. Compared to the conventional pixels, it achieves an average 10% efficiency on its lane detection power consumption compared with existing edge detection algorithms. Moreover, considering different CNN workloads, our design shows more than 23% power efficiency over conventional designs, while it can achieve better accuracy.
This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks (NNs) leveraging a novel compute-pixel with nonvolatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs. Our proposed design, namely an integrated multilevel RRAM (HfOx)-based processing-in-pixel accelerator (MR-PIPA), achieves a frame rate of 1000 and efficiency of ~1.89 TOp/s/W, while it substantially reduces data conversion and transmission energy by ~84% compared to a baseline at the cost of minor accuracy degradation.
Herein, we propose LT-PIM as a L ookup T able-based P rocessing- I n- M emory architecture leveraging the high density of DRAM to enable massively parallel and flexible computation. LT-PIM supports lookup table queries to execute complex arithmetic operations, such as multiplication via only memory read operation. In addition, LT-PIM enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands in the same bit-line. With this, LT-PIM enables a complete and inexpensive in-DRAM RowHammer (RH) self-tracking approach. Our results demonstrate that LT-PIM achieves ∼ 70% higher energy efficiency than the fastest charge-sharing-based designs and ∼ 32% over the best LUT-based designs. As for the RH self-tracking, with a worst-case slowdown of ∼ 0.2%, LT-PIM archives up to ∼ 80% energy-saving over the best designs.
This work presents a processing-in-sensor platform leveraging magnetic devices as a flexible and efficient solution for real-time and smart image processing in AI devices. The main idea is to combine the typical sensing mechanism with an intrinsic coarse-grained convolution operation at the edge to remarkably reduce the power consumption of data conversion and transmission to an off-chip processor imposed by the first layer of deep neural networks. Our initial results demonstrate acceptable accuracy on the SVHN image data-set, while the proposed platform substantially reduces data conversion and transmission energy compared with a baseline sensor-CPU platform.
We propose HARDeNN, a low-overhead end-to-end inference accelerator methodology to armor the underlying pre-trained neural network architecture against black-box non-input adversarial attacks. In order to find the most vulnerable neural network architectures parameters, a hardware-assisted fault injection tool and a statistical stress model have been proposed to synergy uniform fault assessment across layers and targeted in-layer fault assessment to realize a holistic, rigorous fault evaluation in NN topologies susceptible to non-input adversarial black-box attacks. The key observation from the assessment shows that the weights and activation functions are the most vulnerable neural network parameters that are susceptible to both single-bit and multiple-bit flip attacks. Concerning the aforementioned parameters, a multi-objective design space exploration is conducted to find a superior design under different resource constraints. The error-resiliency magnitude offered by HARDeNN can be adjusted based on the given boundaries. The experimental results show that HARDeNN methodology enhances the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 multi-bit upsets that target weight and activation layers,
In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.
Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGB-colored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., <8:8>. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems.
Convolutional Neural Networks (CNNs) have gained lots of attention in various digital imaging applications. They have proven to produce incredible results, especially on big data, that require high processing demands. With the increasing size of datasets, especially in computational pathology, CNN processing takes even longer and uses higher computational resources. Considerable research has been conducted to improve the efficiency of CNN, such as quantization. This paper aims to apply efficient training and inference of ResNet using quantization on histopathology images, the Patch Camelyon (PCam) dataset. An analysis for efficient approaches to classify histopathology images is presented. First, the original RGB-colored images are evaluated. Then, compression methods such as channel reduction and sparsity are applied. When comparing sparsity on grayscale with RGB modes, classification accuracy is relatively the same, but the total number of MACs is less in sparsity on grayscale by 77% than RGB. A higher classification result was achieved by grayscale mode, which requires much fewer MACs than the original RGB mode. Our method’s low energy and processing make this project suitable for inference on wearable healthcare low powered devices and mobile hospitals in rural areas or developing countries. This also assists pathologists by presenting a preliminary diagnosis.
This book chapter describes, explores, and analyzes the designs and framework for energy-efficient and reliable edge computing from device to architecture to handle and compute data-intensive tasks and applications. First, we present a comprehensive study regarding magnetic random-access memory (MRAM) as a promising nonvolatile memory component due to its interesting features, including nonvolatility, near-zero standby power, high integration density, and radiation hardness. To enable efficient and reliable computing units, optimized in-memory processing accelerators for data and compute-intensive tasks via algorithm and hardware codesign approaches are discussed. Moreover, two other high attention topics, namely, normally off computing and hardware security, are examined. Thus, two design methodologies are introduced to mitigate MRAM write energy cost while provided benefits are efficiently utilized. The first design methodology approach, referred to as NV-clustering, is developed to realize middleware-transparent intermittent computing. The foundations of our work are advanced from the ground up by extending this emerging MRAM device to discover logic-in-memory methods that leverage intrinsic nonvolatility to realize intermittent robust computation. Then power analysis-resilient circuit (PARC) procedure as an extension of NV-clustering is developed as a power-masked synthesis technique in the presence of power analysis attacks.
Deep neural networks (DNNs) have shown their great capability of surpassing human performance in many areas. With the help of quantization, artificial intelligence (AI) powered devices are ubiquitously deployed. Yet, the easily accessible AI-powered edge devices become the target of malicious users who can deteriorate the privacy and integrity of the inference process. This paper proposes two adversarial attack scenarios, including three threat models, which crush local binary pattern networks (LBPNet). These attacks can be applied maliciously to flip a limited number of susceptible bits in kernels within the system’s shared memory. The threat could be driven through the Row-Hammer attack and significantly drops the model’s accuracy. Our preliminary simulation results demonstrate flipping only the most significant bit of the first LBP layer decreases the accuracy from 99.51 % down to 18 % on the MNIST data-set. We then briefly discuss potential hardware/software -oriented defense mechanisms as countermeasures to such attacks.
Due to the essential role of matrix multiplication in many scientific applications, especially in data and compute -intensive applications, we explore the efficiency of highly used matrix production algorithms. This paper proposes an HW/SW co-optimization technique, entitled EaseMiss, to reduce the cache miss ratio for large general matrix-matrix multiplications. First, we revise the algorithms by applying three software optimization techniques to improve performance. Choosing the proper algorithms to achieve the best performance is examined and formulated. By leveraging the proposed optimizations, the number of cache misses decreases by a factor of 3 in a conventional data cache. To further improve, we then propose SPLiTCACHE to virtually split data cache regarding matrices’ dimensions for better data reuse. This method can be easily embedded into conventional general-purpose processors or GPUs at the cost of negligible logical circuit overhead. After using the correct and valid splitting, the obtained results show that the cache misses reduce by a factor of 2 compared to the conventional data cache on average in the machine learning workloads.
This paper presents a novel ternary Static Random Access Memory (T-SRAM) cell. To validate the functionality of the proposed T-SRAM, carbon nanotube field-effect transistors are selected as a proof-of-concept, whereas either post-CMOS or CMOS technologies can replace it. Our T-SRAM intrinsically eliminates the need to store the intermediate ternary state’s voltage level, thus significantly reducing leakage power and increasing robustness. Extensive SPICE simulation and comparison results show that the proposed T-SRAM can be a promising alternative for CMOS SRAMs deploying in low-power edge AI. Further, the analysis verifies that the proposed design is more robust than previous implementations.
This work paves the way to realize a processing-in-pixel accelerator based on a multi-level HfO x ReRAM as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs.
In this paper, we propose an efficient convolutional neural network (CNN) accelerator design, entitled RNSiM, based on the Residue Number System (RNS) as an alternative for the conventional binary number representation. Instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain, the novelty of RNSiM lies in that all the data, including stored weights and communication/computation, are performed in the RNS domain. Due to the inherent parallelism of the RNS arithmetic, power and latency are significantly reduced. Moreover, an enhanced integrated intermodulo operation core is developed to decrease the overhead imposed by non-modular operations. Further improvement in systems’ performance efficiency is achieved by developing efficient Processing-in-Memory (PIM) designs using various volatile CMOS and non-volatile Post-CMOS technologies to accelerate RNS-based multiplication-and-accumulations (MACs). The RN-SiM accelerator’s performance on different datasets, including MNIST, SVHN, and CIFAR-10, is evaluated. With almost the same accuracy to the baseline CNN, the RNSiM accelerator can significantly increase both energy-efficiency and speedup compared with the state-of-the-art FPGA, GPU, and PIM designs. RNSiM and other RNS-PIMs, based on our method, reduce the energy consumption by orders of 28−77× and 331−897× compared with the FPGA and the GPU platforms, respectively.
Over past years, the high demand to efficiently process deep learning (DL) models has driven the market of the chip design companies. However, the new Deep Chip architectures, a common term to refer to DL hardware accelerator, have slightly paid attention to the security requirements in quantized neural networks (QNNs), while the black/white -box adversarial attacks can jeopardize the integrity of the inference accelerator. Therefore in this paper, a comprehensive study of the resiliency of QNN topologies to black-box attacks is examined. Herein, different attack scenarios are performed on an FPGA-processor co-design, and the collected results are extensively analyzed to give an estimation of the impact’s degree of different types of attacks on the QNN topology. To be specific, we evaluated the sensitivity of the QNN accelerator to a range number of bit-flip attacks (BFAs) that might occur in the operational lifetime of the device. The BFAs are injected at uniformly distributed times either across the entire QNN or per individual layer during the image classification. The acquired results are utilized to build the entropy-based model that can be leveraged to construct resilient QNN architectures to bit-flip attacks.
We propose SHIELDeNN, an end-to-end inference accelerator frame-work that synergizes the mitigation approach and computational resources to realize a low-overhead error-resilient Neural Network (NN) overlay. We develop a rigorous fault assessment paradigm to delineate a ground-truth fault-skeleton map for revealing the most vulnerable parameters in NN. The error-susceptible parameters and resource constraints are given to a function to find superior design. The error-resiliency magnitude offered by SHIELDeNN can be adjusted based on the given boundaries. SHIELDeNN methodology improves the error-resiliency magnitude of cnvW1A1 by 17.19% and 96.15% for 100 MBUs that target weight and activation layers, respectively.
This work shows a promising solution to efficiently implement normally-off computing (NoC) and power analysis side-channel attack resilient designs using spin-based devices. Spintronics, as post-CMOS devices, provide interesting features such as non-volatility, which plays an essential role in NoC structures. Besides, spin-based components can naturally function as a polymorphic gate (PG) that realizes reconfigurable logic functions with inherent security attributes. However, Spintronics’ utilization imposes power and area overhead compared to CMOS-based designs. Thus, herein an efficient design methodology is introduced to mitigate these overheads. This approach is first extended to realize the targeted insertion PG Modules within the VLSI implementations to make it resilient against power failure, entitled NV-Clustering. Then PARC as an extension of NV-Clustering was developed as a power-masked synthesis method in the presence of power analysis side-channel attack. PARC randomly generates power maskable building blocks with the optimum PDP and area overhead. In addition to NV-Clustering, the PARC can be expanded against fault injection approaches due to PG modules’ reconfigurability to cover the faults.
In recent years, the big data booming has boosted the development of highly accurate prediction models driven from machine learning (ML) and deep learning (DL) algorithms. These models can be orchestrated on the customized hardware in the safety-critical missions to accelerate the inference process in ML/DL -powered IoT. However, the radiation-induced transient faults and black/white -box attacks can potentially impact the individual parameters in ML/DL models which may result in generating noisy data/labels or compromising the pre-trained model. In this paper, we propose Fiji-FIN 1 , a suitable framework for evaluating the resiliency of IoT devices during the ML/DL model execution with respect to the major security challenges such as bit perturbation attacks and soft errors. Fiji-FIN is capable of injecting both single bit/event flip/upset and multi-bit flip/upset faults on the architectural ML/DL accelerator embedded in ML/DL -powered IoT. Fiji-FIN is significantly more accurate compared to the existing software-level fault injections paradigms on ML/DL -driven IoT devices.
Over past years, the easy accessibility to the large scale datasets has significantly shifted the paradigm for developing highly accurate prediction models that are driven from Neural Network (NN). These models can be potentially impacted by the radiation-induced transient faults that might lead to the gradual downgrade of the long-running expected NN inference accelerator. The crucial observation from our rigorous vulnerability assessment on the NN inference accelerator demonstrates that the weights and activation functions are unevenly susceptible to both single-event upset (SEU) and multi-bit upset (MBU), especially in the first five layers of our selected convolution neural network. In this paper, we present the relatively-accurate statistical models to delineate the impact of both undertaken SEU and MBU across layers and per each layer of the selected NN. These models can be used for evaluating the error-resiliency magnitude of NN topology before adopting them in the safety-critical applications.
Over past years, the easy accessibility to the large scale datasets has significantly shifted the paradigm for developing highly accurate prediction models that are driven from Neural Network (NN). These models can be potentially impacted by the radiation-induced transient faults that might lead to the gradual downgrade of the long-running expected NN inference accelerator. The crucial observation from our rigorous vulnerability assessment on the NN inference accelerator demonstrates that the weights and activation functions are unevenly susceptible to both single-event upset (SEU) and multi-bit upset (MBU), especially in the first five layers of our selected convolution neural network. In this paper, we present the relatively-accurate statistical models to delineate the impact of both undertaken SEU and MBU across layers and per each layer of the selected NN. These models can be used for evaluating the error-resiliency magnitude of NN topology before adopting them in the safety-critical applications.
Herein, a bit-wise Convolutional Neural Network (CNN) in-memory accelerator is implemented using Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) computational sub-arrays. It utilizes a novel AND-Accumulation method capable of significantly-reduced energy consumption within convolutional layers and performs various low bitwidth CNN inference operations entirely within MRAM. Power-intermittence resiliency is also enhanced by retaining the partial state information needed to maintain computational forward-progress, which is advantageous for battery-less IoT nodes. Simulation results indicate ~5.4× higher energy-efficiency and 9× speedup over ReRAM-based acceleration, or roughly ~9.7× higher energy-efficiency and 13.5× speedup over recent CMOS-only approaches, while maintaining inference accuracy comparable to baseline designs.
A prevalent class of side-channel attacks relies on Differential Power Analysis (DPA) methods, which monitor power traces during cryptographic processing to discover secret keys. Reconfigurability via Polymorphic Gate Modules (PGMs) offers an approach to obscure DPA information by dynamically rearranging the operation of constituent sub-circuits, albeit at the cost of increased area and power consumption. Thus, we develop the Power Analysis-Resilient Circuit (PARC) design methodology to instantiate the use of spin-based devices as an extension to conventional Register Transfer Language (RTL) specifications. PARC replaces a specific portion of the circuit to maximize a new Effectiveness of Design (EoD) metric, which quantifies DPA impact versus its performance overhead. To validate functionality, PARC is applied to various benchmark circuits including ISCAS-89, MCNC, and ITC-99. EoD results indicate that PARC significantly increases the number of power traces that an adversary would need to use in order to extract power information but incurs a low cost to the functional circuit itself.
Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of the Internet of Things (IoT) devices by utilizing ambient sources of energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile, it is proposed to innovate Intermittent Robust Computation (IRC) Unit as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in spin-based switching devices. The foundations of IRC will be advanced from the device-level upwards, by extending a Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device. The device will then be used to realize SHE-MTJ Majority/Polymorphic Gate (MG/PG) logic approaches and libraries. Then a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Finally, the NV-Clustering synthesis procedure and corresponding tool module are proposed to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. It also realizes middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Simulation results for various benchmark circuits including ISCAS-89 validate functionality and power dissipation, area, and delay benefits.
A Generative Adversarial Network (GAN) is an adversarial learning approach which empowers conventional deep learning methods by alleviating the demands of massive labeled datasets. However, GAN training can be computationally-intensive limiting its feasibility in resource-limited edge devices. In this paper, we propose an approximate GAN (ApGAN) for accelerating GANs from both algorithm and hardware implementation perspectives. First, inspired by the binary pattern feature extraction method along with binarized representation entropy, the existing Deep Convolutional GAN (DCGAN) algorithm is modified by binarizing the weights for a specific portion of layers within both the generator and discriminator models. Further reduction in storage and computation resources is achieved by leveraging a novel hardware-configurable in-memory addition scheme, which can operate in the accurate and approximate modes. Finally, a memristor-based processing-in-memory accelerator for ApGAN is developed. The performance of the ApGAN accelerator on different data-sets such as Fashion-MNIST, CIFAR-10, STL-10, and celeb-A is evaluated and compared with recent GAN accelerator designs. With almost the same Inception Score (IS) to the baseline GAN, the ApGAN accelerator can increase the energy-efficiency by ~28.6× achieving 35-fold speedup compared with a baseline GPU platform. Additionally, it shows 2.5× and 5.8× higher energy-efficiency and speedup over CMOS-ASIC accelerator subject to an 11 percent reduction in IS.
In this paper, we develop an evolutionary-driven circuit optimization methodology, which can be leveraged for the synthesis of spintronic-based normally-off computing (NoC) circuits. NoC architectures distribute nonvolatile memory elements throughout the CMOS logic plane, creating a new class of fine-grained functionally-constrained synthesis challenges. Spin-based NoC circuits synthesis objectives include increased computational throughput and reduced static power consumption. Our proposed methodology utilizes Genetic Algorithms (GAs) to optimize the implementation of a Boolean logic expression in terms of area, delay, or power consumption. It first leverages the spin-based device characteristics to achieve a primary semi-optimized implementation, then further performance optimization is applied to the implemented design based on the NoC requirements and optimization criteria. As a proof-of-concept, the optimization approach is leveraged to implement a functionally-complete set of Boolean logic gates using spin Hall effect (SHE)-magnetic tunnel junctions (MTJs), which are optimized for both power and delay objectives. NoC synthesis methodologies supporting NoC circuit design of emerging device and hybrid CMOS logic applications. Finally, Simulation results and analyses verified the functionality of our proposed optimization tool for NoC circuit implementations.
With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node.
The objectives of advancing secure, intermittency-tolerant, and energy-aware logic datapaths are addressed herein by developing a spin-based design methodology and its corresponding synthesis steps. The approach selectively-inserts Non-Volatile (NV) Polymorphic Gates (PGs) to realize datapaths which are suitable for intrinsic operation in Energy-Harvesting-Powered (EHP) devices. Spin Hall Effect (SHE)-based Magnetic Tunnel (MTJs) are utilized to design NV-PGs, which are combined within a Flip-Flop (FF) circuit to develop a PG-FF realizing Boolean logic functions with inherent state-holding capability. The reconfigurability of PGs is leveraged for logic-encryption to enhance the security of the developed intermittency-resilient circuits, which are applied to ISCAS-89, MCNS, and ITC-99 benchmarks. The results obtained indicate that the PG-FF based design can achieve up to 7.1% and 13.6% improvements in terms of area and Power Delay Product (PDP), respectively, compared to NV-FF based methodologies that replace the CMOS-based FFs with NV-FFs. Further PDP improvements are achieved by using low-energy barrier SHE-MTJ devices within the PG-FF circuit. SHE-MTJs with 30kT energy exhibit 40.5% reduction in PDP at the cost of lower retention times in the range of minutes, which is still sufficient to achieve forward progress in EHP devices having more than hundreds of power-on and power-off cycles per minute.
Abstract
The architecture, operation, and characteristics of two post-CMOS reconfigurable fabrics are identified to realize energy-sparing and resilience features, while remaining feasible for near-term fabrication. First, Storage Cell Replacement Fabrics (SCRFs) provide a reconfigurable computing platform utilizing near- zero leakage Spin Hall Effect devices which replace SRAM bit-cells within Look-Up Tables (LUTs) and/or switch boxes to complement the advantages of MOS transistor-based multiplexer select trees. Second, Heterogeneous Technology Configurable Fabrics (HTCFs) are identified to extend reconfigurable computing platforms via a palette of CMOS, spin-based, or other emerging device technologies, such as various Magnetic Tunnel Junction (MTJ) and Domain Wall Motion devices. HTCFs are composed of a triad of Emerging Device Blocks, CMOS Logic Blocks, and Signal Conversion Blocks. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HTCFs enable new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. Both SCRFs and HTCFs offer a platform for fine- grained Logic-In-Memory architectures and runtime adaptive hardware. SPICE simulations indicate 6% to 67% reduction in read energy, 21% reduction in reconfiguration energy, and 78% higher clock frequency versus alternative fabricated emerging device architectures, and a significant reduction in leakage compared to CMOS-based approaches.
Aggressive Metal Oxide Semiconductor (MOS) technology scaling in digital circuits has resulted in important challenges including a significant increase in leakage currents, shortchannel effects, and drain saturation growth while reducing the power supply voltage for digital
applications. Furthermore, by extensions to sub 10-nm regimes, error resiliency has become a major challenge for the microelectronics industry, particularly mission-critical systems, e.g. space and terrestrial applications. Therefore, emerging devices and technologies have attracted considerable
attention in recent years as an alternative for CMOS based technologies such as spintronics [1-6], resistive random access memory (RRAM) [7-10], phase-change memory (PCM) [11, 12], and Quantum Cellular Automata (QCA) [13-18]. Among promising devices, the 2014 Magnetism
Roadmap [19] identifies nanomagnetic devices as capable post-CMOS candidates, of which Magnetic Tunnel Junctions (MTJs) are considered as one of the most promising technologies spanning both logic [20-22] and memory functionalities [23-26]. MTJs are characterized by nonvolatility, near-zero standby power, high integration density, and radiation-hardness, as a technology progression from CMOS. Moreover, MTJ can be readily integrated at the back-end process of the CMOS fabrication, due to its vertical structure [27, 28]. In this book chapter, we will focus on the fundamentals and modeling of the MTJs using precise physics equations. Moreover, some of their applications in reconfigurable fabrics and logic-in-memory architectures will be studied.
Magnetic tunnel junction (MTJ)-based devices have been studied extensively as a promising candidate to implement hybrid energy-efficient computing circuits due to their nonvolatility, high integration density, and CMOS compatibility. In this paper, MTJs are leveraged to develop a novel full adder (FA) based on 3- and 5-input majority gates. Spin Hall effect (SHE) is utilized for changing the MTJ states resulting in low-energy switching behavior. SHE-MTJ devices are modeled in Verilog-A using precise physical equations. SPICE circuit simulator is used to validate the functionality of 1-bit SHE-based FA. The simulation results show 76% and 32% improvement over previous voltage-mode MTJ-based FA in terms of energy consumption and device count, respectively. The concatanatability of our proposed 1-bit SHE-FA is investigated through developing a 4-bit SHE-FA. Finally, delay and power consumption of an n-bit SHE-based adder has been formulated to provide a basis for developing an energy efficient SHE-based n-bit arithmetic logic unit.
Nanotechnologies, remarkably Quantum-dot Cellular Automata (QCA), offer an attractive perspective for future computing technologies. In this paper, QCA is investigated as an implementation method for reversible logic. A novel XOR gate and also a new approach to implement 2:1 multiplexer are presented. Moreover, an efficient and potent universal reversible gate based on the proposed XOR gate is designed. The proposed reversible gate has a superb performance in implementing the QCA standard benchmark combinational functions in terms of area, complexity, power consumption, and cost function in comparison to the other reversible gates. The gate achieves the lowest overall cost among the most cost-efficient designs presented so far, with a reduction of 24%. In order to employ the merits of reversibility, the proposed reversible gate is leveraged to design the four common latches (D latch, T latch, JK latch, and SR latch). Specialized structures of the proposed circuits could be used as building blocks in designing sequential and combinational circuits in QCA architectures.
Field programmable gate array (FPGA) attributes of logic configurability, bitstream storage, and dynamic signal routing can be realised by leveraging the complementary benefits of emerging devices with complementary metal oxide semiconductor (CMOS)-based devices. A novel carbon/magnet lookup table (CM-LUT) is developed and evaluated by trading off a range of mixed heterogeneous technologies to balance energy, delay, and reliability attributes. Herein, magnetic spintronic devices are employed in the configuration memory to contribute non-volatility and high scalability. Meanwhile, carbon nanotube field-effect transistors (CNFETs) provide desirable conductivity, low delay, and low power consumption. The proposed CM-LUT offers ultra-low power and high-speed operation while maintaining high endurance re-programmability with increased radiation-induced soft-error immunity. The proposed four-input one-output CM-LUT utilises 41 CNFETs and 20 magnetic tunnel junctions for read operations and 35 CNFET to perform write operations. Results indicate that CM-LUT achieves an average four-fold energy reduction, eight-fold faster circuit operation and 9.3% reconfiguration power delay product improvement in comparison with spin-based look-up tables. Finally, additional hybrid technology designs are considered to balance performance with the demands of energy consumption for near-threshold operation.
In this paper, various energy-efficient write schemes are proposed for switching operation of spin hall effect (SHE)-based magnetic tunnel junctions (MTJs). A transmission gate (TG)-based write scheme is proposed, which provides a symmetric and energy-efficient switching behavior. We have modeled an SHE-MTJ using precise physics equations, and then leveraged the model in SPICE circuit simulator to verify the functionality of our designs. Simulation results show the TG-based write scheme advantages in terms of device count and switching energy. In particular, it can operate at 12% higher clock frequency while realizing at least 13% reduction in energy consumption compared to the most energy-efficient write circuits. We have analyzed the performance of the implemented write circuits in presence of process variation (PV) in the transistors’ threshold voltage and SHE-MTJ dimensions. Results show that the proposed TG-based design is the second most PV-resilient write circuit scheme for SHE-MTJs among the implemented designs. Finally, we have proposed the 1TG-1T-1R SHE-based magnetic random access memory (MRAM) bit cell based on the TG-based write circuit. Comparisons with several of the most energy-efficient and variation-resilient SHE-MRAM cells indicate that 1TG-1T-1R delivers reduced energy consumption with 43.9% and 10.7% energy-delay product improvement, while incurring low area overhead.
Abstract:
Abstract:
Abstract:
Quantum-dot Cellular Automata (QCA) is an attractive nanoelectronics paradigm which is widely advocated as a possible replacement of conventional CMOS technology. Designing memory cells is a very interesting field of research in QCA domain. In this paper, we are going to propose novel nanotechnology-compatible designs based on the majority gate structures. In the first step, this objective is accomplished by QCA implementation of two well-organized JK flip-flop designs and in the second step; synchronous counters with different sizes are presented as an application. To evaluate functional correctness of the proposed designs and compare with state-of-the-art, QCADesigner tool is employed.
Quantum-dot cellular automata (QCA) has been studied extensively as a promising switching technology at nanoscale level. Despite several potential advantages of QCA-based designs over conventional CMOS logic, some deposition defects are probable to occur in QCA-based systems which have necessitated fault-tolerant structures. Whereas binary adders are among the most frequently-used components in digital systems, this work targets designing a highly-optimized robust full adder in a QCA framework. Results demonstrate the superiority of the proposed full adder in terms of latency, complexity and area with respect to previous full adder designs. Further, the functionality and the defect tolerance of the proposed full adder in the presence of QCA deposition faults are studied. The functionality and correctness of our design is confirmed using high-level synthesis, which is followed by delineating its normal and faulty behavior using a Probabilistic Transfer Matrix (PTM) method. The related waveforms which verify the robustness of the proposed designs are discussed via generation using the QCADesigner simulation tool.
GWhile spintronic-based neuromorphic architectures offer analog computation strategies [2], in this proposal we exploit reconfigurability and associative processing using a Logic-In-Memory (LIM) paradigm. LIM is compatible with conventional computing algorithms and integrates logical operations with data storage, making it an ideal choice for parallel SIMD operations to eliminate frequent accesses to memory, which are extreme contributors to energy consumption. Spin-based LIM architectures have the capability to increase computational throughput, reduce the die area, provide instant-on functionality, and reduce static power consumption [3]. Feasibility of a low power spintronic LIM chip has recently been demonstrated in [4] for database applications.
Traditional CMOS technology is approaching its end-of-life, so employing novel technologies such as nano-scale ones are being deployed. Quantum-dot cellular automata (QCA) is a new computing method in nanotechnology that has considerable features such as low power, small dimension and high speed switch. In this paper a comprehensive study of QCA is provided in which we discuss the preliminaries and describe the different aspects of QCA. The state of the art in this field is also presented.
By the inevitable scaling down of the feature size of the MOS transistors which are deeper in nanoranges, the CMOS technology has encountered many critical challenges and problems such as very high leakage currents, reduced gate control, high power density, increased circuit noise sensitivity and very high lithography costs. Quantum-dot cellular automata (QCA) owing to its high device density, extremely low power consumption and very high switching speed could be a feasible competitive alternative. In this paper, a novel 5-input majority gate, an important fundamental building block in QCA circuits, is designed in a symmetric form. In addition to the majority gate, a SR latch, a SR gate and an efficient one bit QCA full adder are implemented employing the new 5-input majority gate. In order to verify the functionality of the proposed designs, QCADesigner tool is used. The results demonstrate that the proposed SR latch and full adder perform equally well or in many cases better than previous circuits.
Photonic technology is now recognized as a promising platform among the existing solutions to the challenges facing interconnection networks in current chips. Recent progresses in silicon nanophotonic technologies have provided an adequate infrastructure to construct photonic communication links with higher bandwidths and power efficiency in comparison with the traditional electrical communications. Router is a core component in photonic networks-on-chip. This paper proposes a 5 × 5 non-blocking photonic router which is designed for XY routing algorithm. The simulation results show that our proposed router achieves significant improvements in terms of insertion loss owing to reduction in number of the waveguide crossings. These improvements leveraging wavelength-division-multiplexing lead to obtain higher bandwidth density.
Quantum-dot cellular automata (QCA) is an emerging nanoscale technology and a possible alternative to conventional CMOS technology. QCA has attractive features such as high speed, low power consumption and smaller area occupation. In this paper, a novel configurable QCA circuit is presented. This circuit can be configured to perform various logic functions such as 2-input or 3-input AND, 2-input or 3-input OR, and other possible variations. By using these kinds of circuits, the hardware requirements for a QCA design can be reduced and various functions can be obtained. We then propose an efficient QCA design of a one-bit full adder constructed based on our circuit. Our design will be compared with previous designs. The comparison shows that the proposed adder has better performance in terms of latency, complexity, and size in QCA. In order to verify the functionality of the proposed circuit, it is checked by means of computer simulations using QCADesigner tool.
Quantum-dot cellular automata (QCA) which encodes binary information by means of charge configuration of quantum-dot cells rather than current, represents a new computing platform at the nanotechnology level. On the plus side, it offers significant improvements over CMOS due to its low power consumption, high speed and small dimension, however on the negative side a large number of manufacturing defects are likely to occur requiring new fault tolerant architectures. The defects might occur in manufacturing or synthesis phases of the design process. In this paper, we present and analyze a novel reconfigurable fault tolerant gate. In addition to its reconfigurability property which makes it capable of covering some commonly used functions, the gate is designed in such a way that it is defect-tolerant against the synthesis phase defects. In order to simulate the functionality of the proposed gate, QCADesigner is used and related waveforms are presented.
In this paper we have proposed an efficient method based on Genetic Algorithms (GAs) to design quantum cellular automata (QCA) circuits with minimum possible number of gates. The basic gates used to design these circuits are 2-input and 3-input NAND gates in addition to inverter gate. Due to use of these two types of NAND gates and their contradictory effects, a new fitness function has been defined. In addition, in this method we have used a type of mutation operator that can significantly help the GA to avoid local optima. The results show that the proposed approach is very efficient in deriving NAND based QCA designs.
Quantum-dot Cellular Automata (QCA) technology is attractive due to its low power consumption, fast speed and small dimension; therefore it is a promising alternative to CMOS technology. Additionally, multiplexer is a useful part in many important circuits. In this paper we propose a novel design of 2: 1 MUX in QCA. Moreover, a 4: 1 multiplexer, an XOR gate and a latch are proposed based on our 2: 1 multiplexer design. The simulation results have been verified using the QCADesigner.
Quantum-dot cellular automaton (QCA) is an emerging technology that can be considered as a possible alternative for semiconductor transistor technology. In this paper, an efficient approach to design circuits is explored in nanoscale. This field is an attractive and exciting blend of computer architecture. The majority gate and the inverter gate as fundamental building block are used in the conventional method for QCA circuit implementation. In this proposed approach, an arbitrary circuit is implemented based on suitable arrangement of QCA cells. The functionality of the proposed circuit is verified using the kink energy computations. Besides, for evaluating the performance of the proposed approach; several QCA circuits have been suggested. Finally, these proposed QCA circuits are compared with the other classical and conventional circuits in term of area, cell counts and latency
Quantum dots cellular automata is a new computing method in the nanotechnology that has considerable features such as low power, small dimension and high speed switch. A QCA device stores logic based on the position of individual electrons. The fundamental logic elements in QCA are the majority (Fig.1 (a)) and inverter gates (Fig.1 (b)) that operate based on the Coulomb repulsion between electrons [1].