Abstract

Objective: To design, model, verify, and synthesize a low-power Neural Processing Element tailored for Edge AI applications, focusing on efficient execution of neural network operations within resource-constrained environments.

Design: A modular hardware architecture based on an adaptation of the ARM Ethos-U microNPU, incorporating blocks for multiply-accumulate operations, weight decoding, data buffering, and output formatting.

Subjects/Patients: Not Applicable

Methods: The design was implemented in Verilog HDL, verified using Cadence Xcelium for functional correctness, and synthesized with Cadence Genus to evaluate area, power, and timing metrics. A finite state machine controls data flow, and four key blocks (MAC Unit, Weight Decoder, Shared Buffer, Output Unit) were simulated and partially integrated.

Results: Simulations confirmed correct functionality of implemented blocks, with accurate multiply-accumulate operations, weight decoding, data storage/retrieval, and output formatting. The architecture demonstrates scalability for parallel instantiation, reduced memory accesses, and suitability for low-power edge devices.

Conclusion: The proposed Neural Processing Element provides a scalable, efficient hardware solution for Edge AI, enabling low-latency inference on IoT devices while minimizing power consumption.

Downloads

Download data is not yet available.
 How to Cite
[1]
Palani, L. et al. 2025. Design and Implementation of a Neural Processing Element (NPE) Design for Edge AI Applications. International Journal of Science and Engineering Invention. 11, 10 (Oct. 2025), 141–147. DOI:https://doi.org/10.23958.300.

Copyrights & License