Abstract
Objective: To design, model, verify, and synthesize a low-power Neural Processing Element tailored for Edge AI applications, focusing on efficient execution of neural network operations within resource-constrained environments.
Design: A modular hardware architecture based on an adaptation of the ARM Ethos-U microNPU, incorporating blocks for multiply-accumulate operations, weight decoding, data buffering, and output formatting.
Subjects/Patients: Not Applicable
Methods: The design was implemented in Verilog HDL, verified using Cadence Xcelium for functional correctness, and synthesized with Cadence Genus to evaluate area, power, and timing metrics. A finite state machine controls data flow, and four key blocks (MAC Unit, Weight Decoder, Shared Buffer, Output Unit) were simulated and partially integrated.
Results: Simulations confirmed correct functionality of implemented blocks, with accurate multiply-accumulate operations, weight decoding, data storage/retrieval, and output formatting. The architecture demonstrates scalability for parallel instantiation, reduced memory accesses, and suitability for low-power edge devices.
Conclusion: The proposed Neural Processing Element provides a scalable, efficient hardware solution for Edge AI, enabling low-latency inference on IoT devices while minimizing power consumption.
Downloads
Copyrights & License

This work is licensed under a Creative Commons Attribution 4.0 International License.