Arm发布Cortex-M55处理器和Ethos-U55 microNPU架构
2020年2月10日,Arm宣布了其最新的Cortex-M系列产品,即新的Cortex M55。 除了新的CPU微体系结构带来了一些新的改进之外,我们还看到了新的Ethos-U55 NPU IP的引入,该IP旨在与新的M55内核集成在一起。 ARM的新IP旨在在未来几年内提高数十亿低功耗嵌入式设备的机器学习和推理能力,并扩展其产品组合以适应新的用例。
https://www.arm.com/company/news/2020/02/new-ai-technology-from-arm
Cortex-M55: Arm最具AI功能的Cortex-M处理器
Cortex-M55 是第一款基于 Armv8.1-M 架构的微处理器,使用了氦气技术。所谓氦气技术,实际上是针对 Arm Cortex-M 系列处理器的 M-Profile 矢量扩展(MVE)技术,是Armv8.1-M架构的一个扩展,它引入了新的SIMD 128位向量运算,旨在增强DSP和ML的应用和性能。
关键特性:
- Architecture – Armv8.1-M
- Bus interface – AMBA 5 AXI5 64-bit master (compatible to AXI4 IPs)
- Pipeline – 4-stage (for main integer pipeline)
- Security – Arm TrustZone technology (optional)
- DSP extension – 32-bit DSP/SIMD extension
- M-Profile Vector Extension (MVE) – Helium (optional)
- Optional Floating-point Unit (FPU)
- Coprocessor interface – 64-bit (optional)
- Instruction cache – Up to 64KB with ECC (optional)
- Data cache – Up to 64KB with ECC (optional)
- Instruction TCM (ITCM) – Up to 16MB with ECC (optional)
- Data TCM (DTCM) – Up to 16MB with ECC (optional)
- Interrupts – Up to 480 interrupts + Non-maskable interrupt (NMI)
- Wake-up Interrupt Controller (WIC) – Internal and/or external (optional)
- Multiply-accumulate (MAC) / cycle – Up to: 2 x 32-bit MACs/cycle, 4 x 16-bit MACs/cycle, 8 x 8-bit MACs/cycle
- Sleep modes – Multiple power domains, Sleep modes (sleep and deep sleep), Sleep-on-exit, Optional retention support for memories and logic
- Debug – Hardware and software breakpoints, Performance Monitoring Unit (PMU)
- Trace – Optional Instruction trace with Embedded Trace Macrocell (ETM), Data Trace (DWT) (selective data-trace), and Instrumentation Trace (ITM) (software trace)
- Arm Custom Instructions – Optional (available in 2021)
- Robustness – ECC on instruction cache, data cache, instruction TCM, data TCM (optional); Bus interface protection (optional); PMC-100 (Programmable MBIST Controller, optional); Reliability, availability, and serviceability (RAS) extension
Ethos-U55: Arm的首款用于Cortex-M的microNPU
这是第一款针对微控制器和低功耗微处理器的AI处理器架构,它需要和Cortex-M系列微处理器配套使用。其处理单元的规模也是可扩展的,从最小的32个MAC引擎,到最大可以配置到256个MAC引擎。ARM没有给出更为具体的设计细节,不过我们可以知道的是,它的设计要点在控制面积和功耗上。
关键特性:
- Performance (At 1 GHz) – 64 to 512 GOP/s
- MACs (8×8) – 32, 64, 128, 256
- Utilization on popular networks – Up to 85%
- Data Types – Int-8 and Int-16
- Network Support – CNN and RNN/LSTM
- Winograd Support – No
- Sparsity – Yes
- Memory System
- Internal SRAM – 18 to 50 KB
- External on Chip SRAM – KB to Multi-MB
- Compression – Weights only
- Memory Optimizations – Extended compression, layer/operator fusion
- Debug and Profile – Layer-by-layer visibility with PMUs
- Evaluation and Early Prototyping – Performance Model, Cycle Accurate Model, or FPGA evaluations
依照Arm声明,在这两种技术组合下,可以将现有的嵌入式设备的发挥高大480倍的机器学习效率,NPU的开发框架使用TensorFlow Lite Micro。