Abstract
This paper presents an integrated circuit (IC) realization of a random forest (RF) machine learning classifier in a 65-nm CMOS. Algorithm, architecture, and circuits are co-optimized to achieve aggressive energy and delay benefits by taking advantage of the inherent error resiliency derived from the ensemble nature of an RF classifier. Deterministic sub-sampling (DSS) and regularized decision trees reduce interconnect complexity, and avoid irregular memory access patterns and computations, thereby reducing the energy-delay product (EDP). The prototype IC also employs low-swing analog in-memory computations embedded in a standard 6T SRAM to enable massively parallel tree node comparisons, thereby minimizing the memory fetches and reducing the EDP further. The 65-nm CMOS prototype IC achieves a 3.1 × and 2.2 × improved energy efficiency and throughput leading to 6.8 × lower EDP compared to a conventional digital system at the same accuracies of 94% and 97.5% for two tasks: 1) eight-class traffic sign recognition and 2) face detection, respectively.
Original language | English (US) |
---|---|
Pages (from-to) | 2126-2135 |
Number of pages | 10 |
Journal | IEEE Journal of Solid-State Circuits |
Volume | 53 |
Issue number | 7 |
DOIs | |
State | Published - Jul 2018 |
Keywords
- Accelerator
- analog processing
- in-memory computing
- machine learning (ML)
- random forest (RF)
ASJC Scopus subject areas
- Electrical and Electronic Engineering