Speech by Dr. Tei-Wei Kuo (CTO of Delta Electronics)
Data-Centric Computing
Dr. Tei-Wei Kuo received his Bachelor's degree in Computer Science & Information Engineering from National Taiwan University in 1986 and his Ph.D. in Computer Science from The University of Texas at Austin in 1994. He is currently the CTO of Delta Electronics (since February 2024) and a Distinguished Professor in the Department of Computer Science & Information Engineering at NTU. He previously served as the Acting President of National Taiwan University (from October 2017 to January 2019) and Vice President for Academic Affairs (from August 2016 to January 2019). Professor Kuo was also an Adjunct/Visiting Professor and Senior Advisor at the Mohamed bin Zayed University of Artificial Intelligence (from February 2023 to January 2024), and the Lee Shau-Kee Chair Professor of Information Engineering, Advisor to the President (Information Technology), and Dean of the College of Engineering at the City University of Hong Kong (from August 2019 to July 2022). His research areas include Embedded Systems, Non-volatile Memory Software Designs, Neuromorphic Computing, and Real-time Systems.
Over two decades ago, flash memory transformed the computing industry. Since then, storage devices have made notable advancements in performance, energy efficiency, and access behaviors. In recent years, their performance has increased by over 1000 times, creating new challenges in computer design, particularly in eliminating traditional I/O bottlenecks. During his speech, Professor Kuo introduced various solutions in neuromorphic computing that endow memory chips with new computational capabilities. He specifically addressed the challenges of application co-designs in in-memory computing and demonstrated how the characteristics of non-volatile memory can be leveraged to optimize deep learning.
The lecture focused on two main strategies: using Resistive Random Access Memory (ReRAM) and Phase Change Memory (PCM) as the optimization solution for deep learning. It covered approaches to address accuracy issues with ReRAM and durability issues with PCM in deep learning computations.
Professor Kuo began by discussing the utilization of deep neural networks (DNNs) in embedded systems during the Internet of Things era. The discussion focused on image and speech recognition on edge devices. To enhance the computational efficiency of DNNs, he introduced a new technology called Processing in Memory (PIM), which integrates computation and memory units, substantially reducing power consumption.
In recent years, crossbar accelerators equipped with resistive random-access memory (ReRAM) have caught much attention as a promising solution for IoT devices. ReRAM stores data by modulating the resistance of cells and also performs computational functions, making it highly valuable for IoT and edge applications. However, programming variation errors in ReRAM hinder its scalability in large-scale applications, especially in multi-bit ReRAM design and crossbar scalability. Professor Kuo is dedicated to addressing these challenges through innovative self-adaptive data manipulation strategies aimed at reducing analog variation errors of ReRAM crossbar accelerators. He has introduced three key designs: Weight Rounding Design (WRD), Input Sub-cycle Design, and Bit-line Redundancy Design (BRD). These designs not only mitigate overlapping variation errors but also enhance inference accuracy.
Phase-change memory (PCM) is also a promising solution for neural networks due to its excellent performance, high density, and near-zero leakage power. However, challenges such as limited write cycles and uneven read/write performance hinder its application in neural networks. Professor Kuo is exploring the optimal use of PCM-based systems for training neural networks while maintaining accuracy. Neural network operation involves two crucial stages: training and inference. The training stage requires significant computational resources and main storage capacity for operations like backpropagation and gradient descent. The inference stage applies the neural network to tasks such as classification. Over the past decade, researchers have addressed computational and storage challenges by reducing model structures and optimizing data flow and content. However, research on non-volatile memory (NVM) applications remains limited. Professor Kuo has proposed a data-aware programming design to optimize PCM write operations, reduce memory access latency during the training process, and extend PCM's lifespan without compromising neural network accuracy. Experimental results indicated that this method significantly improved training performance and extended PCM's lifespan by up to 3.4 times while maintaining neural network accuracy.
In the end, Professor Kuo emphasized the significant impact of memory performance on neural network computations. During the Q&A session that followed, he provided valuable insights in response to questions from the audience. I am very grateful for the opportunity to learn from the invaluable research experience imparted by Professor Tei-Wei Kuo from National Taiwan University.