Zhongzhi Yu

Zhongzhi Yu

I am a final year Ph.D. student at Georgia Tech, advised by Prof. Yingyan (Celine) Lin. My research interests lie in efficient deep learning algorithms, with a focus on designing efficient inference and tuning methods for large-scale transformer models. Additionally, I work on enabling the effective adaptation of transformer models to downstream tasks in data-scarce scenarios.

Before starting my Ph.D. journey, I received my M.S. from Columbia University and my B.Eng. from Zhejiang University.

I am currently interning at Nvidia Research as a research intern, where I am fortunate to work with Dr. Mark Ren and Dr. Mingjie Liu. Before that, I interned as a research intern at MIT-IBM Watson AI Lab during 2021, where I was fortunate to work with Dr. Yang Zhang and Dr. Kaizhi Qian.

Email / Google Scholar / CV / LinkedIn / GitHub

News

[Jun. 2024] [Award] Our work MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation received the best paper award in first IEEE International Workshop on LLM-Aided Design (LAD 2024).
[May. 2024] [Paper] Our work Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration is accepted by ICML 2024.
[Feb. 2024] [Paper] Our work EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting is accepted by DAC 2024.
[Jul. 2023] [Paper] Our work GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models is accepted by ICCAD 2023.
[Jul. 2023] [Award] Our Demo Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design is awarded 2nd Place on University Demo Best Demonstration Award at DAC 2023.
[Apr. 2023] [Paper] Our work Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modularized Learning is accepted by ICML 2023.
[Feb. 2023] [Paper] Our work Hint-Aug: Drawing Hints from Vision Foundation Models towards Boosted Few-shot Parameter-Efficient ViT Tuning is accepted by CVPR 2023.
[Feb. 2023] [Paper] Our work NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants is accepted by DAC 2023.

Research Interests

My research interests lie in developing efficient deep learning algorithms, particularly in creating effective inference and tuning methods for large-scale transformer models to further optimize their achievable accuracy-efficiency trade-off. Recently, I have also been interested in designing data-efficient techniques to enable large language models to generate hardware designs, including but not limited to hardware code, aiming to alleviate the tedious manpower involved in the hardware design process.

Selected Publications (*: Equal Contributions)

	Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, and Yingyan (Celine) Lin Paper We conduct a comprehensive exploration of the attention sink mechanism in LLMs and propose an attention calibration technique to improve their performance on-the-fly during inference.
	EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang (Katie) Zhao, and Yingyan (Celine) Lin Paper We developed Edge-LLM, a computation- and memory-efficient tuning framework that enables on-device tuning of LLMs.
	MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, and Yingyan (Celine) Lin Paper We release MG-Verilog, the first-of-its-kind Verilog code generation dataset with labels of multiple granularities.
	GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models Yonggan Fu, Yongan Zhang, Zhongzhi Yu*, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, and Yingyan (Celine) Lin Paper We develop GPT4AIGChip, a framework intended to democratize AI accelerator design by leveraging human natural languages instead of domain-specific languages.
	Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modularized Learning Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, and Yingyan (Celine) Lin Paper We propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability in a modularized-then-assemble manner.
	Hint-Aug: Drawing Hints from Vision Foundation Models towards Boosted Few-shot Parameter-Efficient ViT Tuning Zhongzhi Yu, Shang Wu, Yonggan Fu, Cheng Wan, Shunyao Zhang, Chaojian Li, and Yingyan (Celine) Lin Paper We propose a framework called Hint-based Data Augmentation (Hint-Aug), which is dedicated to boosting the effectiveness of few-shot tuning foundation ViT models.
	NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Haoran You, and Yingyan (Celine) Lin Paper We propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy.
	EyeCoD: Eye Tracking System Acceleration via FlatCam-based Algorithm/Hardware Co-Design Haoran You, Cheng Wan, Yang Zhao, Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, Vivek Boominathan, Ashok Veeraraghavan, Ziyun Li, and Yingyan (Celine) Lin Paper We devise a lensless FlatCam-based eye tracking algorithm and hardware accelerator co-design framework dubbed EyeCoD, which is the first system to meet the high throughput requirement with smaller form-factor.
	AugViT: Improving Vision Transformer Training by Marrying Attention and Data Augmentation Zhongzhi Yu, Yonggan Fu, Chaojian Li, and Yingyan (Celine) Lin Paper We propose a data augmentation framework called AugViT, which is dedicated to incorporating the key component in ViTs, i.e., self-attention, into data augmentation intensity to enable ViT's outstanding accuracy across various devices.
	MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation Zhongzhi Yu, Yonggan Fu, Sicheng Li, Mengquan Li, Chaojian Li, and Yingyan (Celine) Lin Paper We propose a Multi-grained Input-Adaptive Vision Transformer framework dubbed MIA-Former that can input-adaptively adjust the structure of ViTs at three coarse-to-fine-grained granularities (i.e., model depth and the number of model heads/tokens).
	LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference Zhongzhi Yu, Yonggan Fu, Shang Wu, Mengquan Li, Haoran You, and Yingyan Lin Paper We propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs.

Academic Service

Conference Reviewer: ICLR'23, NeurIPS'23, ICML'23, CVPR'23, AAAI'22, NeurIPS'22, AICAS'22.
Artifact Evaluation Committee: MICRO'23.

Last Update: Sep. 2023

Website Template