Zhongzhi Yu  

I am a 4th-year Ph.D. student at Georgia Tech, advised by Prof. Yingyan (Celine) Lin. My research interests lie in efficient deep learning algorithms, with a focus on designing efficient inference and tuning methods for large-scale transformer models. Additionally, I work on enabling the effective adaptation of transformer models to downstream tasks in data-scarce scenarios.

Before starting my Ph.D. journey, I received my M.S. from Columbia University and my B.Eng. from Zhejiang University.

I am currently interning at Nvidia Research as a research intern, where I am fortunate to work with Dr. Mark Ren and Dr. Mingjie Liu. Before that, I interned as a research intern at MIT-IBM Watson AI Lab during 2021, where I was fortunate to work with Dr. Yang Zhang and Dr. Kaizhi Qian.

Email /  Google Scholar  /  CV  /  LinkedIn  /  GitHub


profile photo
News

Research Interests

My research interests lie in developing efficient deep learning algorithms, particularly in creating effective inference and tuning methods for large-scale transformer models to further optimize their achievable accuracy-efficiency trade-off. Recently, I have also been interested in designing data-efficient techniques to enable large language models to generate hardware designs, including but not limited to hardware code, aiming to alleviate the tedious manpower involved in the hardware design process.
Selected Publications       (*: Equal Contributions)
clean-usnob Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, and Yingyan (Celine) Lin
Paper

We conduct a comprehensive exploration of the attention sink mechanism in LLMs and propose an attention calibration technique to improve their performance on-the-fly during inference.

clean-usnob EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang (Katie) Zhao, and Yingyan (Celine) Lin
Paper

We developed Edge-LLM, a computation- and memory-efficient tuning framework that enables on-device tuning of LLMs.

clean-usnob MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, and Yingyan (Celine) Lin
Paper

We release MG-Verilog, the first-of-its-kind Verilog code generation dataset with labels of multiple granularities.

clean-usnob GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
Yonggan Fu*, Yongan Zhang*, Zhongzhi Yu*, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, and Yingyan (Celine) Lin
Paper

We develop GPT4AIGChip, a framework intended to democratize AI accelerator design by leveraging human natural languages instead of domain-specific languages.

clean-usnob Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modularized Learning
Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, and Yingyan (Celine) Lin
Paper

We propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability in a modularized-then-assemble manner.

clean-usnob Hint-Aug: Drawing Hints from Vision Foundation Models towards Boosted Few-shot Parameter-Efficient ViT Tuning
Zhongzhi Yu, Shang Wu, Yonggan Fu, Cheng Wan, Shunyao Zhang, Chaojian Li, and Yingyan (Celine) Lin
Paper

We propose a framework called Hint-based Data Augmentation (Hint-Aug), which is dedicated to boosting the effectiveness of few-shot tuning foundation ViT models.

clean-usnob NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants
Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Haoran You, and Yingyan (Celine) Lin
Paper

We propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy.

clean-usnob EyeCoD: Eye Tracking System Acceleration via FlatCam-based Algorithm/Hardware Co-Design
Haoran You*, Cheng Wan*, Yang Zhao*, Zhongzhi Yu*, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, Vivek Boominathan, Ashok Veeraraghavan, Ziyun Li, and Yingyan (Celine) Lin
Paper

We devise a lensless FlatCam-based eye tracking algorithm and hardware accelerator co-design framework dubbed EyeCoD, which is the first system to meet the high throughput requirement with smaller form-factor.

clean-usnob AugViT: Improving Vision Transformer Training by Marrying Attention and Data Augmentation
Zhongzhi Yu, Yonggan Fu, Chaojian Li, and Yingyan (Celine) Lin
Paper

We propose a data augmentation framework called AugViT, which is dedicated to incorporating the key component in ViTs, i.e., self-attention, into data augmentation intensity to enable ViT's outstanding accuracy across various devices.

clean-usnob MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation
Zhongzhi Yu, Yonggan Fu, Sicheng Li, Mengquan Li, Chaojian Li, and Yingyan (Celine) Lin
Paper

We propose a Multi-grained Input-Adaptive Vision Transformer framework dubbed MIA-Former that can input-adaptively adjust the structure of ViTs at three coarse-to-fine-grained granularities (i.e., model depth and the number of model heads/tokens).

clean-usnob LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference
Zhongzhi Yu, Yonggan Fu, Shang Wu, Mengquan Li, Haoran You, and Yingyan Lin
Paper

We propose LDP, a Learnable Dynamic Precision DNN training framework that can automatically learn a temporally and spatially dynamic precision schedule during training towards optimal accuracy and efficiency trade-offs.

Academic Service

  • Conference Reviewer: ICLR'23, NeurIPS'23, ICML'23, CVPR'23, AAAI'22, NeurIPS'22, AICAS'22.
  • Artifact Evaluation Committee: MICRO'23.

Last Update: Sep. 2023

Website Template