News & Highlights

Recent milestones, publications, and recognitions.

Feb 2024 Paper

EDGE-LLM

Layer-wise compression and adaptive tuning techniques for on-device LLM adaptation accepted by DAC 2024.

Jul 2023 Paper

GPT4AIGChip

Our LLM-driven accelerator design framework has been accepted by ICCAD 2023.

Jul 2023 Award

Gen-NeRF Demo

Won 2nd place in the University Demo Best Demonstration Award at DAC 2023.

Apr 2023 Paper

Master-ASR

Modularized multilingual ASR system accepted by ICML 2023.

Feb 2023 Paper

Hint-Aug

Few-shot ViT tuning framework accepted by CVPR 2023.

Feb 2023 Paper

NetBooster

Efficiency boosting framework for tiny neural networks accepted by DAC 2023.

Research Focus

I study how to make large language models adaptable, efficient, and dependable in real-world settings. My research spans three pillars:

  • Inference Calibration: Elevating LLM outputs with lightweight attention refinements and dynamic evaluation strategies.
  • Adaptive Tuning: Designing compression-aware fine-tuning and voting schemes so edge devices can host powerful LLM experiences.
  • LLM-for-Hardware: Co-designing silicon and software by steering LLMs to generate and verify hardware designs with human-centered workflows.

Previously, I earned my Ph.D. in Computer Science from Georgia Tech, advised by Prof. Yingyan (Celine) Lin. I also hold an M.S. from Columbia University and a B.Eng. from Zhejiang University, and I have collaborated with MIT-IBM Watson AI Lab.

Selected Publications

* denotes equal contribution

Visualization for attention calibration technique

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan (Celine) Lin

A systematic study of attention sink behaviors in LLMs paired with an inference-time calibration method.

Paper
Visualization for EDGE-LLM

EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting

Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang (Katie) Zhao, Yingyan (Celine) Lin

Compression-aware fine-tuning that unlocks on-device LLM adaptation with minimal hardware overhead.

Paper
Visualization for MG-Verilog dataset

MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan (Celine) Lin

The first multi-granularity Verilog dataset enabling precise LLM-guided hardware code generation.

Paper
Visualization for GPT4AIGChip framework

GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

Yonggan Fu*, Yongan Zhang*, Zhongzhi Yu*, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, Yingyan (Celine) Lin

Human-in-the-loop workflows that translate natural language into accelerator design artifacts.

Paper
Visualization for Master-ASR framework

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modularized Learning

Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Cheng Wan, Yonggan Fu, Yongan Zhang, Yingyan (Celine) Lin

A modular ASR architecture that balances multilingual performance and low-resource specialization.

Paper
Visualization for Hint-Aug

Hint-Aug: Drawing Hints from Vision Foundation Models towards Boosted Few-shot Parameter-Efficient ViT Tuning

Zhongzhi Yu, Shang Wu, Yonggan Fu, Cheng Wan, Shunyao Zhang, Chaojian Li, Yingyan (Celine) Lin

Integrating attention-aware data augmentation to amplify few-shot ViT tuning effectiveness.

Paper
Visualization for NetBooster

NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants

Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Haoran You, Yingyan (Celine) Lin

A blueprint for uplifting compact neural networks via expansion-then-contraction strategies.

Paper
Visualization for EyeCoD system

EyeCoD: Eye Tracking System Acceleration via FlatCam-based Algorithm/Hardware Co-Design

Haoran You*, Cheng Wan*, Yang Zhao*, Zhongzhi Yu*, Yonggan Fu, Jiayi Yuan, Shang Wu, Shunyao Zhang, Yongan Zhang, Chaojian Li, Vivek Boominathan, Ashok Veeraraghavan, Ziyun Li, Yingyan (Celine) Lin

A compact lensless eye-tracking system achieving high throughput through algorithm-hardware co-design.

Paper
Visualization for AugViT

AugViT: Improving Vision Transformer Training by Marrying Attention and Data Augmentation

Zhongzhi Yu, Yonggan Fu, Chaojian Li, Yingyan (Celine) Lin

An attention-aware augmentation framework that consistently lifts ViT accuracy across hardware tiers.

Paper
Visualization for MIA-Former

MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation

Zhongzhi Yu, Yonggan Fu, Sicheng Li, Mengquan Li, Chaojian Li, Yingyan (Celine) Lin

A multi-grained input-adaptive ViT that dynamically adjusts depth, heads, and tokens.

Paper
Visualization for LDP framework

LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference

Zhongzhi Yu, Yonggan Fu, Shang Wu, Mengquan Li, Haoran You, Yingyan Lin

Temporal and spatial precision scheduling for DNN training that optimizes accuracy-efficiency trade-offs.

Paper

Academic Service

Conference Reviewer

ICLR 2023, NeurIPS 2022–2023, ICML 2023, CVPR 2023, AAAI 2022, AICAS 2022.

Artifact Evaluation

MICRO 2023 Artifact Evaluation Committee.