
I am a Ph.D. student in EIC at Huazhong University of Science and Technology, supervised by Prof. Jun Sun and Prof. Yingzhuang Liu.
Previously, I received my B.Eng degree (2014-2018) in Electronic Information Engineering from Huazhong University of Science and Technology, worked with Prof. Xin Yang.
Research Interests
My research interests broadly lie in the theoretical understanding of deep learning. I am particularly fascinated by simple yet principled approaches that can shed light on the fundamental capabilities and limitations of modern models, and I aim to develop frameworks that bridge theory and practice—both explaining why existing methods work and guiding the design of future ones.
These days, I am mostly drawn by three interrelated directions: model compression and capacity, generalization, and robustness. These aspects are central to efficiency, reliability, and interpretability of deep learning, and I am especially interested in developing theoretical tools that help us better understand the trade-offs between them.
I am open to academic collaborations, feel free to get in touch if you are interested.
Publications
- ICLR 2026 Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Qiaozhe Zhang, Jun Sun, Ruijie Zhang, Yingzhuang Liu
The Fourteenth International Conference on Learning Representations (ICLR), 2026
@misc{zhang2025renyisharpnessnovelsharpness, title={R\'enyi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization}, author={Qiaozhe Zhang and Jun Sun and Ruijie Zhang and Yingzhuang Liu}, year={2025}, eprint={2510.07758}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2510.07758}, }Sharpness (of the loss minima) is widely believed to be a good indicator of generalization of neural networks. Unfortunately, the correlation between existing sharpness measures and the generalization is not that strong as expected, sometimes even contradiction occurs. To address this problem, a key observation in this paper is: what really matters for the generalization is the \textit{average spread} (or unevenness) of the spectrum of loss Hessian $\mathbf{H}$. For this reason, the conventional sharpness measures, such as the trace sharpness $\operatorname{tr}(\mathbf{H})$, which cares about the \textit{average value} of the spectrum, or the max-eigenvalue sharpness $\lambda_{\max}(\mathbf{H})$, which concerns the \textit{maximum spread} of the spectrum, are not sufficient to well predict the generalization. To finely characterize the average spread of the Hessian spectrum, we leverage the notion of \textit{Rényi entropy} in information theory, which is capable of capturing the unevenness of a probability vector and thus can be extended to describe the unevenness for a general non-negative vector (which is the case for the Hessian spectrum at the loss minima). In specific, in this paper we propose the \textit{Rényi sharpness}, which is defined as the negative of the Rényi entropy of loss Hessian $\mathbf{H}$. Extensive experiments demonstrate that Rényi sharpness exhibit \textit{strong} and \textit{consistent} correlation with generalization in various scenarios. Moreover, on the theoretical side, two generalization bounds with respect to the Rényi sharpness are established, by exploiting the desirable reparametrization invariance property of Rényi sharpness. Finally, as an initial attempt to take advantage of the Rényi sharpness for regularization, Rényi Sharpness Aware Minimization (RSAM) algorithm is proposed where a variant of Rényi Sharpness is used as the regularizer. It turns out this RSAM is competitive with the state-of-the-art SAM algorithms, and far better than the conventional SAM algorithm based on the max-eigenvalue sharpness.
- NeurIPS 2024 How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective
Qiaozhe Zhang, Ruijie Zhang, Jun Sun, Yingzhuang Liu
The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024
@article{zhang2024sparse, title={How sparse can we prune a deep network: A fundamental limit perspective}, author={Zhang, Qiaozhe and Zhang, Ruijie and Sun, Jun and Liu, Yingzhuang}, journal={Advances in Neural Information Processing Systems}, volume={37}, pages={91337--91372}, year={2024} }Network pruning is a commonly used measure to alleviate the storage and computational burden of deep neural networks. However, the fundamental limit of network pruning is still lacking. To close the gap, in this work we'll take a first-principles approach, i.e. we'll directly impose the sparsity constraint on the loss function and leverage the framework of statistical dimension in convex geometry, thus enabling us to characterize the sharp phase transition point, which can be regarded as the fundamental limit of the pruning ratio. Through this limit, we're able to identify two key factors that determine the pruning ratio limit, namely, weight magnitude and network sharpness. Generally speaking, the flatter the loss landscape or the smaller the weight magnitude, the smaller pruning ratio. Moreover, we provide efficient countermeasures to address the challenges in the computation of the pruning limit, which mainly involves the accurate spectrum estimation of a large-scale and non-positive Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we can also provide rigorous interpretations on several heuristics in existing pruning algorithms. Extensive experiments are performed which demonstrate that our theoretical pruning ratio threshold coincides very well with the experiments.
- Preprint Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification
Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang
arXiv 2023
@article{zhang2023multi, title={Multi-level multiple instance learning with transformer for whole slide image classification}, author={Zhang, Ruijie and Zhang, Qiaozhe and Liu, Yingzhuang and Xin, Hao and Liu, Yan and Wang, Xinggang}, journal={arXiv preprint arXiv:2306.05029}, year={2023} }Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make employing deep learning methods for WSI-based digital diagnosis challenging. Recently integrating multiple instance learning (MIL) and Transformer for WSI analysis shows very promising results. However, designing effective Transformers for this weakly-supervised high-resolution image analysis is an underexplored yet important problem. In this paper, we propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks. To validate its effectiveness, we conducted a set of experiments on WSI classification tasks, where MMIL-Transformer demonstrate superior performance compared to existing state-of-the-art methods, i.e., 96.80% test AUC and 97.67% test accuracy on the CAMELYON16 dataset, 99.04% test AUC and 94.37% test accuracy on the TCGA-NSCLC dataset, respectively.
- ACM MM 2018 Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network
Xin Yany, Jinyu Chen, Zhiwei Wang, Qiaozhe Zhang, Wenyu Liu, Chunyuan Liao, Kwang-Ting Cheng
Proceedings of the 26th ACM international conference on Multimedia (ACM MM), 2018
@inproceedings{yang2018monocular, title={Monocular camera based real-time dense mapping using generative adversarial network}, author={Yang, Xin and Chen, Jinyu and Wang, Zhiwei and Zhang, Qiaozhe and Liu, Wenyu and Liao, Chunyuan and Cheng, Kwang-Ting}, booktitle={Proceedings of the 26th ACM international conference on Multimedia}, pages={896--904}, year={2018} }Monocular simultaneous localization and mapping (SLAM) is a key enabling technique for many computer vision and robotics applications. However, existing methods either can obtain only sparse or semi-dense maps in highly-textured image areas or fail to achieve a satisfactory reconstruction accuracy. In this paper, we present a new method based on a generative adversarial network,named DM-GAN, for real-time dense mapping based on a monocular camera. Specifcally, our depth generator network takes a semidense map obtained from motion stereo matching as a guidance to supervise dense depth prediction of a single RGB image. The depth generator is trained based on a combination of two loss functions, i.e. an adversarial loss for enforcing the generated depth maps to reside on the manifold of the true depth maps and a pixel-wise mean square error (MSE) for ensuring the correct absolute depth values. Extensive experiments on three public datasets demonstrate that our DM-GAN signifcantly outperforms the state-of-the-art methods in terms of greater reconstruction accuracy and higher depth completeness.
Services
Conference Reviewers
- Neural Information Processing Systems (NeurIPS) 2025
- International Conference on Learning Representations (ICLR) 2026