About

I am a Ph.D. student in EIC at Huazhong University of Science and Technology, supervised by Prof. Jun Sun and Prof. Yingzhuang Liu.

Previously, I received my B.Eng degree (2014-2018) in Electronic Information Engineering from Huazhong University of Science and Technology, worked with Prof. Xin Yang.

Research Interests

My research interests broadly lie in the theoretical understanding of deep learning. I am particularly fascinated by simple yet principled approaches that can shed light on the fundamental capabilities and limitations of modern models, and I aim to develop frameworks that bridge theory and practice—both explaining why existing methods work and guiding the design of future ones.

These days, I am mostly drawn by three interrelated directions: model compression and capacity, generalization, and robustness. These aspects are central to efficiency, reliability, and interpretability of deep learning, and I am especially interested in developing theoretical tools that help us better understand the trade-offs between them.

I am open to academic collaborations, feel free to get in touch if you are interested.

Publications

Preprint Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization
Qiaozhe Zhang, Jun Sun, Ruijie Zhang, Yingzhuang Liu
arXiv 2025

pdf code

@misc{zhang2025renyisharpnessnovelsharpness,
    title={R\'enyi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization}, 
    author={Qiaozhe Zhang and Jun Sun and Ruijie Zhang and Yingzhuang Liu},
    year={2025},
    eprint={2510.07758},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2510.07758}, 
}

Sharpness (of the loss minima) is a common measure to investigate the generalization of neural networks. Intuitively speaking, the flatter the landscape near the minima is, the better generalization might be. Unfortunately, the correlation between many existing sharpness measures and the generalization is usually not strong, sometimes even weak. To close the gap between the intuition and the reality, we propose a novel sharpness measure, i.e., \textit{Rényi sharpness}, which is defined as the negative Rényi entropy (a generalization of the classical Shannon entropy) of the loss Hessian. The main ideas are as follows: 1) we realize that \textit{uniform} (identical) eigenvalues of the loss Hessian is most desirable (while keeping the sum constant) to achieve good generalization; 2) we employ the \textit{Rényi entropy} to concisely characterize the extent of the spread of the eigenvalues of loss Hessian. Normally, the larger the spread, the smaller the (Rényi) entropy. To rigorously establish the relationship between generalization and (Rényi) sharpness, we provide several generalization bounds in terms of Rényi sharpness, by taking advantage of the reparametrization invariance property of Rényi sharpness, as well as the trick of translating the data discrepancy to the weight perturbation. Furthermore, extensive experiments are conducted to verify the strong correlation (in specific, Kendall rank correlation) between the Rényi sharpness and generalization. Moreover, we propose to use a variant of Rényi Sharpness as regularizer during training, i.e., Rényi Sharpness Aware Minimization (RSAM), which turns out to outperform all existing sharpness-aware minimization methods. It is worthy noting that the test accuracy gain of our proposed RSAM method could be as high as nearly 2.5%, compared against the classical SAM method.

NeurIPS 2024 How Sparse Can We Prune A Deep Network: A Fundamental Limit Perspective
Qiaozhe Zhang, Ruijie Zhang, Jun Sun, Yingzhuang Liu
The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

pdf code NeurIPS

@article{zhang2024sparse,
    title={How sparse can we prune a deep network: A fundamental limit perspective},
    author={Zhang, Qiaozhe and Zhang, Ruijie and Sun, Jun and Liu, Yingzhuang},
    journal={Advances in Neural Information Processing Systems},
    volume={37},
    pages={91337--91372},
    year={2024}
}

Network pruning is a commonly used measure to alleviate the storage and computational burden of deep neural networks. However, the fundamental limit of network pruning is still lacking. To close the gap, in this work we'll take a first-principles approach, i.e. we'll directly impose the sparsity constraint on the loss function and leverage the framework of statistical dimension in convex geometry, thus enabling us to characterize the sharp phase transition point, which can be regarded as the fundamental limit of the pruning ratio. Through this limit, we're able to identify two key factors that determine the pruning ratio limit, namely, weight magnitude and network sharpness. Generally speaking, the flatter the loss landscape or the smaller the weight magnitude, the smaller pruning ratio. Moreover, we provide efficient countermeasures to address the challenges in the computation of the pruning limit, which mainly involves the accurate spectrum estimation of a large-scale and non-positive Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we can also provide rigorous interpretations on several heuristics in existing pruning algorithms. Extensive experiments are performed which demonstrate that our theoretical pruning ratio threshold coincides very well with the experiments.

Preprint Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification
Ruijie Zhang, Qiaozhe Zhang, Yingzhuang Liu, Hao Xin, Yan Liu, Xinggang Wang
arXiv 2023

pdf code

@article{zhang2023multi,
      title={Multi-level multiple instance learning with transformer for whole slide image classification},
      author={Zhang, Ruijie and Zhang, Qiaozhe and Liu, Yingzhuang and Xin, Hao and Liu, Yan and Wang, Xinggang},
      journal={arXiv preprint arXiv:2306.05029},
      year={2023}
}

Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make employing deep learning methods for WSI-based digital diagnosis challenging. Recently integrating multiple instance learning (MIL) and Transformer for WSI analysis shows very promising results. However, designing effective Transformers for this weakly-supervised high-resolution image analysis is an underexplored yet important problem. In this paper, we propose a Multi-level MIL (MMIL) scheme by introducing a hierarchical structure to MIL, which enables efficient handling of MIL tasks involving a large number of instances. Based on MMIL, we instantiated MMIL-Transformer, an efficient Transformer model with windowed exact self-attention for large-scale MIL tasks. To validate its effectiveness, we conducted a set of experiments on WSI classification tasks, where MMIL-Transformer demonstrate superior performance compared to existing state-of-the-art methods, i.e., 96.80% test AUC and 97.67% test accuracy on the CAMELYON16 dataset, 99.04% test AUC and 94.37% test accuracy on the TCGA-NSCLC dataset, respectively.

ACM MM 2018 Monocular Camera Based Real-Time Dense Mapping Using Generative Adversarial Network
Xin Yany, Jinyu Chen, Zhiwei Wang, Qiaozhe Zhang, Wenyu Liu, Chunyuan Liao, Kwang-Ting Cheng
Proceedings of the 26th ACM international conference on Multimedia (ACM MM), 2018

pdf

@inproceedings{yang2018monocular,
      title={Monocular camera based real-time dense mapping using generative adversarial network},
      author={Yang, Xin and Chen, Jinyu and Wang, Zhiwei and Zhang, Qiaozhe and Liu, Wenyu and Liao, Chunyuan and Cheng, Kwang-Ting},
      booktitle={Proceedings of the 26th ACM international conference on Multimedia},
      pages={896--904},
      year={2018}
}

Monocular simultaneous localization and mapping (SLAM) is a key enabling technique for many computer vision and robotics applications. However, existing methods either can obtain only sparse or semi-dense maps in highly-textured image areas or fail to achieve a satisfactory reconstruction accuracy. In this paper, we present a new method based on a generative adversarial network,named DM-GAN, for real-time dense mapping based on a monocular camera. Specifcally, our depth generator network takes a semidense map obtained from motion stereo matching as a guidance to supervise dense depth prediction of a single RGB image. The depth generator is trained based on a combination of two loss functions, i.e. an adversarial loss for enforcing the generated depth maps to reside on the manifold of the true depth maps and a pixel-wise mean square error (MSE) for ensuring the correct absolute depth values. Extensive experiments on three public datasets demonstrate that our DM-GAN signifcantly outperforms the state-of-the-art methods in terms of greater reconstruction accuracy and higher depth completeness.

Services

Conference Reviewers

Neural Information Processing Systems (NeurIPS) 2025
- Top Reviewer of NeurIPS 2025
International Conference on Learning Representations (ICLR) 2026