All papers that have not been peer-reviewed will not appear here, including preprints. You can access my all of papers at 🔗Google Scholar.

2024

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. It identifies mistakes, explores alternative strategies, and refines its answers by leveraging a specialized medical verifier. The model enhances reasoning through two key approaches: guiding complex reasoning trajectories for fine-tuning with the verifier and applying reinforcement learning (PPO) with verifier-based rewards to further improve reasoning.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. It identifies mistakes, explores alternative strategies, and refines its answers by leveraging a specialized medical verifier. The model enhances reasoning through two key approaches: guiding complex reasoning trajectories for fine-tuning with the verifier and applying reinforcement learning (PPO) with verifier-based rewards to further improve reasoning.

Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce Med-MAT, a VQA dataset comprising 106 open-source medical datasets designed to advance generalization experiments and support the training of powerful medical multimodal large language models (MLLMs). This dataset highlights Compositional Generalization (CG) as a key mechanism, enabling MLLMs to better understand unseen images and achieve more data-efficient training.

Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce Med-MAT, a VQA dataset comprising 106 open-source medical datasets designed to advance generalization experiments and support the training of powerful medical multimodal large language models (MLLMs). This dataset highlights Compositional Generalization (CG) as a key mechanism, enabling MLLMs to better understand unseen images and achieve more data-efficient training.

3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study
3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study

Yaofei Duan, Patrick Cheong-Iao Pang, Ping He, Rongsheng Wang, Yue Sun, Chuntao Liu, Xiaorong Zhang, Xirong Yuan, Pengjie Song, Chan-Tong Lam, Ligang Cui, Tao Tan†(† corresponding author)

IEEE Journal of Biomedical and Health Informatics 2024 Journal

This study introduces "Multi-modal Multi-task Network" (3MT-Net), a deep learning architecture using clinical data, B-mode, and color Doppler ultrasound. 3MT-Net employs AM-CapsNet for tumor feature extraction, cross-attention for data fusion, and ensemble learning for optimization. Extensive testing on two datasets showed 3MT-Net outperforms the industrial-grade CAD product S-detect, achieving higher AUC.

3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study
3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study

Yaofei Duan, Patrick Cheong-Iao Pang, Ping He, Rongsheng Wang, Yue Sun, Chuntao Liu, Xiaorong Zhang, Xirong Yuan, Pengjie Song, Chan-Tong Lam, Ligang Cui, Tao Tan†(† corresponding author)

IEEE Journal of Biomedical and Health Informatics 2024 Journal

This study introduces "Multi-modal Multi-task Network" (3MT-Net), a deep learning architecture using clinical data, B-mode, and color Doppler ultrasound. 3MT-Net employs AM-CapsNet for tumor feature extraction, cross-attention for data fusion, and ensemble learning for optimization. Extensive testing on two datasets showed 3MT-Net outperforms the industrial-grade CAD product S-detect, achieving higher AUC.

XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model
XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model

Lin Li, Rongsheng Wang, Qimin Yang, Jiexin Chen, Patrick Cheong-Iao Pang, Yapeng Wang, Ka-Hou Chan, Tao Tan, Jie Ma†(† corresponding author)

RSNA’s Cutting-Edge Research 2024 ConferenceOral

We introduce XrayGLM, a conversational medical visual language model that analyzes and summarizes chest X-rays, aimed at improving domain-specific expertise for radiology tasks compared to general large models.

XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model
XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model

Lin Li, Rongsheng Wang, Qimin Yang, Jiexin Chen, Patrick Cheong-Iao Pang, Yapeng Wang, Ka-Hou Chan, Tao Tan, Jie Ma†(† corresponding author)

RSNA’s Cutting-Edge Research 2024 ConferenceOral

We introduce XrayGLM, a conversational medical visual language model that analyzes and summarizes chest X-rays, aimed at improving domain-specific expertise for radiology tasks compared to general large models.

Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan†(† corresponding author)

Long-Context Foundation Models (LCFM) at ICML 2024 2024 ConferencePoster

This study investigates the decline in long-context understanding for medical LLMs after domain-specific fine-tuning, conducting experiments to determine the best composition of general and medical training data to balance diagnostic knowledge with comprehensive reading abilities.

Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan†(† corresponding author)

Long-Context Foundation Models (LCFM) at ICML 2024 2024 ConferencePoster

This study investigates the decline in long-context understanding for medical LLMs after domain-specific fine-tuning, conducting experiments to determine the best composition of general and medical training data to balance diagnostic knowledge with comprehensive reading abilities.

Xiaoqing: A Q&A Model for Glaucoma Based on LLMs
Xiaoqing: A Q&A Model for Glaucoma Based on LLMs

Xiaojuan Xue, Deshiwei Zhang, Chengyang Sun, Yiqiao Shi, Rongsheng Wang, Tao Tan, Peng Gao, Sujie Fan, Guangtao Zhai, Menghan Hu, Yue Wu†(† corresponding author)

Computers in Biology and Medicine 2024 Journal

We introduce Xiaoqing, an LLM model tailored for glaucoma developed through comparative and experiential experiments, demonstrating it can better serve glaucoma patients and medical research compared to general and clinical AI assistants by providing more informative and readable responses to glaucoma-related questions in Chinese.

Xiaoqing: A Q&A Model for Glaucoma Based on LLMs
Xiaoqing: A Q&A Model for Glaucoma Based on LLMs

Xiaojuan Xue, Deshiwei Zhang, Chengyang Sun, Yiqiao Shi, Rongsheng Wang, Tao Tan, Peng Gao, Sujie Fan, Guangtao Zhai, Menghan Hu, Yue Wu†(† corresponding author)

Computers in Biology and Medicine 2024 Journal

We introduce Xiaoqing, an LLM model tailored for glaucoma developed through comparative and experiential experiments, demonstrating it can better serve glaucoma patients and medical research compared to general and clinical AI assistants by providing more informative and readable responses to glaucoma-related questions in Chinese.

2023

RDT-FSDet: Few-shot Object Detection for Rapid Antigen Test
RDT-FSDet: Few-shot Object Detection for Rapid Antigen Test

Yaofei Duan, Rongsheng Wang, Tao Tan†, Xiaoyan Jin, ChanTong Lam†, Sio-Kei Im†(† corresponding author)

Interdisciplinary Nursing Research 2023 Journal

We employed the few-shot object detection strategy and trained the Faster R-CNN detector with the mainland data set as the base class, followed by fine-tuning with the few-shot approach on the Macau RDT result data set. Moreover, we introduced 2 novel data augmentation methods, namely the “light simulation mask method” and “synthetic positive samples” for an unbalanced data set, to increase the sample size and balance the data set of the RDT detection task.

RDT-FSDet: Few-shot Object Detection for Rapid Antigen Test
RDT-FSDet: Few-shot Object Detection for Rapid Antigen Test

Yaofei Duan, Rongsheng Wang, Tao Tan†, Xiaoyan Jin, ChanTong Lam†, Sio-Kei Im†(† corresponding author)

Interdisciplinary Nursing Research 2023 Journal

We employed the few-shot object detection strategy and trained the Faster R-CNN detector with the mainland data set as the base class, followed by fine-tuning with the few-shot approach on the Macau RDT result data set. Moreover, we introduced 2 novel data augmentation methods, namely the “light simulation mask method” and “synthetic positive samples” for an unbalanced data set, to increase the sample size and balance the data set of the RDT detection task.

Focus-RCNet: A Lightweight Recyclable Waste Classification Algorithm Based on Focus and Knowledge Distillation
Focus-RCNet: A Lightweight Recyclable Waste Classification Algorithm Based on Focus and Knowledge Distillation

Dashun Zheng, Rongsheng Wang, Yaofei Duan, Patrick Cheong-Iao Pang†, Tao Tan(† corresponding author)

Visual Computing for Industry, Biomedicine, and Art 2023 Journal

We propose a lightweight Focus-RCNet using depthwise separable convolutions and a channel attention module for real-time recyclable waste classification, achieving state-of-the-art accuracy on public datasets while being compact for embedded applications through knowledge distillation.

Focus-RCNet: A Lightweight Recyclable Waste Classification Algorithm Based on Focus and Knowledge Distillation
Focus-RCNet: A Lightweight Recyclable Waste Classification Algorithm Based on Focus and Knowledge Distillation

Dashun Zheng, Rongsheng Wang, Yaofei Duan, Patrick Cheong-Iao Pang†, Tao Tan(† corresponding author)

Visual Computing for Industry, Biomedicine, and Art 2023 Journal

We propose a lightweight Focus-RCNet using depthwise separable convolutions and a channel attention module for real-time recyclable waste classification, achieving state-of-the-art accuracy on public datasets while being compact for embedded applications through knowledge distillation.

PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis
PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis

Rongsheng Wang, Yaofei Duan, Yukun Li, Dashun Zheng, Xiaohong Liu, ChanTong Lam, Tao Tan†(† corresponding author)

The Visual Computer 2023 Journal

We propose a Parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for electrocardiogram heart sound classification, which combines CNNs and a transformer encoder to extract hierarchical features and achieves state-of-the-art performance on publicly available datasets.

PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis
PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis

Rongsheng Wang, Yaofei Duan, Yukun Li, Dashun Zheng, Xiaohong Liu, ChanTong Lam, Tao Tan†(† corresponding author)

The Visual Computer 2023 Journal

We propose a Parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for electrocardiogram heart sound classification, which combines CNNs and a transformer encoder to extract hierarchical features and achieves state-of-the-art performance on publicly available datasets.

IvyGPT: Interactive Chinese Pathway Language Model in the Medical Domain
IvyGPT: Interactive Chinese Pathway Language Model in the Medical Domain

Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexin Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan†(† corresponding author)

2023 CAAI International Conference on Artificial Intelligence 2023 ConferencePoster

We propose IvyGPT, a large language model trained on medical question-answering data and reinforced with human feedback, achieving state-of-the-art performance for clinical conversational agents while containing over 33 billion parameters manageably within a small GPU cluster.

IvyGPT: Interactive Chinese Pathway Language Model in the Medical Domain
IvyGPT: Interactive Chinese Pathway Language Model in the Medical Domain

Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexin Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan†(† corresponding author)

2023 CAAI International Conference on Artificial Intelligence 2023 ConferencePoster

We propose IvyGPT, a large language model trained on medical question-answering data and reinforced with human feedback, achieving state-of-the-art performance for clinical conversational agents while containing over 33 billion parameters manageably within a small GPU cluster.

LightR-YOLOv5: A Compact Rotating Detector for SARS-CoV-2 Antigen-Detection Rapid Diagnostic Test Results
LightR-YOLOv5: A Compact Rotating Detector for SARS-CoV-2 Antigen-Detection Rapid Diagnostic Test Results

Rongsheng Wang, Yaofei Duan, Menghan Hu, Xiaohong Liu, Yukun Li, Qinquan Gao, Tong Tong, Tao Tan†(† corresponding author)

Displays 2023 Journal

We propose LightR-YOLOv5, a compact detector for SARS-CoV-2 antigen rapid test results that uses a lightweight feature extractor and attention modules to localize results, outperforming other object detectors while being only 2.03MB in size for efficient deployment as a verification tool.

LightR-YOLOv5: A Compact Rotating Detector for SARS-CoV-2 Antigen-Detection Rapid Diagnostic Test Results
LightR-YOLOv5: A Compact Rotating Detector for SARS-CoV-2 Antigen-Detection Rapid Diagnostic Test Results

Rongsheng Wang, Yaofei Duan, Menghan Hu, Xiaohong Liu, Yukun Li, Qinquan Gao, Tong Tong, Tao Tan†(† corresponding author)

Displays 2023 Journal

We propose LightR-YOLOv5, a compact detector for SARS-CoV-2 antigen rapid test results that uses a lightweight feature extractor and attention modules to localize results, outperforming other object detectors while being only 2.03MB in size for efficient deployment as a verification tool.

Aux-ViT: Classification of Alzheimer's Disease from MRI based on Vision Transformer with Auxiliary Branch
Aux-ViT: Classification of Alzheimer's Disease from MRI based on Vision Transformer with Auxiliary Branch

Yaofei Duan†, Rongsheng Wang, Yukun Li(† corresponding author)

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE) 2023 ConferencePoster

We propose an Aux-ViT model for Alzheimer's diagnosis using MRI that adds an auxiliary branch to the Vision Transformer backbone to preserve shallow features and reduce overfitting, achieving improved accuracy over the baseline ViT model through multi-scale data preprocessing and augmentation techniques.

Aux-ViT: Classification of Alzheimer's Disease from MRI based on Vision Transformer with Auxiliary Branch
Aux-ViT: Classification of Alzheimer's Disease from MRI based on Vision Transformer with Auxiliary Branch

Yaofei Duan†, Rongsheng Wang, Yukun Li(† corresponding author)

2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE) 2023 ConferencePoster

We propose an Aux-ViT model for Alzheimer's diagnosis using MRI that adds an auxiliary branch to the Vision Transformer backbone to preserve shallow features and reduce overfitting, achieving improved accuracy over the baseline ViT model through multi-scale data preprocessing and augmentation techniques.

2022

A Neural Network Recommender Algorithm Based on Bidirectional Graph Attention
A Neural Network Recommender Algorithm Based on Bidirectional Graph Attention

Hui Ning†, Rongsheng Wang, Pengwei Yang(† corresponding author)

Advances in International Computer Science 2022 Journal

We propose BGANR, a recommendation model that applies bidirectional graph attention on knowledge graphs to capture symmetric relationships and uses a dynamic activation function to overcome gradient vanishing, outperforming state-of-the-art methods on benchmark datasets.

A Neural Network Recommender Algorithm Based on Bidirectional Graph Attention
A Neural Network Recommender Algorithm Based on Bidirectional Graph Attention

Hui Ning†, Rongsheng Wang, Pengwei Yang(† corresponding author)

Advances in International Computer Science 2022 Journal

We propose BGANR, a recommendation model that applies bidirectional graph attention on knowledge graphs to capture symmetric relationships and uses a dynamic activation function to overcome gradient vanishing, outperforming state-of-the-art methods on benchmark datasets.

EfficientNet-YOLOv5: Improved YOLOv5 Based on EfficientNet Backbone for Object Detection on Marine Microalgae
EfficientNet-YOLOv5: Improved YOLOv5 Based on EfficientNet Backbone for Object Detection on Marine Microalgae

Rongsheng Wang, Yukun Li, Yaofei Duan, Tao Tan†(† corresponding author)

2022 6th International Conference on Universal Village (UV) 2022 Conference

We propose EfficientNet-YOLOv5 for marine microalgae detection to address challenges of tiny objects and unequal categories, achieving improved accuracy over baseline models on microscopy datasets.

EfficientNet-YOLOv5: Improved YOLOv5 Based on EfficientNet Backbone for Object Detection on Marine Microalgae
EfficientNet-YOLOv5: Improved YOLOv5 Based on EfficientNet Backbone for Object Detection on Marine Microalgae

Rongsheng Wang, Yukun Li, Yaofei Duan, Tao Tan†(† corresponding author)

2022 6th International Conference on Universal Village (UV) 2022 Conference

We propose EfficientNet-YOLOv5 for marine microalgae detection to address challenges of tiny objects and unequal categories, achieving improved accuracy over baseline models on microscopy datasets.