All papers that have not been peer-reviewed will not appear here, including preprints. You can access my all of papers at 🔗Google Scholar.
Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Benyou Wang†(† corresponding author)
arXiv 2024 Conference
We introduce HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. It identifies mistakes, explores alternative strategies, and refines its answers by leveraging a specialized medical verifier. The model enhances reasoning through two key approaches: guiding complex reasoning trajectories for fine-tuning with the verifier and applying reinforcement learning (PPO) with verifier-based rewards to further improve reasoning.
Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang†(† corresponding author)
arXiv 2024 Conference
We introduce Med-MAT, a VQA dataset comprising 106 open-source medical datasets designed to advance generalization experiments and support the training of powerful medical multimodal large language models (MLLMs). This dataset highlights Compositional Generalization (CG) as a key mechanism, enabling MLLMs to better understand unseen images and achieve more data-efficient training.
Yaofei Duan, Patrick Cheong-Iao Pang, Ping He, Rongsheng Wang, Yue Sun, Chuntao Liu, Xiaorong Zhang, Xirong Yuan, Pengjie Song, Chan-Tong Lam, Ligang Cui, Tao Tan†(† corresponding author)
IEEE Journal of Biomedical and Health Informatics 2024 Journal
This study introduces "Multi-modal Multi-task Network" (3MT-Net), a deep learning architecture using clinical data, B-mode, and color Doppler ultrasound. 3MT-Net employs AM-CapsNet for tumor feature extraction, cross-attention for data fusion, and ensemble learning for optimization. Extensive testing on two datasets showed 3MT-Net outperforms the industrial-grade CAD product S-detect, achieving higher AUC.
Lin Li, Rongsheng Wang, Qimin Yang, Jiexin Chen, Patrick Cheong-Iao Pang, Yapeng Wang, Ka-Hou Chan, Tao Tan, Jie Ma†(† corresponding author)
RSNA’s Cutting-Edge Research 2024 ConferenceOral
We introduce XrayGLM, a conversational medical visual language model that analyzes and summarizes chest X-rays, aimed at improving domain-specific expertise for radiology tasks compared to general large models.
Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan†(† corresponding author)
Long-Context Foundation Models (LCFM) at ICML 2024 2024 ConferencePoster
This study investigates the decline in long-context understanding for medical LLMs after domain-specific fine-tuning, conducting experiments to determine the best composition of general and medical training data to balance diagnostic knowledge with comprehensive reading abilities.
Xiaojuan Xue, Deshiwei Zhang, Chengyang Sun, Yiqiao Shi, Rongsheng Wang, Tao Tan, Peng Gao, Sujie Fan, Guangtao Zhai, Menghan Hu, Yue Wu†(† corresponding author)
Computers in Biology and Medicine 2024 Journal
We introduce Xiaoqing, an LLM model tailored for glaucoma developed through comparative and experiential experiments, demonstrating it can better serve glaucoma patients and medical research compared to general and clinical AI assistants by providing more informative and readable responses to glaucoma-related questions in Chinese.
Yaofei Duan, Rongsheng Wang, Tao Tan†, Xiaoyan Jin, ChanTong Lam†, Sio-Kei Im†(† corresponding author)
Interdisciplinary Nursing Research 2023 Journal
We employed the few-shot object detection strategy and trained the Faster R-CNN detector with the mainland data set as the base class, followed by fine-tuning with the few-shot approach on the Macau RDT result data set. Moreover, we introduced 2 novel data augmentation methods, namely the “light simulation mask method” and “synthetic positive samples” for an unbalanced data set, to increase the sample size and balance the data set of the RDT detection task.
Dashun Zheng, Rongsheng Wang, Yaofei Duan, Patrick Cheong-Iao Pang†, Tao Tan(† corresponding author)
Visual Computing for Industry, Biomedicine, and Art 2023 Journal
We propose a lightweight Focus-RCNet using depthwise separable convolutions and a channel attention module for real-time recyclable waste classification, achieving state-of-the-art accuracy on public datasets while being compact for embedded applications through knowledge distillation.
Rongsheng Wang, Yaofei Duan, Yukun Li, Dashun Zheng, Xiaohong Liu, ChanTong Lam, Tao Tan†(† corresponding author)
The Visual Computer 2023 Journal
We propose a Parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for electrocardiogram heart sound classification, which combines CNNs and a transformer encoder to extract hierarchical features and achieves state-of-the-art performance on publicly available datasets.
Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexin Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan†(† corresponding author)
2023 CAAI International Conference on Artificial Intelligence 2023 ConferencePoster
We propose IvyGPT, a large language model trained on medical question-answering data and reinforced with human feedback, achieving state-of-the-art performance for clinical conversational agents while containing over 33 billion parameters manageably within a small GPU cluster.
Rongsheng Wang, Yaofei Duan, Menghan Hu, Xiaohong Liu, Yukun Li, Qinquan Gao, Tong Tong, Tao Tan†(† corresponding author)
Displays 2023 Journal
We propose LightR-YOLOv5, a compact detector for SARS-CoV-2 antigen rapid test results that uses a lightweight feature extractor and attention modules to localize results, outperforming other object detectors while being only 2.03MB in size for efficient deployment as a verification tool.
Yaofei Duan†, Rongsheng Wang, Yukun Li(† corresponding author)
2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE) 2023 ConferencePoster
We propose an Aux-ViT model for Alzheimer's diagnosis using MRI that adds an auxiliary branch to the Vision Transformer backbone to preserve shallow features and reduce overfitting, achieving improved accuracy over the baseline ViT model through multi-scale data preprocessing and augmentation techniques.
Hui Ning†, Rongsheng Wang, Pengwei Yang(† corresponding author)
Advances in International Computer Science 2022 Journal
We propose BGANR, a recommendation model that applies bidirectional graph attention on knowledge graphs to capture symmetric relationships and uses a dynamic activation function to overcome gradient vanishing, outperforming state-of-the-art methods on benchmark datasets.
Rongsheng Wang, Yukun Li, Yaofei Duan, Tao Tan†(† corresponding author)
2022 6th International Conference on Universal Village (UV) 2022 Conference
We propose EfficientNet-YOLOv5 for marine microalgae detection to address challenges of tiny objects and unequal categories, achieving improved accuracy over baseline models on microscopy datasets.