Rongsheng Wang

Macao Polytechnic University M.S., Macao Polytechnic University (2024)

I am Rongsheng Wang, I used to study at Macao Polytechnic University (MPU) and work under the supervision of Prof.Tao Tan. My research interests are computer vision, LLM, trusted LLM, RAG, and Agent systems. I am also interested in generative data research. I am currently seeking a PhD opportunity.
I publish a lot of open source projects on 🔗Github GitHub User's stars. I shared some of the datasets and model weights used by the project at 🔗HuggingFace. I likewise like to post some small-parameter LM at 🔗Ollama to help people use it locally.


Education
  • Macao Polytechnic University

    Macao Polytechnic University

    M.S. in Big Data and Internet of Things Sep. 2022 - Jul. 2024

  • Henan Polytechnic University

    Henan Polytechnic University

    B.S. in Computer Science (AI) Sep. 2018 - Jul. 2022

Honors & Awards
  • Outstanding Award of JingDong Health - Global AI Innovation Competition 2024
  • First Prize of Baidu PaddlePaddle AGI Hackathon 2024
  • Third Prize of Digital Medical Technology and Application Innovation Competition 2023
  • Third Prize of Baichuan Intelligence and Amazon Cloud AGI Hackathon 2023
  • Silver of Kaggle RSNA Screening Mammography Breast Cancer Detection 2023
  • Outstanding Award of IEEE UV 2022 "Vision Meets Algae" Object Detection Challenge 2022
  • Baidu PaddlePaddle Developers Experts 2021
Experience
  • Qiyuan.Tech

    Qiyuan.Tech

    CTO, Reasercher Oct. 2023 - Now

  • HKUST (GZ)

    HKUST (GZ)

    Research Assistant (Supervisor is Yun Bai and Xiang Liu) Feb. 2024 - May 2024

News
2025
We have launched a blog website to introduction our latest work. Welcome to visit! Visit My Blog
Jan 16
2024
AI-Powered Robot Chef Wins International Culinary Competition Featured
Nov 05
AI Transforms Music Industry: First AI-Composed Symphony Debuts in New York
Oct 19
First Human Settlement Established on Mars, Marking New Era of Space Exploration. Read more
Jan 30
Selected Publications (view all )
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. It identifies mistakes, explores alternative strategies, and refines its answers by leveraging a specialized medical verifier. The model enhances reasoning through two key approaches: guiding complex reasoning trajectories for fine-tuning with the verifier and applying reinforcement learning (PPO) with verifier-based rewards to further improve reasoning.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Junying Chen, Zhenyang Cai, Ke Ji, Xidong Wang, Wanlong Liu, Rongsheng Wang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce HuatuoGPT-o1, a medical LLM designed for advanced medical reasoning. It identifies mistakes, explores alternative strategies, and refines its answers by leveraging a specialized medical verifier. The model enhances reasoning through two key approaches: guiding complex reasoning trajectories for fine-tuning with the verifier and applying reinforcement learning (PPO) with verifier-based rewards to further improve reasoning.

Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce Med-MAT, a VQA dataset comprising 106 open-source medical datasets designed to advance generalization experiments and support the training of powerful medical multimodal large language models (MLLMs). This dataset highlights Compositional Generalization (CG) as a key mechanism, enabling MLLMs to better understand unseen images and achieve more data-efficient training.

Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging
Med-MAT: On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Zhenyang Cai, Junying Chen, Rongsheng Wang, Weihong Wang, Yonglin Deng, Dingjie Song, Yize Chen, Zixu Zhang, Benyou Wang†(† corresponding author)

arXiv 2024 Conference

We introduce Med-MAT, a VQA dataset comprising 106 open-source medical datasets designed to advance generalization experiments and support the training of powerful medical multimodal large language models (MLLMs). This dataset highlights Compositional Generalization (CG) as a key mechanism, enabling MLLMs to better understand unseen images and achieve more data-efficient training.

3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study
3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study

Yaofei Duan, Patrick Cheong-Iao Pang, Ping He, Rongsheng Wang, Yue Sun, Chuntao Liu, Xiaorong Zhang, Xirong Yuan, Pengjie Song, Chan-Tong Lam, Ligang Cui, Tao Tan†(† corresponding author)

IEEE Journal of Biomedical and Health Informatics 2024 Journal

This study introduces "Multi-modal Multi-task Network" (3MT-Net), a deep learning architecture using clinical data, B-mode, and color Doppler ultrasound. 3MT-Net employs AM-CapsNet for tumor feature extraction, cross-attention for data fusion, and ensemble learning for optimization. Extensive testing on two datasets showed 3MT-Net outperforms the industrial-grade CAD product S-detect, achieving higher AUC.

3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study
3MT-Net:A Multi-modal Multi-task Model for Breast Cancer and Pathological Subtype Classification Based on a Multicenter Study

Yaofei Duan, Patrick Cheong-Iao Pang, Ping He, Rongsheng Wang, Yue Sun, Chuntao Liu, Xiaorong Zhang, Xirong Yuan, Pengjie Song, Chan-Tong Lam, Ligang Cui, Tao Tan†(† corresponding author)

IEEE Journal of Biomedical and Health Informatics 2024 Journal

This study introduces "Multi-modal Multi-task Network" (3MT-Net), a deep learning architecture using clinical data, B-mode, and color Doppler ultrasound. 3MT-Net employs AM-CapsNet for tumor feature extraction, cross-attention for data fusion, and ensemble learning for optimization. Extensive testing on two datasets showed 3MT-Net outperforms the industrial-grade CAD product S-detect, achieving higher AUC.

XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model
XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model

Lin Li, Rongsheng Wang, Qimin Yang, Jiexin Chen, Patrick Cheong-Iao Pang, Yapeng Wang, Ka-Hou Chan, Tao Tan, Jie Ma†(† corresponding author)

RSNA’s Cutting-Edge Research 2024 ConferenceOral

We introduce XrayGLM, a conversational medical visual language model that analyzes and summarizes chest X-rays, aimed at improving domain-specific expertise for radiology tasks compared to general large models.

XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model
XrayGLM: Summarizing Chest X-ray Reports Using a Large Medical Visual Language Model

Lin Li, Rongsheng Wang, Qimin Yang, Jiexin Chen, Patrick Cheong-Iao Pang, Yapeng Wang, Ka-Hou Chan, Tao Tan, Jie Ma†(† corresponding author)

RSNA’s Cutting-Edge Research 2024 ConferenceOral

We introduce XrayGLM, a conversational medical visual language model that analyzes and summarizes chest X-rays, aimed at improving domain-specific expertise for radiology tasks compared to general large models.

Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan†(† corresponding author)

Long-Context Foundation Models (LCFM) at ICML 2024 2024 ConferencePoster

This study investigates the decline in long-context understanding for medical LLMs after domain-specific fine-tuning, conducting experiments to determine the best composition of general and medical training data to balance diagnostic knowledge with comprehensive reading abilities.

Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan†(† corresponding author)

Long-Context Foundation Models (LCFM) at ICML 2024 2024 ConferencePoster

This study investigates the decline in long-context understanding for medical LLMs after domain-specific fine-tuning, conducting experiments to determine the best composition of general and medical training data to balance diagnostic knowledge with comprehensive reading abilities.

Xiaoqing: A Q&A Model for Glaucoma Based on LLMs
Xiaoqing: A Q&A Model for Glaucoma Based on LLMs

Xiaojuan Xue, Deshiwei Zhang, Chengyang Sun, Yiqiao Shi, Rongsheng Wang, Tao Tan, Peng Gao, Sujie Fan, Guangtao Zhai, Menghan Hu, Yue Wu†(† corresponding author)

Computers in Biology and Medicine 2024 Journal

We introduce Xiaoqing, an LLM model tailored for glaucoma developed through comparative and experiential experiments, demonstrating it can better serve glaucoma patients and medical research compared to general and clinical AI assistants by providing more informative and readable responses to glaucoma-related questions in Chinese.

Xiaoqing: A Q&A Model for Glaucoma Based on LLMs
Xiaoqing: A Q&A Model for Glaucoma Based on LLMs

Xiaojuan Xue, Deshiwei Zhang, Chengyang Sun, Yiqiao Shi, Rongsheng Wang, Tao Tan, Peng Gao, Sujie Fan, Guangtao Zhai, Menghan Hu, Yue Wu†(† corresponding author)

Computers in Biology and Medicine 2024 Journal

We introduce Xiaoqing, an LLM model tailored for glaucoma developed through comparative and experiential experiments, demonstrating it can better serve glaucoma patients and medical research compared to general and clinical AI assistants by providing more informative and readable responses to glaucoma-related questions in Chinese.

PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis
PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis

Rongsheng Wang, Yaofei Duan, Yukun Li, Dashun Zheng, Xiaohong Liu, ChanTong Lam, Tao Tan†(† corresponding author)

The Visual Computer 2023 Journal

We propose a Parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for electrocardiogram heart sound classification, which combines CNNs and a transformer encoder to extract hierarchical features and achieves state-of-the-art performance on publicly available datasets.

PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis
PCTMF-Net: Heart Sound Classification with Parallel CNNs-Transformer and Second-Order Spectral Analysis

Rongsheng Wang, Yaofei Duan, Yukun Li, Dashun Zheng, Xiaohong Liu, ChanTong Lam, Tao Tan†(† corresponding author)

The Visual Computer 2023 Journal

We propose a Parallel CNNs-Transformer network with multi-scale feature context aggregation (PCTMF-Net) for electrocardiogram heart sound classification, which combines CNNs and a transformer encoder to extract hierarchical features and achieves state-of-the-art performance on publicly available datasets.

All publications