|
Wei Li (李巍)
I am Wei Li, a second-year Ph.D. student in the School of Computer Science and Technology, Harbin Institute
of Technology (Shenzhen), advised by Prof. Rui Shao.
My research interests include Computer Vision, Vision-Language Models,
and Vision-Language–Action Models.
Email: liwei2024@stu.hit.edu.cn /
Google Scholar /
Github
|
|
|
|
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Wei Li, Renshan Zhang, Rui Shao, Jie He, Liqiang Nie
NeurIPS 2025
arXiv /
github
We presented CogVLA, a cognition-aligned and instruction-driven Vision-Language-Action framework designed
to address the computational inefficiencies and semantic fragmentation in existing VLA models
|
|
|
SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Wei Li, Renshan Zhang, Rui Shao, Zhijian Fang, Kaiwen Zhou, Zhuotao Tian, Liqiang Nie
AAAI 2026, Oral
arXiv /
github
SemanticVLA formulates vision–language–action learning as a semantic selection–fusion–action
coupling process, providing an interpretable and efficient pathway from multimodal perception to robotic action.
|
|
|
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie
CVPR 2025
arXiv /
github
LION-FS adopts a two-stage optimization strategy: 1) Fast Path: Routing-Based Response Determination evaluates
frame-by-frame whether an immediate response is necessary. 2) Slow Path: Multi-granularity Keyframe Augmentation
optimizes keyframes during response generation.
|
|
|
Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey
Rui Shao, Wei Li, Lingsen Zhang, Renshan Zhang, Zhiyang Liu, Ran Chen, Liqiang Nie
arXiv /
github
This survey provides the first systematic, taxonomy-oriented review of large VLM-based VLA models for robotic manipulation.
|
|
|
SCOOT: Self-supervised Centric Open-set Object Tracking
Wei Li, Weiliang Meng, Bowen Li, Jiguang Zhang, Xiaopeng Zhang
SIGGRAPH Asia 2023, Poster
page
SCOOT unifies self-supervised appearance learning, textual-visual feature fusion,
and reconstruction-driven association to advance open-set object tracking.
|
•
Harbin Institute of Technology (Shenzhen)
2024.09 - Present
Ph.D. in Computer Science and Technology, School of Computer Science and Technology
Advised by Rui Shao
|
•
University of Chinese Academy of Sciences
2021.09 - 2024.06
M.S. in Artificial Intelligence, Institute of Automation, Chinese Academy of Sciences
Advised by Weiliang Meng
|
•
Jilin University
2017.09 - 2021.06
B.S. in Electronic Information, School of Communication Engineering
|
|
•Conference Reviewer: CVPR, ICML and ECCV.
|
Last updated: January 2026
Template by Jon Barron.
|
|