Siyu Xu is a final-year M.Phil student at The University of Sydney, supervised by A/Prof. Chang Xu. His research focuses on large multimodal models (LMMs), particularly Vision-Language-Action (VLA) models. His goal is to develop efficient and general-purpose robotic systems capable of understanding and interacting with the world through vision, language, and actions, enabling them to seamlessly assist humans in various real-world tasks and everyday life.

🔥 News

  • 2025.01:  🎉🎉 One paper is accepted to NAACL 2025 Findings.

📝 Publications

Arxiv
sym

VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Siyu Xu, Yunke Wang, Chenghao Xia, Dihao Zhu, Tao Huang, Chang Xu

[Under review] [Project]

NAACL 2025
sym

CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V

Siyu Xu, Yunke Wang, Daochang Liu, Bo Du, Chang Xu

[Project] [Code]

Conference of the Nations of the Americas Chapter of the ACL (NAACL), 2025

🎖 Honors and Awards

  • 2022.05 Second Class Prize - ASC22 Student Supercomputer Challenge, Asia Supercomputer Community
  • 2020.10 First Place Award - Pre-training for Video Captioning Challenge, ACM International Conference on Multimedia
  • 2020.10 7th Place - Artificial Intelligence Competition for Video Generation Challenge, ZHEJIANG LAB
  • 2020.05 9th Place - International Audio and Video Algorithm Optimization Competition, Mongo Media

🧑‍🏫 Teaching

  • 2024 S1, Tutor of COMP5329, Deep Learning, USYD
  • 2023 S2, Tutor of COMP5328, Advanced Machine Learning, USYD

📖 Educations

  • 2023.07 - 2025.06, M.Phil in Computer Vision, University of Sydney
  • 2022.08 - 2023.06, Master of Information Technology, University of Sydney

💻 Internships