Embodied Vision, Language, and Action Models for Consumer Applications

#ServiceRobot #ElderlyAssistance #VLAFoundationModel #GeneralizableRobots #HumanRobotInteraction #QualityOfLife
Share

The IEEE CTSoc is pleased to invite all interested to the mentioned live Distinguished Lecture Session


Abstract

We are developing a versatile and efficient service robot designed to assist the elderly and individuals in need with their daily tasks. The robot is capable of performing actions such as picking up a water cup and opening doors, with plans for more advanced interactions in the future. By leveraging our advanced VLA (Vision-Language-Action) foundation model, we have achieved promising results in manipulation tasks, demonstrating its effectiveness in handling everyday objects. A key innovation in our approach is the generalizable robot manipulation demonstration. Once pre-trained, our robot foundation model can be adapted to various new objects, environments, and different robot platforms using few-shot learning techniques. This capability allows the robot to quickly learn and adapt to its surroundings, enhancing its utility and effectiveness in real-world scenarios. By integrating large action foundation models, we aim to create a service robot that not only performs tasks efficiently but also interacts meaningfully with people, ultimately improving the quality of life.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 29 Nov 2024
  • Time: 04:00 PM to 05:00 PM
  • All times are (UTC+08:00) Beijing
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Contact Event Host
  • For more information about the IEEE CTSoc DL Program or Education Activities, contact:  mahsa.pourazad@gmail.com

  • Survey: Fill out the survey
  • Starts 22 October 2024 12:00 AM
  • Ends 29 November 2024 04:00 PM
  • All times are (UTC+08:00) Beijing
  • No Admission Charge


  Speakers

Dr. Jianlong Fu, Principal Research Manager, Microsoft Research Asia

Topic:

Embodied Vision, Language, and Action Models for Consumer Applications

For a copy of the flyer, click here:  https://drive.google.com/file/d/1SNSkbwG8OS6RYCW-8Ypr5vGJxIGH9bZ4/view?usp=sharing

Biography:

Dr. Jianlong Fu is currently a Principal Research Manager who is responsible for the research and innovation in multimodal computing group at Microsoft Research Asia (MSRA). He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences. His research focuses on multimedia content understanding, and multi-modal perceptual computing in images, videos, and embodied agents. He has published over 100 peer-reviewed technical papers and over 20 US patents. His Google Scholar h-index is currently 50. Dr. Fu serves as the vice-chair for the Automotive CE Applications Technical Committee under the IEEE Consumer Technology Society, as well as an editorial board member for IEEE TMM, IEEE CTSoc-NCT, and guest editor for IEEE TPAMI from 2019-2021. He has also chaired several specialized committees at international multimedia flagship conferences such as ACM Multimedia 2021 and ACM ICMR 2021/2023. He has received multiple awards, including the ACM SIGMM Rising Star Award 2022, Best Paper Award at the 2018 ACM Multimedia Conference, and over 10 international competition championships in CVPR/ICCV/ECCV. Additionally, his research has been applied to various Microsoft products such as Windows, Office, Bing, Edge, and Xioice.

Address:China





     Brought to you by the IEEE CTSoc Education Activities Committee.  More DL sessions will be held this year.  For the latest list, visit:  https://ctsoc.ieee.org/education/dl-schedules-and-webinars.html