Embodied Vision, Language, and Action Models for Consumer Applications : vTools Events

IEEE.org | IEEE Xplore Digital Library | IEEE Standards | IEEE Spectrum | More Sites

Embodied Vision, Language, and Action Models for Consumer Applications

#ServiceRobot #ElderlyAssistance #VLAFoundationModel #GeneralizableRobots #HumanRobotInteraction #QualityOfLife

The IEEE CTSoc is pleased to invite all interested to the mentioned live Distinguished Lecture Session

Abstract

We are developing a versatile and efficient service robot designed to assist the elderly and individuals in need with their daily tasks. The robot is capable of performing actions such as picking up a water cup and opening doors, with plans for more advanced interactions in the future. By leveraging our advanced VLA (Vision-Language-Action) foundation model, we have achieved promising results in manipulation tasks, demonstrating its effectiveness in handling everyday objects. A key innovation in our approach is the generalizable robot manipulation demonstration. Once pre-trained, our robot foundation model can be adapted to various new objects, environments, and different robot platforms using few-shot learning techniques. This capability allows the robot to quickly learn and adapt to its surroundings, enhancing its utility and effectiveness in real-world scenarios. By integrating large action foundation models, we aim to create a service robot that not only performs tasks efficiently but also interacts meaningfully with people, ultimately improving the quality of life.

Date and Time

Location

Hosts

Registration

Add Event to Calendar
iCal
Google Calendar

Loading virtual attendance info...

Contact Event Host
For more information about the IEEE CTSoc DL Program or Education Activities, contact: mahsa.pourazad@gmail.com
Survey: Fill out the survey

Starts 21 October 2024 04:00 PM UTC
Ends 29 November 2024 08:00 AM UTC
No Admission Charge

Speakers

Dr. Jianlong Fu, Principal Research Manager, Microsoft Research Asia

Topic:

Embodied Vision, Language, and Action Models for Consumer Applications

For a copy of the flyer, click here: https://drive.google.com/file/d/1SNSkbwG8OS6RYCW-8Ypr5vGJxIGH9bZ4/view?usp=sharing

Biography:

Dr. Jianlong Fu is currently a Principal Research Manager who is responsible for the research and innovation in multimodal computing group at Microsoft Research Asia (MSRA). He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences. His research focuses on multimedia content understanding, and multi-modal perceptual computing in images, videos, and embodied agents. He has published over 100 peer-reviewed technical papers and over 20 US patents. His Google Scholar h-index is currently 50. Dr. Fu serves as the vice-chair for the Automotive CE Applications Technical Committee under the IEEE Consumer Technology Society, as well as an editorial board member for IEEE TMM, IEEE CTSoc-NCT, and guest editor for IEEE TPAMI from 2019-2021. He has also chaired several specialized committees at international multimedia flagship conferences such as ACM Multimedia 2021 and ACM ICMR 2021/2023. He has received multiple awards, including the ACM SIGMM Rising Star Award 2022, Best Paper Award at the 2018 ACM Multimedia Conference, and over 10 international competition championships in CVPR/ICCV/ECCV. Additionally, his research has been applied to various Microsoft products such as Windows, Office, Bing, Edge, and Xioice.

Address:China

Brought to you by the IEEE CTSoc Technical & Educational Activities Committee. More DL sessions will be held this year. For the latest list, visit: https://ctsoc.ieee.org/education/dl-schedules-and-webinars.html