Creating a 3D World from 2D Images
Brought to you by the IEEE Consultants Network of Long Island (LICN) and the LI Photonics Society
Have you ever tried Google’s Live View feature, where you just point your phone at a building and it instantly figures out exactly where you are? Or maybe you’ve wondered how action movies pull off those cool 3D scenes of the places they’re about to rob? Ever stopped to think about how virtual tours and street views come to life?
In this talk, we will explore the various algorithms that are used to create vivid and comprehensive 3D scenes from just a handful of images collected from the internet. The talk will be divided in 3 key sections:
Artifact Mitigation
Images collected from the internet are not always perfect. Some may have memes or text overlaid on them, while others may have been compressed multiple times for efficient storage, resulting in pixelation. Some images might be blurry, have too much sun exposure, or be taken at night. There can also be moving objects like people and cars, which aren’t needed and could obstruct the 3D reconstruction of a scene.
In this section, we’ll learn about the deep learning algorithms used to remove these kinds of artifacts and transient objects from images.
Image Registration and Geo-localization
Once the images have been pre-processed, the next step is to determine their relative pose with respect to each other. Imagine trying to figure out whether a photo was taken from the left side of the Eiffel Tower or the right, or whether the person with the camera was 100 meters away or just 50 meters. Sometimes, the images might even come from a drone! So how do we place all these different frames into a common reference frame? The more viewpoints we have of an area, the more complete our 3D models will be.
In this section, we’ll learn how Structure-from-Motion (SfM) is used to assign poses to these images. We’ll explore techniques for using background details to determine pose, especially when the object of interest looks the same from every angle. And finally, we’ll briefly discuss how these images can be geo-localized; meaning their latitude and longitude can be estimated, even when no GPS information is available.
3D Reconstruction
Now that our images are cleaned up and we know their poses, we can dive into the techniques used to transform these images to build a 3D scene. We’ll discuss, at a high level, some traditional 3D reconstruction methods along with more recent AI-based approaches such as neural rendering and gaussian splatting.
The applications of these computer vision and deep learning techniques we’ve talked about are widespread, and I hope you’ll start noticing them being used all around you, whether it’s in mobile phones, self-driving technology, biometric scanners, delivery robots, security cameras, and many other places.
Date and Time
Location
Hosts
Registration
-
Add Event to Calendar
Loading virtual attendance info...
Speakers
Kshitij Minhas
Kshitij Minhas works at the intersection of computer vision, robotics, and AI. In his current role at SRI, he works on multi-faceted projects tackling novel problems such as developing algorithms for autonomous drones, warehouse robot navigation, 3D scene reconstruction, precise human pose estimation for security, GPS-free robot localization, augmented reality mentoring systems, and automation for heavy construction equipment among others. Previously he worked at computer vision-based hardware startups where he was more focused on hardware design and deployment.
Kshitij brings end-to-end expertise, spanning algorithm development, system integration, and hardware design. He holds degrees in Electrical and Computer Engineering as well as Mechanical Engineering, has published work at premier IEEE conferences, and is co-inventor on 2 patents. Through this talk he hopes to pass on some of his learnings in computer vision and the latest trends in 3D reconstruction and AI to the general audience.
LinkedIn: https://www.linkedin.com/in/kshitijminhas
Agenda
7:00 PM Networking and Announcements
7:10 PM Presentation