This article is the second part of a weekly series about what it takes to create a digital twin. This week, we will deal with the many challenges of building a digital twin from 2D images and our first breakthrough at overcoming them.
Last week, we talked about how we can create digital twins by just using 360° cameras, which are accessible and affordable, instead of 3D scanners, which can get quite pricey. Beamo.ai makes the process seem easy, but there were some hurdles that we had to leap over.
3D reconstruction from 2D photography is considered one of the core challenges of computer vision due to its extreme complexity.
To create a high fidelity digital twin, you will first need to gather a good enough number of 2D panoramas, also called scans or capture points. As each capture point is turned into a 360° sphere, the next step will be to perfectly position and align all spheres to one another to recreate an immersive and browsable 3D space. This positioning is paramount to making a workable digital twin so that users have the best experience. Otherwise, perspective can be sifted and users are disoriented.
To relatively pinpoint each capture point’s location, most current systems require multiple photos of the same object from different angles to work. In this scenario, deep neural networks do the job much better and with more efficiency than human-developed models and algorithms. To improve the speed and accuracy of 3D reconstruction we have taught our deep learning-based models to retrieve the best image features and patterns from the entire dataset.
Imagine you have a pile of puzzle pieces, all scrambled up. You will first divide them by colors and compare one another, searching for similar patterns or features. Beamo's ASC (Auto Scene Connection) algorithm had a similar approach. It was our first system to automate the 3D reconstruction process. It extracts feature points from each photo to compare the sets of feature points. If it's a match, the next step is to estimate the relative position between multiple images. By the end of the repetition, we could map out the location and connection of all the images. With the ASC algorithm, we fully automated the process of 3D reconstruction. Still, a lot of computing power was needed in order to achieve the optimal result, there was therefore room for improvement.
This is where most digital twins solutions fail when reaching a certain number of capture points. The digital twin gets too big to get processed properly and computing failures can happen, rendering all the capturing hard work useless. To solve that issue, users might go to the extent of sectioning a space into different captures to later compile them back together into one bigger digital twin. This usually requires costly or manual operations.
Processing is also made virtually impossible if no common feature points are found. This can happen when jumping from room to room without any transition. This is why users are often obliged to capture at every doorstep to make sure common features are found during the processing.
Also, capturing outdoors is often impossible since common features are lacking or located too far apart.
“Indeed, while doing 3D reconstruction, searching for similar patterns or features by comparing one another is not suitable for very large environments. We needed to find a mobile and accessible solution.“
Farkhod Khudayberganov, Lead Engineer at 3i Inc.
Did we hit a roadblock? Or did we indeed find a solution that will make capturing and processing simpler and easier, something that works even in vast open spaces? Not to get ahead of ourselves, but the ending of this series is a happy one. Come back next week as we wrap up this series on “How can we capture 3D space without a 3D scanner”.