Francisco Pereira
Machine Learning Engineer
This blog post is the second part of our on-going blog series about 3D computer vision. If you haven’t read the first blog post, you can check it out here. This second article (Part 2) provides an overview of 3D optical acquisition methods. We cover the differences between various types of sensors and how they can benefit specific use-cases. We also cover different 3D data formats and storage options.
We’ve seen in the first blog post of this series how the ability to perceive and interpret the three-dimensional structure of the surrounding world is becoming increasingly important in a wide range of industries. But how can machines get this extra layer of information? There’s a wide range of optical 3D acquisition methods that enable them to capture (or estimate) depth and spatial information about their environments.
Figure 1 presents different acquisition techniques categorized into active and passive methods. Active methods require an external light source that emits a signal and measures the reflected or returned signal, whereas passive methods do not.
Following the structure of the pipeline presented in the previous post (also seen in Figure 2), in this second part we will be focusing on the capturing and storing of 3D data. We will specifically focus on four prominent methods, namely Stereo Vision, Structured Light, Time of Flight, and LiDAR. For each method, we will explore the operating principle, advantages, disadvantages, and real-world use cases where the technique excels. At last, we conclude by providing you with a decision map to help choose the most appropriate method based on various factors, as well as a brief discussion of future trends in 3D optical acquisition.
By providing a thorough understanding of these optical 3D acquisition methods, we aim to equip readers with the knowledge needed to make informed decisions when selecting the right technique for a specific application or industry.
TLDR: For those short on time or just to lazy to read the full blog post, at the end we provide a decision map that shows when to use each type of sensor based on different applications requirements and external factors. It summarizes the article in a short, compact and visual way.
Stereo Vision, also known as stereoscopic vision, is a passive 3D acquisition method that mimics the way humans perceive depth. It utilizes two or more cameras, positioned at a certain distance apart (known as the baseline), to capture images of the same scene from slightly different viewpoints. These images, called stereo pairs, are then processed by a stereo matching algorithm to identify corresponding points (features) in both images. The disparity between these points is calculated, which is the difference in their horizontal positions in the left and right images.¹
By leveraging the geometry of the camera setup and using triangulation, the depth (or 3D coordinates) of each point in the scene can be determined. The details of how depth is estimated is out of scope for this blog post, however, for those interested in a more mathematical treatment of the topic, you may want to check the following material.
Stereo vision sensors generate two primary types of data: stereo pairs (left and right images) and depth maps (disparity maps). By combining the depth information from the depth map with the original 2D images, a 3D representation of the scene can be reconstructed.
Valuable characteristics of Stereo Vision systems are:
On the other hand, these usually suffer from:
While the previously discussed drawbacks pertain to passive Stereo Vision, Active Stereo Vision techniques utilize a light source, such as a laser or structured light, to illuminate the scene being captured. This approach enhances stereo matching and enables the method to perform well in low-light settings. However, it comes at a higher cost due to the requirement of an extra component — the projector.
Stereo Vision is a popular acquisition method, mainly due to its flexibility and low cost. Real-world applications of stereoscopic vision are numerous and can be seen in:
In summary, Stereo Vision is a versatile and cost-effective 3D acquisition method suitable for a range of applications, particularly when real-time depth information is required. However, its dependence on texture and sensitivity to lighting changes can pose challenges in certain scenarios.
Structured light is an active optical 3D acquisition method that involves projecting a known pattern (often a series of stripes or a grid) onto the scene or object being scanned. The deformation of the projected pattern on the object’s surface is captured by a camera placed at a known position and orientation relative to the projector. The relationship between the projector, camera, and the deformation of the pattern allows for the extraction of depth information.⁴
The data generated by structured light systems include the captured 2D image with the deformed pattern and the resulting 3D point cloud or depth map, which represent the 3D structure of the scanned object or scene. Depending on the characteristics of the projected/encoded pattern, different algorithms can be used to decode the deformed pattern and compute the depth information.
Structured Light setups benefit from:
Problems associated with these setups include:
Real world situations where this acquisition method thrives include:
Time of Flight (ToF) sensors are an active optical 3D acquisition method that measures the time it takes for emitted light, usually infrared (IR) light, to travel from the sensor to the object and back. The ToF sensor emits light pulses (direct ToF sensors) or continuous waves (indirect ToF sensors), which are reflected by the object’s surface and then detected by the sensor. The sensor’s imaging lens collects the reflected light from the scene and converts it into depth data on each pixel of the array. The depth (or distance to the object) is calculated by knowing the speed of light and measuring the round-trip time of the light. This depth map is a 2D representation of the 3D structure of the scene, and it can be combined with additional data, such as RGB images from a separate camera, to create a more complete 3D representation.⁷
Good properties of ToF sensors are:
Looking at the down-side of these sensors, we have:
Time of Flight sensors are commonly seen in:
LiDAR (Light Detection and Ranging) operates on the time-of-flight (ToF) principle, similar to ToF sensors. This means that it determines distance by calculating the round-trip time of light and the speed of light. However, LiDAR generally uses multiple laser beams (high-power light sources) and a rotating or oscillating mechanism to cover a larger area or achieve a full 360-degree view of the surroundings. The laser beams are typically aimed in a specific direction and angle, and the distance is measured for those coordinates. Because of this, the resulting data is a point cloud (and not a depth map) and a direct representation of the environment, providing accurate spatial information.
The data generated by LiDAR sensors include the raw timing and intensity information for each laser pulse and the resulting 3D point cloud that represents the 3D structure of the scanned environment. The point cloud contains the X, Y, and Z coordinates of each point in the 3D space, and in some cases, additional information such as intensity or color can be included.
The benefits of LiDAR include:
Less desirable properties of these sensors are:
Now that we have looked into the different types of 3D acquisition methods. It is also important to think about the type of data these sensors generate and the best way to store it.
The data collected by these sensors typically comes in one of these forms: depth maps or point clouds.
To generate a point cloud from a 2D depth map, the depth information (Z coordinate) of each pixel in the depth map is combined with the corresponding spatial information (X and Y coordinates) of the pixel in the sensor’s field of view. This process is called “back-projection” or “unprojection.”
The back-projection process involves applying the intrinsic and extrinsic parameters of the sensor, such as focal length, sensor resolution, and sensor pose, to convert the 2D depth map information into 3D coordinates. This process is usually implemented in software and is available in various open-source libraries like Point Cloud Library (PCL), Open3D, and OpenCV.
There are two main categories of formats for storing Point Cloud data: ASCII and LAS/LAZ.¹³
ASCII formats use plain text files where the X, Y, and Z coordinates of each point are separated by a character, such as a space or a comma. These files may also include a table header with metadata and additional information for each point, such as intensity or amplitude. Common file extensions for ASCII files include TXT, XYZ, PTS, and PTX. OBJ files can also be used to store point cloud data, although this method can be inefficient for large datasets (OBJ is intended to store geometric properties of objects and will include unnecessary amounts of information for point cloud data).
In contrast, LAS/LAZ formats are binary file formats specifically designed for lidar data storage and exchange.
Given that this data is unstructured, it is common to store it in a Data Lake either on the cloud or on-premise, depending on your set up. Cloud-based storage services like Google Cloud Storage, Amazon S3 and Azure Blob Storage can be used to store and manage large point cloud datasets.
In this blog post, we have explored various optical 3D acquisition methods, including Stereo Vision, Structured Light, Time of Flight, and LiDAR. Each technique has its unique operating principles, advantages, and disadvantages, making them suitable for different applications and scenarios. The decision map below (Figure 9) provides an easy way to choose the most appropriate sensor to use, given a set of common business or practical requirements. Keep in mind that this decision map is a general guideline, and the best choice for a specific application may depend on various other factors.
In addition to the methods discussed, it is also worth noting the emergence of hybrid systems that combine multiple 3D acquisition techniques to overcome limitations and improve overall performance. Advancements in hardware and software will improve real-time processing of 3D data, enabling faster and more efficient analysis of scenes. Integration of 3D sensing technology and computer vision with other technologies such as augmented reality, virtual reality, and robotics, will create new possibilities for interaction and automation. And of course, as machine learning techniques continue to improve, we can expect to see more accurate and robust algorithms that will facilitate 3D reconstruction of complex environments, as well as object detection and tracking with more spatial awareness.
We hope this blog post has provided you with valuable insights into the world of optical 3D acquisition methods and will help you make informed decisions when selecting the appropriate technique for your needs.
[1] — Sanja Fidler. Intro to Image Understanding: Depth from Stereo. University of Toronto — CSC420, 2021.
[2] — Toyota Research Institute. Seeing Clearly: Advancing Robotic Stereo Vision.
[3] — Stereo Labs. Spatial Analytics Solution.
[4] — D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Madison, WI, USA, 2003, pp. I-I, doi: 10.1109/CVPR.2003.1211354.
[5] — Zivid Applications. Industrial Maintenance Inspection.
[6] — “Laboratory Dental 3d Scanner Technology: Structured Light Or Laser Light Scanning?”. BIZ Dental.
[7] — Larry Li. “Time-of-Flight Camera — An Introduction”. Texas Instruments.
[8] — Pat Marion. “Flipping the Script with Atlas”. Boston Dynamics.
[9] — Magic Leap 2, an immersive head-set with 3D Time-of-Flight.
[10] — Liu, Shan. 3D Point Cloud Analysis : Traditional, Deep Learning, and Explainable Machine Learning Methods. Cham: Springer International Publishing AG, 2022.
[11] — “Informing smarter lidar solutions for the future”. Waymo, September 21, 2022.
[12] — “3 ways LiDAR can transform modern farming”. ACI Corporation.
[13] — “A Review of Options for Storage and Access of Point Cloud Data in the Cloud”. NASA ESDIS Standards Coordination Office, February 2022.