Deep-learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3D shape measurement

: Fringe projection proﬁlometry (FPP) is one of the most popular three-dimensional (3D) shape measurement techniques, and has becoming more prevalently adopted in intelligent manufacturing, defect detection and some other important applications. In FPP, how to eﬃciently recover the absolute phase has always been a great challenge. The stereo phase unwrapping (SPU) technologies based on geometric constraints can eliminate phase ambiguity without projecting any additional fringe patterns, which maximizes the eﬃciency of the retrieval of absolute phase. Inspired by the recent success of deep learning technologies for phase analysis, we demonstrate that deep learning can be an eﬀective tool that organically uniﬁes the phase retrieval, geometric constraints, and phase unwrapping steps into a comprehensive framework. Driven by extensive training dataset, the neutral network can gradually "learn" how to transfer one high-frequency fringe pattern into the "physically meaningful", and "most likely" absolute phase, instead of "step by step" as in convention approaches. Based on the properly trained framework, high-quality phase retrieval and robust phase ambiguity removal can be achieved based on only single-frame projection. Experimental results demonstrate that compared with traditional SPU, our method can more eﬃciently and stably


Introduction
Optical non-contact three-dimensional (3D) shape measurement techniques have been widely applied for many aspects, such as intelligent manufacturing, reverse engineering, heritage digitalization, bio-medicine and so on [1,2].The fringe projection profilometry (FPP) [3] is one of the most popular optical 3D imaging methods due to its simple hardware configuration, flexibility in implementation, and high measurement accuracy.In FPP technology, a series of fringe patterns are projected onto the target object by the projector, and then deformed by the object, during which the depth information of the object is encoded in the phase of the deformed fringe.After these patterns are captured by the digital camera, the wrapped phase modulated by object can be calculated by fringe retrieval techniques [4][5][6][7][8][9][10].In order to establish the unique correspondence between the camera and projector pixels, the ambiguity of the wrapped phase needs to be eliminated through the phase unwrapping methods [11][12][13][14].Then the 3D information of the object can be reconstructed by utilizing optical triangulation based on the pre-calibrated geometric parameters [15,16] of the FPP system.
Nowadays, with the development of imaging devices and projection technologies, the realization of high-speed 3D imaging technologies based on FPP become possible [17].Meanwhile, rapid acquisition of high-quality 3D information is crucial to many applications, such as online quality inspection, stress deformation analysis, rapid reverse molding, etc [18,19].To obtain 3D data in high-speed scenarios based on FPP, efforts are usually carried out from two aspects: (1) improving the speed in hardware and (2) improving the efficiency of single 3D reconstruction in algorithm.The first aspect can realize high-speed 3D measurement up to the level of kHz [20] or even 10 kHz [18] through the binary defocusing projection technology [21] or other projection technologies [22][23][24] and equipped with high-speed cameras.However, the hardware cost of these methods is very high.The other aspect focuses on reducing the number of images required for per reconstruction to improve measurement efficiency, thereby achieving 3D data acquisition in high-speed scenarios.The ideal way is to obtain 3D data of the object in a single frame.Recently, we have realized high-accuracy phase information acquisition from a single fringe pattern through combining deep learning and the physical model of phase retrieval [25,26].However, these works just acquire single-shot wrapped phase.To realize 3D measurement, phase unwrapping is required, which is one of the operations in FPP that most affect measurement efficiency and are most sensitive to time.The commonly used phase unwrapping methods are temporal phase unwrapping (TPU) algorithms [11,27], which can recover the absolute phase with the assistance of Gray-code patterns or multi-wavelength fringes.However, due to the requirement of a large number of additional patterns, which are only used for phase unwrapping and contribute nothing to the measurement accuracy, TPU decreases the efficiency of 3D measurement in high-speed scenarios.Although some improved TPU schemes [28][29][30] are proposed to solve the phase ambiguity problem with the help of as fewer patterns as possible, the use of auxiliary patterns is still inevitable.In 2007, Weise et al. [31] introduced geometric constraints into the FPP method and proposed a novel phase unwrapping technology called stereo phase unwrapping (SPU), which can solve the phase ambiguity problem based on the spatial relationship between two cameras and one projector without projecting any auxiliary patterns.Although requiring more cameras than traditional methods, SPU indeed maximizes the efficiency of FPP and is well-suited for 3D shape measurement in high-speed scenarios.However, the traditional SPU method is not enough to robustly unwrap the phase of dense fringe images, while increasing the frequency of fringe images is essential to the precision of 3D reconstruction.To solve this issue, some researchers propose some auxiliary algorithms to enhance the robustness of SPU, which usually focus on four directions.(1) The first direction combines spatial phase unwrapping methods and SPU to reduce phase unwrapping errors of SPU.But these methods are hard to deal with discontinuous or disjoint phases.(2) The second direction concentrates on the auxiliary-information-embedded technologies [14,32].Since these methods obtain absolute phase essentially with the aid of modulation information, they will fail when measuring objects with large surface reflectivity variations.(3) The third method is to increase the number of perspectives and eliminate the ambiguity of high frequency phase with more geometric constraints [33].Compared with the two methods mentioned above, this method is more adaptive to the objects with complex surfaces, but at the cost of further increasing the number of cameras.Besides, simply increasing the number of views is not enough to recover absolute phase of dense fringe images robustly, which usually needs to be combined with (4) the depth constraint strategy [34][35][36][37].However, when using high-frequency fringe, the traditional depth constraint strategy can only unwrap the phase in a narrow depth range, and how to set a suitable range of the depth constraint is also difficult.Tao et al. [38] proposed an adaptive depth constraint (ADC) approach, with which the measurement volume is enlarged, and the range of the depth constraint can be automatically selected, but only if the correct absolute phase can be obtained for the first measurement.In addition, since SPU relies on the similarity of phase information between multiple perspectives to realize phase unwrapping [33], (1) on the one hand, it has high requirement for the quality of the wrapped phase.So that the wrapped phase in SPU is usually acquired by phase-shifting (PS) method [4,5], which is a multi-frame phase acquisition method and can provide high spatial resolution and high measurement accuracy phase information.However, the use of more than one fringe pattern reduces the efficiency of SPU.Another commonly used phase acquisition technologies are Fourier transform (FT) methods [6][7][8][9][10] with single-shot nature, which are not suitable for SPU due to the poor imaging quality around discontinuities and isolated areas in the phase map.(2) On the other hand, SPU requires high-quality system calibration, and it is more difficult to implement algorithmically than other traditional methods such as TPU.
From above discussion, it is not difficult to know that although SPU is best suitable for 3D measurement in high-speed scenes, it still has some defects, such as: limited measurement volume, inability to robustly achieve phase unwrapping of high-frequency fringe images, sensitivity of multiple perspectives to phase noise and calibration errors, loss of measurement efficiency due to reliance on multi-frame phase acquisition methods, complexity of algorithm, consumption of higher cost and so on.The ideal SPU should use only two cameras and one projector, which is the most basic requirement for the hardware equipments of SPU, to achieve robust phase unwrapping of high-frequency fringe images in a large measurement range on the premise of single-frame projection.
Inspired by ( 1) the success of deep learning in single-shot wrapped phase acquisition and (2) the advance of geometric constraints, we further push deep learning into the retrieval of absolute phase.On the basis of our previous works based on deep learning, we incorporate geometric constraints into the neural network.In our work, geometric constraints are implicit in the network, rather than directly using calibration parameters, which simplifies the entire process of phase unwrapping, and avoids the complex adjustment of various parameters.With our method, we demonstrate that deep learning can "learn" to obtain the "physically meaningful" absolute phase from single-frame projection through extensive data learning without the conventional "step-by-step" process.Compared with the traditional SPU, our method more robustly unwrap the phase of the higher frequency with fewer perspectives in a larger range.Experiments verify that our method can achieve high-quality 3D shape measurement of multiple isolated objects with complex surfaces in both static and dynamic scenarios.However, the single-frame method inherently has uncertainties and cannot cope with various situations without fail.Therefore, we also analyze the limitations of our method in the conclusion and discussions section.

Phase retrieval and unwrapping with PS and SPU
A typical SPU-based 3D imaging system consists of one projector and two cameras [31], as shown in Fig. 1.The fringe patterns projected by the projector are deformed due to the modulation of the measured object, and then captured by two cameras.For N-step phase-shifting fringe algorithm, the patterns captured by Camera 1 can be expressed as: where the index n = 0, 1, ..., N − 1, the superscript c denotes the camera, (u c , v c ) is the camera pixel coordinate, I c n represents the (n + 1)th captured fringe patterns, A c is the average intensity map, B c is the amplitude intensity map, Φ c is the absolute phase map, and 2π/N is the phase shift.With the least square method [39], the wrapped phase can be obtained: where φ c denotes the wrapped phase, and M and D represent the numerator and denominator of arctangent function respectively.The absolute and wrapped phase maps satisfy the following relation: where k c is the fringe order, k c ∈ [0, K − 1], and K denotes the number of the used fringes.The fringe order can be obtained by using geometry constraint.For an arbitrary pixel point o c projected into Camera 2 to obtain its corresponding 2D candidates, as shown by the red and green 2D pixel points in Camera 2 in Fig. 1.Among these 2D candidates, there is a correct matching point which should have the more similar wrapped phase to o c 1 than other candidates.With this feature, we can find the matching point through the phase similarity check, and then the phase unwrapping of o c 1 will be achieved.However, because of system error and ambient light interference, some wrong 2D candidate points may have more similar phase value to o c 1 than the matching point, which leads to the instability of SPU.Furthermore, the higher the frequency of the fringe, the more candidate points, the more likely such situation will happen.Due to the vulnerability to noise, SPU must rely on the multi-frame phase acquisition technologies with robustness towards ambient illumination and more accurate system calibration parameters.
Besides, the fringe patterns with high frequency can not be used in traditional SPU, which will aggravate the shortcoming of SPU.
For the improvement of the stability of SPU, the common methods are to (1) increase the number of cameras or (2) assist with depth constraint strategy.The first direction, at the cost of increased hardware costs, focuses on further projecting the 2D candidates in Camera 2 to the third or even the fourth camera for phase similarity test to exclude more wrong 2D candidates.The other direction, at the cost of increased algorithm complexity, is to eliminate some wrong 3D candidates by using the depth constraint strategy to reduce the pressure of the subsequent phase similarity check.However, this method is only effective in a narrow depth range, and how to choose a suitable depth constraint range is also a challenge.Generally, the SPU with at least three cameras and assisted with the ADC strategy [38,40] which is the most advanced depth constraint algorithm can robustly remove phase ambiguity on the premise that the correct absolute phase is obtained for the first measurement.However, complex systems and algorithms make such technology difficult to implement.The ideal SPU method should be to use only two cameras, no complicated auxiliary algorithms, and only a single frame projection to achieve robust phase unwrapping of dense fringe images in a large measurement volume.

Phase retrieval and unwrapping with deep learning
In order to enable the geometry-constraint-based SPU method to achieve high-accuracy 3D measurement with less cost, simpler algorithms, and few images for reconstruction, inspired by the recent success of deep learning techniques in the phase analysis, we combine deep neural networks (DNN) and the SPU algorithm to propose a deep-learning-enabled geometric constraints and phase unwrapping method.By our method, only one fringe pattern is projected, and no other complex algorithm is required, then the phase unwrapping of dense fringe images can be realized with only two cameras.The flowchart of our method is shown in Fig. 2. We construct a four-path convolutional neural network (CNN), as shown in the top right of Fig. 2, which can not only learn to obtain the wrapped phase, but also to unwrap the wrapped phase.For each convolutional layer of this CNN, the kernel size is 4 × 4 with convolution stride one, and the output is a 3D tensor of shape (H, W, C), where (H, W) is the size of the input pattern, and C represents the number of filters used in each convolutional layer and equals the number of channels of output data.In this work, W=640, H=480 and C=64.In the first path of the CNN, the input is processed by a convolutional layer, followed by four residual blocks and another convolutional layer.In the other three paths, the data is down-sampled by the pooling layers by two times, four times, and eight times, respectively, for better feature extraction, and then up-sampled by the upsampling blocks to match the original size.The outputs of four paths are concatenated into a tensor with quad channels.Finally, two channels are generated in the last convolution layer when training phase retrieval and one channel is generated when training phase unwrapping.
Next we will discuss our algorithm steps.Step1: In order to achieve high-resolution phase retrieval with single frame projection, we separately input the single-frame fringe images captured by Camera 1 and Camera 2 into the constructed CNN, and the outputs are the numerators M c and denominators D c corresponding to the two fringe patterns.Step2: The high-precision wrapped phase maps can be obtained according to Eq. 2. Step3: To realize the phase unwrapping, enlightened by the geometry-constraint-based SPU technology described in Section 2.1 which can remove phase ambiguity through spatial relationships between multiple perspectives, we feed the single-frame fringe patterns of two perspectives in the network.The information from two perspectives is used because it is not enough to solve the phase ambiguity problem of the objects with discontinuous surfaces or multiple isolated objects through the fringe information of a single frequency in one perspective.Meanwhile, we refer to the phase unwrapping method using reference information of Zhang et al. [41], and add the information of a reference plane to the inputs to allow the network to quickly and accurately acquire the absolute phase of the measured object based on the reference information.Thus, the single-frame fringe patterns captured by the two cameras, as well as the pre-acquired fringe patterns of a reference plane and its fringe order map in the perspective of Camera 1 are input into the network.It is worth mentioning that the reference plane information is obtained in advance, and subsequent experiments do not need to obtain it repeatedly.The output result is the fringe order map of the measured object in Camera 1. Step4: Then we can achieve phase unwrapping by Eq. 3. Step5: After acquiring the high-precision absolute phase, the 3D reconstruction can be carried out with the calibration parameters between the two cameras to obtain the 3D information of the measured object.

Experiments
In order to verify the effectiveness of the developed algorithm, we construct a triple-camera 3D imaging system (we use three cameras to collect the ground-true data for training, and only two cameras are used when validating our method), which includes a LightCrafter 4500Pro (912 × 1140 resolution) and three Basler acA640-750um cameras (640 × 480 resolution).Each camera is equipped with a 12mm Computar lense.And all the cameras are synchronized by the trigger signal from the projector.48-period phase-shifting fringe patterns are used in our experiments.
During the training session, we use the triple-camera system to collect the data of 1001 different scenes, of which the first scene is the reference plane.Each set of data consists of three-step phase-shifting fringe patterns captured by three cameras.For the network that obtains the wrapped phase, the inputs are the first of the three-step phase-shifting images captured by the first camera, and the ground-true data are the corresponding numerators M c and denominators D c calculated by Eq. 2. For the unwrapped network, the inputs are the first of the three-step phase-shifting fringe patterns acquired by the first and third cameras, as well as those of the Fig. 3. Measurement results of four static scenes.(a), (e), (i), (m) The results measured by using PS to obtain wrapped phase and using triple-camera SPU and ADC strategy to obtain absolute phase.(b), (f), (j), (n) The results measured by using PS to obtain wrapped phase and using dual-camera SPU and traditional depth constraints strategy to obtain absolute phase.(c), (g), (k), (o) The results measured by using PS to obtain wrapped phase and directly using the reference phase to unwrap the phase.(d), (h), (l), (p) The results measured by our method.
reference plane and its fringe order in the first camera.The ground-true data are the fringe order map in the first camera perspective obtained by using the SPU method with three cameras and assisted with the ADC strategy.When training these two networks, 800 sets of data are used for training and 200 sets are used for verification.

Qualitative evaluation
To verify the effectiveness of our method, we firstly measure four static scenarios.These four scenarios are not in the training set and the validation set.We use four methods to measure these scenes.The first method is to use PS to obtain wrapped phase, and use triple-camera SPU and ADC strategy to obtain absolute phase (the results measured by which are taken as the ground-truth data); the second is to use PS to obtain wrapped phase, and use dual-camera SPU and traditional depth constraints strategy to obtain absolute phase; the third is to use PS to obtain wrapped phase, and directly use the reference phase to unwrap the phase; the fourth is our method.The measurement results are shown in Fig. 3, where the leftmost side are results of the first method, the second left column are those of the second method, the third left column are those of the third method, and the rightmost side are those of our method.It can be seen from the results of the second method that the traditional SPU and depth constraints are not enough to unwrap the phase of high-frequency fringes in large depth range.The parts marked by the black dotted boxes in Fig. 3 are the results with phase unwrapping errors of the third method, from which we can see that the reference plane can only unwrap the wrapped phase of the object in a limited range, which is between −π and π of the absolute phase of the reference plane.While with our method, the ambiguity of the wrapped phase of the measured object with a large depth range can be accurately eliminated with only two cameras.And our deep-learning-assisted method can acquire high-quality measurement results by signal-frame projection, almost the same quality as the results obtained by traditional PS method, SPU method and ADC method with three cameras.We also measure four dynamic scenarios to demonstrate the ability of the proposed method to measure dynamic objects.The measurement results are shown in Fig. 4. It can be seen from the left three columns of Fig. 4 that because of the multi-frame imaging characteristics of PS, when encountering the moving object, the measured result will have obvious motion ripples.Due to the reliance on high-quality phase information, the results measured by the second method are significantly worse.And the reference plane method is still not sufficient to unwrap the large-depth object's phase.From the results of our method, since our method needs only a single frame projection, for moving objects, we can also yield high-quality 3D reconstruction results as in the static case.

Quantitative evaluation
In this experiment, we measure a pair of standard spheres to demonstrate the accuracy of 3D reconstruction of our method The spheres' radii are 25.3989 mm and 25.4038 mm, respectively, and their center-to-center distance is 100.0532mm, as shown in Fig. 5(a).The measurement result is shown in Fig. 5(b).We perform the sphere fitting to the measured results of two spheres, and their errors are shown in Fig. 5(c).The radii of the reconstructed spheres are 25.4616 mm and 25.4648 mm, with deviations of 52.7 µm and 61.0 µm, respectively.The measured center-to-center distance is 99.9878 mm, with an error of 65.3 µm.These experiments validate that our method can provide high-quality 3D measurements with fewer cameras, fewer projection images, and simpler algorithms.

Conclusion
In this work, we presented a deep-learning-enabled geometric constraints and phase unwrapping method for single-shot absolute 3D shape measurement.Our method avoids the shortcomings of many traditional methods, such as the trade-off of efficiency and precision of traditional phase retrieval method, and the trade-off of SPU in phase unwrapping robustness, large measurement range and the use of high-frequency fringe patterns.On the premise of single-frame projection, our method can solve the phase ambiguity problem of dense fringe in a larger measurement range with less perspective information and simpler algorithms.We believe that our work provides an important guidance for the recovery of absolute 3D shape measurement with high speed, high accuracy and no motion artifacts from single-frame fringe projection, and further verifies the powerful strength of deep learning technologies in the field of fringe projection profilometry.

Discussion
For traditional methods, one usually proceeds step by step based on prior knowledge.For example, for SPU, firstly find 3D candidates, then use depth constraints to remove unreliable candidate points, thirdly project to another perspective, and finally perform phase similarity check.Due to the step-by-step process, all information in all data, such as spatial information, temporal information, and so on, is not effectively utilized.The comprehensive utilization of all valid information requires strong and professional prior knowledge, which is very difficult to complete.However, deep learning can make it.Through data training and learning, these problems can be effectively integrated into a comprehensive framework.In our work, this framework is a very organic one, which incorporates phase acquisition, geometric constraints, and phase unwrapping.These methods in the framework are no longer reproduced step by step as traditionally, but are organically integrated together.However, since the data source of our method are 2D images, when the image itself is ambiguous, deep learning is by no means always reliable, as shown in Fig. 6.In the future, we will further integrate the physical model into FPP based on deep learning, and construct FPP driven by data and physics.

Fig. 4 .
Fig. 4. Measurement results of four dynamic scenes.(a), (e), (i), (m) The results measured by using to obtain wrapped phase and using triple-camera SPU and ADC strategy to obtain absolute phase.(b), (f), (j), (n) The results measured by using PS to obtain wrapped phase and using dual-camera SPU and traditional depth constraints strategy to obtain absolute phase.(c), (g), (k), (o) The results measured by using PS to obtain wrapped phase and directly using the reference phase to unwrap the phase.(d), (h), (l), (p) The results measured by our method.(see Visualization 1, Visualization 2, Visualization 3 and Visualization 4 for the whole process of the first scene).

Fig. 5 .
Fig. 5. Quantitative analysis of the reconstruction accuracy of our method.(a) The measured standard spheres.(b) 3D reconstruction result of our method.(c) The error distribution of the measured standard spheres.

Fig. 6 .
Fig. 6.Analysis of the limitations of our method.(a) Image of two flat plates captured by Camera 1 (no ambiguity in the 2D image).(b) Absolute phase of two plates in (a).(c) The result of two plates in (a) obtained by our method.(d) Image of two flat plates captured by Camera 1 (there is ambiguity in the 2D image, the fringe in the red dotted box in (a) is missing).(e) Absolute phase of two plates in (d).(f) The result of two plates in (d) obtained by our method.