Few-shot Transfer Learning for Holographic Image Reconstruction using a Recurrent Neural Network

Deep learning-based methods in computational microscopy have been shown to be powerful but in general face some challenges due to limited generalization to new types of samples and requirements for large and diverse training data. Here, we demonstrate a few-shot transfer learning method that helps a holographic image reconstruction deep neural network rapidly generalize to new types of samples using small datasets. We pre-trained a convolutional recurrent neural network on a large dataset with diverse types of samples, which serves as the backbone model. By fixing the recurrent blocks and transferring the rest of the convolutional blocks of the pre-trained model, we reduced the number of trainable parameters by ~90% compared with standard transfer learning, while achieving equivalent generalization. We validated the effectiveness of this approach by successfully generalizing to new types of samples using small holographic datasets for training, and achieved (i) ~2.5-fold convergence speed acceleration, (ii) ~20% computation time reduction per epoch, and (iii) improved reconstruction performance over baseline network models trained from scratch. This few-shot transfer learning approach can potentially be applied in other microscopic imaging methods, helping to generalize to new types of samples without the need for extensive training time and data.

However, applying these deep learning-based image reconstruction methods also has some challenges, including e.g., generalization of trained models to new datasets from new types of samples.Due to the limited scale and diversity of available training data, and potential distribution shifts in acquired images, resulting from e.g., varying sample preparation and imaging protocols, a trained neural network model can face inference errors, failing to generalize to new, unknown types of samples.Transfer learning 26 and domain adaptation 27 are two popular few-shot learning methods to help generalize network models on limited labeled data.The first technique fine-tunes a portion or all of the parameters of a pretrained model on a small dataset with the new distribution, and the second one aims to generalize models on a known target domain during the training without using labels of the target domain.For this, the pre-trained model demands both a high-capacity network that can adapt to different domains well, and a large, diverse training set that represents the distribution variations.
Here, we demonstrate a few-shot transfer learning method for holographic image reconstruction using a recurrent neural network (RNN) that achieves successful generalization to new sample types, never seen during the training.To demonstrate this approach, we pre-trained a convolutional RNN on a large holographic dataset composed of various types of samples (blood smears, Pap smears and lung tissue sections), which served as our backbone model.We show that this backbone model can be rapidly transferred to small training sets of new sample types (e.g., prostate and salivary gland tissue sectionsnever seen/used before), converging ~2.5x faster compared with baseline models trained from scratch, also saving ~20% training time per epoch using ~90% less number of trainable parameters by fixing the RNN blocks (backbone) in the model.The main contributions of this work include: (1) building a pre-trained holographic image reconstruction model that is suitable for fast few-shot learning on new types of samples, and (2) demonstrating a rapid transfer learning scheme that reduces the training time and the number of trainable parameters by fixing the RNN blocks in the model.We successfully transferred the backbone RNN model to small scale prostate and salivary gland datasets that were never used during the training phase and achieved microscopic reconstruction of inline holograms, correctly revealing the phase and amplitude distributions of these new types of tissue samples with minimal amount of training data and time.

Results and Discussions
For holographic image reconstruction we used a convolutional RNN architecture (named RH-M 25 ), which was demonstrated to be effective for multi-height phase retrieval and holographic image reconstruction (see Fig. 1a). intensity-only holograms recorded by a lensfree inline holographic microscope 31 are first back-propagated with zero phase (i.e., without any phase retrieval) by an axial distance of  2 ̅ = 500 , and then the resulting image sequence is fed into the network as its input (see the Methods section).In the RH-M network, the features in the propagated holograms are extracted by a series of convolutional blocks at different scales, and then aggregated by the RNN blocks (marked by the dash-lined box in Fig. 1a).The ground truth/target sample fields (including sample phase and amplitude) were created by a multi-height phase retrieval (MH-PR) algorithm using 8 separate holograms captured at different sample-to-sensor distances 29 .Here, we pretrained an RH-M model using a large holographic image dataset including three types of samples: blood smears, pap smears, and lung tissue sections.On average each sample type has  ≅ 700 non-overlapping, unique fields-of-view (FOVs) and  = 5 input holograms were used for reconstruction.However, standard RH-M networks suffer from limited generalization and fail on the reconstruction of entirely new types of samples that were never seen by the network before; see for example the blind testing results of prostate tissue sections in Fig. 1a.Inspired by the fact that the axial differences in hologram intensity patterns reflect/encode the sample's phase information 32 , we fixed the RNN blocks (backbone) of the pre-trained RH-M model, which reflect the differences between input holograms and merge their features, and then transferred the rest of the model to new type of samples as shown in Fig. 1b.After a rapid transfer learning process using a small dataset of the new sample type, the resulting new model successfully adapts to the new data distribution of prostate tissue sections and generalizes very well to successfully reconstruct both the phase and amplitude information of the sample (see Figs. 1b and 2).Major advantages of our approach resulting from the generalization of a pre-trained model in transfer learning include faster convergence speed and better adaptability to small training datasets.To better quantify these advantages, we evaluated the transfer learning performance of the backbone model using a series of datasets with different    ratios.As illustrated in Fig. 2, the pre-trained model was transferred onto prostate datasets using different amounts of unique sample FOVs, i.e.,   .We created models using the reported transfer learning scheme with a fixed backbone (model 1) and standard transfer learning (without fixed backbone, model 2) on 4 training datasets of various Furthermore, as noted in Fig. 2b, the reported approach (model 1) used only ~4.33 million trainable parameters, compared to >36 million for the other models, saving ~90% of the trainable parameters.Another advantage of the reported approach is the reduced time cost of transfer learning.As reported in Fig. 2d   Next, we evaluated the generalization of the backbone model to different input sequence lengths   , using an additional, new sample type.The training set was captured on a few salivary gland tissue sections (    = 24.8%),and the backbone model was transferred to datasets with   = 2,3,4 holograms respectively, with and without a fixed backbone.In addition, baseline models with   = 2,3,4 were also trained from scratch (using the same amount of data) for comparison purposes.Then, the models were blindly tested using another testing set (Fig. 3a).As shown in Fig. 3b, transferred models (models 1, 2) successfully reconstructed the sample complex field with high fidelity, reflected by the low amplitude RMSE and high ECC.Furthermore, by adding extra input holograms, i.e., increasing   , the reconstruction accuracy can be further enhanced.In contrast, the output images of the baseline models (labeled green) are severely contaminated by artifacts (due to limited amount of training data available for the new sample type), scoring worse amplitude RMSE and ECC values than those achieved by our transferred models.Figure 3c further illustrates the amplitude RMSE values of these models' outputs on an external testing set with 40 input -ground truth pairs.For all   values, the transferred models (blue and orange bars) significantly outperform the baseline models (green bars).Both Figs.3b  and c indicate that the reconstruction accuracy of the transferred models can be further improved by increasing   , confirming the effectiveness of the pre-trained RNN backbone in multi-height holographic image reconstruction.On the other hand, baseline models trained from scratch failed to learn using small training sets and cannot efficiently utilize additional holograms, resulting in a flat trend of the amplitude RMSE values (green bars in Fig. 3c). Figure 3d further compares the training time of the transferred models and the baseline model with respect to   , demonstrating the fast generalization of the backbone model.

Conclusions
In this work, in order to improve deep neural networks' generalization to successfully image new sample types in computational holographic microscopy, we presented a transfer learning-based few-shot learning method to generalize models when only small datasets of new sample types are available; we demonstrated the success of this new approach using a convolutional RNN to perform multi-height holographic image reconstruction.We established an RNN backbone model and a transfer learning scheme to reduce both the computation time and the number of trainable parameters compared to standard transfer learning approaches.The generalization of the backbone model was validated on prostate and salivary gland tissue datasets that were never seen by the network before.Compared with baseline models trained from scratch, the RNN models transferred from the backbone gained faster convergence speed and improved the image reconstruction quality.The reported transfer learning framework substantially enhances the generalization of deep neural network-based holographic imaging for new types of samples.

Imaging system and samples
Experiments in this work were implemented using a lens-free in-line holographic microscope.A broadband light source (WhiteLase Micro, NKT Photonics) was used for illumination, filtered by an acousto-optic tunable filter (530).A complementary metaloxide-semiconductor (CMOS) image sensor (IMX 081, Sony) captures the raw holograms.A 3D positioning stage (MAX 606, Thorlabs, Inc.,) was used to move the CMOS sensor to perform precise lateral and axial shifts.The samples of interest were directly placed between the light source and the CMOS sensor.The typical sample-to-source distance ( 1 ) and sample-to-sensor distance ( 2 ) used for imaging ranged from ~5 − 10 cm and ~400 − 600 µm respectively.All hardware was controlled by a customized LabVIEW program during the imaging process.
All human samples involved in this work were deidentified and prepared from existing specimens, without a link or identifier to the patient.Human prostate and lung tissue slides were prepared by and acquired from the UCLA Translational Pathology Core Laboratory (TPCL).Pap smear slides were provided by the UCLA Department of Pathology.Blood smear slides were provided by the UCLA Department of Internal Medicine.
The raw in-line holograms were algorithmically super-resolved.To implement pixel superresolution, 6-by-6 inline holograms were captured for each FOV with subpixel shifts using a 3D positioning stage (MAX606, Thorlabs, Inc.).Relative lateral shifts were first estimated by an image correlation-based algorithm, and then 36 holograms were shifted and added to obtain a super-resolved hologram 29 .The resulting super-resolved holograms were used for both multi-height phase retrieval and neural network-based hologram reconstructions.
Multi-height phase recovery and free space propagation 8 in-line holograms at different sample-to-sensor distances were captured to perform MH-PR for each sample FOV 29 .An autofocusing algorithm is first applied to estimate the sampleto-sensor distance for each hologram based on the edge sparsity criterion 33 .Then the first hologram with zero phase padding is propagated to the remaining hologram planes, using the angular spectrum-based wave propagation 34 and the estimated sample-to-sensor distances.At the designated hologram position, the resulting field is updated using the measured hologram at the same position, where the amplitude of the resulting field is averaged with the measured hologram amplitude and the phase is kept/retained.One iteration is completed when all the measured holograms have been used in a sequence, and this iterative algorithm typically converges after 10-30 iterations..The down-and up-sampling paths of RH-M consist of 4 consecutive convolutional blocks respectively and 4 RNN blocks connecting the corresponding convolutional blocks in the down-and up-sampling paths.The RNN block adapts two convolutional gated recurrent unit (GRU) 36 layers and one 1 × 1 convolutional layer.

Image quality evaluation metrics
A generative adversarial network (GAN) 37 framework was employed in this work.The loss function used for both training and transfer learning is a linear combination of (1) pixel-wise MAE loss:   , (2) multiscale SSIM loss,   , between the network output  ̂ and the ground truth image , and (3) the adversarial loss,   , given by the discriminator () network.Accordingly, the total loss for the generator (RH-M) can be expressed as:   =   +   +   where , ,  are empirically set as 3, 1, 0.3 for all models.MAE and SSIM losses are defined as: = (,  ̂),   = 1 − (,  ̂) Squared loss terms were employed for the adversarial loss and the total discriminator loss (  ) 38 : Adam optimizers 39 with learning rates of 10 −5 and 10 −6 were used for the generator and discriminator networks, respectively, in the backbone training.Decaying learning rates with initial values of 2 × 10 −4 and 2 × 10 −5 were applied for transfer learning on new datasets for the generator and the discriminator, respectively.Based on the validation losses, models

Figure 1 .
Figure 1.The pre-trained RNN backbone model and few-shot transfer learning for holographic image reconstruction.(a) RH-M backbone network pre-trained from scratch on a large dataset (composed of  FOVs for each sample type) with 3 types of samples (blood smears, Pap smears and lung tissue sections).The network is later directly tested on a new type of sample (prostate tissue section) but fails in its reconstruction since this type of sample was never seen by the network before.(b) Transfer learning of an RH-M network from the pre-trained model with fixed RNN backbone.A small dataset of the new sample type with   image FOVs is used for transfer learning, where   ≪ .After a fast transfer learning process, the RH-M can generalize on testing slides of the new sample type very well.Scale bar: 50 μm

Figure 2
Figure 2 Transfer learning results of RH-M backbone on prostate tissue sections.Models 1 and 2 were transferred from the pre-trained RH-M model with and without fixed RNN backbone, respectively.Baseline models were trained from scratch.(a) Back-propagated input holograms   and ground truth complex field obtained by MH-PR using 8 input holograms.(b) Reconstruction results of models 1, 2 and the baseline model transferred/trained on small training datasets with 4 different scales (i.e., different input sequence contained   = 2 back-propagated holograms during the transfer.We additionally trained the same network from scratch (baseline model) on the same 4 datasets with different    for comparison.After convergence (about 120 epochs), we tested them on an additional testing dataset of a prostate tissue excluded from all training sets.As indicated in Fig. 2b, models 1 and 2 showed equivalent reconstruction performance compared with the baseline model on all 4 datasets with different    .Increasing the ratio of the training dataset benefited the reconstruction quality as further indicated by the decreasing amplitude RMSE and the increasing enhanced correlation coefficient (ECC) values (see the Methods section).Figure 2c further confirmed the same conclusion by calculating the amplitude RMSE of the model outputs on the testing set of 49 unique FOVs.
, the training time of our approach (model 1, blue bars) was reduced by up to 19% on the same GPU machine (see Methods) compared to standard transfer learning (model 2, orange bars) and the baseline model (green bars).
Figure 2e further compared the training and validation mean absolute error (MAE) values for the transferred models and the baseline model on a small prostate tissue section dataset (with    = 12.1%).The transferred models (model 1 and 2, blue and orange curves respectively) both converged ~2.5x faster than the baseline model (green curves), saving about 60% of the training epochs to reach the same performance in terms of the validation MAE loss.

Figure 3
Figure 3 Transfer learning results of RH-M backbone on salivary gland tissue sections.Models 1 and 2 were transferred from the pre-trained RH-M model with and without fixed RNN backbone, respectively.Baseline model was trained from scratch.(a) Back-propagated input holograms   and the ground truth field obtained by MH-PR from 8 input holograms.(b) Reconstruction results of models 1, 2 and the baseline model transferred/trained on small training datasets with different numbers of input holograms RMSE, ECC, MAE and multiscale structural similarity index (SSIM) were used to evaluate the similarity between two images during the training and testing of the presented neural networks.Denoting the two images as  and  with dimensions of  × , these metrics are defined as follows: , ) = {∑ ∑  * (, )(, )  =1  =1 } √(∑ ∑ |(, )| 2  =1  =1 ) ⋅ (∑ ∑ |(, )| 2  =1  =1 ) (, ) = ∑ ∑|(, ) − (, )|  =1  =1 35e indices of image pixels., and  , are the mean and standard deviation values of 2 −1 downsampled version of , respectively, and  , is the covariance between 2 −1 downsampled versions of   .The other parameters used for the SSIM calculations are empirically determined as  1 =  1 = 0.0448,  2 =  2 = 0.2856,  3 =  3 = 0.3001,  4 =  4 = 0.2363,  5 =  5 =  5 = 0.1333,  = 5,  1 = (0.01)2,  2 = 2 3 = (0.03)2,where  = 255 for 8-bit gray scale images35.ECC values are calculated on complex images using both amplitude and phase channels.Holograms and their corresponding ground truth fields were cropped into non-overlapping 512 × 512-pixel image patches, each corresponding to a ~0.2 × 0.2  2 unique sample FOV.Then the image pair sets of each sample type were divided into training, validation, and testing sets, where the testing set was strictly captured on a different patient slide not used in training and validation sets.During the transfer learning, data augmentation techniques (flipping and rotation) were applied to expand the training set by 8 times.The holographic image reconstruction network (RH-M) follows the convolutional recurrent neural network architecture as in Ref.