Extracting quantitative biological information from bright-field cell images using deep learning

Quantitative analysis of cell structures is essential for biomedical and pharmaceutical research. The standard imaging approach relies on fluorescence microscopy, where cell structures of interest are labeled by chemical staining techniques. However, these techniques are often invasive and sometimes even toxic to the cells, in addition to being time consuming, labor intensive, and expensive. Here, we introduce an alternative deep-learning–powered approach based on the analysis of bright-field images by a conditional generative adversarial neural network (cGAN). We show that this is a robust and fast-converging approach to generate virtually stained images from the bright-field images and, in subsequent downstream analyses, to quantify the properties of cell structures. Specifically, we train a cGAN to virtually stain lipid droplets, cytoplasm, and nuclei using bright-field images of human stem-cell–derived fat cells (adipocytes), which are of particular interest for nanomedicine and vaccine development. Subsequently, we use these virtually stained images to extract quantitative measures about these cell structures. Generating virtually stained fluorescence images is less invasive, less expensive, and more reproducible than standard chemical staining; furthermore, it frees up the fluorescence microscopy channels for other analytical probes, thus increasing the amount of information that can be extracted from each cell. To make this deep-learning–powered approach readily available for other users, we provide a Python software package, which can be easily personalized and optimized for specific virtual-staining and cell-profiling applications.

Biomedical and pharmaceutical research often relies on the quantitative analysis of cell structures.For example, changes in the morphological properties of cell structures are used to monitor the physiological state of a cell culture [1], to identify abnormalities [2], and to determine the uptake and toxicity of drugs [3].The standard workflow is shown in Figure 1a: the cell structures of interest are chemically stained using fluorescence staining techniques; fluorescence images are acquired; and, finally, these images are analyzed to retrieve quantitative measures about the cell structures of interest.One key advantage is that multiple fluorescence images of the same cell culture can be acquired in parallel using the appropriate combination of chemical dyes and light filters, with the resulting images containing information about different cell structures.
However, fluorescence cell imaging has significant drawbacks.First, it requires a fluorescence microscope equipped with appropriate filters that match the spectral profiles of the dyes.Besides the complexity of the optical setup, usually only one dye is imaged at each specific wavelength, limiting the combination of dyes and cell structures that can be imaged in a single experiment.Second, the staining of the cell structures is typically achieved by adding chemical fluorescence dyes to a cell sample, which is an invasive (due to the required culture media exchange and dye uptake [4]) and sometimes even toxic process [5].Third, phototoxicity and photobleaching can also occur while acquiring the fluorescence images, which results in a trade-off between data quality, time scales available for live-cell imaging (duration and speed), and cell health [6].Furthermore, a cell-permeable form of some dyes enters a cell, and then reacts to form a stable and impermeable reaction product that is transferred to daughter cells; as a consequence, the dye intensity dilutes at every cell division and is eventually lost.Fourth, fluorescence staining techniques are often expensive, time-consuming and labor-intensive, as they may require long protocol optimizations (e.g., dye concentration, incubation and washing times have to be optimized for each cell type and dye).Also, care has to be taken when choosing multiple dye partners to avoid spectral bleed-through [7].All these drawbacks aggravate, or hinder completely, the collection of reliable and long-term longitudinal data on the same population, such as when studying cell behavior or drug uptake over time.Therefore, there is an interest in extracting the same information using cheaper, non-invasive methods.In particular, it would be desirable to replace fluorescence images with brightfield images, which are much easier to acquire and do not require specialized sample preparation, eliminating concerns about the toxicity of the fluorescence dyes or damage related to the staining and imaging procedures.However, while brightfield images do provide some information about cellular organization, they lack the clear contrast of fluorescence images, which limits their use in subsequent downstream quantitative analyses.
Recently, the use of deep learning has been proposed as a way to create images of virtually-stained cell structures, thus mitigating the inherent problems associated with conventional chemical staining.These proposals come in the wake of the deep learning revolution [8,9], where convolutional neural networks have been widely used to analyze images, e.g., for microscopy [10] and particle tracking [11][12][13][14].Virtually stained images have been created from images acquired with various imaging modalities.For example, virtual staining of cells and histopathology slides has been achieved using quantitative phase imaging [15,16], autofluorescence imaging [17], and holographic microscopy [18].Furthermore, more recent work suggests that the information required to reproduce different stainings is in fact available within brightfield images [6,19,20].
Here, we propose a deep-learning-based approach to extract quantitative biological information from brightfield microscopy.A high-level description of the proposed workflow is shown in Figure 1b.Specifically, we train a conditional generative adversarial neural network (cGAN) to use a stack of brightfield images of human stem-cell-derived adipocytes to generate virtual fluorescence-stained images of their lipid droplets, cytoplasm, and nuclei.Subsequently, we demonstrate that these virtually-stained images can be successfully employed to extract a series of quantitative biologicallyrelevant measures in a downstream cell-profiling analysis.In order to make this deep-learning-powered approach readily available for other users, we provide a Python software package, which can be easily personalized and optimized for specific virtual-staining and cell-profiling applications.

Adipocyte cell culture, imaging, and cell profiling
Adipocytes, or fat cells, are the primary cell type composing adipose tissue.They store energy in the form of lipids, mainly triglycerides, in organelles called lipid droplets.Adipocyte cell cultures are commonly employed to study how the adipocyte metabolic profile responds to therapies for metabolic diseases such as diabetes and nonalcoholic fatty liver disease [21].They are also important therapeutically as they are present in the subcutaneous skin layers, and many relatively complex therapeutics, such as nanomedicines, vaccines or biologicals, are dosed using subcutaneous injections.For example, in the case of nanomedicines and vaccines containing mRNA, the adipocytes are important for creating the active therapeutic protein product [22].
The mature adipocyte cultures, fixed using 4% paraformaldehyde, are chemically stained to label lipid droplets (Bodipy, green fluorescent), cell cytoplasm (Cell Tracker Deep Red, red fluorescent), and nuclei (Hoechst 33342, blue fluorescent).All fluorescent reagents are from Thermo Fisher Scientific and are used according to the manufacturer's instructions.
The cell cultures are imaged using a robotic confocal microscope (Yokogawa CV7000) equipped with a 60× water-immersion objective (Olympus, UPLSAPO 60XW, NA=1.2) and a 16-bit camera (Andor Zyla 5.5).Illumination correction is applied during acquisition so that the fluorescence intensities are consistent over the field of view.In each well, brightfield and fluorescence images are captured for 12 non-overlapping fields of view (280 µm × 230 µm, 2560 × 2160 pixels), for a total of 96 fields of view.For each field of view, a set of four images (one brightfield image and three fluorescence images for lipid droplets, cytoplasm, and nuclei) is acquired at 7 different z-positions separated by 1 µm.Subsequently, the fluorescence images at different z-positions are projected onto a single image using a maximum intensity projection to create a single fluorescence image per field of view and fluorescence channel.
Using the maximum intensity projections of the confocal fluorescence images, semi-quantitative phenotypic data is extracted from cell structures using the opensource cytometric image analysis software CellProfiler (https://cellprofiler.org,version 4.07 [24]) and a custommade analysis pipeline (the analysis pipelines are available in the supplementary information [25]).Measured parameters include object numbers (nuclei, cells, lipid droplets), morphological characteristics (areas), and intensity data.

Neural network architecture
Neural networks are one of the most successful tools for machine learning [8,26].They consist of a series of layers of interconnected artificial neurons.These artificial neurons are simple computational units that, when appropriately trained, output increasingly meaningful representations of the input data leading to the sought-after result.Depending on the problem, the architecture of the neural network varies.In particular, generative adversarial networks (GANs) [27] have been shown to perform well in image-to-image transformation tasks, including recently to realize virtual stainings [15][16][17][18]20].A GAN consists of two networks [27]: a generator, which generates images, and a discriminator, which discriminates whether images are real or created by the generator.The adversarial aspect refers to the fact that these two networks compete against each other: during the training, the generator progressively becomes better at generating synthetic images that can fool the discriminator, while the discriminator becomes better at discriminating real from synthetic images.
In this work, we employ a conditional GAN (cGAN) [30].A schematic of its architecture is shown in Figure 2. The generator receives as input a stack of brightfield images of the same field of view acquired at different z-positions and generates virtually-stained fluorescence images of lipid droplets, cytoplasm, and nuclei.The discriminator attempts to distinguish the generated images from fluorescently-stained samples, classifying them as either real or synthetic data.The conditional aspect of the cGAN refers to the fact that the discriminator receives both the brightfield stack and the stained images as inputs.Thus, the task of the discriminator is conditioned on the brightfield images, i.e., instead of answering "is this a real staining?",the discriminator answers "is this a real staining for this stack of brightfield images?" In our implementation, the generator is based on the U-Net-architecture [31], where the input image is first downsampled to a smaller representation and then upsampled to its original size, with skip connections between the downsampling and upsampling paths to retain local information.We have modified the original U-Net architecture to optimize its performance for virtual staining.First, each encoder convolutional block (Figure 2) concatenates its input with the result of two sequential convolutional layers before downsampling; this helps the network to propagate information deeper, because it preserves the input information without the need for the convolutional layers to learn to preserve it.Second, in order to tackle the vanishing gradient problem and to improve the latent space representation (i.e. a low-dimensional representation of the input data, usually corresponding to the innermost layers of the U-Net where the input is most compressed) [14,18,32], we have implemented the bottleneck of the U-Net architecture using two residual networks blocks (ResNet blocks, which preserve information from the previous layer in the network, like the encoder convolutional blocks, but add the input and output of the block, instead of concatenating them [28], Figure 2), each with 512 feature channels.Third, every layer (except the last two) uses instance normalization and a leaky ReLU activation (defined as Φ(x) = α • x, where α = 1 for x > 0 and α = 0.1 for x < 0), which, differently from standard ReLU, has the advantage of retaining a gradient in the backpropagation step even for negative layer outputs [33].
In the first layer of the generator, we normalize the input brightfield z-stack as xi = tanh 2 where x i is the pixel value of the i-th z-slice of the original stack and xi is that of the rescaled z-slice, while q i p denotes the p-th percentile pixel value of that z-slice calculated on the entire training set.By estimating the percentiles on the entire training set instead of on a patchby-patch basis, the normalization becomes more resilient to outliers.Furthermore, normalizing by using statistical FIG. 2. Conditional generative adversarial neural network (cGAN) for virtual staining.The generator transforms an input stack of brightfield images into virtually-stained fluorescence images of lipid droplets, cytoplasm, and nuclei, using a U-Net architecture with the most condensed layer being replaced by two residual network (ResNet) blocks [28].In the first layer of the generator, we normalize each input channel (i.e., each brighfield z-slice) in the range [−1, 1] using Equation ( 1).The U-Net encoder consists of convolutional blocks followed by max-pooling layers for downsampling.Each convolutional block contains two paths (a sequence of two 3 × 3 convolutional layers, and the identity operation), which are merged by concatenation.The U-Net decoder uses bilinear interpolations for upsampling followed by concatenations layers and convolutional blocks.Next, the hyperbolic tangent activation transforms the output to the range [−1, 1].In the last layer of the U-Net, the network learns to denormalize the output images back to original pixel values by scaling and adding an offset to the output.Every layer in the generator, except the last two layers and the pooling layers, is followed by an instance normalization and a leaky ReLU activation.The discriminator is designed similar to the PatchGan discriminator [29] and receives both the brightfield images and fluorescence images (either the target fluorescence images or those predicted by the generator).The inputs to the discriminator are normalized as those to the generator.The discriminator convolutional blocks consist of 4 × 4 strided convolutions for downsampling.In all layers in the discriminator, we use instance normalization (with no learnable parameters) and leaky ReLU activation.Finally, the discriminator outputs a matrix containing the predicted probability for each patch of 32 × 32 pixels.
properties of the distribution of intensities rather than the minimum and maximum of the intensities, we prevent the normalization from depending on the image size and we preserve a local correspondence between the intensities of the different channels, which aids the training procedure.Finally, the choice of the hyperbolic tangent as a normalization function ensures that all values fall in the range [−1, 1], while mitigating the effect of outliers in the intensity distribution.In the last layer of the U-Net, the network learns to denormalize the output images back to original pixel values by scaling and offsetting the output.
We employ a discriminator that follows a conditional PatchGan architecture [29]: It receives the stack of brightfield images and the fluorescence images (either the target fluorescence images or the virtually-stained im-ages), divides them into overlapping patches, and classifies each patch as real or fake (rather than using a single descriptor for the whole input).This splitting arises naturally as a consequence of the discriminator's convolutional architecture [34].As shown in Figure 2, the discriminator's convolutional blocks consist of 4×4 convolutional layers followed by strided convolutions for downsampling.In all layers, we use instance normalization (with no learnable parameters) and leaky ReLU activation.Finally, the discriminator output is a matrix that represents the predicted classification probability for each patch.The benefit of using a PathGAN is that the discriminator evaluates the input images based on their style rather than their content.This modification makes the generator task of fooling the discriminator more specialized, thus improving the quality of the generated virtual stainings [18].
We have implemented this neural network using Deep-Track 2.0, an open-source software for quantitative microscopy using deep learning that we have recently developed [14,35], which uses a Python-based TensorFlow backend [36,37].

Training procedure
Once the network architecture is defined, we need to train it using z-stacks of brightfield images for which we know the corresponding target fluorescence images.As we have seen above, the dataset consists of 96 sets of images (each consisting of seven brightfield images and three fluorescence targets with 2560 × 2160 pixels).We randomly split these data into a training dataset and a validation dataset, corresponding to 81 and 15 sets of images, respectively.
Before starting the training process, the brightfield images and corresponding fluorescence targets need to be carefully aligned (a slight misalignment results from the different optics employed to capture the brightfield and fluorescence images).We use a Fourier-space correlation method that calculates a correction factor in terms of a pixel offset and a scale factor (see code in the supplementary information [25]).Afterward, we stochastically extract 512×512 pixel patches from the corrected images and augment the training dataset using rotational and mirroring augmentations.Importantly, the misalignment must be corrected before the augmentation step because otherwise the augmentations would introduce irreducible errors and put a fundamental limit on high-frequency information.
During training, the trainable parameters of the neural network (i.e., the weights and biases of the artificial neurons in the neural network layers) are iteratively optimized using the back-propagation training algorithm [38] to minimize the loss function, i.e., the difference between virtually-stained images and target chemicallystained images.Initially, we set the weights of the convolutional layers of both the generator and discriminator to be randomly (normally) distributed with a mean of 0 and a standard deviation of 0.02; all of the biases are set to 0.
In each training step, we alternately train the generator and the discriminator.First, the generator is tasked with predicting the fluorescence images corresponding to stacks of brightfield images.Then, the discriminator receives both the brightfield images and fluorescence images (either the target fluorescence images or the virtually-stained images predicted by the generator) and classifies them as real (chemically-stained images, labeled with 1's) or fake (virtually-stained images, labeled with 0's).
The loss function of the generator is where z target represents the chemically-stained (target) images, z output represents the virtually-stained (generated) images, MAE {z target , z output } is the mean absolute error between the target and generated images, D(•) is the discriminator prediction, and β is a weighting factor between the two part of the loss function (we set β = 0.001, which makes the typical value of the MAE roughly half the discriminator term).Importantly, L gen depends on the discriminator prediction and penalizes the generator for producing images classified as fake.The loss function of the discriminator is which penalizes the discriminator for misclassifying real images as generated or generated images as real.Thus, the generator tries to minimize its loss by achieving D(z output ) = 1 for the images it generates, while the discriminator tries to achieve D(z output ) = 0 for generated images and D(z target ) = 1 for the chemically-stained fluorescence targets.This leads to an adversarial behavior between the generator and the discriminator.
We have trained both networks for 8000 epochs (each consisting of 24 batches of 8 images) using the Adam optimizer [39] with a learning rate of 0.0002 and β 1 = 0.5 (the exponential decay rate for the 1-st moment estimates).Each epoch takes 10 seconds on a NVIDIA A100 GPU (40 GB VRAM, 2430 MHz effective core clock, 6912 CUDA cores), for a total training time of about 22 hours.

Qualitative analysis
Figure 3 shows a representative example of virtual staining for one of the validation data realized with the cGAN described in the previous section (images for all validation data are available in the supplementary information [25]).Figure 3a shows the first of the seven brightfield slices used as input for the cGAN, and Figure 3b, 3c, and 3d show the corresponding target chemically-stained fluorescence images.Comparing the FIG. 3. Virtually-stained fluorescence images.a Brightfield image and corresponding b-d chemically-stained and e-g virtually-stained fluorescence images for lipid droplets, cytoplasm and nuclei.h-o Enlarged crops corresponding to the dotted boxes in a-g.The lipid droplets are clearly visible in the brighfield image (a and h) thanks to their high refractive index so that the cGAN manages to accurately predict the chemically-stained images (b and i) generating accurate virtual stainings (e and m), even reproducing some details of the internal structure of the lipid droplets (indicated by the arrows in i and m).The chemical staining of the cytoplasm (c and j) is also closely reproduced by the virtual staining (f and n).The virtually-stained nuclei (g and o) deviate more prominently from the chemically-stained ones (d and k), especially in the details of both their shape and texture, which can be explained by the fact that the nuclei are not clearly visible in the brightfield image so that the cGAN seems to use the surrounding cell structures to infer the presence and properties of the nuclei shape.
brightfield inputs with the fluorescence targets, it can be seen that the brightfield image contains information about the cellular structures, but such information is less readily accessible than in the fluorescence images.Furthermore, it can be noticed that different cell structures have different prominence in the brightfield image, with the lipid droplets being more clearly visible than the cytoplasm, and, in turn, the cytoplasm clearer than the nuclei.
Despite the limited information in the brightfield image, the cGAN manages to predict the fluorescence targets, as can be seen in Figures 3e, 3f, and 3g for lipid droplets, cytoplasm, and nuclei, respectively.Overall, the virtually-stained images appear to be qualitatively very similar to the chemically-stained ones.Figures 3h-o show some enlarged crops of Figures 3a-g, where details can be more clearly appreciated.
The lipid droplets are virtually stained with great de-tail, as can be appreciated by comparing the enlarged crop of the chemical staining (Figure 3i) with that of the virtual staining (Figure 3m).This is to be expected because the lipid droplets, consisting primarily of lipids at high concentration, have a higher refractive index than most other intracellular objects [40], which makes them clearly visible in the brightfield images.Interestingly, even some details about the internal structure of the lipid droplets can be seen in the virtual staining (e.g., those indicated by the arrows in Figure 3i and 3m).These structures are probably due to proteins embedded in the surface or core of the droplets that affect the appearance of the chemically-stained cells [41]: Since most of the space inside adipocytes is occupied by lipid droplets, when these cells need to increase their metabolic activity (e.g., during protein synthesis), they rearrange their contents creating textural imprints on the surfaces of the lipid droplets resulting in golf-ball-like textures.
A lot of detail can also be found in the virtuallystained cytoplasm, as can be seen by comparing the enlarged chemically-stained image (Figure 3j) with the corresponding enlarged virtually-stained image (Figure 3n).Similar to the lipid droplets, the high quality of the cytoplasm virtually-stained images is also to be expected as the cytoplasm also has good contrast in the brightfield images, although less than the lipid droplets.We can see that some of the fine structures appear to be slightly different.This is particularly evident in the contrast between various cytoplasmic structures (see, e.g., those indicated by the arrows in Figure 3j and 3n).However, since the cytoplasm dye (CellTracker Deep Red) reacts with amine groups present in intracellular proteins dispersed in the cytoplasm, this propably leads to uneven staining patterns in the chemically-stained image, which are intrinsically random and, therefore, cannot be reproduced by the virtual-staining procedure.
The nuclei are more difficult to virtually stain because they have very similar refractive index to the surrounding cytoplasm [42], so that there is little information about them in the brightfield image.Nevertheless, the cGAN manages to identify them, as can be seen by comparing the enlarged crop of the chemically-stained nuclei (Figure 3k) with the corresponding virtual staining (Figure 3o), although without resolving the details of their internal structure.The cGAN seems to extract information about the nuclei shape mostly based on the surrounding cell structures, making it difficult to predict nuclei that are not surrounded by lipid droplets.Considering that the cell is typically at its thickest around the position of the nucleus, complementing the brightfield images with phase contrast images may give additional information that is helpful for increasing the robustness of the virtual nuclei staining.

Quantitative analysis
The stained images are then used to extract quantitative biological information about the cell structures.For example, quantitative information about the cellular lipid droplet content is critical to study metabolic diseases where the fat storage in adipocytes plays a pivotal role and to dissect the mechanisms leading to organ injury due to lipid deposition in ectopic tissue [43].As a consequence, generation of accurate and relevant quantitative cell structure data is of key importance for biomedical and pharmaceutical research as well as for clinical therapeutic decisions.
Here, we have used the open-source software CellProfiler (version 4.07 [24]) to identify and segment the lipid droplets, cytoplasm and nuclei in both the chemicallystained and virtually-stained fluorescence images (the analysis pipeline is available in the supplementary information [25]).For each cell structure, we employ a feature-extraction pipeline that calculates the number of cell structures in each image, their mean area in pixels, their integrated intensity, their mean intensity, and the standard deviation of their mean intensity.The results of this quantitative analysis are shown in Figure 4 for the same representative set of validation images used in Figure 3 (the results for all validation data are available in the supplementary information [25]).The aggregated results for the whole validation dataset are presented in Table I.
The first step of the feature-extraction pipeline is to segment the relevant cell structures.Starting from the fluorescence images, the feature-extraction pipeline identifies relevant cellular structures based on threshold values for intensity, size and shape.Figures 4a-c show the segmentations obtained from the chemically-stained images, and Figures 4d-e the corresponding segmentations obtained from the virtually-stained images.
In the feature-extraction pipeline, the nuclei are identified first (Figures 4c and 4f).Since the lipid droplets in the adipocytes may occlude the nuclei and physically change their size and shape, a wide range of possible nuclear diameters and shapes is selected to ensure a successful segmentation.Furthermore, since the intensity of the nuclei varies, an adaptive thresholding strategy is chosen (i.e., for each pixel, the threshold is calculated based on the surrounding pixels within a given neighborhood).As a last step, nuclei that are clumped together are distinguished by their shape.Identifying the nuclei is critically important because the number of nuclei is often used for the quantification of different biological phenomena, for example the average amount of lipids per cell in the context of diabetes research.
In the second part of the feature-extraction pipeline, the cytoplasm is segmented to determine the cell boundaries, starting from the locations of the previously identified nuclei (Figures 4b and 4e).An adaptive thresholding strategy is again used, with a larger adaptive window (the neighborhood considered for the calculation of the FIG. 4. Quantitative information from chemically-stained and virtually-stained fluorescence images.Segmentation obtained using CellProfiler (https://cellprofiler.org,version 4.07 [24]) of a-c chemically-stained target image and d-f virtually-stained generated image for lipid-droplet, cytoplasm and nuclei.Probability distribution for g-i the size and j-m the mean intensity of the individual lipid droplets, cytoplasmatic regions, and nuclei identified by CellProfiler for the chemicallystained (gray histograms) and virtually-stained (colored histograms) segmentations.n-p Total cell structure count, mean area, integrated intensity, mean intensity, and standard deviation of the mean intensity for the lipid droplets, cytoplasmatic regions, and nuclei identified by CellProfiler in the virtually-stained segmentations (colored outlines) normalized to those identified in the chemically-stained segmentations (gray outlines).threshold) compared to that used for the nuclei segmentation.Identifying the cytoplasm structure is important because it gives information about the cell size (measured area) and morphology (e.g., presence of protrusions or blebbing features), which are in turn related to the physiological state of the cell [44].
In the final part of the feature-extraction pipeline, the lipid droplets are segmented independently from the nuclei and cytoplasm (Figures 4a and 4d).This segmentation is done in two steps to target separately the smaller and larger lipid droplets.For each of the two steps, a range of expected diameters and intensities are selected for the image thresholding.Since lipid droplets in each of the size distributions have similar peak intensities, a global thresholding strategy is used for their identification.Lipid droplets that are clumped together are distinguished by their intensity rather than their shape, which is consistently round for all the lipid droplets.
The segmented images are then used to count and characterize the cell structures.Figures 4g-m show that there is a good agreement between the probability distribution histograms for the area size and mean intensity of the cell structures identified from the chemicallystained (gray histograms) and virtually-stained (colored histograms) segmentations for lipid droplets (Figures 4g  and 4j), cytoplasm (Figures 4h and 4k), and nuclei (Fig- ures 4i and 4m).Figures 4n-p show the cell structure count in the image, their mean area, their combined integrated intensity over the image, the mean intensity of cell structures in the image, and the standard deviation of the mean intensity identified by CellProfiler in the virtually-stained images (colored outlines) normalized to those identified in the chemically-stained images (gray outlines) for the lipid droplets (Figure 4n), cytoplasmic regions (Figure 4o), and nuclei (Figure 4p).
The aggregated results for the features extracted using CellProfiler for the whole validation dataset are presented in Table I.Importantly, there is a high correlation (Pearson correlation coefficient ρ in Table I) between all metrics obtained with the chemically-stained and virtuallystained images.This indicates that any deviation between these metrics is systematic and consistent, which is highly relevant for biological experiments, where the focus is often not on absolute values but rather on the comparison of different samples.
The feature extraction from the virtually-stained images shows the best performance for the lipid droplets.This is very useful for potential applications because, e.g., lipid droplets are often used to measure the effect of drugs for metabolic diseases.In this context, the amount of fat in cells can be quantified by normalizing the number of lipid droplets, their mean area or integrated intensity to the number of cells in the image.A systematically lower number of larger lipid droplets is identified in the segmented virtually-stained images (Figures 4d) compared to the segmented chemically-stained images (Figures 4a).This can be partly explained by the fact that chemicallystained fluorescence images of the lipid droplets have some intensity variations (see, e.g., those indicated by the arrows in Figures 3i and 3m), which may result in the erroneous segmentation of a single lipid droplet into multiple parts (see, e.g., the inset in Figure 4a).Even though these intensity variations are reproduced in the virtuallystained images (see, e.g., those indicated by the arrows in Figures 3j and 3n), they do not translate into an erroneous segmentation of the image by CellProfiler, leading to identification of fewer but larger lipid droplets (see, e.g., the inset in Figure 4d).Therefore, the lipid droplet count is lower, their area larger, and their integrate intensity is higher when analyzing the virtually-stained images compared to when analyzing the chemically-stained ones (Table I).Nevertheless, the average and standard deviation of their mean intensity are more closely estimated (probably thanks to the fact that these are intensive quantities).
The main information extracted from the cytoplasm staining is related to the cell boundaries and morphology.In this respect, the cell count and mean area are the most important metrics, which are reproduced very well by the analysis of the virtually-stained images (Table I).The other metrics are related to the intensity of the cytoplasm, which can be inconsistent even in the chemicallystained images because the cytoplasmic dye (CellTracker Deep Red) reacts with amine groups present in intracellular proteins dispersed in the cytoplasm producing an uneven texture.This explains why the cGAN cannot predict the exact spatial distribution and amount of the chemical dye from which the chemically-stained images are On the other hand, the metrics about the integrated intensity, mean intensity, and standard deviation of the mean intensity are reproduced accurately from the virtually-stained images.
The nuclei are used to identify the individual cells, for which both the number and morphological properties of the nuclei are needed.In this respect, the most important measures are the nuclei count and mean area, which are determined accurately using the virtually-stained images (Table I).The other metrics (pixel value, mean intensity, and standard deviation of the intensity) are less comparable to the chemically-stained fluorescence images.The cGAN does not manage entirely to capture the dynamic content of the nuclei, possibly because of the non-static chromatin conformations present in living cells, resulting in different levels of dye accessibility.With this information not being visible in the brightfield images, it is not surprising that the virtual staining does not include textural details.Nevertheless, this is not generally a problem because in most studies the cell nuclei morphology or chromatin conformation is not the aim, rather the nuclei constitute cell structures useful as normalization factors.The virtual staining does offer sensitive cell number determination and, as such, enables cell-cell comparison of other measured parameters.Considering the known phototoxicity of Hoechst 33342 in time-lapse imaging series of living cells [45], and the cGAN enables accurate nuclear counts and cell segmentation, and may be preferred over chemical staining.

CONCLUSIONS
We have developed a deep-learning-powered method for quantitative analysis of intracellular structures in terms of their size, morphology, and content.The method is based on virtually-stained images of cells derived from brightfield images and subsequent downstream analysis to quantify the properties of the virtually-stained cell structures.
We have demonstrated the accuracy and reliability of our method by virtually staining and quantifying the lipid droplets, cytoplasm, and cell nuclei from brightfield images of stem-cell-derived adipocytes.While the lipid droplets are easily visible in the brightfield images, direct quantification of their size and content using conventional analysis techniques is challenging, and fluorescent staining techniques are typically used.The cytoplasm and cell nuclei are almost indistinguishable based on their optical contrast, but also in this case the neural network manages to reconstruct them, probably also making use of information contained in the spatial distribution of the lipid droplet.
Compared to standard approaches based on fluorescent staining, our approach is less labor-intensive and the results do not depend on careful optimization of the staining procedure.Therefore, the results are more robust and can potentially be compared across experiments and even across labs.We note also that the proposed approach is not limited to the structures quantified in this work, but can be applied to virtually stain and quantify any intracellular object with unique optical characteristics.Furthermore, virtual staining does not exclude fluorescent imaging, so additional information can also be obtained from the liberated fluorescence channels, such as particle uptake or protein expression, both of which are important, e.g., for subcutaneous dosing of nanomedicines and vaccines.
To make this method readily available for future applications, we provide a Python open-source software package, which can be personalized and optimized for the needs of specific users and applications [25].

FUNDING AND ACKNOWLEDGMENTS
The authors would like to thank Anders Broo and Lars Tornberg from AstraZeneca and Johanna Bergman and Sheetal Reddy from AI Sweden for enlightening discussions.AI Sweden provided access to their computational resources.The authors would also like to acknowledge that the idea for this work is inspired by the Adipocyte Cell Imaging Challenge held by AI Sweden and AstraZeneca.This work was partly supported by the H2020 European Research Council (ERC) Starting Grant ComplexSwimmers (677511), the Knut and Alice Wallenberg Foundation, and the Swedish Strategic Research Foundation (ITM17-0384).

FIG. 1 .
FIG. 1.From cell cultures to quantitative biological information.a The standard workflow entails chemically staining the cell structures of interest, imaging them using fluorescence microscopy (in multiple light channels), and, finally, using these fluorescence images to retrieve quantitative biologically-relevant measures about the cell structures of interest.b The deeplearning-powered approach we propose replaces the chemical-staining and fluorescence microscopy with a conditional generative adversarial neural network (cGAN) that uses brightfield images to generate virtual fluorescence-stained images.

TABLE I .
Comparison of features extracted from chemically-stained and virtually-stained images for the whole validation dataset.Average and standard deviation of various metrics (pixel value, count, mean area, integrated intensity, mean intensity, and standard deviation of the mean intensity of lipid droplets, cytoplasmic regions, and nuclei) calculated over the 15 sets of target chemically-stained images and of the predicted virtually-stained images of the validation dataset.We also report the value and percentage of the mean absolute error (MAE) as well as the Pearson correlation ρ between the metrics calculated on the target and predicted images.Note that the pixel values are in the original image range [0, while the intensity measurements are extracted with CellProfiler from images rescaled in the range [0, 1].The features that are most biologically relevant for each cell structure are highlighted in gray.