Quantitative Digital Microscopy with Deep Learning

Video microscopy has a long history of providing insights and breakthroughs for a broad range of disciplines, from physics to biology. Image analysis to extract quantitative information from video microscopy data has traditionally relied on algorithmic approaches, which are often difficult to implement, time consuming, and computationally expensive. Recently, alternative data-driven approaches using deep learning have greatly improved quantitative digital microscopy, potentially offering automatized, accurate, and fast image analysis. However, the combination of deep learning and video microscopy remains underutilized primarily due to the steep learning curve involved in developing custom deep-learning solutions. To overcome this issue, we introduce a software, DeepTrack 2.0, to design, train and validate deep-learning solutions for digital microscopy. We use it to exemplify how deep learning can be employed for a broad range of applications, from particle localization, tracking and characterization to cell counting and classification. Thanks to its user-friendly graphical interface, DeepTrack 2.0 can be easily customized for user-specific applications, and, thanks to its open-source object-oriented programming, it can be easily expanded to add features and functionalities, potentially introducing deep-learning-enhanced video microscopy to a far wider audience.


I. INTRODUCTION
During the last century, the quantitative analysis of microscopy images has provided important insights for various disciplines, ranging from physics to biology.An early example is the pioneering experiment performed by Jean Perrin in 1910 that demonstrated beyond any doubt the physical existence of atoms [1]: he manually tracked the positions of microscopic colloidal particles in a solution by projecting their image on a sheet of paper (Fig. 1A) and, despite a time resolution of just 30 seconds, he managed to quantify their Brownian motion and connect it to the atomic nature of matter.In the following decades, several scientists followed in Perrin's footsteps, improving the time resolution of the experiment down to seconds [2,8] (Fig. 1B).Despite these improvements, manual tracking of particles intrinsically limits the time resolution of conceivable experiments.
In the 1950s, analog electronics provided some tools to increase acquisition and analysis speed.According to Preston [9], the history of digital microscopy begins in Britain in 1951 with an unlikely actor: the British National Coal Committee convened to investigate "the possibility of making a machine to replace the human observer" to measure coal dust in mining operations [10].In 1955, Causley and Young developed a flying-spot microscope to count and size particles and cells [3]: The flying-spot microscope used a cathode-ray tube to scan a sample pixel by pixel, while the cells were counted and sized by a simple analog integrated circuit (Fig. 1C).This device allowed over an order of magnitude faster counting than human operators, while maintaining the same accuracy.
During the 1950s and in earnest in the 1960s, researchers started employing digital computers to add speed and functionalities to microscopic image analysis, with a growing focus on biomedical applications.In 1965, Prewitt and Mendelsohn managed to distinguish cells in a blood smear by analyzing with a computer their images obtained by a flying-spot microscope and recorded as 8-bit values on a magnetic tape [11].In the following years, digital microscopy went from research labs to clinical settings with the development of the computerized tomography scanner (CT-scanner) in 1972 [12] and of the automated flow cytometer in 1974 [13].
In soft matter physics, despite the early success of Perrin's experiment [1], most studies focused on the ensemble behavior of colloidal particles employing methods such as selective photobleaching and image correlation [14][15][16].These methods can resolve fast dynamics, but they can only measure the average behavior of a homogeneous colloidal solution [16].To overcome these limitations, Geerts et al. automated particle tracking in 1987, developing what is now known as single particle tracking, and used it to track individual gold nanoparticles on the surface of living cells from images acquired with a differential interference contrast (DIC) microscope [4] (Fig. 1D).In the following decades, researchers have also used fluorescent molecules [17][18][19] and quantum dots [20,21] as tracers within biological systems.
It became quickly evident that highly accurate tracking algorithms were needed to analyze the collected data.In 1996, Crocker and Grier proposed an algorithm to determine particle positions based on the measurement of the centroids of their images [5] (Fig. 1E).The main advantage of this algorithm is that it is largely setup-agnostic, i.e., it does not depend on the specific properties of the imaging system and of the particle.Other setup-agnostic approaches have been proposed in more recent years analyzing, e.g., the Fourier transform of the particle image [22], or its radial symmetry [23].Other algorithms, instead, made a model of the image based on the properties of the imaging system and of the particle [16,[24][25][26][27][28][29].These alternative methods were less general and often A Examples of manually tracked trajectories of colloids in a suspension from Perrin's experiment that convinced the world of the existence of atoms [1].The time resolution is 30 seconds.B Kappler manually tracked the rotational Brownian motion of a suspended micromirror to determine the Avogadro number [2].C-E 1951-2015: The digital microscopy era.C Causley and Young developed a computerized microscope to count particles and cells using a flying-spot microscope and an analog analysis circuit [3].D Geerts et.al. developed an automatized method to track single gold nanoparticles on the membranes of living cells [4].E Crocker and Grier kickstarted modern particle tracking, achieving high accuracy using a largely setup-agnostic approach [5].F-I 2015-2020: The deep-learning-enhanced microscopy era.F Ronnerberger et.al. developed the U-Net, a variation of a convolutional neural network that is particularly suited for image segmentation and has been very successful for biomedical applications [6].G Helgadottir et.al. developed a software to track particles using convolutional neural networks (DeepTrack 1.0) and demonstrated that it can achieve higher tracking accuracy than traditional algorithmic approaches [7].J This article presents DeepTrack 2.0, which provides an integrated environment to design, train and validate deep learning solutions for quantitative digital microscopy.
more computationally expensive, but they often achieved higher accuracy and could also provide quantitative information about the particle, such as its size [30] or its out-of-plane position [31][32][33].Despite the large number of methods being introduced, digital video microscopy remained a hard problem, requiring the development of ad hoc algorithms tuned to the needs of each experiment.
In fact, a 2014 comparison of 14 tracking methods found that, when compared on several different simulated sce-narios, no single algorithm performed best in all scenarios [34].
Only in the last few years has machine learning started to be employed for the analysis of images obtained from digital microscopy.This comes in the wake of the deep learning revolution [35], thanks to which computer-vision task such as image recognition [36], semantic segmentation [37], and image generation [38], are now automatized with relative ease.Recent results have demonstrated the potential of applying deep learning to microscopy, vastly improving techniques for particle tracking [7,39,40], cell segmentation and classification [6,[41][42][43][44], particle characterization [39,45,46], object counting [47], depth-offield extension [48], and image resolution [49,50].In 2015, Ronnerberger et.al. developed a special kind of neural network (U-Net) for the segmentation of cellimages [6] (Fig. 1F), which is now widely used for the segmentation of biomedical images.In particle tracking, Hannel et.al. employed deep learning to track and measure colloids from their holographic images [39], Newby et.al. demonstrated how deep learning can be used for the simultaneous tracking of multiple particles [40], and Helgadottir et.al. achieved tracking accuracy surpassing standard methods [7] (Fig. 1G).These early successes clearly demonstrate the potential of deep learning to analyze microscopy data.However, they also point to a key limiting factor for the development and deployment of deep-learning solutions to microscopy: the availability of high-quality training data.In fact, training data often need to be experimentally acquired specifically for each application and, especially for biomedical applications, to be manually annotated by experts, which are expensive, time-consuming and potentially biased processes [51].
In this article, we provide a brief review of the applications of deep learning to digital microscopy and we introduce a comprehensive software (DeepTrack 2.0, Fig. 1H) to design, train and validate deep-learning solutions for quantitative digital microscopy.In section II, we review the main applications of deep learning to microscopy and the most frequently employed neuralnetwork architectures.In section III, we introduce Deep-Track 2.0, which greatly expands the functionalities of the particle-tracking software DeepTrack 1.0 [7], and features a user-friendly graphical interface and a modular (object-oriented) architecture that can be easily expanded and customized for specific applications.Finally, in section IV, we demonstrate the versatility and power of deep learning and DeepTrack 2.0 by using it to tackle a variety of physical and biological quantitative digital microscopy challenges, from particle localization, tracking and characterization to cell counting and classification.

II. DEEP LEARNING FOR MICROSCOPY
In this section, we will start by providing an overview of machine learning and deep learning, in particular introducing and comparing the deep-learning models that are most commonly used in microscopy: fully connected neural networks, convolutional neural networks, convolutional encoder-decoders, U-Nets, and generative adversarial networks.Subsequently, we will review some key applications of deep learning in microscopy focusing on three key areas: image segmentation, image enhancement, and particle tracking.
Image segmentation partitions an image into multiple segments each corresponding to a specific object (e.g., cells of different kinds).In this context, deep learning has been very successful, especially in the segmentation of biological and biomedical images.However, one limiting factor is the need for high-quality training datasets, which often need to be annotated by experts manually, a time-consuming and tedious task.
Image enhancement includes tasks such as noise reduction, deaberration, refocusing and superresolution.Also in this case, deep learning has been widely employed in the last few years, especially in the context of computational microscopy.Differently from image segmentation, image enhancement can often utilize training datasets that are directly acquired from experiments.
Particle tracking deals with the localization of objects (often microscopic colloidal particles or tracer molecules) in 2D or 3D.Deep-learning-powered solutions are more accurate than algorithmic approaches, work in extremely difficult environments with poor signal-to-noise ratio, and can extract quantitative information about the particles.Particle-tracking algorithms can often be trained using simulated data by employing physical simulations of the required images.

A. Deep learning
In contrast to standard computer algorithms where the user is required to define explicit rules to process the data, machine-learning algorithms can learn patterns and rules to perform specific tasks directly from series of data.In supervised learning, machine-learning algorithms learn by adjusting their behavior according to a set of input data and corresponding desired outputs (the ground truth).These input-output pairs constitute the training dataset, which can be obtained either from experiments or from simulations.
Deep learning is a kind of machine learning built on artificial neural networks (ANN) [35].ANNs were originally conceived to emulate the capabilities of the brain, specifically its ability to learn [52].They are constituted by interconnected artificial neurons (simple computing units often just returning a non-linear function of their inputs).Often, these artificial neurons are organized in layers (typically with tens or hundreds of artificial neurons).In the most commonly employed architectures, each layer receives the output of the previous layer, computes some transformation, and feeds the result into the next layer.In many machine vision applications, the number of layers is in the tens (this number is the "depth" of the ANN; hence the term "deep learning").
The weights of the connections between artificial neurons and layers are the parameters that are adjusted in the training process.The training can be broken down into the following steps (referred to as the error backpropagation algorithm [53]): First, the ANN receives an input and calculates a predicted output based on its current weights.Second, the output is compared to the true, desired output, and the error is measured using a loss function.Third, the ANN propagates this error backwards, calculating for each weight whether it should be increased or decreased in order to reduce the error (and a local estimate of the rate of change of the error depending on that weight).Finally, the weights are updated using an optimiser, which determines how much each weight should be changed.By feeding the network additional training data, it typically improves its performance, gradually converging to some optimum weight configuration.
In microscopy applications, the most commonly employed ANN architectures are dense neural networks, convolutional neural networks, convolutional encoderdecoders, U-Nets, and generative adversarial networks (Table I).
The workhorse of ANNs are dense neural networks (DNNs), which consist of a series of layers fully connected in sequence.While sufficiently large DNNs can approximate any function [54], the number of weights required quickly grows to unmanageable levels, especially for large inputs such as images.Furthermore, they present a rigid structure, where the dimensions of both the input and output are fixed.Therefore, when analyzing images, they are rarely used on their own, while they are often employed as the final steps of some other network to generate the final output from already pre-processed data.
In contrast, convolutional neural networks (CNNs) are particularly useful to analyze images.They are primarily built upon convolutional layers.In each convolutional layer, a series of 2D filters are convolved with the input image, producing as output a series of feature maps.The size of the filters with respect to the input image determines the features that can be detected in each layer.To gradually detect larger features, the feature maps are downsampled after each convolutional layer.The downsampled feature maps are then fed as input to the next network layer.There is often a dense top after the last convolutional layer, i.e., a relatively small DNN that integrates the information contained in the output feature maps of the last layer to determine the sought-after result.
Convolutional encoder-decoders are convolutional neural networks constituted by two paths.First, there is the encoder path, which reduces the dimensionality of the input through a series of convolutional layers and downsampling layers, therefore encoding the information about the original image.Then, there is the decoder path, which uses the encoded information to reconstruct either the original image or some transformed version of it (e.g., in segmentation tasks).Therefore, when trained to reconstruct the input image at the output, the information at the end of the encoder path can serve as a compressed version of the input image.When trained to reconstruct a transformed version of the input image, the encoded information can serve as a powerful representation of the input image useful for the specific task at hand.
U-Nets are an especially useful evolution of convolutional encoder-decoders.In addition to the encoder and decoder convolutional paths, U-Nets feature also forward concatenation steps between corresponding levels of these two paths.This permits them to preserve positional information lost when the image resolution is reduced.They have been particularly successful in analyzing and segmenting biological and biomedical images.
Differently from the previous case, generative adversarial networks (GANs) combine two networks, a generator and a discriminator, regardless of their specific architectures [55].The generator manufactures data, usually images, from some input.The discriminator, in turn, classifies its input as either real data or synthetic data created by the generator.The term adversarial refers to the fact that these two networks compete against each other: The generator tries to fool the discriminator with manufactured data, while the discriminator tries to expose the generator.The generator can be trained to either transform images by feeding it with a real image as input or to make up images by feeding it with a random input.The generator is typically either a convolutional encoderdecoder or a U-Net, while the discriminator is often a convolutional neural network.While GANs are a breakthrough for data generation and offer many benefits, they are difficult to train and highly sensitive to hyperparameter tuning: Slight changes in their overall architecture can lead to vanishing gradients, lack of convergence, and uncorrelated generator loss and image quality [56,57].

B. Image segmentation
Deep learning has been extremely successful at segmentation tasks, especially for biomedical applications [44,[59][60][61][62][63][64][65][66][67], but also in material science [68,69].Image segmentation is typically used to locate objects and boundaries in images.More precisely, image segmentation assigns a label to every pixel in an image such that pixels with the same label share certain characteristics (e.g., represent objects of the same type).
Generally, deep-learning models performing image segmentation are trained using experimental data that need to be manually annotated by experts.In some cases, to alleviate the need for annotated images, pretrained neural networks are employed (i.e., neural networks that have been trained for classification tasks on a large dataset of different images, often not directly related to the task at hand) and fine-tuned using a relatively small set of manually annotated data [44,59].
If the exact topography of the sample is not needed, one can downsample the image using several convolutional layers, obtaining a coarse classification of its various regions.For example, this approach was used by Coudray et.al. [42] to distinguish cancerous lung cells from normal tissue by fine-tuning a pre-trained neural network for image analysis and object detection (Inception v3 [58]) (Fig. 2A).

Architecture
Advantages Disadvantages Dense Neural Network (DNN) • Can use all available information.
• Can represent any transformation between the input and output.
• Input and output can easily have any dimensions.
• The number of weights increases quickly with the number of layers and the dimension of the input.
• The input and output dimensions must be known in advance.

Convolutional Neural Network (CNN)
• Can be constructed with a limited number of weights.
• Highly effective at extracting local information from images.
• Analysis is position-independent.
• Cannot access global information.
• Can be computationally expensive.
• Difficult to retain an exact output shape.
Convolutional Encoder-Decoder (CED) • Only the number of features needs to be known in advance.
• Returns an image in output, which can be more interpretable by humans.
• Can be trained as an auto-encoder without annotated data.
• Positional information is lost during downsampling.
• Can be hard to annotate data.
• Only the number of features needs to be known in advance.
• Returns an image in output, which can be more interpretable by humans.
• Can quickly grow large.
• Forward concatenation layers disallow use as an auto-encoder.

Generative Adversarial Network (GAN) Experimental Data
Generator Discriminator • Can create very realistic images.
• Can be trained without annotated data.
• Very hard to train.
• The outputs are designed to look correct, not to be correct.
• Output quality very sensitive to the details of the architecture.
Table I.A comparison of common deep learning architectures.Advantages and disadvantages of deep learning architectures commonly employed for microscopy, i.e., the dense neural network (DNN), the convolutional neural network (CNN), the convolutional encoder-decoder (CED), the U-Net, and the generative adversarial network (GAN).For each model, we also show a miniature example of the architecture, where gray lines with orange circles represent dense layers, blue rectangles represent convolutional layers, red rectangles represent pooling layers, and magenta rectangles represent deconvolutional layers.The arrows depict the forward concatenation steps.
Image segmentation with deep learning.A Lung cell classification and mutation prediction [42], using Inception v3 [58]: Non-overlapping tiles in the input are analyzed, returning a low-resolution segmentation mask, containing either just a binary classification as tumor or healthy, or a complete prediction of the mutation type.B The U-Net architecture as originally proposed by Ronneberger [6] differs from the CED by the addition of forward concatenation steps between the encoder and decoder parts, which allow the network to forward positional information lost during encoding.The network is used to segment nearly overlapping cells.C A cell segmentation software, based on a model closely resembling the U-Net [59], can be automatically retrained by feeding it additional fluorescence images.
U-Nets were originally developed for cell segmentation, where one of the key requirements is to clearly mark the cell edges, such that neighboring cells can be distinguished [6] (Fig. 2B).
High-quality image segmentation annotations are timeconsuming to obtain, which is why many researches opt to design networks that can be retrained for a specific task using a much smaller dataset.For example, Sadanandan et.al. developed a neural network that can be automatically be retrained using fluorescently labeled cells [59] (Fig. 2C).With such an approach, the neural network can relatively easily be adapted to different experimental setups, even though the process requires some experimental effort in acquiring the additional training data.
Segmentation has also been used for three-dimensional images.For example, Li et.al. used a three-dimensional convolutional neural network to reconstruct the interconnections between biological neurons [71].Similar approaches have been employed also for the volumetric reconstruction of organs [72,73].

C. Image enhancement
Deep learning has been widely employed for image enhancement.This is particularly interesting because it permits also to perform tasks on images that would be extremely difficult or impossible to do with microscopy because of intrinsic physical limitations.
Deep learning has been employed to achieve superresolution by using diffraction-limited images to reconstruct images beyond the diffraction limit.For example, Ouyang et.al. trained a GAN to imitate the output of the standard super-resolution method PALM [74] significantly improving the resolution of fluorescence images [50] (Fig. 3A).
An interesting application of deep learning is to realize cross-modality analysis, where a neural network learns how to translate the output of a certain optical device to that of another.For example, Wu et.al. used a U-Net to translate between holography and brightfield microscopy, enabling volumetric imaging without the speckle and artifacts associated with holography [75] (Fig. 3B).This method uses experimental pairs of images collected simultaneously by two different optical devices.
Going one step further, deep learning can also be used to generate images that cannot be directly obtained from We see how the quality of the produced image increases with the acquisition time.B A GAN is used to transform holography images to brightfield [75].From the top to bottom, we see holographic images of pollen, backpropagated to the focal plane, and finally transformed into brightfield images.The bottom set of images are the real brightfield images for comparison.C A GAN is used to convert quantitative phase images (in-line holography) into virtual tissue stainings, mimicking histologically stained brightfield images [76].
the sample using optical devices, but would require some other kind of analysis of the sample.For example, Rivenson et al. used phase information obtained from holography to create a virtually stained sample corresponding to a histologically stained brightfield image [76] (Fig. 3C).
Image-enhancement techniques typically train networks using experimental images that do not need any manual annotation.Either the target is calculated using known methods, or it is collected simultaneously using an alternate path for the light.While this reduces the amount of manual labor required, both approaches have their drawbacks.A network trained to imitate a traditional method is unlikely to improve upon it on primary metrics.Instead, it can improve by allowing less ideal inputs, or decrease execution time.On the other hand, using a dual microscope will lead to networks specialized for the optical devices used to acquire the training images.Moreover, such a dual-purpose microscope is usually non-standard, so the user need to alter and customize their setup.

D. Particle tracking
Single particle tracking has become a crucial tool for probing the microscopic world.Standard approaches are typically limited by the complexity of the system: Higher particle densities, higher levels of noise, and more complex point-spread functions often lead to worse results.Developments using deep learning have shown that it is possible to largely overcome these limitations.A big advantage of deep-learning solutions for particle tracking is that often simulated data can be used to train the networks.
Newby et.al. demonstrated that deep learning can be used for the detection of particles in high-density, low-signal-to-noise-ratio images [40].Their method uses a small CNN to construct a pixel-by-pixel classification probability map of background versus particle (Fig. 4A).Standard algorithms can then be applied to this probability map to track the particles.
Helgadottir et.al. achieved a tracking accuracy surpassing that of traditional methods using a convolutional neural network with a dense top to detect particle centroids in challenging conditions [7] (Fig. 4B).
Along with particle localization, deep learning has also been used to measure other characteristics of particles.[40], where a relatively small network of three convolutional layers estimates a pixel-by-pixel probability map of background versus particle.B High-accuracy single particle localization using a CNN with a dense top [7].The network is scanned across the image to detect and localize all particles (dots) and bacteria (circles) in the image.C Particle tracking and characterization in terms of radius and refractive index using in-line holography images [46], where bounding boxes for each particle in the field of view are extracted and fed to a CNN.They showcase accurate measurements on data for particles between 0.5 µm and 1.5 µm.D Particle characterization in terms of radius and refractive index using off-axis holography images [45].A CNN with latent space temporal averaging is used to measure multiple observations of a single particle to improve accuracy.This allows characterization of particles down to around 0.2 µm.
For example, Altman et.al. used a convolutional neural network to measure the radius and refractive index of images of colloids acquired by an in-line holographic microscope [46] (Fig. 4C).Midtvedt et al. used an off-axis microscope and a time-average convolutional neural network to measure the radius and refractive index of even smaller particles [45] (Fig. 4D).

III. DEEPTRACK 2.0
In this section, we introduce DeepTrack 2.0, which is an integrated software environment to design, train, and validate deep-learning solutions for digital microscopy [87].DeepTrack 2.0 builds on the particle-tracking software package DeepTrack, which we introduced in 2019 [7], and greatly expands it beyond particle tracking to-wards a whole new range of quantitative microscopy applications, such as classification, segmentation, and cell counting.
To accommodate users with any level of experience in programming and deep learning, we provide access to the software through several channels, from a high-level graphical user interface that can be used without any programming knowledge, to scripts that can be adapted for specific applications, to a low-level set of abstract classes to implement new functionalities.Furthermore, we provide various tutorials to use the software at each level of complexity, including several video tutorials to guide the user through each step of a deep-learning analysis for microscopy: from defining the training image generation routine, to deciding the neural network model, to training and validating the network, to applying the trained network to real data.
As main entry point, we provide a completely standalone graphical user interface, which delivers all the power of DeepTrack 2.0 without requiring programming knowledge.This is available both for Windows and for MacOS [88].In fact, we recommend all users to start with the graphical user interface, which provides a visual approach to deep learning and an intuitive feel for how the various software components interact.
As more precise control is desired, we recommend the users to peruse the available Jupyter notebooks [87], which provide complete examples of how to write scripts for DeepTrack 2.0.
For most applications, DeepTrack 2.0 already includes all necessary components.However, if more advanced functionalities that are not already included are required, it is easy to extend DeepTrack 2.0 building on its framework of abstract objects and its native communication with the popular deep learning package Keras [89].In fact, we expect the most advanced users to expand the functionalities of DeepTrack 2.0 according to their needs.
All users are also welcome to report any bugs and to propose additions to DeepTrack 2.0 through its GitHub page [87].

A. Graphical user interface
The graphical user interface of DeepTrack 2.0 provides an intuitive way to perform the various steps that are necessary for the realization of a deep-learning analysis for microscopy.Through the graphical user interface, users can define and visualize image generation pipelines (Fig. 5A), train models (Fig. 5B-C), and analyze experimental data (Fig. 5D).
A typical workflow is the following: 1. Define the image generation pipeline, e.g., a pipeline to generate images of a particle corrupted by noise.
2. Define the ground-truth training target, e.g., the particle image without noise (image target), or the particle position (numeric target).
4. Train and evaluate the deep learning model.

5.
Apply the deep learning model to the user's experimental data.
Projects realized with DeepTrack 2.0 can be saved and subsequently loaded, which is useful for archival purposes as well as to share deep learning models and results between users and platforms.Furthermore, projects can be automatically exported as Python code, which can then be executed to train a network command-line or imported into an existing project.
At a more advanced level, it is possible to extend the capabilities of the graphical user interface by adding new Python-coded objects.We envision that this possibility will motivate users to create and share additional software compatible with DeepTrack 2.0.

B. Scripts
We provide several Jupyter notebooks both as examples of how to write scripts using DeepTrack 2.0 and as a foundation to create customized solutions.To facilitate this, we provide several video tutorials [87], which detail how the solutions are constructed and how they can be modified.

C. Code
The software architecture of DeepTrack 2.0 (Fig. 6) is built on four main components: features, properties, images, and deep-learning models: Features: They are the foundations on which Deep-Track 2.0 is built.They receive a list of images as input, and either apply some transformation to all of them (e.g., adding noise), or add a new image to the list (e.g., adding a scatterer), or merge them into a single image (e.g., imaging a list of scatterers through an optical device).By defining a set of features and how they connect, we produce a single feature that defines the entire image creation process.
Properties: They are the parameters that determine how features operate.For example, a property can control the position of a particle, the intensity of the added noise, or the amount by which an image is rotated.They can have a constant value or be defined by a function (e.g., to place a particle at a random position), which can depend on the value of other properties, either from the same feature or from other features.
Images: They are the objects on which the features operate.They behave like n-dimensional NumPy arrays (ndarray) and can therefore be directly used with most Python packages.They contain a list of the property values used by the features that created the image, which can be used to generate ground-truth labels for training neural networks as well as to evaluate how the error in deep-learning models depends on the properties defining the image (e.g., signal-to-noise ratio, background gradient, illumination wavelength).
Models: They are the deep-learning models.A series of standard models is already provided in Deep-Track 2.0, including models for DNNs, CNNs, U-Nets, and GANs.In each case, the parameters of the model (e.g., number of layers and number of artificial neurons) can be defined by the user.
DeepTrack 2.0 solutions depend on the interactions between these objects.In general, there are three distinct typical operations a feature can perform.The first operation is to add an image to a list of images (notably Scatterers): Here, a list of n images are fed to the feature F .Each of these images has a list of properties P i , which describe the process used to create that image.The feature is controlled by some properties P , and returns a new list of images.The first n images are unchanged, but a new image I is appended to the end, on which the properties P are imprinted.
The second operation is to transform all images in the list in some way (the standard behavior of features, including noise, augmentations and most mathematical op-erations): Here, the feature returns a list of the same length, but each image is altered (e.g., some noise is added or it is rotated).The properties characterizing this alteration P are imprinted on all images.
The third operation is to merge several images into a single image: Here, all the properties of the input images, as well as the feature's own properties, are imprinted on the resulting image (notably optical devices).
A typical complete image generation pipeline can look something like: Here, the start is an empty list.Two initial features (F s1 , F s2 ) append images to that list, creating a list of two images (I s1 , I s2 ) (e.g., these could be two scattering particles int eh field of view).Each such image is modified by a feature F (e.g., by adding some noise), before being merged into a single image by F o (e.g., representing the output of a microscope).Note that P is not added to the list of properties twice; the list is in fact a set, and cannot contain duplicate properties.
We show an even more concrete example in Fig. 6A.Here, we have an initial feature Ellipse which creates a single image of an ellipse.We follow this by the feature Duplicate, which creates a fixed number of duplicates (here five).(Note that Duplicate duplicates the feature Ellipse, not the generated image, which is why it can create several different ellipses, i.e., with different radius, intensity, or in-image position.)This list of images is sent to the feature Fluorescence, which images them through a simulated fluorescence microscope.After this, a background offset is added, and Poisson noise is introduced.
Since the positions of all ellipses are stored as properties, they are imprinted on the final image.This allows us to create a segmented mask, shown in Fig. 6B, which we can use as ground-truth label to train the deep-learning model.These two image creation pipelines (one for the data, one for the label), are passed to a generator which continuously creates new images by updating the properties that control the features using user-defined update rules.These images are fed to a neural network to train it (Fig. 6C), which results in a trained model that can analyze experimental data (Fig. 6D).
Writing code directly using the DeepTrack 2.0 framework allows the user to extend the capabilities of the package.In most cases, it is sufficient to use the Lambda feature, which allows any external function to be incorporated into the framework.However, certain scenarios may require the user to write custom features.For example, the user can extend the feature Optics (features that simulate optical devices) to create a new imaging modality, the feature Scatterer (features that represent some object in the sample) to create a custom scatter, or the feature Augmentation (features that augment an image to cheaply broaden the input space) to expand the range of available augmentations.It is also straightforward to add new neural-network models: any Keras model can be directly merged with DeepTrack 2.0 without any configuration, while models from other packages can easily be wrapped.
To help users get started writing code using Deep-Track 2.0, we provide several comprehensive video tutorials [87], ranging in scope from implementing custom features to writing complete solutions.

IV. CASE STUDIES
In this section, we use DeepTrack 2.0 to exemplify how deep learning can be employed for a broad range of microscopy applications.We start with a standard benchmark for image classification: the MNIST digit recognition challenge [90] (Section IV A).Afterwards, we employ Deeptrack 2.0 to analyze microscopy images.First, we develop a model to track particles whose images are acquired by brightfield microscopy, training a single-particle tracker whose accuracy surpasses standard algorithmic approaches especially in noisy imaging conditions [7] (Section IV B).Then, we expand this example to also extract quantitative information about the particle, namely its size and refractive index (Section IV C).Deep learning is especially powerful in tracking multiple particles in noisy environments.As a demonstration of this, we develop a model that can detect quantum dots on a living cell imaged by fluorescence microscopy (Section IV D).Again, we expand this example to demonstrate three-dimensional tracking of multiple particles whose images are acquired using holography (Section IV E).We also develop a neural network to count the number of cells in fluorescence images (Section IV F).Finally, we train a GAN to create synthetic images of cells from a semantic mask (Section IV G).All these examples are available both as project files for DeepTrack 2.0 graphical user interface [87] and as Jupyter notebooks [88], and they are complemented by video tutorials.

A. MNIST digit recognition
Recognizing hand-written digits of the MNIST dataset is a classical benchmark for machine learning [90].The task consists of recognizing handwritten digits from 0 to 9 in 28 × 28 pixel images.In the dataset, there are 6 • 10 4 training images and 1 • 10 4 validation images, some examples of which are provided in Fig. 7A.
Since is a relatively simple task, we employ a DNN (Fig. 7B).The architecture of the networks is that of Ciersan et.al., which has achieved the best results using DNNs amongst the attempts listed on the MNIST webpage [90].As a loss function, we use categorical crossentropy, which is a standard loss function for classification tasks.
We train the network using the 6 • 10 4 training images augmented using affine transformations and elastic distortions, exemplified in Fig. 7C.We train it for 1000 epochs, where one epoch represents one pass through all training images.This results in a validation loss of 0.05, as compared to a training loss of 0.20 (Fig. 7C).The higher training loss is likely due to the augmentations, which make the training data harder than the validation data.It is, as such, unlikely that the network has overfitted the training set.
Finally, we validate the trained network on the 1 • 10 4 validation images.The network achieves an accuracy of 99.34%, which is in line with the best performance achieved by DNN on the MNIST digit recognition task [90].The confusion matrix (Fig. 7D) shows that the incorrectly classified digits consist mainly of 9s classified as 7s and of 4s classified as 9s.

B. Particle localization
Determining the position of objects within an image is a fundamental task for digital microscopy.In this example, we aim at localizing with very high accuracy the position of an optically trapped particle.Two videos are captured of the same particle in the same optical trap, one with good image quality and one with poor image quality, from which we want to extract the particle's x and y positions (Fig. 8A).
To analyze these images, we first use a CNN to transform the 51 × 51 pixel input image to a 6 × 6 × 64 tensor.Subsequently, we pass this result to a DNN, which  outputs an estimate of the particle's in-plane position (Fig. 8B).This model is based on the one described by Helgadottir [7].We use mean absolute error (MAE) as the loss function.(Alternatively, we could also use mean squared error (MSE), which delivers equally accurate results.) The network is trained purely on synthetic data generated using DeepTrack 2.0.The generation of synthetic microscopy data for training a network generally entails the following steps.First, the optical properties of the instrument used to capture the data is replicated (e.g., NA, illumination spectra, magnification, and pixel size).This ensures that the simulated point-spread function of the simulated optical system closely matches the exper-imental setup.Second, the properties of the sample are specified, including the radius and refractive index of the particle.As a final step, noise is added to the simulated images to be representative of experimental data.During training, each parameter of the simulation (e.g., the optical properties, the sample properties, and the noise strength) is stochastically varied around the expected experimental values to make the network more robust.In Fig. 8C, we show a few outputs from the image generation pipeline with SNR increasing from left to right.We also demonstrate that the network outperforms the radial center algorithm [23].This is achieved by training the network for 110 epochs, on a set of 1 • 10 4 synthetic images.The validation set consisted of 1000 images.We can see that the loss is still decreasing at this point, but the gain is minimal (Fig 8C).No signs of overfitting can be seen.
Finally, we use the network to track the two videos of the optically trapped particle.In Fig. 8D, we see that in the low-noise video, the radial center method and the DeepTrack model agree, while, for the high noise-video, the radial center method makes large, sporadic jumps.Since the videos are of the same particle in the same optical trap, we expect the dynamics of the particle to be similar.Since only the DeepTrack method gives consistent dynamics in the two cases, it indicates that Deep-Track is better able to track this more difficult case.A more detailed discussion of this example can be found in Ref. [7].

C. Particle characterization
Microscopy images contain quantitative information about the morphology and optical properties of the sample.However, extracting this information using conventional image analysis techniques is extremely demanding.Deep learning has proven to offer a remedy to this [45,46].In this example, we employ DeepTrack 2.0 to develop a model to quantify the radius and refractive index of particles based on their complex-valued scattering patterns.As experimental verification, we record the scattering patterns of a heterogeneous mix of two particle populations (150 nm polystyrene and 230 nm polystyrene bead flowing in a microfluidic channel) using an off-axis holographic microscope.
In line with the previous example, we use a CNN to downsample the 64 × 64 × 2 pixel input (the two channels corresponding to the real and the imaginary parts of the field) to a 8 × 8 × 128 tensor.Subsequently, we pass this tensor to a DNN, which outputs an estimate of the particle's radius and refractive index (Fig. 9B).The number of channels in each layer is doubled compared to the previous example, which may help capture subtle changes in the scattered field.We used MAE loss.
To account for imperfections in the experimental system, we approximate the experimental PSF for the simulated images by adding coma aberrations with random strength.In Fig. 9C we show three examples of outputs from the image generation pipeline.The network is trained for 110 epochs on a set of 1•10 4 synthetic images.The validation set consists of 1000 images.The validation loss diverges from the training loss after only 20 epochs suggesting that the training could be terminated earlier, or that a larger training set could be beneficial.
Finally, we evaluate the model on the experimental dataset.In each frame, all particles are roughly localized using a standard tracking algorithm, and focused using the criteria described in Ref. [91].These observations are subsequently linked frame to frame to form traces.We use the fact that we observe each particle multiple times to improve the accuracy of the sizing.Specifically, we predict the size and refractive index of a particle using an image from each observation of that particle.We then average the result to obtain the final prediction for that particle.(This deviates slightly from the method proposed in [45] where the averaging is performed in the latent space, which result in a more complex and accurate method.)We can see the results in Fig. 9D, showing the radius versus the refractive index of each measured particle.We clearly distinguish two populations, which closely match the modal characteristics of the particles (shown by the two circles).

D. Multiparticle tracking
The previous examples have been focused on analyzing a single particle at a time.Frequently, however, microscopy involves detecting multiple particles at once.In this example, we extract the positions of quantum dots situated on the surface of a living cell.A small slice of an image is shown in Fig. 10A, with each particle circled in white.
We train a U-Net model to transform the input into a binarized representation, where each pixel within 3 pixels of a particle in the input is set to 1, and every other pixel is set to 0, as shown in Fig. 10B.The network returns a probability for each pixel, which is thresholded into the binary representation.(Note that in this example we can use a network that is smaller than the original U-Net because the information is highly localized; however, if, for example, the data were aberrated, a deeper network would be better.)The network is compiled with a binary cross-entropy loss.
The network is trained purely on synthetic data, simulating the appearance of a quantum dot as the PSF of the optical device.In Fig. 10C, we show several examples of the outputs from the image generation pipeline.The network is trained on 2000 128 × 128 pixel images in two sessions.The first session consists of 10 epochs where the loss is weighted such that setting a pixel value of 1 to 0 is penalized 10 times more than setting a value of 0 to 1.This helps the network avoid the very simple local optimum of setting every pixel to 0. For the following 100 epochs, the two types of errors are penalized equally.This explains the sudden change of training rate after 10 epochs seen in Fig. 10C.The validation set consists of 256 images, and shows no signs of overfitting.
In Fig. 10D, we show a single image tracked using the trained network.It detects all obvious particles, as well as a few that are hard to verify as real observations.However, for most such cases, they are detected again the next frame, indicating that it is a real observation instead of just random noise.(However, this method to verify the tracking is not conclusive, since quantum dots are known to frequently flicker, meaning that they are not guaranteed to be visible in the next frame.Conversely, two observations in a row do not necessarily mean that it is a real observation.It can be a product of optical effects that are consistent between frames.)

E. 3D multiparticle tracking
Similarly to single particle analysis, multi-particle analysis can be extended to extract quantitative information about the particles.In this example, we locate spherical particles in 3D space from the intensity of the scattered field captured by an in-line holographic microscope (Fig. 11A).In order to be able to validate the out-of-plane positioning with ground-truth experimental data, we capture the experimental data using an off-axis holographic microscope.This allows us to accurately track the particles in 3D space using standard methods [91].Off-axis holographic microscopes, unlike the inline counterpart, retrieve the complex field instead of just the field intensity.We approximate the conversion from off-axis to in-line holographic microscopy by squaring the amplitude of the Similarly to the previous example, we represent each particle in the input by a region of pixel values of 1s in the output.The difference is that this network returns a volume, with each particle instead represented by a sphere with radius 3 pixels (Fig. 11B).The last dimension of the output represents the out-of-plane position of the particle, ranging from 2 µm to 30 µm.The network is slightly larger than the previous example, since it needs to extract more information about the particles.Just as in the previous example, binary cross-entropy is used as loss function.
The network is trained purely on synthetic data.We replicate the optical properties of the instrument used to capture the data, simulating the appearance of a particle using Mie theory.Each parameter of the simulation is stochastically varied around the experimentally expected values, making the network more robust.Additionally, we approximate the experimental PSF by adding coma aberrations with random strength.In Fig. 11C, we show a few images from this pipeline.The network is trained for 100 epochs on a set of 1 • 10 3 synthetic images.The validation set consisted of 256 images, and diverges from the training loss after 10 epochs, suggesting that it could be terminated earlier, or that a larger training set could be beneficial (Fig. 11C).
In Fig. 11D, we show a single particle tracked in three dimensions, with the position found using the trained network in orange and the off-axis method in gray.The two methods overlap almost exactly.Moreover, we also show the out-of-plane positioning found by the off-axis method and the trained model for all detections.We see that most observations fall very close to the central line, with few outliers.Moreover, we see a divergence from the central line at the edges of the range.This is due a limitation of how the position is extracted from the binarized image: A sphere close to the edge of the volume will not be entirely contained within the image, so that its centroid will not be the center of the sphere, resulting in a bias.

F. Cell counting
DeepTrack 2.0 is not limited to particle analysis.Counting the number cells in an image has traditionally been a tedious task performed manually by trained experts.In this example, we count the number of U2OS cells (cells cultivated from the bone tissue of a patient suffering from osteosarcoma [92]) in fluorescence images shown in Fig. 12A.We use the BBBC039 dataset for evaluation [93].
We once again use a U-net model.This time, we represent each cell by a Gaussian distribution with a standard deviation of 10 pixels, whose intensity values integrate to one (Fig. 12B).In this way, the integral of the output intensity corresponds to the number of cells in the image.By representing the cell by a Gaussian profile, we also reduce the emphasis on the absolute positioning of the cell, while retaining the ability for a human to validate the output visually.We compile the network using MAE loss.
The training data consists of synthetic data generated by imaging cell-like objects through a simulated fluorescence microscope.A few example cells, as well as a single training input-output pair is shown in Fig. 12C.The network is trained for 190 epochs on a set of 1000 synthetic images.Since the training set of the BBBC039 dataset is not used for training, we merge the training set and the validation set and use the merged set for validation.The validation loss is consistently higher than the training loss, but follows a very similar curve (Fig. 12C).This suggests that the synthetic data is a decent approximation of the experimental images.The offset can be largely explained by a few images in the validation set that are particularly hard for the network.
For large images, errors can average out, which can result in deceptively accurate counting.To eliminate this concern, we show the predicted number of cells versus the true number of cells for smaller slices of images (256 × 256 pixels) in the BBBC039 dataset in Fig. 12D.The network predicts the correct number of cells within just a few percent.As a comparison, we show that the images cannot be analyzed by simply integrating the intensity of the input images (Fig. 12D).In order to show a best case scenario for the sum-of-pixels method, we transformed each sum by an affine transformation that minimizes the square error on the test set itself.It is apparent that this is not sufficient to achieve an acceptable counting accuracy.

G. GAN image generation
DeepTrack 2.0 can also efficiently handle more cases where the training set is derived directly from experimentally captured data, instead of being simulated.In this example, we combine the two approaches, by using ex-perimental data to train a GAN to create new data from a semantic representation of the image (Fig. 13A).More specifically, the GAN creates images of the drosophila melanogaster third instar larva ventral nerve cord from a semantic representation of background, membrane, and mitochondria.This GAN, once trained, can subsequently be used as a part of an image simulation pipeline, just as any other DeepTrack feature.
The architecture of the neural network we employ is shown in Fig. 13B.The model is composed of a generator that learns the mapping relation between the input mask and its corresponding cell-image, and of a discriminator that, given the semantic segmentation, determines if the generated image plausibly could have been drawn from a real sample.A conditional GAN is used to create cell images from a semantic mask.A Example masks (left) from which images of drosophila melanogaster third instar larva ventral nerve cord (right) are generated using the segmented anisotropic ssTEM dataset [94].B The network architecture is a conditional generative adversarial network.The generator transforms an input semantic mask into a realistic cell image, using a U-Net architecture with the most condensed layer being replaced by two residual network blocks [95].The discriminator is designed similar to the PatchGan discriminator [96], and receives both the mask and an image as an input.The generator is trained using a MAE loss between the experimental image and the generated image, as well as a MSE loss on the discriminator output.Conversely, the discriminator is trained with a MSE loss.C Examples of masks and corresponding experimental images.The loss of the generator (left) and of the discriminator (right) are shown over 1000 training epochs, each of which consists of 16 mini-batches of 7 samples.We see that the generator loss increases towards the end of the training, a signature that continuing training beyond this point destabilizes the generator.D Masks images from a validation set, and corresponding generated image and real image.The generated images are qualitatively similar to the real images.
The generator follows a U-Net design with symmetric encoder and decoder paths connected through skip connections.The encoder consists of convolutional blocks followed by strided convolutions for downsampling.Each convolutional block contains two sequences of 3 × 3 convolutional layers.At each step of the encoding path, we increase the number of feature channels by a factor of 2.
The encoder connects to the decoder through two residual network (ResNet) blocks [95], each with 1024 feature channels.For upsampling, we use bilinear interpolations, followed by a convolutional layer (stride = 1).This operation is followed by concatenation with the corresponding feature map from the encoding path.Fur-thermore, we add two convolutional blocks with 16 feature channels at the final layer of the decoder.We use a 1 × 1 convolutional layer to map each 16-component feature vector to the output image.Herein, the hyperbolic tangent activation (tanh) is employed to transform the output to the range [−1, 1].Every layer in the generator, except the last layer, is followed by an instance normalization (alpha = 2) and a LeakyRelu activation layer.
The discriminator follows a PatchGan architecture [96], which divides the input images into overlapping patches and classifies each path as real or fake, rather than using a single descriptor.This splitting arises nat-urally as a consequence of the discriminator's convolutional architecture [57].The discriminator's convolutional blocks consist of 4 × 4 convolutional layers with a stride of 2, which decrease the input resolution to half the width and height.In all layers, we use instance normalization (with no learnable parameters) and LeakyRelu activation.Finally, the network outputs an 8 × 8 singlechannel tensor containing the predicted probability for each path.
The training data consists of experimental data from the segmented anisotropic ssTEM dataset [94].Each sample is normalized between -1 and 1, and augmented by mirroring, rotating, shearing, and scaling.Moreover, a Gaussian random noise with standard distribution randomly sampled (per image) between 0 and 0.1 is added to the mask.Adding noise to the mask qualitatively improves the image quality.Specifically, without adding noise, the network is prone to tiling very similar internal structures, especially far away from the border of a mask.This occurs because there is no internal structure in the input, making two nearby regions of the input virtually identical from the point of view of the network.By introducing some internal structure to the mask in the form of noise, we help the network distinguish otherwise very similar regions in the input.An additional benefit is that it is the possible to generate many images from a single mask, just by varying the noise.A few example training input-output pairs are shown in Fig. 12C.For this example, we define the loss functions of the generator as, l G = γ•MAE {z label , z output }+(1 − D(z output )) 2 , and discriminator as, l D = D (z output ) 2 +(1 − D(z label )) 2 , where D(•) denotes the discriminator network prediction, z label refers to the ground truth cell-image, and z output is the generated image.Note that the generator loss function, l G , aims to minimize the MAE between the generator output image and its target, based on the regularization parameter γ set to 0.8.For training, we use the Adam optimizer with a learning rate of 0.0002 and β 1 = 0.5 for 1000 epochs, each of which consisting of 16 mini-batches.
The resulting model is able to create new images from masks it has never seen before.We show five such cases in Fig. 13D.The generated images are not identical to the real cell images in terms of texture and appearance, which is expected since the masks only contain spatial information about the cells' structures.However, the generated images are qualitatively similar to images from the experimental dataset.

V. OUTLOOK
The adoption of new deep-learning methods for the analysis of microscopy data is extremely promising, but it has been hampered by difficulties in generating highquality training datasets.While manually annotated experimental data ensures that the training set is representative of the validation set, it is not guaranteed that the trained network can correctly analyze data obtained with another setup or annotated by another operator.Moreover, it limits the network to human-level accuracy, which is not sufficient for tasks requiring higher level of accuracy, such as single-particle tracking.Synthetically generated data bypasses these issues because the ground truth can be known exactly, and the networks can be trained with parameters that exactly match each user's setup.
Thanks to the increasing available inference speed, it will become easier to perform real-time analysis of microscopy data.This can be used to make real-time decisions, from simple experiment control (e.g., such as controlling the sample flow speed) to more complex decisions (e.g., real-time sorting and optical force feedback systems).For example, one could imagine a completely automated experimental feedback system that applies optical forces to optimize imaging parameters and to acquire the best possible measurements of the quantities of interest.
In this article, we have introduced DeepTrack 2.0, which provides a software environment to develop neuralnetwork models for quantitative digital microscopy, from the generation of training datasets to the deployment of deep-learning solutions tailored to the needs of each user.We have shown that DeepTrack 2.0 is capable of training neural networks that perform a broad range of tasks using purely synthetic training data.For tasks where it is infeasible to simulate the training set, DeepTrack 2.0 can augment images on the fly to expand the available training set.Moreover, DeepTrack 2.0 is complemented by a graphical user interface, allowing users with minimal programming experience to explore and create deep learning models.
We envision DeepTrack 2.0 as a open-source project, where contributors with different areas of expertise can help improve and expand the framework to cover the users' needs.Interesting possible directions for the future expansion of DeepTrack 2.0 can, for example, provide tools for the analysis of time sequences using recurrent neural networks, understand physical processes using reservoir computing, and even support physical implementations of neural networks for greater execution speed and higher energy efficiency.
Deep-learning has the potential to revolutionize how we do microscopy.However, there are still many challenges to overcome, not least of which figuring out how to obtain enough training data for the model to generalize.We believe that physical simulations will play a crucial part in overcoming this roadblock.As such, we strongly encourage researchers and community collaborators to contribute with objects and models in their area of expertise: from specialized in-sample structures and improved optics simulation methods, to new and exciting neural network architectures.

Figure 1 .
Figure 1.Brief history of quantitative microscopy and particle tracking.A-B 1910-1950: The manual analysis era.A Examples of manually tracked trajectories of colloids in a suspension from Perrin's experiment that convinced the world of the existence of atoms[1].The time resolution is 30 seconds.B Kappler manually tracked the rotational Brownian motion of a suspended micromirror to determine the Avogadro number[2].C-E 1951-2015: The digital microscopy era.C Causley and Young developed a computerized microscope to count particles and cells using a flying-spot microscope and an analog analysis circuit[3].D Geerts et.al. developed an automatized method to track single gold nanoparticles on the membranes of living cells[4].E Crocker and Grier kickstarted modern particle tracking, achieving high accuracy using a largely setup-agnostic approach[5].F-I 2015-2020: The deep-learning-enhanced microscopy era.F Ronnerberger et.al. developed the U-Net, a variation of a convolutional neural network that is particularly suited for image segmentation and has been very successful for biomedical applications[6].G Helgadottir et.al. developed a software to track particles using convolutional neural networks (DeepTrack 1.0) and demonstrated that it can achieve higher tracking accuracy than traditional algorithmic approaches[7].J This article presents DeepTrack 2.0, which provides an integrated environment to design, train and validate deep learning solutions for quantitative digital microscopy.

Figure 3 .
Figure 3. Image enhancement with deep learning.A Fluorescence superresolution localization microscopy using deep learning[50]: It uses sparse PALM[74] (optionally together with a widefield (WF) image), to construct a super-resolved image.We see how the quality of the produced image increases with the acquisition time.B A GAN is used to transform holography images to brightfield[75].From the top to bottom, we see holographic images of pollen, backpropagated to the focal plane, and finally transformed into brightfield images.The bottom set of images are the real brightfield images for comparison.C A GAN is used to convert quantitative phase images (in-line holography) into virtual tissue stainings, mimicking histologically stained brightfield images[76].

Figure 4 .
Figure 4. Particle tracking with deep learning.A Particle detection in dense images of varying diffraction patterns[40], where a relatively small network of three convolutional layers estimates a pixel-by-pixel probability map of background versus particle.B High-accuracy single particle localization using a CNN with a dense top[7].The network is scanned across the image to detect and localize all particles (dots) and bacteria (circles) in the image.C Particle tracking and characterization in terms of radius and refractive index using in-line holography images[46], where bounding boxes for each particle in the field of view are extracted and fed to a CNN.They showcase accurate measurements on data for particles between 0.5 µm and 1.5 µm.D Particle characterization in terms of radius and refractive index using off-axis holography images[45].A CNN with latent space temporal averaging is used to measure multiple observations of a single particle to improve accuracy.This allows characterization of particles down to around 0.2 µm.

Figure 5 .Figure 6 .
Figure 5. DeepTrack 2.0 graphical user interface.A The main interface: 1: The image generation pipeline is defined using drag and drop components.2: An image created using the pipeline and the corresponding label are shown.3: A comparison image is also shown to help ensure that the generated image is similar to experimental images.B The training loss and validation loss over time can be monitored in real time during training.It is also possible to monitor custom metrics, or any metric as a function of some property of the image (e.g., particle size, signal-to-noise ratio, aberration strength).C The model prediction on individual images in the validation set can be compared to the corresponding target in real time during training, providing another way to concretely visualize the improvement of the model performance over time.D Finally, the model can be evaluated on experimental images also during training, which can help quickly hone in on a model that correctly handles specific experimental data.

Figure
Figure A dense network to classify hand-written digits.A Three example images from the MNIST dataset with their corresponding labels.B The network architecture consists of five fully connected layers of decreasing size, with the final layer having 10 nodes, whose outputs correspond to classification probabilities.C Examples of augmented training images: The network is trained on a set of 6 • 10 4 28 × 28 pixel images augmented by translations, rotations, shear, and elastic distortions, using a categorical cross-entropy loss.The validation loss (magenta line) is significantly lower than the training loss (orange line), likely due to augmentations making the training set harder than the validation set.D Confusion matrix showing how the 1 • 10 4 validation images are classified by the network: The diagonal represents the correctly classified digits, constituting the vast majority of digits.The off-diagonal cells represent incorrectly classified digits.

Figure 8 .
Figure 8.A convolutional neural network to track a single particle.A Frames of the same particle held in the same optical trap, but with different illumination which results in a low-noise video (left) and a high-noise video (right).B The network architecture consists of 3 convolutional layers, each followed by a pooling layer.The resulting tensor is flattened and passed through three fully connected layers, which return the predicted x and y position of the particle.C Five examples of the outputs of the image generation pipeline at increasing signal-to-noise ratio (SNR).The pixel tracking error for 1000 images using the DeepTrack model (orange markers) and the radial-center algorithm (gray markers).The DeepTrack model systematically outperforms radial center, especially for low SNR.The model was trained for 110 epochs, on a set of 1 • 10 4 synthetic images.The validation loss (magenta line) and the training loss (orange line) remain similar for the whole training session.D The predicted position of the particle in the low-noise video (top panel) and the high-noise video (bottom panel) as found by the radial center algorithm (gray line) and by the DeepTrack model (dotted orange line).In the low-noise case, they overlap within a fraction of a pixel, while for the high-noise case, the radial center algorithm produces erratic predictions.

Figure 9 .
Figure9.A convolutional neural network to measure the radius and refractive index of a single particle.A The real and imaginary parts of the scattered field are used to measure the radius and refractive index of a single particle.The field is captured using an off-axis holographic microscope and numerically propagated such that the particle is in focus.The total datasets consists of roughly 8 • 10 3 such observations, belonging to 352 individual polystyrene particles with 150 nm or 230 nm radius.B The network architecture consists of 3 convolutional layers, each followed by a pooling layer.The resulting tensor is flattened and passed through three fully connected layers which return the predicted radius and refractive index of the particle.C Three pairs of real and imaginary parts of the scattered field from a single particle.The network is trained for 110 epochs on 1000 64 × 64 pixel images, using MAE loss.The validation loss (magenta line) stops decreasing significantly after only 20 epochs, while the training loss (orange line) keeps decreasing.D Measured radius versus measured refractive index for an ensemble of particles.There are two clearly distinguished populations, which closely match the modal characteristics of the particles (shown by the two circles).

Figure 10 .
Figure 10.A U-Net to detect quantum dots in fluorescence images.A A small slice of an image depicting quantum dots situated on a living cell, imaged through a fluorescence microscope (data kindly provided by Carlo Manzo).The quantum dots in the image are circled in white.B The network architecture is a small U-Net.A final convolutional layer outputs a single image, where each particle in the input is represented by a circle of 1s.C Examples of synthetic images used in the training process.The network is trained on 2000 128 × 128 pixel image for 110 epochs using binary cross-entropy loss.The validation loss (magenta line) and the training loss (orange line) are similar in magnitude for the entire training session.After 10 epochs both losses start decreasing more rapidly, which is explained by a change in the weighting in the loss function, which is explained in the text.D A single frame tracked using the trained model.It detects all obvious particles, as well as a few that are hard to conclusively verify as real observations.

Figure 11 .
Figure 11.A U-Net to track spherical particles in three dimensions.A A sample network input, consisting of scattering patterns of several spherical particles.The sample contains a mixture of 150 nm and 230 nm polystyrene particles.B The network architecture is a small U-Net.A final convolutional layer outputs a volume, where each particle in the input is represented by a sphere of 1s.The out-of-plane direction spans 2 µm to 30 µm C Examples of synthetic images used in the training process.The network is trained on 2000 256 × 256 pixel images for 100 epochs using a binary cross-entropy loss.The validation loss (magenta line) diverges from the training loss (orange line) after roughly 10 epochs.D A single particle tracked using the DeepTrack model (dotted orange line) and off-axis holography (gray line), showing the x, y, and z positions over time.The two methods almost perfectly overlap.Moreover, we show the predicted out-of-plane position of all detections as found using the DeepTrack model versus the off-axis holography.Most observations fall close to the central line, with a few outliers and some deviations near the edges of the range.

Figure 12 .
Figure 12.A U-Net to count cells in a fluorescence image.A Two slices from the BBBC039 dataset, with the corresponding number of cells in the image.B The network architecture is a small U-Net.A final convolutional layer outputs an image with a single feature, where each cell in the input is represented by a Gaussian distribution with a standard deviation of 10 pixels and whose intensity integrates to one.Thus, the integration of the intensity of the output corresponds to the number of cells in the image.C Examples of the cell images created by the image simulation pipeline, followed by a sample input-output pair containing six cells.The network is trained on 1000 256 × 256 pixel images, and validated on 150 512pixel × 688pixel images, using MAE loss.The validation loss (magenta line) is consistently higher than the training loss (orange line), but follows a similar curve.D The number of cells as found by the DeepTrack model compared to a naive approach based on the summation of the values of the pixels of the image.Each data point represents a 256 × 256 pixel slice of one of the 50 images in the test set.Three points are circled and have their corresponding input-output pair shown on the right.
Figure 13.A conditional GAN is used to create cell images from a semantic mask.A Example masks (left) from which images of drosophila melanogaster third instar larva ventral nerve cord (right) are generated using the segmented anisotropic ssTEM dataset[94].B The network architecture is a conditional generative adversarial network.The generator transforms an input semantic mask into a realistic cell image, using a U-Net architecture with the most condensed layer being replaced by two residual network blocks[95].The discriminator is designed similar to the PatchGan discriminator[96], and receives both the mask and an image as an input.The generator is trained using a MAE loss between the experimental image and the generated image, as well as a MSE loss on the discriminator output.Conversely, the discriminator is trained with a MSE loss.C Examples of masks and corresponding experimental images.The loss of the generator (left) and of the discriminator (right) are shown over 1000 training epochs, each of which consists of 16 mini-batches of 7 samples.We see that the generator loss increases towards the end of the training, a signature that continuing training beyond this point destabilizes the generator.D Masks images from a validation set, and corresponding generated image and real image.The generated images are qualitatively similar to the real images.