Linking information theory and thermodynamics to spatial resolution in photothermal and photoacoustic imaging

In this tutorial, we combine the different scientific fields of information theory, thermodynamics, regularization theory and non-destructive imaging, especially for photoacoustic and photothermal imaging. The goal is to get a better understanding of how information gaining for subsurface imaging works and how the spatial resolution limit can be overcome by using additional information. Here, the resolution limit in photoacoustic and photothermal imaging is derived from the irreversibility of attenuation of the pressure wave and of heat diffusion during propagation of the signals from the imaged subsurface structures to the sample surface, respectively. The acoustic or temperature signals are converted into so-called virtual waves, which are their reversible counterparts and which can be used for image reconstruction by well-known ultrasound reconstruction methods. The conversion into virtual waves is an ill-posed inverse problem which needs regularization. The reason for that is the information loss during signal propagation to the sample surface, which turns out to be equal to the entropy production. As the entropy production from acoustic attenuation is usually small compared to the entropy production from heat diffusion, the spatial resolution in acoustic imaging is higher than in thermal imaging. Therefore, it is especially necessary to overcome this resolution limit for thermographic imaging by using additional information. Incorporating sparsity and non-negativity in iterative regularization methods gives a significant resolution enhancement, which was experimentally demonstrated by one-dimensional imaging of thin layers with varying depth or by three-dimensional imaging, either from a single detection plane or from three perpendicular detection planes on the surface of a sample cube.


I. INTRODUCTION
Metrology can be described as collecting information from samples. Imaging techniques provide a lot of information, as is indicated by the saying "a picture is worth a thousand words". Non-destructive evaluation (NDE) or biomedical imaging often image deep structures in the samples interior, without destroying the samples. Therefore, an information theoretical viewpoint is instructive to determine the amount of information on subsurface or other embedded structures, which can be gained by measurements on the sample surface. Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information. Information theory was developed by Shannon 1 to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Information processing is a physical activity, which has to obey the laws of (non-equilibrium) thermodynamics. This was originally recognized by Szilárd 2 , and Landauer 3 captured it with his aphorism: "Information is physical".
Imaging of subsurface features from data measured on the sample surface usually results in an ill-posed or an illconditioned inverse problem, which needs regularization [4][5][6] , as illustrated in Fig. 1 for photoacoustic and photothermal imaging. Regularization typically involves additional assumptions, such as the smoothness of the solution. Usually, scia) Electronic mail: peter.burgholzer@recendt.at entists working on the inverse problem of subsurface imaging are barely working on non-equilibrium thermodynamics and information theory, and vice versa. This tutorial should enable researchers from both sides to get more insight from the counterpart. The used models for subsurface imaging and for thermodynamics are kept simple in the beginning, allowing to investigate the essential connections between thermodynamics and regularization of inverse problems. As a first step, a one-dimensional (1D) model without boundaries and a Dirac delta-like structure at a varying depth is used for imaging, either inducing thermal waves described by heat diffusion or acoustic pressure signals showing acoustic attenuation as in a liquid. For describing the relevant results from non-equilibrium thermodynamics, we follow a paper from Esposito and Van den Broeck about the "Second law and Landauer principle far from equilibrium" 7 , where we describe the photothermal or photoacoustic measurement process in terms of thermodynamics. The description can be simplified as the Hamiltonian does not change in time. A short laser light pulse (Dirac delta-like) is used for excitation, which brings the sample suddenly out of thermal equilibrium by heating the structures via optical absorption and thereby generating an initial pressure distribution, and then the system returns slowly to equilibrium either by heat diffusion ( "thermal") or by attenuation of the pressure wave ("acoustic").
After presenting these simple models for the wave propagation and the thermodynamic approach, they are linked in frequency domain to evaluate the physical background of regularization. Later on, more realistic samples in higher dimensions and with realistic boundary conditions are described. But before stepping into the details, in the remaining part of arXiv:2008.04696v1 [physics.app-ph] 11 Aug 2020 the Introduction section previous work in subsurface imaging and in thermodynamics is described.
In former times, images always have shown what could be detected with human eyes. Looking under the sample surface in a non-destructive way was possible only for optically transparent samples. This situation changed drastically with the discovery of X-rays by Röntgen in 1895. Nowadays, the interaction of sample structures with electromagnetic waves in the whole frequency range from eddy current (EC), radar, terahertz (THz), infrared (IR), up to UV, and X-ray radiation is used for imaging, but also with particles or elastic (acoustic) waves. The amount of information, which can be gained by measurements on the sample surface, depends on the interaction of these waves or particles with structures of the sample, such as scattering or absorption, and on the detector (e.g. sensitivity or bandwidth). For interior structures, the available information for imaging is limited also by the information loss during wave propagation from the imaged structures to the sample's surface, caused by diffusion, dissipation, or scattering ( Fig. 1 or Burgholzer et al. 8 ). A unifying framework for treating diverse diffusion-related periodic phenomena under the global mathematical label of diffusion-wave fields has been developed by Mandelis 9 , including thermal waves, charge-carrier-density waves, diffuse-photon-density waves, but also modulated eddy currents, neutron waves, or harmonic mass-transport diffusion waves.
There have been made several attempts to compensate the diffusion, dissipation, or scattering during wave propagation to get a higher resolution for the reconstructed images of the samples interior. We have shown that thermodynamical fluctuations are the cause for an entropy production, which is equal to the information loss and limits this compensation 8 . Therefore, also the spatial resolution for nondestructive imaging (NDI) at a certain depth is limited. For waves that satisfy the wave equation, no information is lost during the propagation, because the time reversed wave is a solution of the same wave equation. The inverse problem of "back-projection" can be exactly solved by, e.g. time reversal 10 . It is called a virtual wave signal, because in reality always dissipation, diffusion, or scattering causes a loss of information during propagation. For attenuated acoustic waves [11][12][13][14][15][16][17][18][19][20][21][22][23] and thermal diffusion 8,24-29 a linear relation between the time discretized measured signal Y meas and the virtual wave signal Y virt , which is a solution of the wave equation, was established: The matrix M depends on the kind of wave propagation and was calculated already for acoustic waves having a power law attenuation and for heat diffusion. Because of severe noise amplification, it cannot be inverted directly, but by using regularization methods (ill-posed / ill-conditioned inverse problem). This is a consequence of the information loss during signal propagation to the surface. Y meas could be the measured acoustic pressure or temperature, and Y virt could be the virtual pressure or temperature, respectively. Eq. (1) was derived for attenuated acoustic waves and thermal diffusion, but the concept of deriving the resolution limit from entropy production seems to be useful for nondestructive and biomedical imaging in general. For example, applications of a transformation of the diffusive electromagnetic wave into a wave field were shown by Lee et al. 30,31 and Gershenson 32 for geophysical inverse problems. The physical reason for the entropy production are fluctuations, which can be statistically described by stochastic processes 8 . One comprehensive letter about the relevant non-equilibrium thermodynamics, the second law and the connection between entropy production and information loss, was published in 2011 by Esposito and van den Broeck 7 . They showed that for two different non-equilibrium states evolving to the same equilibrium state, the entropy production ∆ i S during the evolution from one state to the other is equal to the information loss ∆I = k B ∆D, where k B is the Boltzmann constant and ∆D is the difference of the Kullback-Leibler divergence D, also called relative entropy of these states. D is a measure of how "far" a certain state is away from equilibrium 33 . The entropy production for the macroscopic states with small fluctuations around equilibrium turns out to be, in a good approximation, equal to the dissipated energy ∆Q, which is the dissipated heat or the heat transported by diffusion, divided by the mean temperature T mean , so The thermodynamic fluctuations in macroscopic samples are usually so small that they can be neglected -but not for ill-posed inverse problems such as described by the inversion of Eq. (1). The fluctuations are highly amplified due to the ill-posed problem of image reconstruction. As long as the macroscopic mean-value-equations describe the mean entropy production, the resolution limit depends only on the amplitude of the fluctuations and not on the actual stochastic process including all correlations 25 . Therefore, we have used in the past a simple Gauss-Markov process to describe the measured signal as a time-dependent random variable 8 . In addition, we have performed preliminary work to describe heat diffusion as a Wiener process and have compared for a simulated signal the spatial resolution limits derived in temporal and spatial frequency domain, ω-and k-space, respectively 34 . Experimentally, these theoretical resolution limits were verified by thermographic reconstructions of inclined steel rods in an epoxy sample, heated by a short light pulse 26 (section IV B 1).

II. PHOTOACOUSTIC AND PHOTOTHERMAL IMAGING MODEL
In photothermal and photoacoustic imaging a short light pulse, e.g. from a laser, is used to heat subsurface structures by the absorbed (scattered) light, such as blood vessels in tissue or light absorbing structures in epoxy resin, creating a temperature distribution T 0 (r) at location r immediately after the pulse (Fig.1a). This sudden temperature increase causes an initial pressure distribution p 0 (r) proportional to the absorbed optical energy density A(r) = T 0 (r)C p ρ 0 , with a dimensionless material constant Γ = β c 2 /C p , the Grüneisen coefficient [35][36][37] : where C p is the specific heat capacity, ρ 0 is the ambient density, c is the sound speed, and β is the thermal volume expansion coefficient at constant pressure. The initial pressure distribution causes a virtual pressure wave p virt (r,t) as a function of space r and time t. This wave is called "ideal" as no acoustic attenuation or dispersion during propagation is assumed and is "virtual" as attenuation and dispersion are always present for real waves, even if acoustic attenuation can often be neglected for lower frequencies. The wave equation describes the acoustic pressure p virt (r,t) as a function of space r and time t and can be written as 36,37 : where ∇ 2 is the Laplacian (second derivative in space). The source term on the right side of Eq. (3) ensures that the pressure just after the short excitation pulse, which is modeled by the temporal Dirac delta function δ (t), is the initial pressure distribution p 0 (r). The same equation is valid for a virtual temperature wave, which is defined as T virt := p virt (r,t)/ β c 2 ρ 0 : The virtual wave is the induced virtual pressure wave, but multiplied by a material constant to get a temperature measured in Kelvin. As it has no direct physical representation but is a mathematical model for thermographic reconstruction, it is called "virtual temperature wave". The actual temperature evolution T (r,t) can be described by the heat diffusion equation 38 : where T (r,t) is the temperature as a function of space and time, and α is the thermal diffusivity, which is assumed to be homogeneous in the sample. This description of the thermal diffusion is based on Fourier's law, which is valid for macroscopic samples, where the propagation distance is much larger than the phonon mean free path 39 . We denote byT (r, ω) the temperature signal in the frequency domain, the ω-space, which is related to T (r,t) by the temporal Fourier transform and its inverse: Similarly,p virt (r, ω) andT virt (r, ω) denote the Fourier transforms of the virtual waves p virt (r,t) and T virt (r,t), respectively. Taking the Fourier transform according Eq. (6) of Eqs.
(3), (4), and (5), and using δ (t) = 1 2π ∞ −∞ exp (−iωt)dω results in the Helmholtz equations: with k(ω) ≡ ω/c and σ (ω) 2 ≡ iω/α. For the virtual pressure wave and the virtual temperature wave the wavenumber k(ω) is real, for the temperature the wavenumber σ (ω) is complex with a real and imaginary part of equal size to describe the diffusive effect. Acoustic attenuation and dispersion can also be described by a complex wavenumber, but usually the imaginary part is much smaller than the real part. Stokes 40 could already show in 1845, that for liquids the acoustical absorption α(ω) increases with the square of the angular frequency ω, that α(ω) = α 0 ω 2 , with the material constant α 0 taking into account e.g. the viscosity of the liquid. The pressure caused by a monochromatic acoustic plane wave of frequency ω in a uniform attenuating medium at a point x and at an instant t can be expressed as where the acoustic attenuation coefficient α(ω) can be written as the imaginary part of a complex wavenumber K(ω) = k(ω) + iα(ω) = ω/c + iα 0 ω 2 . For the attenuated acoustic wave the Helmholtz equation is the same as for the virtual pressure wave in Eq. (7), but with the complex wavenumber 12-14 : The source term on the right side of Eq. (11) is the same as for the virtual pressure wavep virt (r, ω) in Eq. (7) and therefore a direct relation can be derived between the attenuated pressure wave and the virtual pressure wave on the same location r [12][13][14] . Replacing ω by cK(ω) in Eq. (7) results in: which is equal to Eq. (11), if multiplied by ω/cK(ω), and: This is the sought relation between the attenuated pressure signal and the virtual pressure signal. It is a local transformation, as the location r is the same forp virt andp, and it is valid in all dimensions. Transformation back from ω -space to the time domain with the inverse Fourier transformation (Eq. (6b)) results in: which can be written as: The factor ω/(cK(ω)) turns out to be approximately one for relevant frequencies and attenuation coefficients. Therefore, M p (t,t ) is in a good approximation a Gaussian pulse, where the amplitude is reduced and the width is increased by a factor of √ α 0 ct . For a thin absorbing planar layer in an infinite medium, which can be modeled as a Dirac delta function in the thickness coordinate z, the virtual pressure p virt (z,t) for z > 0 is proportional to δ (t − z/c). From Eq. (15) one gets for the attenuated pressure wave: which is the inverse Fourier transform (6b) of exp (iK(ω)z) 41 . This is also true for a general complex K(ω), e.g. with different power laws than the square of the frequency, such as for porcine fat tissue with a power law exponent of 1.5 (section IV A 1). More details on compensation of acoustic attenuation can be found in references [11][12][13][14][15][16][17][18][19][20][21][22][23] . Eq. (15) can be discretized to produce a matrix equation where p and p virt are the vectors of the attenuated and virtual pressure signal at discrete time steps, respectively. M p is the matrix at these time steps calculated from Eq. (15). In a similar way, we have calculated a local relation between the actual temperature T (r,t) and the virtual wave signal T virt (r,t) from Eq. (4) and (5), or in frequency domain from the Helmholtz equations (8) and (9) 24 . Replacing ω by cσ (ω) in Eq. (8) gives Eq. (9) by multiplying with ic ασ (ω) and identifyingT The inverse Fourier transformation in Eq. (6b) can be integrated analytically and can be written as 24 : Eq. (19), like Eq. (18) connects the temperature signal to the virtual wave signal at the same location r, but in time domain instead of the temporal frequency domain.
For the example of a thin absorbing layer in an infinite medium the virtual temperature wave consists of two Dirac delta pulses running in the positive and negative z-direction proportional to 1 2 δ (z ± ct) = 1 2c δ t ± z c . From Eq. (19) we get the 1D Green's function: Eq. (19) can be discretized to produce a matrix equation, similar to Eq. (17): Eq. (21) and Eq. (17) can be inverted only with appropriate regularization, as the matrices M T and M p are rank deficient. The truncated-singular value decomposition (T-SVD) method is used in the following to get the reconstructed signal T rec as an estimate for T virt from the thermographic signal T or p rec as an estimate for p virt from the pressure signal p, respectively. The truncation value for the smallest singular values is 1/SNR, with SNR being the signal-to-noise ratio for the measured temperature or pressure. Image reconstruction is now a two-stage process, where two successive inverse problems are solved. In photoacoustic imaging in a first step p virt from the measured pressure signal p on the sample surface is determined and then in a second step the initial pressure distribution p 0 (r) is determined by acoustic reconstruction methods, such as back-propagation or time reversal reconstruction algorithms 10 . In photothermal imaging the initial temperature distribution T 0 (r) is determined by the same reconstruction methods from the virtual temperature wave T virt , which was calculated in the first step from the measured surface temperature T . This two-stage process for imaging can be used in 1D, 2D, and 3D. In one dimension the second step, the reconstruction by back-projection is rather trivial: the time signal of the virtual pressure or temperature wave multiplied by the constant sound speed c gives directly the initial pressure or temperature at the depth z = ct. Therefore, in the beginning we will show a 1D example.

A. Compensation of acoustic attenuation for 1D photoacoustic imaging
In frequency domain, according to Eq. (10) the amplitude of the wave component with frequency ω is damped by the factor exp (−α 0 ω 2 z) during propagation from depth z to the sample surface. For frequencies larger than the truncation frequency ω cut , the amplitude of these wave components is damped below the noise level: For the spatial resolution in photoacoustic imaging the minimal possible width of the acoustic signal in the time domain is essential. A small width enables high spatial resolution, which corresponds to a broad frequency bandwidth. If the frequency bandwidth is limited according to Eq. (22), the spatial resolution limit according to Nyquist is half the wavelength at this frequency: .
The resolution limit can be validated by the reconstruction of a thin absorbing planar layer in an infinite medium, which is modeled as a Dirac delta function in the thickness coordinate z, and where the virtual pressure p virt (z,t) for z > 0 is proportional to δ (t − z/c). The Fourier transformation of a delta pulse is exp(iωz/c) and shows all frequencies with equal amplitude. The reconstruction in the attenuation limited frequency bandwidth is the inverse Fourier transformation, where the frequency integral is taken from −ω cut to +ω cut : This is a sinc-function, where the maximum at a distance z is at t = z/c, which is the arrival time of the virtual wave. The zero points are at t = z c ± π ω cut = z±δ resolution c with δ resolution from Eq. (23). Compared to the measured pressure pulse without any compensation of attenuation given in Eq. (16), the width of the reconstructed pressure pulse (24) is reduced by a factor of ln(SNR). A higher SNR allows a better resolution for the reconstruction.
Glycerine is a liquid with rather high viscosity. Acoustic attenuation in glycerine is about 100 times higher than in water. At a temperature of 25 • C and a frequency of 1 MHz the sound velocity is c = 1923 m/s and the attenuation is 2.5 m −1 , which gives 22 dB/m or 0.22 dB/cm (multiplying the attenuation with 20 log 10 (e)) 42 . This gives α 0 = α/ω 2 = 2.5/(2π10 6 ) 2 s 2 /m = 6.3 10 −14 s 2 /m. Even for high acoustic attenuation as in glycerine and for high frequencies, e.g. up to 100 MHz, ω/(cK(ω)) can still be approximated by one in Eq. (16), because cα 0 ω = 0.076 is small compared to one.
We will start with a simple 1D imaging problem: a thin planar layer at a certain depth z below the surface of our detection plane. Due to acoustic attenuation the rectangular pressure pulse gets a different shape according to Eq. (15), calculated by the discretized version of the matrix equation in (17). In our example, we take 3 layers at a depth z of 1 mm, 5 mm, and 10 mm, with a layer thickness of 0.2 mm. Tab. I shows the truncation frequencies according to Eq. (22) and the resulting resolution limits from Eq. (23). Fig. 2 shows the initial pressure distribution, which corresponds in 1D directly to the virtual pressure wave p virt , and the "measured" pressure signal p, which was calculated from Eq. (7), where 1 % of Gaussian noise was added (SNR = 100). For image reconstruction, the inverse matrix was calculated by using the truncated singular value decomposition (T-SVD) 24 . The SVD of the matrix M p is a factorization of the form UΣV t , where U and V are unitary matrices and Σ is a diagonal matrix with non-negative diagonal elements in decreasing order, called the singular values. For the pseudo-inverse matrix Σ + the inverse of the non-vanishing diagonal elements in Σ is taken; zero diagonal are kept. In the truncated SVD reconstruction, if the singular values get less than 1/SNR, they are set to zero. The pseudo-inverse of the matrix M p is M + p = VΣ + U t . Then the reconstructed virtual pressure wave is: which is shown in Fig. 2 around the depth of 1 mm and around the depth of 10 mm.

B. Virtual wave concept for thermographic reconstruction
In frequency domain, we can calculate the decay in amplitude of a thermal wave with frequency ω after a distance z either by directly solving the Helmholtz equation (9) or by inserting the Ansatz 43 with the complex wave number σ (ω) = iω α ≡ 1+i µ and a thin absorbing layer T 0 (z) = T 0 δ (z) into the heat diffusion equation (5), which results in: where ℜ is the real part and µ(ω) ≡ 2α/ω is defined as the thermal diffusion length 44 . The amplitude of the thermal wave is reduced by a factor 1/e after propagation of that length. The wavenumber or spatial frequency is k(ω) ≡ 1/µ(ω) = ω/2α. Similar as for acoustic attenuation described in Sec. II A, for frequencies larger than the truncation frequency ω cut the amplitude of this wave components are damped below the noise level, with the truncation wavenumber k cut : , with a layer thickness of 0.2 mm. At a depth of 1 mm the attenuation of the measured pressure is significant lower. Therefore, also the reconstructed pressure p rec shows a better spatial resolution. The theoretical resolution given in Table I is  Similar to Eq. (23) the spatial resolution is half of the wavelength a the truncation frequency: For thermographic reconstruction, the resolution limit decays linear with depth z. In comparison, for photoacoustic reconstruction the resolution limit decays proportional to the square root of z (Eq. (23)). The objective of the solution of the inverse problem in thermographic reconstruction is to calculate the virtual wave temperature from the noisy temperature measurements. The onedimensional solution of the wave equation is a wave package of constant shape travelling with sound velocity, because the group velocity is equal to the phase velocity. Therefore, in 1D the virtual wave immediately gives the reconstruction at time t = 0 by projecting it back at a distance z = ct. Similar to photoacoustic imaging, the optimum reconstruction can be achieved in the temporal frequency domain (ω-space) by considering frequencies up to the truncation frequency ω cut . If T virt has the shape of sharp pulse, ideally a Dirac delta distribution, the pulse will keep its shape over a travel distance, as the virtual wave equation (8) has only solutions without dispersion and attenuation. In ω-space, forT virt also the high frequencies will contribute to the signal after some propagation. From the delta pulse in T virt , the related temperature signal T gets broadened with increasing distance z according to Eq. 19.
In Fig. 3(a) the time-dependent temperature signal calculated using Eq. (21) is shown for three different heat sources at layer depths of 1, 3 and 5 mm. The layer thickness is the same as in the photoacoustic example with 0.2 mm. A low carbon steel (<0.4 %C) with the following material parameters is theoretically investigated: k = 43 W/(mK), C p = 465 J/(kgK) and ρ = 7850 kg/m 3 . As steel is a rather good electrical conductor, also the thermal conductivity is high. The resulting thermal diffusivity, which describes the speed of heat propagation, is α = k/(ρC p ) = 11.8 10 −6 m 2 /s. The characteristic time (or time scale) to reach the maximum temperature in an 1D heat diffusive process is given by t D = z 2 /(2α). Due to this quadratic dependence, the temperature curves in Fig. 3(a) show very widespread time scales for different layer depths z. This phenomena can be interpreted by the "thermal wave" theory: The short time behavior is dominated by the fastest propagating, high frequency components of the pulse and the long time behavior by the low frequency components with a lower phase velocity: v = √ 2αω 43 . The sample can be considered as a low pass filter, because the high frequency components are heavily attenuated as they propagate, and so only the slow moving low frequency components reach the surface from deeper layer sources in the sample. In Fig. 3(b), the reconstructed signal T rec is shown for the three rectangular signals of T virt , whereby for the regularization the truncated-SVD is used with a truncation value for the smallest singular value of 1 / SNR with a SNR of 100. The truncation frequencies are listed in Tab. II for the three different layer depths. As the resolution limit is the size of a small, or ideally point-like, reconstructed source at a certain depth, the resolution after some propagation distance z = ct , is the half the wavelength at this frequency or the width of the reconstructed signal calculated by the T-SVD. This is in good agreement with the spatial resolution in Eq. (29). The thermographic resolution limit is proportional to the travel distance z of the signal between the excitation location and the detection surface and inversely proportional to the natural logarithm of the SNR, whereby the thermal diffusivity has no influence. The resolution limits for a SNR of 100 are given in Tab. II.

III. THERMODYNAMICS AND INFORMATION
The process of information gaining during photoacoustic and photothermal imaging is a physical one which has to obey the laws of (non-equilibrium) thermodynamics. Determining the virtual pressure or temperature wave from the measured pressure or temperature is an ill-posed or an ill-conditioned  inverse problem, which needs regularization 4 . In the previous section the truncation frequency ω cut was chosen as regularization parameter, where it was assumed by regularization using the T-SVD method that all signal components below ω cut could be used for reconstruction, and all components above ω cut , where the signal is damped below the noise level, provide no additional information for reconstruction. Therefore, the signal amplitude compared to noise level, which is the SNR, plays an important role to determine the truncation frequency ω cut as the regularization parameter of the inverse problem. Where does this noise come from and what is its relation to information gaining in imaging? Before answering these essential questions some basics of thermodynamics and statistical physics should be reviewed. A microstate is a specific microscopic configuration of a thermodynamic system, e.g. with certain locations or velocities of all of the molecules. A microstate can be represented as a single point with coordinates x in a usually very high dimensional phase space. In contrast, the macrostate of a system refers to a few macroscopic properties, such as its temperature, pressure, volume or density. A macrostate is characterized by a probability distribution ρ(x) in the phase space of possible microstates. This distribution describes the probability of finding the system in a certain microstate. In classical mechanics, the position and momentum variables of a particle can vary continuously, so the set of microstates is actually uncountable and x gets a continuous variable. The time evolution of all the particles is determined by the Hamiltonian H, which gives the total energy of a microstate, the sum of the kinetic and potential energy. The systems energy U t at time t is the mean value or also called expectation value of the system Hamiltonian H: where ρ t (x) is the probability distribution at time t.
Here, we follow the work of Esposito and van den Broeck 7 about the second law and the connection between entropy production and information loss. The system entropy is the Shannon entropy, where k B is the Boltzmann constant, and the corresponding system free energy is given by Here T is the temperature in Kelvin of an ideal heat bath with which the system is in contact -the ambient temperature of the sample. The excitation laser pulse deposes the work W at the time t = 0, which either diffuses as heat out of the system or is the heat from dissipation of the photoacoustic pulse by acoustic attenuation, called Q t from the laser pulse till the time t. After a long time, the system is in the same thermal equilibrium as in the time before the excitation pulse, and all the energy W of the laser pulse has left the system and is in the heat bath (Q t = W ), as sketched in Fig. 4. Following conservation of energy (first law of thermodynamics) the corresponding energy change ∆U t := U t −U eq of the system is given by At t → ∞ equilibrium is reached again, Q t gets W , and the change in system energy ∆U t = 0.
If the system is in equilibrium with the surroundings at the ambient temperature T , the equilibrium distribution is the canonical distribution where the normalization function of the equilibrium distribution is called the canonical participation function Z and the thermodynamic β = 1/(k B T ). This Maxwell-Boltzmann distribution ρ eq can be determined by minimizing the entropy S t under the two constraints on the total particle number and on the average energy per particle, e.g. 45,46 . For the equilibrium entropy one gets using Eq. (31) and Eq. (30): and for the equilibrium free energy According to the second law of thermodynamics the entropy of an adiabatically insulated system increases monotonously until thermodynamic equilibrium is established, where the relative entropy −k B D(ρ t | ρ eq ) gets zero, with the Kullback-Leibler divergence between the non-equilibrium distribution ρ t at time t and the equilibrium distribution ρ eq 33 . The Chernoff-Stein's Lemma gives a precise meaning to D( f | g) as a "distance" between two distributions: if n data from g are given, the probability of guessing incorrectly that the data come from f is bound by the error ε = exp(−nD( f | g)), for n large 33,47 . The free energy of a non-equilibrium state ρ t is higher than that of the equilibrium state by an amount equal to the temperature times the information I t needed to specify the non-equilibrium state:  (36). Due to the excitation pulse at time t = 0, the free energy jumps from F eq to F kick = F eq + W , because the pulse shifts the distribution to a higher energy ∆U kick = W (Eq. 33), but does not change the entropy (Fig.  4), S kick = S eq . The change in the non-equilibrium system entropy ∆S t = S t − S eq can be written as a reversible contribution to the heat flow −Q t /T and of an irreversible contribution, called entropy production ∆ i S t : FIG. 4. Illustration of the thermodynamics of the imaging process in phase space: a system in equilibrium state ρ eq is kicked at a time t = 0 with magnitude x 0 and performed work W to a state ρ kick far from equilibrium, followed by a dissipative or diffusive process back to equilibrium. Till the time t the heat Q t flows to the surroundings of the system at an ambient temperature T . At t → ∞ equilibrium is reached again and Q t gets W . x is the coordinate in phase space or a set of reduced variables which captures the information on the work (see text). The arrows connecting ρ kick at time t = 0, ρ t at t > 0, and ρ eq at t → ∞ indicate the tube of trajectories, which is "thin" for macroscopic systems as deviations from the mean values x(t) are small.
Together with the definition of the free energy in Eq. (32), the first law Eq. (33), and Eq. (38) we obtain for the entropy production: The free energy ∆F t jumps at time t = 0 from zero to W and then decreases and gets zero, the information according to Eq. (38) makes the jump to W /T , and then decreases to zero. ∆I t is defined as the information loss and according to Eq. (40) we get the very important result, that the entropy production is equal to the information loss: For a macroscopic sample the fluctuations, which are the variance or the "noise" around the mean values x(t) are small and the change of the "shape" of the distribution during time evolution shown in Fig. 4 can be neglected. Therefore, the change of the Shannon system entropy can be neglected (∆S t ≈ 0 in Eq. (39)) and the change in free energy gets equal to the change in system energy, ∆F t ≈ ∆U t , which gives from Eq. (38): In this very good approximation for macroscopic samples, the Kullback-Leibler divergence gets a distance in the mathematical sense, and the energy W − Q t , which has not yet dissipated or diffused to the surroundings at temperature T , divided by this temperature is the information content. After a long time, when all the work W has dissipated or diffused, I t gets zero and the equilibrium distribution ρ eq is reached. But according to Chernoff-Stein's Lemma after some time t cut the distribution ρ t cut cannot be statistically distinguished from ρ eq , if k B T D(ρ t | ρ eq ) gets smaller than the equilibrium energy U eq . For real world examples having a high dimensional phase space x the actual distribution density ρ t can be hardly determined, but for the information loss from Eq. (42) it is sufficient to evaluate the dissipated or diffused work from the mean value equations. Usually, a set of reduced variables with a drastically lower dimensionality than the phase space can be used, which captures the information on the mean dissipated or diffused work. In the macroscopic approximation the actual noise or fluctuation around the mean value is irrelevant. Only its amplitude is significant for setting the lower bound in Chernoff-Stein's Lemma. The following subsections give a small example for dissipation and for diffusion to clarify the relation between entropy production and information loss.

A. Kicked Brownian Particle
We now consider the velocity v of a particle as a stochastic process, which was used to describe Brownian motion of a particle, but for simplicity only one velocity component (1D). The essential relation between fluctuation (noise) and dissipation (viscous damping) is shown. Stochastic processes can be described mathematically, e.g. by Master equations, Langevin-or Fokker-Planck equations, shown e.g. in the books of van Kampen 48 , Gardiner 49 , or Risken 50 . In the Langevin-equation the environmental forces on a particle in Newton's law are a linear damping term together with random noise: The linear damping −γv is a viscous drag and σ is the amplitude of the white noise η, which has a zero mean value and is uncorrelated in time: The Langevin equation governs an Ornstein-Uhlenbeck (O-U) process, named after L. S. Ornstein and G. E. Uhlenbeck, who formalized the properties of this continuous Markov process 51 . It was shown by using the statistical properties and the continuum limit of the white noise η, that the Langevin equation is equivalent to a description based on a Fokker-Planck equation. For the time-dependent distribution density ρ t (v) of the velocity a linear differential equation can be derived (e.g. 50,52 ): We start with the equilibrium state (zero time derivative if inserted into Eq. (44)), as the initial velocity distribution: which is a Gaussian distribution with mean value zero and a variance Var(v) = σ 2 /(2γ), representing the statistical "noise". Comparison with Eq. (34) and the kinetic energy H(v) = mv 2 /2 gives: This relation states a connection between the strength of the fluctuations, given by σ , and the strength of the dissipation γ. This is the fluctuation-dissipation theorem for uncorrelated white noise. At a time zero the particle is kicked, which causes an immediate change in velocity of v 0 (kick magnitude) and the distribution density after the kick is The solution of Eq. (44) for the time-dependent distribution density ρ t (v) at t > 0 with ρ kick (v) as initial condition at t = 0 is which gives a Gaussian distribution with time dependent mean valuev(t) but a constant variance Var(v) = σ 2 /(2γ). As shown in 8 this is a general feature of Gauss-Markov processes, also for higher dimensions: taking the equilibrium as an initial condition results for all times after the kick in a distribution with a constant (co)variance (matrix) equal to the equilibrium variance. One realization of the kicked O-U process is shown in FIG 5. At the time t = 0 a kick with a magnitude of v 0 = 10 occurs. For a time t > 0 the information about the magnitude of the kick gets more and more lost due to the fluctuations. Now, this increasing information loss is quantified and compared to the mean entropy production. According to Eq. (30) and (33) due to the kick at time zero the energy of the particle is increased by W = mv 2 0 /2 and From Eq. (31) the entropy stays constant in time (∆S t = 0), as the distribution is shifted in time but with a constant "shape". Therefore the approximation in Eq. (42) gets exact: According to Chernoff-Stein's Lemma, if the information I t at a time t cut gets less than a certain limit, here chosen as the zero-point energy divided by the temperature (U eq /T = k B /2), the distribution ρ t (v) cannot be statistically distinguished from the equilibrium distribution ρ eq (v): which gives At this truncation time t cut , not only the distribution cannot be distinguished from equilibrium, but also the mean velocitȳ v(t) = v 0 exp(−γt) gets less than the noise level v 0 /SNR. In Fig. 5, this is at the scaled time t cut γ = ln(10) ≈ 2.3. This coincidence, that the information related criterion in Eq. (52) gives the same time t cut when the signal gets less than the noise always happens, if the energy described by the Hamiltonian H is proportional to the square of the signal amplitude.

B. Ideal gas diffusion between two boxes
Another example for an Ornstein-Uhlenbeck process using diffusion instead of dissipation is the distribution of 2N particles of an ideal gas between two boxes of volume V , which are connected by a small hole where particles can slip through between the two boxes (effusion). The constant time rate γ is the reciprocal value of the mean time of a particle to stay in one of the boxes before it slips through the hole. At a certain time t, N + δ N(t) particles are situated in the left box and N − δ N(t) particles are in the right box (Fig. 6).
δ N(t) can be approximated by an Ornstein-Uhlenbeck process for a big number of particles (see e.g. the tutorial introduction to stochastic processes by Lemons 53 ). At the time t = 0, N 0 particles are "pumped" from the right to the left box. The mean value δN(t) = N 0 exp(−γt) for t > 0, is given by an exponential decay in time, and the variance of the Gauss distribution is again constant in time with Var(δ N) = σ 2 2γ , according to Eq. (48). However, Var(δ N) is not an energy, and therefore σ 2 cannot be expressed in terms of a rate γ and a temperature T by requiring the equipartition of energy at equilibrium. The fluctuation -dissipation theorem as in Eq. (46) does not apply; Var(δ N) fluctuates without dissipation 53 . The information loss is again described analog to Eq. (51) by an exponential decay: Spatial diffusion is the cause for the entropy production and is described according to Boltzmann by S = k B lnW . W is the number of possibilities to distribute the particles into the two boxes for a certain δN. Using Stirlings formula for a big number of particles N one gets for the entropy 45 : By comparison with Eq. (54) one gets again a relation between the fluctuations, given by σ , and the strength of the diffusion, given by the rate γ (fluctuation-"diffusion" theorem) and for the variance of δ N we get:

IV. IMAGING AS AN INVERSE PROBLEM
In the previous section, imaging with an excitation pulse at time t = 0 has been described by means of nonequilibrium statistical physics. Pressure or temperature are proportional to the momentum or the kinetic energy of many particles, which fluctuate around their mean value. For macroscopic samples these fluctuations are very small and can usually be neglected in the thermodynamic limit. But for inverse problems these fluctuations are highly "amplified", which was shown in the previous section for a system kicked out of the equilibrium, followed by a dissipative process back to equilibrium (Fig. 4). The inverse problem of estimating the kick-magnitude from an intermediate state a certain time after the kick is ill-posed. Just after the kick its magnitude can be estimated very well. A long time after the kick the state is nearly in equilibrium and all the information about the kick magnitude is lost. For macroscopic systems it could be shown that the information loss is in a good approximation just the mean dissipated energy or diffused heat divided by the temperature, which is the mean entropy production. In imaging, the spatial resolution and the information content are strongly correlated, and therefore a loss of information results in a loss of resolution, which is quantified for 1D examples in sections II A and II B.
In sections III A and III B, the mean dissipated energy and the resulting loss of information is explicitely calculated for a kicked Brownian particle and for an ideal gas diffusing between two boxes. For these study cases also the timedependent probability distribution could be calculated explicitly, and the deduced information losses are in agreement with the results from mean value calculations, even when the distributions are broad and still far away from the thermodynamic limit. A truncation time could be given, for which the state after that time cannot be distinguished from the equilibrium distribution according to the Chernoff-Stein's Lemma, when the Kullback-Leibler divergence gets too small. The amplitude of the fluctuations and the dissipation or the diffusion are not independent. Fluctuations and mean entropy production can be thought as two sides of the same coin and are connected by the fluctuation -dissipation theorem. For systems near thermal equilibrium in the linear regime such relations between entropy production and fluctuation properties have been found by Callen 54 , Welton 55 , and Greene 56 . This fluctuation-dissipation theorem is a generalization of the famous Johnson 57 -Nyquist 58 formula in the theory of electric noise. It is based on the fact that in the linear regime the fluctuations decay according the same law as a deviation from equilibrium following an external perturbation.
For macroscopic systems it is not necessary to describe the full stochastic process to get the influence of the fluctuations. The relevant information loss can be calculated from the averaged behavior (mean value), which describes the usually known macroscopic evolution of the system in time, as the mean dissipated work or diffused heat divided by the temperature. This remarkable feature might be the reason that regularization methods for ill-posed inverse problems work so well, although they use only the mean value equations and not the detailed stochastic process to describe the time evolution. The choice of an adequate regularization parameter, e.g. the truncation value for the truncated singular value decomposition (SVD) method, is equivalent to the choice of the error level ε in the Chernoff-Stein's Lemma.
A prominent class of ill-posed inverse problems is nondestructive imaging (NDI), where the information about the spatial pattern of a sample's interior has to be transferred to the sample surface by certain waves, e.g., ultrasound or thermal waves (Fig. 1). Imaging is done by reconstruction of the interior structure from the signals measured on the sample surface, e.g., by back-projection or time-reversal for a photo-acoustically induced ultrasound wave 10,59 . There are several effects which limit the spatial resolution for photoacoustic imaging. Beside insufficient instrumentation and data processing one principle limitation comes from attenuation of the acoustic wave or from heat diffusion (section II). This information loss during the acoustic wave propagation cannot be compensated by any signal processing algorithm. In this section it is shown that the loss of information, which is equal to the entropy production (section III) as the dissipated energy or diffused heat divided by the temperature, is a principle thermodynamic limit, which cannot be compensated. Using the information loss and entropy production for a kicked process derived in section III it is shown that the resolution limit depends just on the macroscopic mean-value equations and is independent of the actual stochastic process, as long as the macroscopic equations describe the mean work and therefore also the mean dissipated work.
In this section we do not attempt to model the process of acoustic attenuation or heat diffusion as a stochastic process, which we have done earlier as a Gaussian process 8 . We use from section II the mean-value equations for the pressure p(r,t) or temperature T (r,t) evaluation in the frequency domain,p(r, ω) orT (r, ω), respectively.
For an acoustic wave in frequency domain, according to Eq. (10) the amplitude of the wave component with frequency ω is damped by the factor exp(−α 0 ω 2 r) after propagating a dis-tance r. The energy ∆U ω of the acoustic wave with frequency ω is proportional to the square of the pressure amplitude: ∆U ω = 0.5χ∆V |p(r, ω)| 2 = 0.5χ∆V exp(−2α 0 ω 2 r), (57) which can be found e.g. in Morse and Ingard 60 , where χ = 1/(c 2 ρ) is the adiabatic compressibility with the density ρ and ∆V is the measurement volume. Using that Var(ρ) = k B T /(χ∆V ) is the variance of the pressure (e.g. from Landau and Lifshitz 61 ) one gets from Eq. (42) for the information content of the Fourier component with frequency ω: 2α 0 ω 2 r). The signal-tonoise ratio SNR at the distance r = 0 is the reciprocal value of the square root of the variance of the pressure, as the signal amplitude in frequency domain is normalized to one (Eq. (10)). The truncation frequency ω cut is defined that for larger frequencies than ω cut the information content in this frequency component is so low that its distribution cannot be distinguished from the equilibrium distribution within a certain statistical error level (Chernoff-Stein's Lemma). This level is ∆U eq /T = 0.5χ∆V Var(p)/T = 0.5χ∆V k B T /(χ∆V )/T = 0.5k B : This gives the same truncation frequency as in Eq. (22), where the amplitude is damped just below the noise level. As described already in section III A, the information related criterion in Eq. (58) gives the same truncation value as when the signal amplitude gets less than the noise, if the energy ∆U ω is proportional to the square of the signal amplitude |p(r, ω)| (Eq. (57)). Similarly, also for particle diffusion the information content I t is proportional to the square of the mean difference in the partical number δN(t) (Eq. (54). For heat diffusion and thermal waves the information content is proportional to the square of the temperature deviation from the equilibrium temperature, in real time space and in the frequency domain. Heat diffusion can be described as an Ornstein-Uhlenbeck process 8,62 . Therefore, the information related criterion in Eq. (42) for thermal waves gives the same truncation frequency ω cut when the signal gets less than the noise with the results given in section II B.
As mentioned at the end of section II, imaging in two (2D) or three dimensions (3D) can always be reduced using a twostage process: first, for each detector location the virtual pressure signal in the absence of attenuation or the virtual thermal wave is calculated from the measured acoustic or thermal signal, respectively. This is a one-dimensional (1D) reconstruction problem, described in discrete time steps by Eq. (17) and Eq. (21). In a second step, any reconstruction method for photoacoustic tomography without acoustic attenuation, such as time-reversal or backprojection, can be used for reconstructions in higher dimensions 10 . In 1D the virtual wave immediately gives the reconstruction at time t = 0 by projecting it back at a distance ct (see section II A and II B). Therefore, it is sufficient to examine the acoustic attenuation or thermal diffusion of 1D waves and the reconstruction in 1D. Compensation of acoustic attenuation or thermal diffusion in higher Kapitel IV A 1. Acoustic attenuation in a fat tissue layer:   The fat tissue is fastened between two aperture disks applying a small axial force on the tissue. Distance bolts with 6 mm or 20 mm ensure two precise lengths of the attenuation path. For these two lengths, attenuated acoustic plane waves were detected by an unfocused piezoelectric transducer, which was aligned by worm screws to ensure a one-dimensional signal propagation and detection. Image from Burgholzer et al. 65 was edited and is used under CC BY 4.0.
dimensions can always be reduced to 1D, which was also explicitly shown for signals from a layer (1D), cylinder (2D), and a sphere (3D) 19 . Therefore, in the beginning we will show a 1D example.

Acoustic attenuation in a fat tissue layer
For excitation of plane acoustic waves the surface of a silicon wafer was illuminated by a nanosecond laser pulse (Fig. 7). Due to the abrupt local heating due to absorption of the 532 nm wavelength laser light pulse, the silicon wafer expands thermoelastically and generates an acoustic wave, propagating through a layer of fatty tissue and is detected by an unfocused ultrasound transducer (V358-SU, Panametrics, Waltham, MA). The measured acoustic pressure without fat, through 6 mm thick porcine fat tissue, and through 20 mm thick tissue is shown in Fig. 8(a) as a function of time 63,64 . For comparison, the measured signals are scaled and time-shifted. The frequency spectrum was calculated by Fourier transformation in time and the frequency dependent attenuation for the 6 mm and 20 mm fat layer was determined by dividing their spectrum through the spectrum of the reference signal without fat. It turned out that a power law α(ω) = α 0 ω n with an exponent n = 1.5 and α 0 = 0.87dB MHz −n cm −1 fits the attenuation in a wide frequency range very well 63 .
The experimental set-up now is slightly different compared to the photoacoustic imaging set-up described in section II. In photoacoustic imaging, the short excitation pulse at the time zero excites a pressure wave in the whole sample at the same time. If an acoustic signal arrives later at the detector this comes from a longer propagation distance and causes a higher attenuation. Here, the fat layer always has the same thickness and therefore the attenuation for earlier or later arriving parts of the signal at the detector is the same. Only for the excitation as a Dirac delta in space and time as described in Eq. (16), this is the same. Since the differential equations are linear, the general attenuated signal is a temporal convolution of the input signal with the attenuated solution for Dirac excitation, or a simple multiplication in the frequency domain 15 . The discrete version of the relationship between the measured pressure and the virtual pressure wave now, instead of Eq. (17), becomes where p and p virt are the vectors of the attenuated and virtual pressure signal at discrete time steps, respectively. Writing the discrete Fourier and inverse Fourier transformation as multiplication by F and its conjugate transpose F * , and diag(.) forms a diagonal matrix, M z can be written by neglecting the dispersion as 63 which immediately shows that the singular values of M z decay exponentially and according to Eq. (22) for larger frequencies than the truncation frequency ω cut the amplitude of these wave components is damped below the noise level: SNR exp (−α 0 ω n cut z) = 1 or ω cut = n ln(SNR) The spatial resolution according to Eq. (23) is half the wavelength at the truncation frequency ω cut . For the SNR of 1358 (63 dB), the truncation frequency for a fat thickness of 6 mm and 20 mm is 24 MHz and 11 MHz, respectively, which corresponds to a spatial resolution of 32 µm after 6 mm of fat and 70 µm after 20 mm of fat.
The acoustic attenuation in water compared to fat can be neglected 63 . Therefore, the signal measured in water without fat can be taken as the virtual pressure signal p virt , which is no single Dirac delta pulse, but gets negative and shows additional "ringing" because of the laser ultrasound excitation within the silicon wafer and the characteristic of the piezoelectric transducer and the amplifier. All these influences on the signal can be described by a matrix M water with p virt = M water p δ , and Eq. (59) can be written as Using the truncated SVD for the inversion of M z M water , the reconstructions of p δ from the measurements p for water, 6 mm fat, and 20 mm fat are shown in Fig. 8(b). Multiplying the temporal width of these peaks of 21 ns and 46 ns by the sound velocity of 1512 m/s in fatty tissue perfectly fits to the resolution limits of 32 µm after 6 mm of fat and 70 µm after 20 mm of fat derived above from the truncation frequencies. To make comparison easier, the signals were time-shifted and scaled and (b) the reconstruction results using T-SVD for regularization to compensate attenuation in fatty tissue of 6 mm thickness and 20 mm thickness. This corresponds to a spatial resolution limit of 32 µm for 6 mm fat and 70 µm for 20 mm fat, resulting from entropy production. The matrix M z was multiplied by the convolution matrix M water of the water signal to get a δ -like pulse for the pure water-signal without fatty tissue in the measurement chamber.

Axial thermal profiling of planar heat sources
To generate an internal planar heat source for an experimental study, a thin plate (foil) was embedded in pure epoxy resin. Laser light is absorbed for thermal excitation and heats the foil (Fig.9 (b)). In the case of an electrically conductive film, electromagnetic induction can also be used to heat the film ( Fig.9 (a)). The epoxy resin is opaque in the spectral sensitivity range of the mid-wave IR camera (3 to 5 µm) . Therefore, the measured infrared radiation only comes from the sample surface (z = 0 mm).
The thermal diffusivity of the epoxy sample was obtained by the Parker method 66   ferent epoxy samples, with the embedded film at a depth of z = 7.2 mm and 11.7 mm, the surface temperature as a function of time has been measured with a quantum IR detector ( Fig.  10(a)). Due to the low temperature increase on the sample surface, the measurement noise is approximately additive white Gaussian noise with a standard deviation of 25 mK 67 . The measurement parameters are listed in Tab. III, with the pulse duration t p , the number of discretizations in time and space N and the temporal and spatial discretization ∆t and ∆z. We invert these experimental temperature signals into virtual wave signals to reconstruct the location, the width and amplitude of the internal heat sources, showing the capability of incorporating prior information in the regularization. The reconstructed depth profile of the initial temperature using the direct regularization method T-SVD 4 is shown in Fig. 10(b). The comparison of the different reconstructed depth profiles clearly shows the entropy production caused by thermal diffusion that is equal to information loss with the result of lower axial resolution for the deeper defect, as listed in Tab. III. The SNR, which was derived from the singular values, is nearly doubled for the defect depth of 11.7 mm, as the duration t p of the heating pulse is increased from 0.5s to 1s. The truncation frequency ω cut and the spatial resolution δ resolution are derived by Eq. (28) and Eq. (29), respectively.
To increase the information content in the regularization process, we introduce prior knowledge in form of positivity, which is reasonable due to the solely non-negative amplitude values of the virtual wave, as well as for the photoacoustic pressure, in the 1D regime 68 . In contrast to the bipolar reconstruction with T-SVD, non-negative values for the reconstructed virtual wave field can be enforced using iterative regularization procedures, such as the Alternating Direction Method of Multipliers (ADMM) 69,70 . Additionally, for a majority of practical problems in photothermal imaging, we can assume sparsity, since in the application of defect detection mostly isolated singular defects occur, e.g. cracks, pores, delaminations or inclusions of other materials, or in the application of parameter estimation, only the positions of the sources, the back wall or individual inner boundary layers should be identified. Consequently, we have only a few point or line scatterers, which leads to a sparse virtual wave signal. Sparsity is introduced in the ADMM by an appropriate formulation of the objective function using 1 -norm minimization. As shown in the reconstructed depth profiles in Fig. 10(b), iterative regularization with prior information leads to an improvement in energy localization and hence to a higher axial resolution.

B. Active thermographic computed tomography (ATCT)
Active thermographic computed tomography (ATCT) is a new hybrid reconstruction technique that utilizes the photothermal (PT) effect 71,72 , or other thermal excitation sources as inductive heating 24,26 , mechanical friction through ultrasound, microwaves, for signal generation. In the case of photothermal computed tomography (PTCT) an optical pulse is used to irradiate an object, such as biological tissue or manufactured materials and products, to generate thermal waves within the object. The thermal waves diffuse to the sample surface, where the temperature change can be measured using infrared cameras. Due to the focal plane array of the infrared camera up to 10 6 signals can be recorded simultaneously, which is equivalent to an extremely large aperture in contrast to photoacoustic transducers. The aim of PTCT is to reconstruct an image that represents a map of the initial temperature distribution within the object from the measured photothermally-induced thermal signals. The initial temperature distribution is proportional to the absorbed optical energy, which can reveal useful information of the internal structure, such as material defects, or inhomogeneous material parameters.
The reconstruction process in ATCT is a two step algorithm: • Computation of the virtual wave field: The original thermographic problem is converted to a photoacoustic imaging task by a pointwise transformation, that means separately for each pixel, of the measured temperature field into a virtual wave field as shown in section (Sec.IV A 2).
• Virtual wave back-propagation: The resulting virtual wave field exhibits wave properties, such as wave- front propagation, reflection and refraction. Frequency domain-synthetic aperture focusing technique (F-SAFT), a well known acoustic reconstruction method 73,74 , can be applied to image the initial temperature distribution. This allows a multidimensional image reconstruction with the advantage of a higher SNR.
The common inverse heat conduction solutions for multidimensional thermographic imaging leads to large scale optimization problems with high computation costs. The virtual wave concept for ATCT splits the severely ill-posed 3D large scale problem into a variety number of small-scale 1D problems, where efficient algorithms, e.g. ADMM, can compute parallel for the solution. The 1D depth-encoded signals calculated for each location could be gathered and then reconstructed to a 3D image using F-SAFT with fast computational architecture, which could be also accelerated by graph- ical processing unit (GPU) programming 75 .
In 3D imaging, we prefer the regularization technique ADMM including the prior information sparsity and positivity because of its improvements in terms of sensitivity and depth resolving capability compared to direct regulizers. The energy input during thermal excitation always results in positive temperature signals. By converting the temperature field into a virtual wave field, positivity cannot be applied directly for 2D or 3D wave propagation. An initially non-negative acoustic signal will take negative values during propagation 68 . The circular and spherical projections,which is in 2D the Abel transformation and in 3D the time integral of the virtual wave 17 , retain positivity of the initial source. For one data point the information gain by a positivity constraint would be only a factor of 2, but for a signal with N data points this factor becomes 2 N , which can be large for higher N.
In the next sections we show two examples of ATCT with limited data (limited view), when the detectors cannot be placed around the complete object. In practical measurements, only a limited space around the specimen is available for photothermal detection. The thermal diffusivity and the resulting virtual speed of sound of the bulk is assumed to be constant and isotropic.

Reconstruction from a single detector plane
The experimental phantom is built up with two steel rods, which are embedded in epoxy resin (Fig. 11). The cylindrical axes of the steel rods are parallel with a distance of 5 mm and inclined to the object surface. The inductive heated steel rods work as volumetric heat sources with a heating time of t p = 2 s similar to the 1D photothermal example (Sec. IV A 2). The temperature field is measured on a single surface at z=0. The measured signal decreases with increasing depth of the rods, due to the homogeneous diffusion in each direction. The spatial-and time discretizations were ∆ x = 168 µm and ∆ t = 40 ms.
Reconstruction results based on this phantom and T-SVD regularization were also published by Burgholzer et al. 26 . In this tutorial, not only T-SVD is used for regularization, but also the iterative regularization scheme ADMM is used, which enables the incorporation of prior information in the form of positivity and sparsity for defect reconstruction. A cross section of the virtual wave field reconstructed by T-SVD and ADMM is illustrated in Fig. 12. The virtual wave fields were normalized to the maximum wave amplitude, that was obtained by ADMM regularization. Each cross section shows the typical scattering hyperbola as consequence of the cylindrical steel rods, but the ADMM regularization leads to a much sharper localization. Based on the virtual wave field, the initial temperature distribution can be calculated applying inverse wave propagation methods, like F-SAFT. Fig. 13 illustrates the resulting 3D initial temperature distribution for the different regularization methods. The amplitude of the reconstructed initial temperature field was binarized with a threshold value to represent the localization of thermal sources at a point within the 3D sample volume. The circles indicate the real position of the steel rods in the epoxy cylinder. Fig. 13 (a) exhibits the T-SVD based reconstruction. It is visible, that the steel rods can be distinguished till z = 4 mm, as expected for direct reconstructions without any prior information. The resolution is in the range of the depth of the sources. Moreover, artefacts are present in regions with low SNR. 13 (b) illustrates the reconstruction employing ADMM. In this process, positivity and sparsity was introduced to improve the regularised solution. Here, the steel rods are separable in the domain 10 < x < 40 mm. Moreover, the artefacts are significantly reduced compared to T-SVD.

3D plane detection
An example of ATCT with temperature data from three orthogonal detection planes is shown in Fig. 14. The phantom is an epoxy cube with an edge length of a = 15 mm containing four spherical steel spheres with a diameter of 2 mm. The epoxy material, the steel beads, the inductive heating device and the position of the sample inside the coil were the same as in Sec. IV B 1. The spatial and temporal discretization of the IR camera were ∆x = ∆y = ∆z = 172 µm and ∆t = 50 ms. The bottom and the sidewalls of the cube were thermally isolated to ensure adiabatic boundary conditions. The timedependent temperature fields were measured on the right side plane (x = a), front plane (y = 0) and on the top plane (z = a) for a time duration of 120 s after the pulse excitation. The pulse time of 7.5 s was chosen to obtain also thermographic signals from the deeper spheres.
Reconstructions based on this phantom and T-SVD regularization were also published in Burgholzer et al. 24 . In this tutorial and in contrast to prior works, we use ADMM regularization and incorporate prior information in the form of positivity and sparsity to improve the quality of the regularized solution. The three corresponding reconstructions using ADMM for regularization and F-SAFT for image reconstruction are shown in Fig. 14. For all three reconstruction planes, limited view artifacts are visible and the inclusions far from the detector planes cannot be reconstructed at all.

V. SUMMARY, CONCLUSIONS AND OUTLOOK
In nature, real processes are always irreversible and show a clear direction of the arrow of time. Some processes, such as the propagation of an acoustic wave, especially for low frequencies in a medium with low viscosity, are in a good approximation reversible. The corresponding wave equation for pressure is symmetric in time; for every pressure wave, which is a solution of this equation, the time-reversed pressure wave is also a solution. Even if some acoustic waves can be described in a good approximation by the wave equation, acoustic attenuation cannot be avoided totally. Therefore, we call this ideal solution a "virtual" wave, in comparison to the real measured signal. In this tutorial it is demonstrated for thermal and acoustic waves, that the measured and the virtual signal are linked by a local relation (Eq. 1).
For non-destructive imaging of subsurface structures, from an information theoretical viewpoint it is advantageous to use "nearly reversible" waves to transport the information from the structures to the sample surface, such as acoustic waves with low attenuation. Due to heat diffusion thermal waves exhibit a higher entropy production, which is shown to be equal to the information loss (section III) and thus results in a lower spatial resolution (Tab. I and II or Fig. 2 and 3). In section III two simple examples for stochastic processes are given, where the close connection between fluctuation and dissipation ("kicked Brownian particle") and diffusion ("ideal gas diffusion between two boxes") can be shown explicitely.
In section IV the information theoretical cut-off criterion using the Chernoff-Stein's Lemma (section III) is applied in frequency domain to photoacoustic and photothermal imaging. The information related criterion gives the same truncation value as when the signal amplitude gets less than the noise level. This is always the case if the mean energy of the wave component and therefore also its information content is proportional to the square of the signal amplitude. In section IV A for the 1D imaging even in fat with rather high acoustic damping the resolution is always much better than for ther- mographic imaging (see also the simulation in Glycerine in section II A). Therefore, for thermographic imaging with this hight entropy production from heat diffusion, it is essential to use additonal information such as positivity and sparsity by implementing iterative algrorhms, e.g. by ADMM to get a better resolution, also in the sub-mm range ( Fig. 10(b)). This has been used in thermographic computed tomography (section IV B), either by 2D reconstruction from single detector plane measurements or on three perpendicular planes for a cubic sample containing steel spheres to be imaged. Using the T-SVD without incorporating additional information for a single 1D reconstruction, the spatial resolution does not get better compared to a direct inversion of the heat diffusion. The advantage of the virtual wave concept in 1 D is, that more advanced regularization techniques, incorporating a priori information such as sparsity or positivity can be utilized in the reconstruction process 71,72 . In 2D and 3D, after the computation of the virtual wave field, well known acoustic reconstruction methods, such as F-SAFT are used (section IV B). Here, the SNR is significantly enhanced as the heat flow in all directions is taken into account for reconstructions. This is similar to averaging of many 1D measurements when imaging a layered structure, but with the essential advantage, that the virtual wave concept can be used for any 2D or 3D structure to be imaged.
For future work, deep learning neural networks are planned to be used as the second step instead of e.g. F-SAFT. For the same date from Fig. 13 this gives another significant enhancement in resolution. In Fig. 13 the threshold-level for the isosurface plot is rather critical to avoid additional artifacts. With a deep neural network trained only by simulated data the reconstructions are very stable and choosing a threshold-level for the isosurface plot is not critical as reconstruction artifacts hardly appear 76 .

ACKNOWLEDGMENTS
The financial support by the Austrian Federal Ministry of Science, Research and Economy and the National Foundation for Research, Technology and Development is gratefully acknowledged. Furthermore, this work has been supported by the project "multimodal and in-situ characterization of inhomogeneous materials" (MiCi), by the the federal government of Upper Austria and the European Regional Development Fund (EFRE) in the framework of the EU-program IWB2020. Financial support was also provided by the Austrian research funding association (FFG) under the scope of the COMET programme within the research project "Photonic Sensing for Smarter Processes (PSSP)" (contract number 871974). This programme is promoted by BMK, BMDW, the federal state of Upper Austria and the federal state of Styria, represented by SFG. Parts of this work have been supported by the Austrian Science Fund (FWF), projects P 30747-N32 and P 33019-N. The data that support the findings of this study are available from the corresponding author upon reasonable request.