Nonlinear Optimal Filter Technique For Analyzing Energy Depositions In TES Sensors Driven Into Saturation

We present a detailed thermal and electrical model of superconducting transition edge sensors (TESs) connected to quasiparticle (qp) traps, such as the W TESs connected to Al qp traps used for CDMS (Cryogenic Dark Matter Search) Ge and Si detectors. We show that this improved model, together with a straightforward time-domain optimal filter, can be used to analyze pulses well into the nonlinear saturation region and reconstruct absorbed energies with optimal energy resolution.


I. MODEL OF PHONON SENSOR
CDMS (Cryogenic Dark Matter Search) relies on superconducting W transition-edge sensors (TESs) connected to Al collector fins to measure energy deposited as hot phonons in Si and Ge substrates by potential dark matter collisions. 1 For a voltage-biased TES, small changes in temperature yield measurable changes in current. Good energy resolution requires small TESs, but finite cross-section to rare particle interactions requires large detectors. To bridge these competing design criteria, CDMS uses an array of superconducting Al fins coupled to 2-µm-wide W-TESs. These are all wired in parallel so that all have the same voltage bias, and those closest to the event location are driven into saturation even for small energy depositions. In these detectors, phonons created by an event propagate to the detector surface where some break Cooper pairs in the Al, forming quasiparticles (qp). The qp's diffuse to an Al-W interface where they are trapped in the lower gap W, and heat the W electrons. However, qp's that are trapped in the Al in local gap variations do not reach the Al-W interface and their energy is lost. The model described in this paper was used to analyze data from a recent study of the energy collection in CDMS-style W/Al QETs (Quasi-particle Trap Assisted Electrothermal Feedback Transition Edge Sensors) by Yen 2 where collimated 2.62 keV Cl K α x-rays were used to study the energy response of square W-TESs (250 µm on a side) at the ends of 300 nm-thick Al films of different lengths on Si substrates. These TESs were designed to operate in saturation because the reduced heat capacity gives a better theoretical energy resolution than larger devices designed to operate in their linear regimes (see Eqs. (12) and (14)).
In typical voltage-biased operation, a TES is held in its superconducting transition using negative feedback, whereby Joule heating balances the TES power loss to the much colder substrate (defined by κ and n below): In the case of W below 100 mK, the limiting energy loss mechanism is electron-phonon coupling, with n=5. 3 Negative feedback also speeds up the return of a perturbed TES to its quiescent state. The characteristic recovery time for electrothermal feedback (ETF) in a TES is: where τ 0 is the intrinsic thermal time constant C /G and α ≡ ∂(logR) ∂(logT ) = T R ∂R ∂T is the unitless steepness parameter for the slope of the resistive transition. For small energy depositions, and near-constant T TES , the energy deposited in the TES is simply the decrease in Joule heating integrated over the pulse. In practice, such estimates are systematically low for pulses that span a significant portion of the transition region. Additionally, integrating the pulse results in substantially worse energy resolution than any filter technique where the spectrum of the noise versus that of the pulse is taken into account. For this nonlinear and in principle non-stationary problem, template matching to simulated pulses provides the optimal filter. 4

II. MODEL OF QET DEVICE
The earliest model of Al-W qp devices assumed a uniform sheet of current flowing from the Al to the W TES, which then warmed as a single lump element. Results of this model for small energy depositions are shown in Fig. 1(a) along with data from an actual device. In 2005, Pyle 5 showed that sharp initial spikes observed in real data were a result of the fast but non-instantaneous conduction of heat across the W-TES. In his revised model, the TES was divided into strips along its length and the thermal conduction between strips was found using the measured TES normal-state resistance and the Weidemann-Franz Law. The revised model was better, but it still did not reproduce pulse decays accurately (see Fig. 1(b)). Further improvements, described here, were made after SEM data 2 showed that, due to step-coverage issues, the 40 nm-thick W-TESs in many of our devices were connected to their adjoining 300 nm-thick Al films by W filaments alone (cross-sectional area constricted to ∼2.7% for the devices studied here). Such film constrictions increase the local current density and keep that region from becoming superconducting at temperatures well below the intrinsic T c of the film. This effect creates a small normal region that acts as a heater and allows the TES to lie below the steep part of its transition without going fully superconducting. The resulting  reduction in ETF leads to increased pulse decay times. Figure 1(c) shows that this model fits our experimental data well. Because pulse shapes are linked to device temperature, energy reconstructions are potentially sensitive to the heat capacity of the W-TES. Below, we adopt a BCS-like 6 model for the normal and superconducting states: In the normal state, γ = 0.85 meV/mK 2 /µm 3 for single-phase, bulk W. 6 In order to achieve a linear energy scale after matching data, we use about half of this γ value. The discrepancy is likely due to the polycrystalline properties of our sputtered films. The constant a that sets the scale for C s is computed from Ginzburg-Landau theory while holding the wave number constant through the transition and minimizing the free energy, 7 yielding C s (T c ) = 2.43C n (T c ). To avoid a discontinuity which clearly does not appear in the data, we adopt a two-fluid model. Taking the normal fraction f n to be some function of the resistance, we have: In a uniform-current, large-device approximation, e.g., a vortex-induced resistance model, 8 More complicated forms for f n can also be used. 9 We find that energy reconstructions are relatively insensitive to the shape of f n as long as other device parameters are fit to data after that choice is made.

III. TEMPLATE MATCHING
When matching a signal S i to a series of energy templates T i, j (in this case representing energies E j ) a standard procedure is to minimize χ 2 : where σ i, j is the expected rms noise at each template point. In a typical TES system, inherent noise sources include Johnson noise where f is the inverse of twice the sampling rate) and thermal fluctuations in the link to the thermal bath (P rms =  4k B T 2 g f , where g ≡ dP dT = nκT n−1 ). For our voltage-biased TESs the output is measured as a current, so I = V /R. We construct two independent noise terms: Here C e is the TES electron heat capacity, α ≡ T R ∂R ∂T and ∆t is the sampling rate. Although many recent efforts have been made to empirically map out R(T, I) for superconducting transitions, we use here a model 10 motivated by Ginzburg-Landau theory. for a TES with normal resistance R n , critical temperature T c and 10-90% transition width T w . The constant A denotes I c T 3/2 c , the strength of the suppression of T c by non-zero current density in the film.

IV. NON-STATIONARY NOISE
Defining a variance σ 2 i = ⟨(S i − T i ) 2 ⟩ is sufficient if the noise varies so rapidly as to be uncorrelated between measurements. But the possibility of large thermal fluctuations that dissipate on a time-scale τ ETF ≫ ∆t calls for a covariance matrix with its goodness of fit metric: where the weighting matrix W ∝ 1/σ 2 is the inverse of the covariance matrix Σ 2 . In the same way that rms noise from different sources are added in quadrature, covariance matrices from different sources, e.g., TES and amplifier, can be added linearly (Σ 2 =  i Σ 2 i ). In principle, each element of the simulation could have an independently calculated covariance matrix, but once a simulator with the relevant physics and noise terms is created, it is computationally less costly to make a noiseless template T i, j , add a few thousand noisy pulses S i, j,k at each of a comb of energies E j , and calculate the weighting matrices W by Monte-Carlo. The (i,j,k) indexes represent time bin, input energy and pulse number, respectively. Since the energy of real pulses will fall between energies on the comb, we minimize χ 2 by parabolic or third-order fitting to χ 2 (E). Figure 2 shows covariance matrices for small, medium and large event energy templates. As expected, the 0 eV template is diagonal (i.e. stationary) with a width of ∼100 µsec ≈ τ ETF . As the pulse approaches saturation (middle pane), the diagonal is suppressed, although too little to see in the figure. This is the quantity that would be used in a traditional χ 2 calculation. Most strikingly, during saturation (right pane) the off-diagonal elements are suppressed. Off the transition curve,  power fluctuations have negligible coupling to the current readout, so correlations on the scale of τ ETF are essentially absent. This feature shows the extent to which non-stationary noise matters for a saturated TES.

V. ENERGY RESOLUTION
Models of TES energy resolution are well known. 11 Here we adopt a small-signal model by Irwin, 12 applied specifically to our two regimes of interest: where T 0 and C 0 refer to the temperature and heat capacity of the device in its quiescent state. The saturation energy E sat can be estimated as the maximum energy removed by ETF in one time constant. Combining Equations (1) and (2): Equation (12) is only valid for small pulses under the quasi-equilibrium assumption that the device has a single temperature at quiescence. At some point, the energy resolution is limited by the ability of ETF to cool the TES. For E > E sat , we replace E sat in Eq. (12) with E absorbed , giving: Figure 3 shows the energy resolution achieved by the methods described above when attempting to reconstruct the energy of simulated pulses with the irreducible noise in Equations (7) and (8) but no amplifier or environmental noise. We used a 1-D device model for conduction across the TES. The black line marks the theoretical best possible resolution. For the integral method, FIG. 3. FWHM energy resolution of covariance χ 2 (black dots), standard χ 2 (red triangles) and integral method (blue circles) reconstructions of 1024 simulated pulses for: (left) ideal link from Al-W qp trap to W-TES of Fig. 1(b), and (right) weak-link model of Fig. 1(c). The black line shows the calculated theoretical noise limit. the resolution was scaled-up as if we had adjusted real pulse integrals for a known energy loss computed from the model. For perfect connection (left pane) the model slightly outperforms the theory in the small-pulse limit. This improvement is a reflection of the fact that for these parameters, almost no heat reaches the part of the TES farthest from the Al film, effectively reducing the volume of the W.
To process data through the optimal filter in a reasonable time, both real and simulated pulses are reduced from 4096 to 256 time bins and weighting matrices are 256 bins square. Deviation of the optimal filter performance from theory at high energies is likely due to loss of high frequency information in the down-selection process, which limits our ability to detect the end of the saturation region. In the weak-link model (Fig. 3, right), the increased current density in the link can drive it normal even in the quiescent state. This ∼ 0.2 Ω normal section creates a quiescent temperature gradient across the TES. The excess heat dumped into the TES where it should have the most suppressed Joule heating after an event degrades the ETF. This is especially damaging for small pulse reconstructions where peak shape is important. The extra heat also allows a TES to operate well below the linear region of its transition curve without going fully superconducting. The reduced transition steepness, α, reduces the effect of P rms on the integral method (Eq. (8)), although this effect is partially offset by the increased energy loss, particularly at low energies.

VI. ENERGY SCALE
When an x-ray strikes a metal film, a fraction of the energy gets deposited in the electron system, as modeled in detail by Kozorezov,13 with the remainder being lost to phonons. Figure 4 shows the energy distribution of Cl K α and K β x-rays incident directly on the W-TES films of Yen, et al. 2 using integral (left) and optimal filter (right) energy reconstructions. For each method, the correct location of the K β peak relative to the center of the K α peak is marked with a vertical line. It is apparent that the optimal filter correctly separates the two x-ray peaks while the integral method does not. The 57% energy recovery reconstructed by the optimal filter method is consistent with the 51% deposition into our electron system predicted by Kozorezov plus a modest amount of phonon reabsorption from our comparatively thin substrates. On the arbitrary energy scale of the integral reconstruction both peaks appear narrower than when optimally filtered, but when adjusted to their respective local energy scales (by assuming 200 eV separation between K α and K β ), the optimal filter wins out by nearly a factor of two.
In principle, the optimal filtering method can reconstruct 1.5 keV pulses with TES-dominated noise to 1 eV precision. Clearly these experiments have not reached the threshold where dealing with non-stationary TES noise matters. We believe the improved resolution seen with the optimal filter in Figure 4 is due more to proper removal of environmental noise a la Fixsen 4 than TES physics. For now the features seen in Figure 2 are wiped out when a covariance matrix due to environmental noise, derived from the TES-normal noise spectrum using Equation (10), is added. Even so, the covariant approach to template matching shown here is a powerful tool that has improved our understanding of TESs.

VII. CONCLUSION
We have shown that our TES weak-link model captures the relevant physics governing TES behavior and produces good fits to observed data. Matching templates from this model to real data using a time-domain optimal filter yields significantly improved energy linearity and event energy reconstructions for real data.
This analysis technique allows us to characterize the performance of actual CDMS style detectors. For the CDMS array of 2 µm wide TESs in parallel, the connections to the ends of the TESs are typically 40 µm wide. A waterfall constriction of the type that we measure in these test devices would still carry more than half of the critical current of the 2 µm wide section. So the test devices are more than an order of magnitude more sensitive to waterfall defects than actual detectors, allowing us to monitor fabrication integrity and easily catch defect levels that do impact detector performance.