Multi-strata subsurface laser die singulation to enable defect-free ultra-thin stacked memory dies

We report the extension of multi-strata subsurface infrared (1.342 µ m) pulsed laser die singulation to the fabrication of defect-free ultra-thin stacked memory dies. We exploit the multi-strata interactions between generated thermal shockwaves and the preceding high dislocation density layers formed to initiate crack fractures that separate the individual dies from within the interior of the die. We show that optimized inter-strata distances between the high dislocation density layers together with e ff ective laser energy dose can be used to compensate for the high backside reflectance (up to ∼ 82%) wafers. This work has successfully demonstrated defect-free eight die stacks of 25 µ m thick mechanically functional and 46 µ m thick electrically functional memory dies. C 2015 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License.

We report the extension of multi-strata subsurface infrared (1.342 µm) pulsed laser die singulation to the fabrication of defect-free ultra-thin stacked memory dies. We exploit the multi-strata interactions between generated thermal shockwaves and the preceding high dislocation density layers formed to initiate crack fractures that separate the individual dies from within the interior of the die. We show that optimized inter-strata distances between the high dislocation density layers together with effective laser energy dose can be used to compensate for the high backside reflectance (up to ∼ 82%) wafers. This work has successfully demonstrated defect-free eight die stacks of 25 µm thick mechanically functional and 46 µm thick electrically functional memory dies. C 2015 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License. Memory packaging is trending toward smaller form factor, increased heterogeneity, and higher performance. One method to address these challenges is to stack ultra-thin, high performance memory dies. However, ultra-thin dies generate higher sensitivity assembly interactions, which result in increased defect modes (e.g., chipping, die sidewall damage, microcracks) and decreased quality characteristics (e.g., die strength, kerf geometry, reliability). Advanced mechanical dicing or laser ablation dicing or combinations thereof (hybrid or sequential) based on dicing after grinding (DAG) or dicing before grinding (DBG) integration have been developed to help address the associated challenges. 1 Although they help, the fabrication of defect-free ultra-thin dies remains problematic because of surface perturbations (frontside and/or backside damage to the wafer depending on the integration approach) due to mechanical interactions or direct laser ablation with Si.
Stealth dicing (SD), a subsurface nanosecond pulsed, permeable laser die singulation technology, 2-6 offers a potential solution. The principal one SD-layer method involves laser-induced "perforation" within the bulk Si, followed by fracture mechanics to physically "cleave out" the individual dies from within. SD avoids ablation defects from the use of conventional laser dicing operating at wavelengths that are highly absorbed by the materials to be diced. Those who have conducted experimental SD work have assessed processing quality based on photodiode characteristics, 3,4,7 demonstrated the importance of focal plane depth, 4 and reported SD-related defects, 7 die strength, 7,8 and stress distribution analysis. 7 Most experiments have been performed to realize singulated 50 µm thick Si die using the SD After Backgrinding (SDAG) approach. 3,4,7 Attempts have also been made to apply SD on 100 µm thick through-silicon via (TSV) wafers, 9 subsurface machining of transparent materials, 10 and to enable MEMS. 11 However, little work has been reported on enabling SD on high backside reflectance wafers where the amount of SD energy coupling into the wafer is extremely limited. In this letter, we report and demonstrate a multi-strata SD process on high backside reflectance (up to ∼ 82%) wafers for the fabrication of defect-free eight die stacks of 25 µm thick and 46 µm thick NAND memory dies. This is achieved by exploiting the multi-strata interactions between generated thermal shockwaves and the preceding multi-SD layers by optimizing the inter-strata distances, the effective SD laser dose, and the number of SD layers. Fig. 1 shows a newly created partial-stealth dicing before grinding (p-SDBG) integration flow as a result of this work, spanning from frontside taping of the wafer to the final framed diced, ultra-thin 300 mm diameter wafer staged inside a specialty carrier. The three key processing modules involved in the p-SDBG integration flow are the SD module, the integrated wafer backgrinding module, and the die separation (DDS) module. Also, in Fig. 1, a representative multi-strata SD process is illustrated. The SD layer focal z-height, Z SDi measures the vertical distance from the frontside of the wafer (facing down) to the z-focal plane of the SD laser, which is incident on the backside of the wafer. The SD layer height, T SDi measures the vertical height of the SD-induced "dislocation belt" layer formed as a result of the SD laser scanning across the wafer.
For the SD laser source, a 90 kHz, 1.342 µm near-infrared wavelength pulsed laser is used. Multi-strata SD layers within the wafer are defined by translating the chuck table relative to the position of the laser with scanning speeds ranging from 50 to 900 mm/s and z-height focal points positioned from 25 to 200 µm, as measured from the wafer frontside surface. An integrated measurement laser operates at a near infrared wavelength of 830±20 nm and is primarily used to detect the backside surface of the wafer in-situ during dicing in order to account for undesired wafer warpage effects on the z-height focal point positioning of SD scanning. A combination of precise z-direction spatial offset between the measurement laser's focal spot and the SD laser's focal spot, coupled with a calibrated displacement versus measured photodiode voltage curve (generated due to the reflection of the measurement laser beam from the wafer backside), allows not only wafer warpage compensation to the SD scanning line but also facilitates the "deep trace" capability of this integrated tool, i.e., the ability to form SD-layers deep within the wafer, beyond 500 µm, by ensuring that the operating regime for the warpage compensation stays linear. Wafers used for the SD experiments were not backgrinded beforehand, and thus come in their original full thicknesses (775 µm) with dicing tape laminated on their frontside. The SD laser is incident on the wafer backside to avoid metallized frontside test element group (TEG) structures along the dicing streets that would block laser radiation. As a result, the SD laser is subjected to the challenges associated with high backside reflectance R from the wafers. Three types of 300 mm Si substrates are used for this work: patterned 2-D NAND memory wafers with measured absolute R of 13.4% (wafer A), 65% (wafer B), and 82.3% (wafer C) at the laser's operating wavelength. The p-SDBG approach is used to fabricate the ultra-thin memory dies for subsequent die stacking and wirebonding.
When the tightly-focused nanosecond SD laser pulse permeates through the Si wafer (without ablating the backside surface) and exceeds a peak power density (typically more than 100 MW/cm 2 ) during the condensing process, a highly nonlinear absorption effect occurs at the focal point due to the interactions between the Si medium and the laser field. 2,4 A localized temperature field larger than 1000 K within the vicinity of the focal spot is established within nanoseconds. As a result, at the focal point vicinity, a void ∼ 1-3 µm in size is formed due to the melting and vaporization of Si. Thereafter, a high dislocation density is generated due to the thermal shock wave produced upwards from the focal point because the absorption coefficient increases non-linearly with the increasing temperature. 12 As the SD laser scans in the horizontal direction, a dislocation "belt" layer known as the SD layer is formed. Fig. 2 plots the mean SD layer height T SD1 as a function of laser scanning speed, v (50 to 900 mm/s) for different R, with selected insets illustrating microstructural and dimensional transitions. It can be seen that as v increases from 50 mm/s to 900 mm/s, T SD1 decreases non-linearly across all wafer technologies. In addition, it is found that at significantly lower effective energy doses (achieved with higher v and higher R), the SD layer becomes less dense with dislocation damage. For example, at a laser average power of 2.0 W (PLE = 22.2 µJ), a sub-optimal "fishbone" SD layer microstructure arises at v = 900 mm/s. When comparing across wafer technologies, C wafers exhibit a globally lower T SD1 than A/B wafers because of a higher R that limits the effective dose from entering the Si wafer from its backside. For the A/B wafers, despite scanning at 900 mm/s with a lower laser average power of 1.7 W (PLE = 18.8 µJ), a clear transition to the "fishbone" microstructure is not immediately obvious. As a result, for lower R wafers, the optimal v can be set at higher values, i.e., 700 mm/s for A/B instead of 500 mm/s for C, and thereby improving the SD throughput time. These optimal speeds for a set of given conditions are extracted not only qualitatively from microstructural observations, but also from the non-linear dependency plotted in Fig. 2 where T SD1 starts to plateau beyond a certain scan speed. The plateauing of T SD1 can be explained by the fact that as irradiation pulses separate further and further from one another as scan speed increases, it reaches a point where no overlap of individual irradiation pulses begins to occur. When this happens, T SD1 remains similar because the effective dose becomes a constant thereafter. One can expect the "fishbone" structure to emerge when the vertical microcracks stabilize in size while T SD1 starts to decrease and plateau off. Fig. 3 plots T SD1 as a function of laser average power (1.0 W to 2.2 W, i.e., PLE from 11.1 µJ to 24.4 µJ) for wafers with different R. For the C wafer (R = 82%), two passes of SD processing were necessary in order to facilitate manual separation using the scribe and break technique for cross-sectional inspection. The results in Fig. 3 show qualitative and quantitative evidence that as LPE increases, T SD1 increases non-linearly. At lower effective energy doses (achieved with lower LPE and higher R), the SD layer becomes less dense with dislocation damage and vertical microcracks becoming more prominent. For example, at a laser average power of 1.0 W (PLE = 11.1 µJ), the suboptimal "fishbone" SD layer arises at v = 500 mm/s. Similar to the results in Fig. 2, when comparing across wafers, C wafers have a generally lower T SD1 than A/B wafers technologies because of its higher R. For the A/B wafers, despite a low laser average power of 1.0 W (PLE = 11.1 µJ) with a higher scan speed at 700 mm/s, there is no obvious transition to the "fishbone" microstructure. Therefore, for lower R values, the optimal PLE can be set at a lower value, i.e., 1.7 W for A/B wafers instead of 2.0 W for C, and thereby improving laser lifetime (cost of ownership) for SD processing. In addition to qualitative observations, the optimal PLE for a given SD condition can also be validated from the non-linear plot shown in Fig. 3. From Fig. 3, it can be seen that T SD1 starts to increase as PLE increases but begins to plateau beyond a certain point, thus resembling a sigmoidal curve (this is more apparent for A/B wafers given the PLE sweeping range). The decreasing sensitivity of T SD1 to PLE as the latter increases reinforces the need to fully comprehend the minimal "safety" T SD1 (or T SDi if using multi-passes) to initiate crack fracture without unnecessarily high PLE. It is clear from Figs. 2 and 3 that T SD1 can be well-controlled by using different PLEs in combination with different scanning speeds, with optimal conditions usually close to the vicinity of the rising edges shown in Fig. 3. Fig. 4 shows cross-sectional optical micrographs of the developed three-strata SD process for C wafers (highest R so that the process can also encompass A/B wafers, i.e., those with lower R values) before backgrinding. The inset of Fig. 4 shows a magnified optical image of the boxed region demonstrating well controlled definition of the three SD layers, SD1-SD3, with no undesired defects (e.g., frontside ablation, interference, cleavage defects). Additionally, a total of 22 runs with two SD-processed C-type wafers per run over a period of ∼ 2 weeks were performed to characterize the run-to-run (RtR) and within wafer (WIW) variation of the developed three-strata SD process. It was found that all three SD layer heights have a well-controlled grand mean of ∼ 19 -20 µm with a RtR mean variability (one-sigma) of ∼ 1.3 -1.4 µm. As for the WIW variation, all three SD layer heights have a grand mean of ∼ 1.4 -2.3 µm with a variability (one-sigma) of ∼ 0.6 -0.4 µm. At the same time, it was found that the SD layer focal plane z-height for SD1, SD2, and SD3 layers have respective well-controlled grand means of 69 µm, 115 µm, and 158 µm. Its RtR mean variability (one-sigma) ranges between 3.2 -4.0 µm. As for the WIW variation, all three SD layer heights have a grand mean of ∼ 1.9 -3.4 µm with a variability (one-sigma) of ∼ 0.6 -1.3 µm. These values demonstrate the potential for SD technology and p-SDBG to enable controlled fabrication of thinned, singulated die measuring 25 µm and below in thickness, because the size and the positioning of the SD "damaged" layers within Si has a very low variation; much lower than that of backgrinding.
At the same time, Fig. 5 shows top view optical micrographs of the frontside surface of SD-singulated dies with well defined, high quality SD kerfs (identified by the arrows) initiated along the dicing streets regardless of the presence of complex TEG structures. There are no signs of kerf geometric defects such as kerf width, kerf loss, kerf perpendicularity, and kerf straightness issues when using the developed three-strata SD process. Kerf width measures about 2 µm wide on average with near zero kerf loss observed as expected. Post-SD, the static loading from backgrinding will "finish the job" of full kerf separation of individual dies, which was originally initiated by frontside directing crack fractures initiated from "within" due to the multi-strata interactions between generated thermal shockwaves and the preceding SD-layers formed as the laser scans horizontally. Finally, Fig. 6 shows SEM micrographs of SDBG-integrated defect-free memory dies progressively stacked with single-sided bonding pads using (a) two four-die blocks and (b) one eight-die block. The respective insets show magnified SEM images to illustrate the integrity of the FIG. 5. Top view optical micrographs of frontside surface of singulated dies on a full 300 mm memory wafer with well-defined SD kerfs (identified by the arrows) initiated along the dicing streets, even across complex TEG structures. There are no signs of kerf geometric defects such as kerf width, kerf loss, kerf perpendicularity, and kerf straightness issues. defect-free sidewalls/edges and the flush profile across the 25 and 46 µm thick die to the 10 µm thick DAF, both of which are characteristics enabled by an optimal SD process and SDBG integration flow.