Scanning probe image wizard : A toolbox for automated scanning probe microscopy data analysis

We describe SPIW (scanning probe image wizard), a new image processing toolbox for SPM (scanning probe microscope) images. SPIW can be used to automate many aspects of SPM data analysis, even for images with surface contamination and step edges present. Specialised routines are available for images with atomic or molecular resolution to improve image visualisation and generate statistical data on surface structure.


I. INTRODUCTION
Analysis of scanning probe microscope (SPM) data, a standard tool for investigating nanoscale surface structure in real space, can be a very time consuming task.A huge portion of researcher time is invested in manual data analysis, often in multiple software packages, or into writing custom analysis scripts.Considering, also, the considerable time taken to perform SPM experiments, we believe that automation of both data collection and analysis is of high priority.
While software packages, including Gwyddion, 1 WSXM, 2 and SPIP, 3 are available for manipulating SPM images, such packages require a user to decide how to process and extract statistics from the data.This can consume a large portion of a SPM researcher's time.Time can be saved by batch processing (for example, the Gwiddion libraries can be accessed through gwybatch), such as subtracting a fitted plane from all images and exporting to a suitable image file.While this greatly improves the speed of processing for sets of similar images, it still requires time to manually sort the images, and decide on the processing needed.In addition, as the libraries were designed with human interaction in mind, only a limited amount of batch processing is possible.
Often further data analysis is needed to extract the desired information from the image.This ranges from measuring lattice constants or step heights, to more complicated feature location, counting, and measuring.The standard SPM processing software mentioned above has little support for such analysis, instead concentrating on plane subtraction, filtering, and basic roughness statistics. 4Some support for this analysis is available in software such as ImageJ. 5ImageJ, however, is designed for conventional optical images and electron microscopy images, and thus many SPM specific analysis functions are not natively supported.Due to this limitation a great deal of SPM data analysis is performed by purpose written scripts, 6,7 or even manual counting and masking in conventional image manipulation software.It is difficult to estimate a) Electronic mail: ppxjs1@nottingham.ac.uk the researcher-time wasted on avoidable manual processing or on duplicated script functionality, when so many groups write their own analysis code.
We present Scanning Probe Image Wizard (SPIW), 8 a new open source software toolbox built entirely around the concept of automated scanning probe data processing.SPIW is written as a MATLAB toolbox, allowing the user to easily combine standard SPM image processing functions with new feature-locating functions designed specifically for SPM images.For more complicated or specialised analysis it is possible for researchers to combine SPIW functionality with their own code as well as with any of the great range data processing operations already included in MATLAB.

II. OVERVIEW OF CAPABILITIES
SPIW was originally written as part of a wider project to fully automate scanning probe experiments.This project combines SPIW image analysis with machine learning techniques to successfully automate STM tip conditioning.Initial experiments in ambient conditions with highly oriented pyrolytic graphite (HOPG) samples 9 relied heavily on prior knowledge of the expected images.Moreover, as STM images of HOPG often result from the sliding of graphite layers, 10 this causes an averaging effect and thus step edges, lattice defects, and contamination are rarely seen.
The STM automation project has since moved to ultra high vacuum (UHV) conditions, with a Si(111) 7 × 7 surface.In such conditions, the image analysis must reliably recognise step edges with flat terraces, process images accurately in the presence of contamination, and identify atomic resolution even on areas of the surface with a high defect density.Figure 1 shows a series of images from an automated tip conditioning run on the Si(111) 7 × 7 surface.A video of a full optimisation run has been included in the supplementary material. 11s no human is present during the automation process, the image analysis must work autonomously with a wide range of images.Successful tip conditioning was achieved with no specific information of the surface reconstruction and without target images.Below we provide detailed explanation of the most important of SPIW's capabilities.

A. Adaptive masking and flattening
Raw SPM images show the topography traced by the probe.As the height of the features in an image are generally much smaller than the width or length of the image, a very small sample tilt can result in an image where features are very difficult to recognise (Figure 2(a)).Most SPM software avoids this problem by line fitting and subtracting from each line of data in the fast scan direction, which we will refer to as line-by-line fitting.Although this allows the user to see structure more clearly, large surface features such as contamination and adsorbates can have a strong effect on only certain lines, causing previously flat areas to become bowed (Figure 2(b)).Final processed images are usually planefitted, to provide a realistic impression of the scanned surface (Figure 2(c)).Certain scanners (such as tube scanners) can exhibit a bowed motion.To correct for this, one can subtract second, or higher, order polynomial planes.Again, large surface features affect the plane fitting algorithm, in this case causing the surface to remain tilted or (for higher order planes) to even become distorted.As such, the user must mask large features from the surface before fitting.Erickson et al. 12 have produced an automatic method of adaptive thresholding to produce masks and then used second order polynomials planes to flatten images. 13This method does not translate well to images with atomic scale surface corrugation or molecular networks.SPIW offers similar capabilities, but it also offers more powerful methods in the case of these corrugated surfaces.The method involves locating every atom/molecule on the surface, via the methods described in Sec.II C. By comparing the median maxima and minima of the surface the corrugation height can be calculated.High and low areas are defined as any part of the surface which is a user defined fraction of the corrugation height above/below the median maxima/minima height Figure 3(a)).These pixels are added to a mask (Figures 3(c) and 3(e)), and will not be included in the plane fitting.Before the plane fitting algorithm is executed, the mask is processed to remove any small areas which can arise from artifacts such as feedback instabilities.
The full flattening procedure is as follows.The image is first flattened with a first order polynomial plane (as the distortions of higher polynomials are undesirable).Next a mask is produced using the method described above, and the surface is again flattened, ignoring any masked pixels.This process can be iterated until the mask does not change within a given tolerance (Figures 3(b

)-3(e)).
A final improvement to the flattening can optionally be applied.In this method, we fit a second order polynomial plane through just the surface maxima which are not inside the masked region (Figures 3(f) and 3(g)).This removes the effect of scanner bow, without less densely packed areas of the surface appearing lower and thus distorting the final image.

B. Step edge finding
Other features which commonly appear in high resolution SPM images are step edges (Figure 4).Step edges pose problems for both flattening routines and for generating statistics about images.SPIW detects step edges using a Sobel filtering to calculate the square magnitude of the pixel height gradient. 14These areas are thresholded with respect to the mean square gradient, to create masks of high gradient regions.The subsequent masks are thinned to single pixel lines.Further processing consisting of hole filling and dilation followed by re-thinning to single pixel.This improves the continuity of the single pixel mask along the step edge.
Once steps have been located, they can be taken into account during flattening by using a specially designed planeflattening routine.The routine does not fit the whole image, instead it carries out line fits to each line separately, as in lineby-line fitting.If a line is broken by a step, then each line segment is fitted separately.This is repeated for both the fast and slow scan directions.A weighted average of all gradients in each direction is used to produce a first order polynomial plane.As no line segment contains a step, the step does not affect the calculated gradients, leaving correctly flattened images.The advantage of this method over defining a plane from three points in the image, a feature available in most SPM software, is that this be applied automatically, rather than using manually selected points.
Locating the positions of step edges opens up another opportunity for automated image processing, as the image has now been divided into terraces.SPIW can be set to divide the image ordering terraces by size, and removing terraces smaller than a set area.These terraces then can be flattened and processed separately, to give statistics specific to each terrace.

C. Atom/molecule recognition
Locating the position of surface features such as atoms and molecules is an essential part of many of the SPIW routines.For images with atomic or molecular resolution the process of locating the molecule is relatively simple.The image is first filtered using a 2D Gaussian kernel.The aim of this filtering is to remove white noise, not to significantly alter the image.As such, the default Gaussian kernel has a standard deviation of just one pixel width.After this, local maxima in the filtered image are used as a first approximation of atom/molecule positions.Local minima can also be located as they are required for certain functions such as calculating corrugation heights.Both lists of points can be improved by removing any points which fall within a masked region.To accurately resolve atoms/molecules the peak to peak separation should be 5 pixels.For images with a low signal to noise ratio the size of the Gaussian kernel may need to be increased for better results.
Fitting of peaks is not used to improve the accuracy of the atomic positions, as this was found to considerably increase the time to process images for no measurable improvement.Moreover, fitting algorithms were found to regularly fail to provide a good fit when image features overlap, causing a decrease in accuracy.
Further properties of the features can be analysed, such as the shape and the area.This is done by looping though all maxima, and comparing to their closest local minima.A local section of the image is then masked at some fraction of the minima-to-maxima height (Figure 5).The user has control of both the height threshold and the local area size, but both are  related to image features rather than set values to improve the applicability of the routine to multiple surfaces.Any masked feature which is not entirely contained in the local image is removed from the statistics.Thus, badly resolved molecules or spuriously defined points do not affect the final statistics.Features too close to the edge of the image are also not included as they may overlap the edge of the image, which would distort the statistics.

D. Generating image statistics
Section II C touched upon SPIW's ability to generate image statistics.With SPIW it is possible to generate statistics only for sections of the image defined by masks.Thus contamination, surface defects, and steps have minimal effect on the final results.
Lattice periodicity and step heights can also be measured automatically and used to calibrate images.Step edge heights can be measured using a function which fits Gaussian functions to each terrace identified in a histogram of pixel heights.For example, for Figure 4(c), the mean step height detected was 2.67 Å, giving a calibration factor of 1.17, as the step height for the Si(111) is 3.135 Å. Lattice periodicity can be measured without knowledge of the expected lattice structure by calculating the distance between each detected atom/molecule and its nearest neighbour.In the case of adatoms of the Si(111) 7 × 7 reconstruction, the closest paring is 6.71 Å across the divide between the faulted and unfaulted halves of the unit cell. 15SPIW measures an average closest distance of 6.88 Å for Figure 3(f), and 6.64 Å for Figure 6(c).These values not only have percentage error of less than 3%, the absolute error is also much smaller than the pixel width of 0.625 Å.This method was chosen over using a Fourier transform, as many surface structures produce a number of peaks in k-space, which are best analysed with specific routines for the expected structure.The relative intensities and clarity of Fourier peaks can also be affected significantly by contamination and defects.These problems are removed by directly using the atomic positions in real space.
RMS surface roughness (R RMS ) can be calculated as a simple standard deviation of the surface heights.As an example, the RMS roughnesses of the four detected terraces in Figure 4(b), from left to right are R RMS = 53 pm, 29 pm, 39 pm, and 58 pm, after z-calibration.Compare these values to R RMS = 353 pm for the whole image, or R RMS = 152 pm for the plane flattened image Figure 4(a).Surface corrugation, h c , can also be measured for atomic resolution images using the method explained in Sec.II A (see also Figure 3(a)).
Another benefit of writing SPIW in MATLAB is for more specialised statistics it is easy to pass data from areas located or masked in SPIW into the wide range of built-in MATLAB functions or home-written scripts.This can dramatically speed up script writing for very specialised image analysis not available "out of the box" in any software package.

E. Computer vision outputs
Generating image statistics is of little use without being able to verify that the image analysis they rely on is working correctly.With this in mind, SPIW is able to produce computer vision outputs which allow the user to see what features were recognised, the positions of steps or masks, and how the image was flattened.SPIW can easily be set to loop though

A. Step edges with atomic resolution
A particularly difficult test for any step edge locating routine is to find step edges in an image with atomic resolution.The problem arises from edge-finding techniques' use of gradients.Often the gradient from the atomic corrugations is as strong as the gradient at the step edge.SPIW has tools to create images where each pixel is the height of the nearest located atom.This image can then be fed into the step edge locating routine with excellent results (Figure 6).

B. Feature locating for molecular networks
This paper has concentrated on UHV STM images of Si(111) 7 × 7 as this is a key prototype.The same routines, however, apply equally well to a number of more complex surfaces.In Figure 7, we have used the same routines as for Figure 5 but on a liquid STM image of a quaterphenyl-tetracarboxylic acid and terphenyl benzene assembly on HOPG. 16The only changes were the size of the kernel used to generate the peak locations: the standard deviation was increased from 1 to 3 pixels due to the more complex shapes of the surface features.The results compare favourably to the results for masking atoms previously presented.

C. Known issues
SPM image processing presents a number of very specific image processing challenges.This is due to the process by which the image is acquired.Image artifacts which can arise from improper imaging parameters, such as feedback gains, can be difficult to separate from real surface features.Changes at the apex of the scanning probe can cause sudden changes in height and/or resolution in the middle of an image.Sample drift or piezoelectric creep from the scanners can cause distortions not only in the x-y plane but also in z.Images with periodic structure can be corrected in the x-y plane, 17 a method not currently implemented in SPIW.However, images dominated by such drift or creep in z are very difficult to process in SPIW as flattening the image is near impossible, and no tools exist to reliably correct such distortions automatically.Line-by-line fitting can give visually pleasing results, yet a combination of inherent distortion and added distortion from the fitting result in images which cannot be responsibly used for most purposes.SPIW can be used to output such images to alert the user to interesting features, yet the raw data will still need manual processing elsewhere.

IV. CONCLUSION
We have presented a number of tools from SPIW that can be used to automatically perform SPM data analysis.The tools are applicable to a wide range of SPM data sets, and can be used in numerous ways.From simply flattening SPM images and saving to image files which can be easily browsed for interesting data, to scripted routines which select only certain images to be processed and analysed statistically.SPIW, like all software projects, in an ongoing development.We hope that by releasing it as an open source project, SPM and image processing experts also can share their acquired knowledge to improve the toolbox for the benefit of the entire SPM community.

FIG. 1 .
FIG. 1.A sequence of images from automated STM tip conditioning on the Si(111) 7 × 7 surface, image widths are 128 nm for (a) and (b), and 32 nm for (c)-(e).(a) First scan shows an unstable tip.(b) Less than 7 min into the run, a flat area is detected despite, despite the presence of a step in the scan region.The automation algorithm zooms in for finer tuning.Poor quality atomic resolution is detected (c), as are steps in atomic resolution images (d).After less than 80 min good quality imaging is detected, despite surface contamination being present.SPIW algorithms are used to determine the surface structure and image quality after each image is taken.

FIG. 2 .
FIG. 2. (a) Raw STM image of Si(111) 7×7 reconstruction.(b) Line-byline flattening of the same image, resulting in distortion of the surface near contamination.(c) Iterative plane flattening (with masking) of same image using a SPIW algorithm.(Scale bars 6 nm.)

FIG. 3 .
FIG. 3. (a) 2D schematic of masking procedure.Maxima/minima are marked with red/blue points, their means by solid lines.h c is the calculated corrugation height, and m is the fraction of h c above/below which features are masked.(b) STM image of Si(111) 7×7 reconstruction flattened using a first order polynomial plane.(c) Resulting mask of high and low areas of (b), using surface corrugations to set threshold height.(d) Result of 5 iterations of flattening non-masked regions, and re-masking.(e) Processed mask of (d).(f) Result of second order polynomial flattening only unmasked peaks in (d).(g) Computer vision image of (f).Cyan points represent atoms, red/blue outlines high/low masked areas.Note that the image is now flat enough that all defects and corner holes are masked.(Scale bars 3 nm.)

FIG. 4 .
FIG. 4. (a) STM image of Si(111) step edges flattened using a first order polynomial plane, with computer vision overlay of located step edges.(b) Image flattened in SPIW with steps taken into account.(c) Histogram of pixel heights for image flattened with the SPIW step method (red), compared compared to first and second order polynomial plane methods (green and blue, respectively).z-heights not yet calibrated, see Sec.II D. (Scale bars 20 nm.)

FIG. 5 .
FIG. 5. (a) STM image of Si(111) 7 × 7 reconstruction flattened using SPIW mask and flatten routines.(b) Computer vision image of (a) with all wellresolved atoms masked for shape.(c) Zoom of boxed region of (b).(Scale bars 3 nm.) FIG. 6.(a) STM image of Si(111) step edge flattened using a first order polynomial plane, with computer vision overlay showing located atoms in cyan.(b) Image constructed such that each pixel height is equal to the height of the nearest located atom, with computer vision overlay of located step edge.(c) Image flattened in SPIW with step taken into account.(d) Histogram of pixel heights for image flattened with the SPIW step method (red), compared to first and second order polynomial plane methods (green and blue, respectively).(Scale bars 6 nm.) This article is copyrighted as indicated in the article.Reuse of AIP content is subject to the terms at: http://scitationnew.aip.org/termsconditions. Downloaded to IP: 82.23.162.189On: Mon, 17 Mar 2014 20:52:20 a large batch of images and save image files with the computer vision outputs along with the statistics.These computer vision outputs can be used to monitor script behaviour to ensure accuracy.All SPM images in this paper are examples of the possible outputs in SPIW.

FIG. 7 .
FIG. 7. (a) and (b) Liquid STM image of quaterphenyl-tetracarboxylic acid and terphenyl benzene assembly on HOPG.(c) and (d) Computer vision image of (a) and (b), respectively, with all well resolved molecules masked for shape.(e) and (f) Zoom of boxed region of (c) and (d), respectively.(Scale bars 10 nm.)