Supercell-core software: a useful tool to generate an optimal supercell for vertically stacked nanomaterials

Vertically oriented materials, such as van der Waals heterostructures, that have novel hybrid properties are crucial for fundamental scientific research and the design of new nano-devices. Currently, most available theoretical methods require applying a supercell approach with periodic boundary conditions to explore the electronic properties of such nanomaterials. Herein, we present supercell-core software, which provides a way to determine the supercell of non-commensurate lattices, in particular, van der Waals heterostructures. Although this approach is very common, most of the reported work still uses supercells that are constructed 'by hand' and on a temporary basis. The developed software is designed to facilitate finding and constructing optimised supercells (i.e., with small size and minimal strain accumulation in adjacent layers) of vertically stacked lattices.


I. INTRODUCTION
When a 2D crystal is placed on top of another layer (substrate) it can either adjust its position (e.g., by rotating to follow the periodic potential of the substrate), resulting in a commensurate state, or the layers can exhibit a small lattice mismatch, in which case, the interlayer binding energy (via weak van der Waals forces) can compensate for the increase in the elastic energy. Both of these occurrences may lead to experimentally observed superperiodic structures, which are commonly known as moiré patterns. Such moiré superlattices have been widely studied in the context of many different systems, including bilayer graphene 1-4 , trilayer graphene 5 , and bilayer black phosphorus 6 , as well as for many heterostructures, such as graphene on hexagonal boron nitride (hBN) 7,8 , or bilayer transition metal dichalcogenides (TMDC) 9,10 .
Hybrid structures consisting of different types of layers connected via weak van der Waals forces represent a new class of hybrid crystals known as van der Waals heterostructures 11 . These structures are of broad interest to many researchers throughout the world, due to the novel hybrid properties arising in these materials, which are distinct from their individual layer components [12][13][14] . Many of these properties can be precisely controlled by rotating the two stacked atomic layers with respect to one another 4,6,13 , highlighting the fact that manipulating this unique "twist angle" degree of freedom can allow for control of the nanoscale properties of such nanomaterials 13,15 .
The strain itself is the subject of a new research field in solid state physics, called straintronics 16 , in which strain engineering methods are used to develop next-generation devices for information, sensor, and energy-saving technologies. The strain-induced physical effects in nanolayers or heterostructures, such as changes in the band structure, or electronic, optical, or magnetic properties [17][18][19] , are fundamentally important. Understanding these relationships is a prerequisite for developing new technologies using such nanomaterials. The strain distribution can be investigated experimentally using a variety of experimental methods 8,20 , such as atomic force a) Electronic mail: Magdalena.Birowska@fuw.edu.pl microscopy (AFM), scanning tunnelling microscopy (STM), and/or Raman spectroscopy.
Proper modelling is required in order to better understand the phenomena and underlying physics behind the structureproperties relationships. The modelling of this type of nanomaterials mostly focuses on electronic band structure calculations, and one of the most accurate ab initio methods is based on density functional theory (DFT). Widely-used software packages, such as VASP 21,22 , Quantum Espresso 23 , and SIESTA 24 , are limited in terms of their supercell approaches, because only a few hundred atoms can be considered. However, stacking adjacent layers with different lattice parameters or different lateral crystal symmetries may allow appropriate modelling of superperiodicty in very large lateral cells, or detection of aperiodic structures. Incommensurate lattices are outside the scope of the present paper. One approach directed towards describing aperiodic layered structures can be found in reference 5 . In the current work, we discuss a developed approach based on commensurate lattices. We propose a supercell software capable of finding the optimal supercell, i.e., those with small size and low strain distribution. Specifically, we have developed a software package that searches through all possible superperiodicities arising from multiples of primary cells for a given rotation angle between the top and bottom 2D lattices. The software allows determination of the optimal "magic angles" between the adjacent vertically-stacked layers and the resulting moiré patterns.
There are few available builders that can handle the construction of supercells. To the best of our knowledge, those which are freely available, such as VESTA 25 , only enable ad hoc construction by hand, without finding the optimal supercell. The other capable programs, such as Quantum Wise 26 , which has an appropriately implemented builder, must be purchased. Herein, we present the package, supercell-core, which is free of charge and allows the user to find the optimal supercell for a given twist angle and "n" number of layers constituting the van der Waals structures. To the best of our knowledge, none of the above mentioned codes, calculate the strain distribution of adjacent layers for a particular angle between the layers.
In this article, we first present the mathematical background for the software, along with explanations of the applied algorithms and discussion of relevant technical details. Then, the code is briefly described, focusing on its functionalities. Finally, we present the practical applications of the software and provide several examples.

A. Mathematical description
Our methodology is based on a commensurability condition, which requires long-range order in the sets of lattice planes at the interfaces of n vertically stacked layers.
In order to formulate this condition mathematically, we can consider two planar lattices: A (bottom layer; substrate) and B (top layer), where lattice B is placed on top of lattice A. We denote the unit vectors of both lattices as a i and b i , respectively, where i ∈ {1, 2}. The task is to find two new vectors, c i , that are commensurate to both a i and b i 27 , such that, (1) where R θ is the 2D rotation matrix based on the rotation angle between layers, θ (Fig. 1a). Depending on the specific case, θ might be fixed, or it might be a free parameter. The angle, θ , is hereafter referred to as the twist angle.
In principle, there may be no set of integers, m i j , n i j , which satisfies both of the above equations, even if θ is treated as a free parameter (Fig. 1b). Thus, the commensurability condition is enforced by applying strain to the top layer. This corresponds to a linear transformation of the B lattice's elementary cell. Note that layer A (the substrate) is always unstrained. The new task is to find the linear transformation that introduces minimal strain into the system.
For now, let us consider a fixed value of θ . We denote the modified vectors of the rotated lattice B asb i . Then, we can define its strain tensor, ε, as a 2D matrix: The m i j coefficients can easily be calculated given the n i j and b j values. We must find a matrix,B, for every set of four numbers, n i j , such thatB(b i ) =b i . Then, from all sets of n i j , we choose the one with minimal strain. To quantify the strain, we define the norm L (1,1) (ε), which is equal to ∑ i j |ε i j |. Note that, to find a nearly unstrained supercell for a given value of θ , the software searches all possible values of n i j (up to some maximum value, defined by the max_el parameter in the code) in order to be certain that the optimal supercell is determined. However, this approach (hereafter referred to as the Direct algorithm), has a time complexity of O(N 4 ), where N is the upper limit of n i j that is considered. However, the problem of finding a nearly unstrained supercell can be greatly simplified. It is important to note that, when the strain is zero, then b i =b i (see Eq. 2). For every pair of n i1 , n i2 , it is possible to write independent linear equations for m i1 , m i2 : The "quality" of each n i j pair can be assessed by taking a measure of how far the corresponding m i j value is from an in- known as the Fast algorithm, effectively describes O(N 2 ) in time complexity, and gives the same results as a direct search for cases where an optimal supercell's strain is zero or near-zero. It can be used to find the moiré patterns or nearly unstrained unit cells. However, the Fast algorithm does not guarantee an optimal solution for cases where an optimised supercell is significantly strained. Our analysis indicates that both algorithms give the same results when the norm L (1,1) (ε) does not exceed a value of 0.015 (see Fig. 2). Thus, it guarantees that the strain calculated with the Fast algorithm is not overestimated. These problems trivially extend to cases where θ is a free parameter, by treating it as an additional degree of freedom when searching the parameter space. When considering multiple layers, B 1 , B 2 , etc., we can exploit the fact that all lattices must ultimately be commensurate, and we do not modify lattice A at all. Effectively, each layer can be fit to the same lattice A supercell. Therefore, we can repeat most of the above steps for each layer independently, and record the calculated "qualities" of every set of m i j values for each layer. To select the final supercell (which is determined by its m i j coefficients), we then use the criterion of the lowest sum of the norm L (1,1) (ε) for all B i layers.

B. Implementation
The code is implemented as a Python package using the Python numerical library, NumPy 28 . Every 2D crystal and every heterostructure is represented by a Python object. Optimisation of the strain for the given values of the twist angle, θ , for each layer is carried out by first preparing a list of all possible combinations of those values. Then, for each combination,

FIG. 2. Comparison of the Fast and Direct algorithms implemented
in the supercell-core software. The norm of the strain tensor is calculated for an optimal supercell for a phosphorene/graphene heterostructure as a function of the twist angle, θ . The red line indicates the value of L (1,1) (ε)=0.015, for which both algorithms give the same results. Note that, for the highly-strained lattices, the discrepancy between the algorithms can be substantial, and in those cases, use of the Direct algorithm is recommended.
an array of all possible vectors of the supercell in the substrate lattice vectors basis is prepared, which generates an N × N × 2 array, where each two-dimensional vector stored in that array is a grid point in the substrate lattice vector basis. The value of N describes the extent to which we check N = 2n + 1 (where n is the highest integer that can be accepted as the n i j coefficient). Certain basic symmetries are utilised to decrease the number of required calculations (decreasing the number of n i j values, e.g., the symmetry with respect to interchanges − → c 1 and − → c 1 , rotation about 180 • ).
In the Fast algorithm ("fast" option), the strain tensor is calculated for all possible c i vectors, and the L (1,1) (ε) is analysed to assess the quality of the specific configuration in a vectorised operation. Ultimately, for a given combination of interlayer angles, the configuration that gives the lowest strain is chosen. In the minimisation process, smaller, less narrow cells are treated preferentially. If the Direct algorithm is applied by the user, the procedure follows the mathematical description of our solution for cases in which the desired strain is near-zero (see Eq. (3)).
For drawings, the code uses the MatPlotLib Python library 28 . An optional dependency of the library is pandas 29 , which is applied if the user wants to log intermediate steps of the strain minimisation procedure.

III. DESCRIPTION OF THE CODE
The software is available in the official Python package index as supercell_core. Assuming that a distribution of Python software is installed, a user can download the package with the command: pip install supercell_core. The recommended way to use the package in a Python script, Python console, or Jupyter notebook is via an import supercell_core as sc statement. The source code is also available from 30 in the form of a Git repository. The repository contains examples files, described in the next section in the supercell_core/examples subdirectory, and a README file with usage examples and a link to online documentation. The package documentation is also provided interactively in the Python console via Python's built-in help function.
The software enables the user to find the optimal configuration for an arbitrary number of vertically stacked layers, and for any set of acceptable interlayer angles between them. The parameter, L max (A) = max |m i j |, can be used to control the trade-off between computational cost, the quality of the resulting supercell, and its size (max_el in the code, referrers to the substrate layer denoted as A). Supercell-core accepts definitions of the lattices either performed manually using the lattice function, or read from a VASP POSCAR file with the read_POSCAR command. For convenience, users are encouraged to use supercell-core with a Jupyter notebook or an interactive Python console.
The user can control the positions of the layers in the zdirection by changing the z-components of the unit cell vectors of the layers. Note, that the strain distribution is only calculated in lateral coordinates. This software stacks unit cells of the layers directly on top of each other. For example, if one layer with unit cell vectors (a, b, 0), (c, d, 0), (0, 0, h) contains an atom at position (x, y, z 1 ), and the layer directly above it contains an atom at (x, y, z 2 ), then the distance between these atoms in the calculated heterostructure will be h − z 1 + z 2 .
It is important to note that the inclusion of different stacking configurations in primary cells of the layers can be achieved by translating the atoms via a corresponding vector in one of the cells. The described approach constructs the supercell independent of the stacking configurations of the layers, since it is only based on the lattice vectors. Thus, the atomic positions are correspondingly scaled and translated in a new supercell.
Supercell optimisation and strain calculation methods can be performed on a Heterostructure object. Choices related to the optimisation algorithm, the maximum n i j value, and whether to store intermediate results for each combination of interlayer angles in a log, are conducted by specific arguments in the opt method. All results of these calculations are stored in Result objects. Saving results to XCrysDen XSF and/or VASP POSCAR files is possible using methods for the Lattice object. Every Result object contains a superlattice method, which returns a Lattice corresponding to the heterostructure (containing the correct positions of atoms in all layers of the new supercell). If logging is enabled, the log can be saved to a CSV file. When this option is used, for each twist angle the code lists in the order the following details: twist angles "theta_0" etc. in radian unit (at the end in • ); the sum of norm L (1,1) (ε) "max_strain"; lateral size in Å 2 "supercell_size"; matrices describing the vectors of the supercell on the basis of substrate and Cartesian coordinates denoted as "M", "N" etc.; "supercell_vectors" ( c 1 =(supercell_vectors_11, supercel_vectors_21), c 2 =(supercell_vectors_12, supercel_vectors_22) given in Å); strain tensor ε i j denoted as "strain_tensor_layer" for each layer B i and number of atoms "atom_count". If there are multiple layers, the user obtains a table with a row for every combination of twist angles. The list takes the form of a pandas.DataFrame file, which makes it easy to view, transform, and save the data (e.g., to CSV format).
The package is also able to draw the positions of atoms in the supercell (projected onto an xy plane) using the Matplotlib plotting library. Here we examine the construction of the supercell of bilayer graphene, which is a prototypical example of a bilayer system. For this reason, we use a two-atom unit cell 2 , a 2 ), assumed for both layers, where a = 2.46 Åis a lattice constant of graphene. One of the graphene layers can be rotated with respect to the other by a relative angle, θ (twist angle). Supposing we want to find the twist angles that lead to the appearance of moiré patterns, we can use the Heterostructure.opt method from the code, and set the algorithm parameter to moire. It is also recommended to increase the value of the L max (A) (e.g., max_el to 20 or 50) in order to catch patterns that are only visible at long distance, and to use a small step for the values of θ (e.g., 0.025 • ). Then, the user can apply the log functionality to assess which angles are most promising (for details see section III).
In this example we set max_el to 20 and the range for the twist angle as θ ∈ (0 • , 30 • ), with the step equal to 0.1 • . The optimal supercells were found by applying these parameters, and the maximum strain tensor components of these supercells are presented in Fig. 3. Particular angles exist when the supercells are small (tens to hundreds of atoms) and have low strain values (< 0.03%). Some of the lowest-strain angles are collected in Table I. In the case where two identical layers with a twist angle equal to 0 • are considered, the strain tensor is exactly zero, and the generated supercell is identical to the primary cell. Because of this obvious result, we do not include this case in Table I. Note that the lattice mismatch is commonly calculated as the difference between the lattice constants, even for rotated layers with strain distributions that are different from those determined by this simple approximation. Therefore, supercells resulting from construction of such twisted layers are assumed to exhibit low strain values, although, in fact, the strain components (e.g., shear strains) can be substantial, and can influence the material's electronic properties. The number of atoms in the substrate layer can be calculated using the equation, det(M) × N at , where N at is the number of atoms in primary cell, and M is equal to M = (m i j ). The number of atoms in other layers can be calculated in an analogous manner.
The results (Table I) demonstrate that, for small twist an-TABLE I. Structural information corresponding to bilayer graphene supercells generated by supercell-core software. The results are presented for the optimal supercells (small system size and nearly unstrained layers) and for particular twist angles between the graphene layers. The supercells of the bilayer graphene have been generated with twist angle resolution 0.1 • , and L max (A) ≤ 20 parameters. Note, that the supercell vectors can be calculated from c 1 = m 11 a 1 + m 21 a 2 and c 2 = m 12 a 1 + m 22 a 2 or from 2D rotation matrix and n i j parameters (see Eq. 1). The m i j , n i j , ε i j matrices are taken from log files. Only maximal strain tensor component is presented. gles, the size of the supercell is large and the number of atoms is high, whereas for large twist angles, the situation is reversed. These results confirm a well-known fact observed for twisted bilayer structures where the relevant length scale is on the order of 1/θ 31 . In particular, for small twist angles such as 1.1 • or 2 • , the periodicity becomes large enough that it cannot be handled as lattice inputs for DFT codes. However, other potential approaches, such as tight binding methods (TB) 32 , can be used instead. Moreover, our supercell-core software successfully identifies the angles that have been reported previously based on STM experiments, specifically the 21.8% 1 and 1.1% 13 for bilayer systems. In addition, we present the visualisation of the optimal supercell generated for the twist angle of 6 • as shown in Fig. 4, where the moiré pattern is clearly visible along with various stacking configurations. The different stacking configurations of the bilayer graphene and theirs crucial impact on the electronic properties have been highlighted in many papers so far (see i.e. Ref. 33 ), however, for a twisted layers many stacking configurations are automatically included within one supercell (see e.g. Fig. 4).
B. Trilayer systems: trilayer graphene (TG) and hBN/graphene/phosphorene The natural extension of the bilayer system is to simply add another layer. In principle, our code allows the user to find the optimal supercell for n layers exhibiting different types of symmetry.
Thus, in this example, we consider a trilayer van der Waals heterostructure consisting of hexagonal boron nitride, graphene, and phosphorene. This structure is significantly more complicated than the previous example. Although graphene and hBN share the same hexagonal symmetry and have similar lattice constants (2.46 Åand 2.52Å, respectively), phosphorene has a rectangular lateral cell with lattice constants equal to a = 3.26 Åand b = 4.35 Å(optimised lattice constants obtained in LDA approximation for monolayer taken from 14 , (see Fig. 6). These differences between the layers makes finding an optimal supercell more difficult.
The supercell-core software can be used to find the optimal configuration for this example with any combination of twist angles. The layer strains resulting from the generated supercells are presented in Fig. 5 as a function of two twist angles, θ 1 and θ 2 . These angles (θ 1 , θ 2 ) correspond to the relative rotation of the graphene (layer B 1 ) and phosporene (layer B 2 ) layers with respect to the hBN layer (layer A), respectively. The dark navy blue regions in Fig. 5 indicate the set of angles that correspond to supercells with relatively low strains. Table II provides the details regarding a few generated supercells that contain hundreds of atoms. The results clearly show that the strains in these supercells are relatively high. This is because we constrained the supercell size (L max (A) ≤ 10) so that the number of atoms is less than one TABLE II. Structural information corresponding to the optimal supercells generated by supercell-core software for the trilayer systems: trilayer graphene (TG) and hexagonal boron nitride/graphene/phosphorene (hBN/G/P). The supercells of the trilayer graphene have been generated with twist angle resolution equl to 0.1 • , and L max (A) ≤ 80. Only maximal strain tensor component is presented. The twist angles θ 1 , θ 2 correspond to relative rotation of the graphene and phosphorene layers in respect to the hBN layer, respectively. For the clarity of presentation, only one of the maximal strain components is presented, however, in some of the cases presented here the |ε xy | = |ε yx | or |ε xx | = |ε yy |.

L max (A)
Twist angles thousand, while our system is composed of three incommensurate lattices. This is a fundamental problem of the system, and not a problem of the program. To find better supercells, one can loosen the constraints on the supercell size, although this introduces the risk of finding supercells that are too big for DFT calculations. In order to minimise the strain, it is recommended that the user increases the value of the L max (A) parameter. By increasing this parameter, one directly enlarges the supercell size considered, and thus, minimises the strain (see the last raw results in Table II). Increasing the resolution of the twist angle may also help, in principle, but there is little chance that an optimal supercell would not be found with a resolution on the order of 0.1 • .
In the Table II, we also present selected supercells for trilayer graphene (TG). Note, that whenever one of the twist angles is zero, the results correspond to bilayer system (first raw in Table II for θ 1 = 0 • , θ 2 = 21.8 • ). However, the software found different result than presented for bilayer system (see Table I for θ 1 = 21.8 • ). When supercell-core finds a true moiré pattern, any parallelogram with vertices on the repeated patterns' nodes is a zero-strain supercell. The program should then find the smallest of such supercells. However, the calculated strain for each of those supercell is non-zero because of floating point errors. When choosing the optimal supercell, the program has some fixed tolerancy (epsilon) of the variation in the strain value but sometimes it happens to be too low to find the smallest supercell from a set of equally good su-percells. A potential solution is to re-run the calculations for low-strain supercells with lower value of the L max (A) parameter.
Generally, for cases where n > 3, it is difficult to obtain the commensurability condition for a relatively small system size, especially one with different types of layers that exhibit different symmetries.

V. CONCLUSIONS
In this report, we present the novel supercell-core software which has been developed to facilitate construction and analysis of vertically-aligned 2D layers. In principle, the code is designed to handle structures comprised of "n" different types of layers exhibiting different lateral symmetries. The developed software allows the user to find the optimal supercell, with the smallest number of atoms and lowest strains experienced by adjacent layers. The methodology is based on a commensurability condition, which implies long-range crystalline order in the sets of lattice planes that make up the van der Waals heterostructure. This condition is enforced by applying strain to the top layers of the structures. In the described approach, the bottom layer is always unstrained.
The software works with POSCAR files, and it is therefore compatible with the DFT software, VASP, and with the widely-used visualisation software, VESTA. In addition, there phoshorene layers used for constructing the optimal van der Waals trilayer hBN/G/P supercell and D. an example of an optimal supercell for a twisted trilayer hBN/G/P structure with twist angles equal to θ 1 = 10.9 • and θ 2 = 29.9 • , corresponding to graphene and phosphorene rotations with respect to the hBN layer.
are two algorithms that are implemented, namely Direct and Fast. The former spans all possible configurations from which the supercells can be constructed, while the latter is more efficient and designed to search for unstrained configurations (e.g., those exhibiting moiré patterns). The former is more hefty and resource-intensive, while the latter is a more efficient algorithm. Depending on the user's requirements, the developed software also enables construction of the optimal supercells based on the twist angle(s) between the layers. It also allows users to study strained supercells in order to examine the impact of the strain distribution on the various properties of van der Waals heterostructures. Both types of results can be used as structural inputs for further calculations based on an ab ini-tio approach (using DFT software, such as VASP, Quantum Espresso, or SIESTA). Particularly for cases of large supercells containing thousands of atoms, the results provided by our software can be used as structural inputs for software based on tight binding methods.