Silicon single-electron random number generator based on random telegraph signals at room temperature

The need for hardware random number generators (HRNGs) that can be integrated in a Silicon (Si) complementary-metal-oxide-semiconductor (CMOS) platform has become increasingly important in the era of the Internet-of-Things (IoT). Si MOSFETs exhibiting random telegraph signals (RTSs) have been considered as such a candidate for HRNG, though its application has been hindered by RTS’s variability and uncontrollable, unpredictable characteristics. In this paper, we report the generation and randomness evaluation of random numbers from RTSs in a Si single electron pump (SEP) device at room temperatures. SEP devices are known to consistently produce RTSs, due to a quantum dot electrically deﬁned by multi-layer polycrystalline Si gates. Using RTSs observed in our devices, random numbers were extracted by a classiﬁer supported by supervised learning, where part of data was used to train the classiﬁer before it is applied to the rest to generate random numbers. The random numbers generated from RTSs were used as inputs for the Monte Carlo method to calculate the values of π , and the distribution was compared against the result obtained from Mersenne twister, a representative pseudo random number generator (PRNG), under the same condition. π was estimated more than 80000 times, and the distribution of the estimated values has a central value at 3.14 with a variance of 0.273, which is only twice larger than the result from PRNG. Our result paves a way to fully electronic CMOS compatible HRNG that can be integrated in a modern System-on-a-Chip in IoT devices.


I. INTRODUCTION
Password authentication is one of the most widely used protocols to protect one's private information 1-3 . This protocol accepts input from a user in a form of a password, and from this password a function called 'hashing' 4,5 (md5() 6 , for example) generates a 128bit output, which is stored in the server. From this generated sequence of characters, called hash, it is virtually impossible to guess the original password. When the server is attacked, the hackers will obtain hashes, but not original passwords. Therefore, the security of the user is still protected. However, it is critically important to use a password which is new, long enough and random enough to avoid a so-called 'dictionary attack' 1-3 , where a hacker tries to guess the password from common candidates, such as those on personal details (for example, birthday, surname and forename), a password used in elsewhere and simple sequences of numbers and letters (1234567890, abc), that matches the obtained hash. To generate a password with enough strength, pseudo random number generators (PRNGs) are used to suggest strong passwords. PRNGs use a date and time, for example, as a seed, and an algorithm produces random numbers as a password.
Another way of generating a strong password is to use a hardware random number generator (HRNG) [7][8][9][10] . HRNG uses the stochastic nature of certain physical phenomena or physical values, such as thermal noise of electronic devices 7 , a) Electronic mail: K.Ibukuro@soton.ac.uk b) Present address: Center for Exploratory Research Laboratory, Research & Development Group, Hitachi, Ltd. Tokyo 185-8601, Japan; Electronic mail: S.Saito@soton.ac.uk detection of single photons 8 , timing of Ovonic threshold switching 9 , chaotic behaviour of semiconductor lasers 10 and so forth. As HRNG does not rely on algorithm, the security of a password that is generated by HRNG is fundamentally superior to the one generated by PRNG.
Recently, theoretical 11 and experimental [12][13][14] studies have sought the possibility of utilising random telegraph signals (RTSs) for random number generation. RTSs are a stochastic shift of threshold voltage over time 15 , and were first found in 1985 in silicon (Si) metal-oxide-semiconductor field effect transistors (MOSFETs) 16 . The shift in V th has been considered to be caused by trapping and de-trapping of a single carrier (an electron in a case of n-type MOSFET (n-MOSFET), and a hole in a case of a p-type MOSFET (p-MOSFET)) into a charge trap in a gate-oxide 15 . As a consequence of aggressive scaling of Si technology, RTSs started to become recognised as a reliability issue in complementary MOS (CMOS) technology in the 2010s [17][18][19][20][21][22][23] . The merit of using RTSs as a source of randomness is primarily because of the compatibility of CMOS technology 15 . Also, it is well known that in many cases the probability to observe two discrete current states (high and low current states) can be well controlled by gate voltage 15,[24][25][26][27] , such that one can identify a bias condition where two current states are equally observed ('balance point') 9 . The difficulty in utilising RTSs as HRNG is that the presence and absence of RTSs cannot be controlled, and usually the probability to find a device that exhibit RTSs is less than 1% 15,28 .
In this paper, we report the generation and evaluation of random numbers from RTSs in a Si single electron pump (SEP) device at room temperature. Our device is based on silicon-on-insulator (SOI) Fin-FET structure with multiple This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT 2 poly-crystalline Si gates, and the formation of an electrically defined quantum dot (QD) is achieved when negative bias is applied to the two gates 25,26 . Our previous study reveals that RTSs can be caused by trapping and de-trapping of an electron in the QD, whose charge state is read by the net current flowing through the device 25,26 . We selected a different device this time, and RTSs were observed again, showing the expected characteristics such as average lifetime of about 1s and tunability of the charge state by the top gate. Moreover, we observed anomalous three-level RTSs, which can be explained by the presence of multiple energy levels in a QD 25 . The advantage of this mechanism is that we do not need to rely on fluctuant device features like oxide traps 13,14 or incident potential pocket 12 , whose presence can only be determined probabilistically. With the future possibility of real-time random number generation in sight, we considered the generation of random numbers from RTSs as a classification problem of time-series data 29 . That is, the classifier was built from training data before it was applied to a new data to achieve random number generation. Finally, the quality of random numbers was evaluated by calculating values of π using Monte Carlo method with the generated random numbers as input. π was calculated more than 80000 times, whose distribution has a median at 3.14 with a deviation of 0.273. This variance was only twice larger than the distribution obtained from Mersenne twister (MT), a representative pseudo random number generator (PRNG), calculated under the same conditions. The reason for the wider variance was attributed to the misclassification in the test data.

II. DEVICE FABRICATION
Our device was fabricated on a 6-inch SOI wafer with the surface orientation being (100). The stack of the wafer was 100nm-thick SOI and 145nm-thick buried oxide (BOX) on a handle layer. After the thickness of the SOI was thinned down to 24nm, the nanowire and mesa for source and drain (S/D) were patterned by electron beam lithography (e-beam). This was followed by anisotropic wet etching using tetramethylammonium hydroxide (TMAH) as wet etchant. This resulted in the sidewall of the nanowire being atomically flat (111) crystalline surface [30][31][32] . The nanowire width of e-beam mask was designed to be 100nm. After the oxidation and wet etching, the width of the nanowire was further reduced to about 60nm. 17.6nm-thick silicon dioxide (SiO 2 ) for gate dielectric was grown on SOI at 1000 • C, followed by its partial removal for S/D contact. Then, polycrystalline Si (poly-Si) was deposited across the entire wafer, and phosphorous was heavily doped using spin-on-dopant (SOD) technique aiming for 5×10 19 cm −3 . The dopants were activated by rapid-thermalannealing (RTA) at 950 • C for one minute. This layer was patterned by e-beam and inductively-coupled-plasma (ICP) etching for raised S/D as well as gates for electrically defining a QD in the channel, called left-gate (LG) and right-gate (RG). The length of L/RG (L L/RG ) was 100nm. In order to allow the formation of an inversion layers in the regions not covered by the L/RG, another layer of poly-Si was deposited and patterned using the same technique used for L/RG. This gate is called top-gate (TG). TG and L/RG were electrically insulated by 9-nm thick thermal oxide. The length of the TG (L TG ) is defined as the separation between LG and RG, which is 150nm. Figure 1 (a) shows a schematic of cross-section of our device along the nanowire. Formation of the passivation oxide was followed by contact opening, aluminium deposition and its patterning for metal interconnect. Finally hydrogen termination was achieved by forming gas anneal at 450 • C for 30 minutes 25 . Figure 1 (b) shows a scanning-electron-microscope (SEM) image of a similar device before the deposition of poly-Si for TG, in which the position of a QD is highlighted. Figure 1 (c) shows the potential profile along the nanowire when LG and RG were kept around V th while TG inverted the entire channel. RTSs were consistently observed in devices from the same fabrication lot 25,26 , and the physical mechanism of the RTSs were attributed to a trapping and detrapping of a single electron in the electrically defined QD. This kind of discretised signals were observed at room temperatures in devices fabricated using the similar technique, while the trapping and detrapping of a single electron was detected by a nearby charge sensor 33,34 . Therefore, the observation of RTSs in this particular device was also expected. LG

III. CHARACTERISATION
A Keysight B1500 and Cascade M150 probe station were used for electrical characterisation. The detection limit of this experimental setup is 100fA. Figure 2 (a), (b) and (c) show the This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT 3 transfer characteristics of the device when TG, LG and RG were swept, respectively, at two different drain voltage (V d ) values, 50mV and 1V. Leakage current was absent between the channel and the gates as well as between the gates. The device exhibited acceptable behaviour as a MOSFET, with the subthreshold swing of the I d -V TG curve being 77mV/decade. The degradation in subthreshold swing in I d -V RG and I d -V LG curves was attributed to the bias condition of the measurement, where voltage on TG (V TG ) was set to 0V. Threshold voltage of TG (V th ) was 0V, and above V th , current fluctuations were observed in the transfer characteristics. Similar current fluctuations were observed in a device from the same fabrication process, which has shorter gate lengths as well as narrower channel width 25,26 . This prompted to perform time domain measurements to identify the nature of the current instability in the transfer characteristics.  shows the histogram of the corresponding time trace, which is a probability distribution against the value of drain current (I d ). V TG was set to be 0.2V, and V LG and V RG were -50mV in this case. The average lifetimes of |0 , |1 were about 1s, while the lifetime of |2 was shorter than the other two current states. In Figure 3 (b), two major peaks and one minor peak can be seen. From this distribution, the probability to observe the highest current state (|0 ) and the middle state (|1 ) is almost the same, while the probability to observe the lowest current state (|2 ) is significantly lower than |0 and |1 . Figure 3 (c) shows the histograms at different voltage conditions, where V LG = V RG were varied from -200mV (blue dash line) to 0mV (green solid line) with 50mV increments. When V LG = V RG = -200mV, the dominant current state was |0 . As V LG = V RG increased, |1 started to observe more frequently, and when V LG = V RG = 0mV, the dominant current state was |1 . Figure 3 (d) shows the occupancy of the current states as a function of V LG = V RG . The balance point was achieved at around V LG = V RG = -50mV. Also, as V LG = V RG approaches to 0mV, the probability to observe |2 became negligible.
In this device, three current states were observed, which cannot be captured by a simple physical picture describing RTSs. In the most simple case of RTSs with two current states, this can be explained by a presence of a single isolated energy level with two different charge states, occupied or empty, corresponding to the two current levels 15 . If there are n independent energy levels were present, the number of current levels should be 2 n . In order to explain the existence of three current levels, a following physical model was constructed, which is shown as schematics in Figure 4. Four diagrams that represent potential diagrams from source to drain are displayed in Figure 4. In this model, we assume the existence of two discrete energy levels in the QD. The separation between the energy levels are expected to be in the order of 100mV from our previous result 25 , justifying the observation of single electron phenomena at room temperature. When the energy barriers created by LG and RG were high (V LG = V RG = -200mV, for example), both energy levels are above Fermi energy of source (E F ) and the ground state is occasionally occupied by an electron and it would be released soon after (Figure 4 (a)). This occasional occupation of the level in the QD and following release caused corresponding positive shift in V th followed by negative V th shifts at random. As the probability for the level to be occupied is still low, the probability distribution is highly asymmetric in favor of |0 . When the barrier is lowered and the ground state aligned with E F , the probability to observe |0 and |1 is almost equal (Figure 4 (b)). Finally, as the barriers were further lowered, both levels can interact with the electron reservoir. This means that the ground state can still capture and release an electron (Fig-This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT ure 4 (c)), and the excited states can also trap and de-trap an electron ( Figure 4 (d)). However, the trapping and de-trapping of an electron in an excited state can be considered to be allowed only when the ground state is occupied. Here, we exclude a possibility where the ground state is empty and the excited state is occupied, which is a reasonable assumption based on this multi-level model in a single QD. Indeed, in the time domain measurement shown in Figure 3 (a), the transition to the current state |2 seems mostly occurred from |1 .
Overall, this physical model explains the characteristics of the observed RTSs reasonably well.

IV. THE PROCEDURE TO EXTRACT CHARACTERISTIC PARAMETERS OF RTS FOR RANDOM NUMBER GENERATION
Following on from discussing the physical model behind the RTS observed in our device, in the next two sections the procedure of random number generation from the RTS is introduced. Data for random number generation was prepared as follows. Nine time domain measurements were taken (called Data1,2,3...9) at the fixed bias condition of V TG = 300mV and V LG = V RG = -106mV, which still realised the potential profile depicted in Figure 4 (b). V TG was increased from 200mV (as shown in Figure 3) to 300mV, in order to increase the amplitude of RTS (difference in current value between high and low current states). The probability of observing high and low current states were almost equal at V LG = V RG = -106mV, determining the bias condition of the time domain measurements. RTSs were observed with similar average lifetime as the one shown in Figure 3 but with higher average current (about 9nA), which were used to generate random numbers (part of data shown in Figure 7 (a)). The sampling interval of each measurement (t int ) was set to be 24ms, which is shorter than the average RTS lifetimes. 100001 points were taken, resulting in the total measurement time of 2400s in a single time domain measurement. After one time domain measurement was completed, 20s of hold time, sufficiently longer than the average lifetime of RTSs, was provided such that the auto-correlation between two adjacent measurements were safely removed.
In order to generate a sequence of random numbers from RTS, high current state (|0 ) was attributed to a digital number 1, while the low current state (|1 and |2 ) was designated to be a digital number 0. This can be considered as a classification problem of time-series data 29 , and to achieve this classification, a certain mathematical model (classifier) needed to be established. Several methods can be proposed. The most simple method is to create a model every time after a measurement was completed ( Figure 5 (a)). That is, after a single time domain measurement was completed, a histogram of I d was created, and the analog-to-digital conversion (ADC) was performed based on the threshold value, determined from the histogram, that segregates one current state to the other. The merit of this procedure is that the ADC from RTS to random numbers can be achieved with high accuracy, as the best classifier is produced for a given time domain measurement. However, the problem of this method is that real-time random number generation is not possible, as the random number is returned as a result of the post processing.
In order to overcome this issue, initialisation of a classifier is necessary. That is, by using several time domain measurements at the same bias condition as 'training data', a classifier (for example, allocate 1 if I d exceeds a certain value determined from training data, else 0, and the sampling time twice longer than average lifetime of RTS in training data) was created, before the model was applied to a new RTS ('test data') to generate a sequence of random numbers ( Figure 5 (b)). Dividing dataset into training and test data, building a mathematical model before applying it to the test data is a typical method used in the context to supervised learning. The clear advantage of this method over the aforementioned one is that the real time random number generation is possible. That is, after the classifier is established, a sequence of random numbers can be generated by applying the classifier to the measurement as the instrument records I d over time. A disadvantage of this method is that as the classifier is optimised to the training data and not the test data, there may be a misclassification of current states. This misclassification is unavoidable in classification problems. However, the accuracy can be im-This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. proved by refining the model (taking into account a drift of average current due to bias temperature instability [35][36][37][38] , for instance). The merit and demerit of both methods were summarised as a table in Figure 5 (c). In the following, the procedure to generate a classifier is explained based on our data, whose flowchart is schematically shown in Figure 6. While the method itself can be used for real-time random number generation, in this paper real-time random number generation was emulated by using test data, rather than obtain more time domain measurements. Firstly, seven time domain measurements out of nine results (Data1,2...9) were used as training data (called train-ingData1,2...7), and the other two measurements were reserved for emulating random number generation (called test-Data1,2). The selection of training and test data was not necessarily in the order of time, as all measurements were assumed to be independent due to the hold time of 20s. This means that there were 9 C 2 = 36 combinations of training and test data, that can be used to evaluate randomness of the generated random numbers in section VI. For simplicity, one particular combination was focused to explain the procedure in detail. Out of trainingData1,2...7, one result was selected (trainingData1, for example) and four parameters that characterise the time domain measurements were extracted, which are I th , I max , I min and T . The first three parameters were used for ADC, while T determines the re-sampling frequency of test data. The definition of these parameters are explained in detail in the next two paragraphs. Extraction of these parameters was achieved by creating two histograms, which are histogram of current values (calculated by a function called 'histcount1') and histogram of individual lifetimes (calculated by a function called 'histcount2'). 'histcount1' was defined to accept one time domain measurement (trainingData1, for example) as an input and firstly calculate a probability distribution to observe a certain current value of I d over the measurement time. Due to the presence of RTS, two major peaks were observed in the histogram (Fig  3 (b). These two peaks were then fitted by a double Gaussian function

PLEASE CITE THIS ARTICLE AS
where a 1 , b 1 , c 1 , a 2 , b 2 and c 2 are all fitting parameters. In particular, b 1 and b 2 are the medians of the two Gaussian functions fitted to the probability distribution of current values, representing the average value for high and low current state. b 1 , c 1 , b 2 and c 2 have the dimension of Ampere, while a 1 and a 2 are dimensionless. Then, b 1 and b 2 were used to determine the threshold current value to classify the current state (called I th ), and maximum and minimum current values accepted in the given time domain measurement (called I max and I min , respectively) to remove outliers. I th was simply defined as an average of b 1 and b 2 while I max and I min were defined as This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT 6 respectively. This means that when I d exceeded I max , the digital number would not be allocated and NaN was returned instead of 1. Similarly, when I d was lower than I min , NaN was given instead of 0. When I d is bigger than I th and smaller than I max , a digital number of 1 was designated, while when I d is smaller than I th and larger than I min , a digital number of 0 would be given. This digitisation was performed to obtain a sequence of digital numbers, called 'digitisedSignal', which constituted, together with I th , I max and I min , the output of function 'histcount1'. Figure 7 (a) shows part of Data8, I th , I max , I min calculated from 'histcount1' with Data8 as an input, and 'digitisedSignal' plotted over the raw data. When an outlier was observed, like the points around t=1910s, NaN was given instead of 0. This outlier might have been possibly caused by another RTS with lower probability to be observed compared to the one that has been discussed so far 15 .
'histcount2' was designed to accept 'digitisedSignal' as an input and firstly calculate a probability distribution of individual lifetime of one current state (high current state, for example) of a given training data (Data8, for example) 15 . As the capture and emission process of a single electron in a QD is considered to follow Poisson process, the distribution was expected to be fitted by an exponential function 15 . Figure 7 (b) shows a histogram of individual lifetime of high current state in Data8 as a training data, and indeed the probability to observe the same current state became exponentially smaller. The histogram was fitted by a simple exponential function where τ High is a fitting parameter, whose physical meaning can be understand as follows; after τ High , the probability for high current state not making a transit to low current state is 1/e ∼ 37% 15 . In this sense, τ High represents the average lifetime of high current state in a given time domain measurement. Also, this value can be understood as a measure to evaluate the autocorrelation of the RTS signal. That is, if RTS was monitored longer than τ High , the probability for I d not making a transit from high current state to low current state can be considered as sufficiently small. The same histogram was calculated for low current state, and τ Low , average lifetime of low current state, were obtained; τ High and τ Low were almost identical as the probability to observe high and low current states were nearly equal (see Figure 7 (a) and 10 (c)). However, it is virtually impossible to balance the probabilities to be exactly 50% with infinite precision, due to the limited resolution of voltage source (1µV). This means that either τ High or τ Low would be slightly longer than the other. For consistency, the longer one was defined as a typical lifetime of RTS; The time-scale parameter extracted from histcount2 would be used for re-sampling of test data for RNG, explained in the next section V. For this purpose, the re-sampling interval should be sufficiently longer than a typical lifetime of RTS in order to reduce autocorrelation. Therefore, τ was multiplied twice such that the probability for one current state did not make a transit to the other current state was less than 1/e 2 ∼ 14%; which is an output of 'histcount2'. After the four parameters were extracted from training-Data1 by 'histcount1' and 'histcount2', the same procedure was iterated for the remaining six time domain measurements (trainingData2,3...7). This resulted in seven sets of four parameters, and finally the average of each parameter was calculated, resulting in I th , I max , I min and T ; where k is the number of training data (7 in this case), I th,i , I max,i , I min,i and T i are I th , I max , I min and T of trainingData i, respectively. Note that ' T /2' is average of typical lifetimes of RTSs in trainingData1,2...7. These four parameters underpin the classifier for extracting random numbers from RTS. This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT 7

V. RANDOM NUMBER GENERATED FROM RANDOM TELEGRAPH SIGNALS OBSERVED IN THE SINGLE ELECTRON PUMP DEVICE
Using the classifier determined from the training data via histcounts1 and histcounts2, finally a sequence of random numbers was extracted from test data. A schematic of this procedure is shown in Figure 8. The extraction procedure is 1) allocate a digital number '1' if I max > I d > I th was satisfied 2) allocate '0' if I th > I d > I min was satisfied 3) allocate NaN if I d exceeded I max or I d was smaller than I min . 4) resample 'digitisedSignal' with an interval of where t int is a measurement interval (24ms), and is a floor function, fixing the value of a real number T /t int to the closest integer equal to or less than T /t int . The re-sampling can only be achieved with an interval of T when T happened to be an integer multiple of t int . Since T is not necessarily an integer multiple of t int , the floor function was practically necessary. The purpose of the re-sampling step was to remove autocorrelation between one digital number extracted from one point to the other 14 . When RTS is sampled with interval sufficiently shorter than the average lifetime of RTS, it is likely to observe the same current state at the next measurement point, which is obvious from the definition of lifetime of one current state. In order to extract a sequence of random numbers from RTS, it is required to be sampled with interval sufficiently longer than the average lifetime such that a digital number at one point is unpredictable from the previous digital number. Each T calculated from training data was defined to be twice of the typical lifetime of RTS, and therefore, on average, T /t int t int can be considered to be sufficiently longer than the lifetime of testData. Note that in this paper the real-time random number generation is emulated and not actually performed, and the test data has the same number of measurement points as the training data. Depending on the initial point of re-sampling was taken, there were T /t int ways of extracting a sequence of binary numbers from a test data (a circle, triangle and cross mark in Figure 8). This is because the selection of the initial point was purely arbitrary; as the test data is considered to be independent of the training data, the measurement could have started at time t 0 = t int , t 0 = 2t int , ... t 0 = ( T /t int − 1)t int . This means that sequences of random numbers starting at t 0 = t int , t 0 = 2t int , ... t 0 = ( T /t int − 1)t int were all equally likely and should not be discarded. Therefore, all sequences were saved for further analysis. The number of sequences depended on which data (Data1,2...9) was used for test data, particularly the number of NaN in the digitised signal of the test data. In our case, T /t int ∼ 55 sequences could be extracted from one time domain measurement. Finally, eight adjacent binary numbers were combined to form one decimal number (from 0 to 255), resulting in T /t int sequences of 8-bit decimal random numbers, which was saved as a matrix called RN1 (one row of RN1 corresponds to a sequence of random numbers). To implement the real-time random number generation, time domain measurement should be taken with an interval of T /t int t int in the first place, before the classifier was applied for ADC. This procedure was repeated for testData2, and a matrix of random numbers with the same size as the one generated from testData1 was obtained, called RN2. The accurate description of the classifier is given in section V. testData1 and 2 were digitised by the classifier and also reorganised in a matrix depending on the initial sampling points. Binary random numbers were finally converted to 8bit decimal numbers, and two matrices called RN1 and RN2 were produced from testData1 and 2.

VI. EVALUATION OF RANDOMNESS GENERATED FROM RANDOM TELEGRAPH SIGNALS
In the section IV and V, the algorithm to generate random numbers from RTSs was introduced, and two matrices containing the sequences of random numbers (RN1 and RN2) were generated as an output. In this section, the randomness of the random numbers is evaluated by estimating the value of π using Monte Carlo method. This estimation of π is generally achieved as follows. Firstly, two sequences of N random numbers distributed in [0,1] were prepared using any random number generator. These two sequences of random number can be used as a coordinate to designate a point in the x-y plane (0 ≤ x ≤ 1, 0 ≤ y ≤ 1). Then, a number of points that satisfy the condition x 2 + y 2 < 1 were counted, which is an This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset.

PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647
Si SE-RNG based on RTSs at RT 8 estimate of π/4. Because a random number generator should generate a different sequence of random numbers every time it runs, this estimate of π will also be different depending on the random number used as inputs. The important point is the distribution of a large number of estimated values of π, and the quality of random number can be considered as good when the distribution has a median around the true value (3.14...) with smaller deviation from the median. The accuracy of the median is determined by the randomness of the random number, while the precision of the estimation, gauged by the deviation from the median, is limited by the number of elements in a sequence (number of points in the x-y plane). This means that if a longer sequence was used as an input, it is more likely that the distribution of estimated values would be smaller.
Random numbers generated from RTS was then used as inputs of this procedure to calculate π (Figure 9 (a)). One sequence of random numbers was selected from both RN1 and RN2, and these random numbers were normalised such that all values would be fit in [0,1], called rn1 and rn2. rn1 and rn2 were then used as a coordinate of a point in the x-y plane (rn1 for x, rn2 for y, for example), creating a scatter plot in the area 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Finally π was estimated by counting number of points in x 2 + y 2 < 1. This estimation was repeated for all the combinations that can be taken from RN1 and RN2, which means that π can be calculated T /t int 2 times from RN1 and RN2. Figure 9 9. (a) shows the procedure to estimate π from random numbers in RN1 and RN2. One sequence of random numbers were chosen from both RN1 and RN2, normalised in [0,1], and used as a coordinate to designate a point in the x-y plane. Red filled circles are the points in the area x 2 + y 2 < 1, while blue filled circles were points outside. (b) shows one of the examples of rn1 and rn2 taken from RN1 and RN2, generated from Data3 and Data8 as test data. Figure 10 (a) shows two examples of the distribution of estimated π, which are both compared with the values of π estimated from Monte Carlo method using uniform random number generated by Mersenne twister (MT). The blue points in Figure 10 (a) are probability distribution of calculated values of π using Data1,2,4,5,6,7,9 to generate the classifier (train-ingData) and Data3,8 to generate random numbers. When fitted by a Gaussian function, the median was 3.16 and the variance was 0.0750, which was comparable with the result obtained from uniformly generated random number by MT as input (plotted as orange dashed line in Figure 10 (a)). On the other hand, magenta points in Figure 10 (a) are the one using Data 3,4,5,6,7,8,9 for the classifier and Data1,2 for random numbers. As can be seen, the median of the distribution was off from the true value. This implies that the numbers generated from Data1,2 by the classifier based on Data3,4...9 were not uniform, which means that a number of smaller values (0,1,2...) was larger than that of bigger values (...253, 254, 255).
In order to seek the source of non-uniformity in random numbers generated from Data1,2 as test data, the validity of classification parameters were investigated. Figure 10 (b), (c), (d) and (e) show the histograms of Data3, Data8, Data1 and Data2 (light-blue lines) with corresponding I th and I th values. These values are highlighted by blue solid line and magenta dotted line, respectively. I th is defined to be an average of two medians of double Gaussian functions fitted to a given histogram, meaning that the digitised signal generated using I th can be used as a reference. On the other hand, as I th is an average of I th of multiple training data, which is not optimised to a test data, chances are that a current state can be misclassified to a different one due to fluctuation of measured current values around the median. Therefore, the difference between I th and I th can be used as a metric to characterise the extent of misclassification; where i and j are natural numbers that satisfy 1 ≤ i = j ≤ 9, I th,i is I th for Data'i' and I th i j is I th when Data'i' and Data' j' were used as test data. Positive d i j means that misclassification of '1' as '0' frequently occurred when Data'i' was used as test data alongside with Data' j'. Likewise, negative d i j implies that '0' were often misclassified as '1' when Data'i' was used as test data alongside with Data' j'. As shown in Figure 10 (b) and (c), d 38 and d 83 has a different sign, while d 12 and d 21 were both positive, seen in Figure  10 (d) and (e). This means that because more misclassification towards '0' occurred in both Data1 and Data2, RN1 and RN2 has a larger number of smaller values, leading to a larger number of points in the x-y plane that satisfy x 2 + y 2 < 1 and hence larger estimated values of π. In the case of Data3 and Data8, certainly there were misclassification in both cases, though the opposite sign of d 38 and d 83 helped to mask the non-uniformity in generated random number in the estimated values of π. This result suggests that even if the uniformity of a sequence of random numbers is not perfect, by knowing the deviation with its sign (d i j , for example) and cancelling the deviations by combining two or three sequences of random numbers (d Σ,i j ), total uniformity of the points in higher dimension (2D,3D...) may be improved.
Combined with the distribution shown in Figure 10 (a), the randomness of points in the x-y plane can be evaluated by the This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. sum of d i j and d ji ;

PLEASE CITE THIS ARTICLE AS
Figure 10 (f) shows the values of this metric d Σ,i j against the median of the distribution of estimated values of π, measured from its true value. Blue diamonds denote the results from all the combinations of choosing two test data from nine measurement results, and a green filled circle highlights the case when Data3,8 were used as test data, while the red filled circle does the case when Data1,2 were used as test data. A clear pattern can be seen in this scatter plot, where the median of the distribution deviate from the true value when the value of |d Σ,i j | becomes larger. This suggests that the metric d Σ,i j can be used as a predictor of how good the estimation would be. In other words, if the value of d Σ,i j is small, the median of the distribution is expected to be closer to the true value than otherwise. Finally, in order to evaluate the overall quality of the random number generated from our device, a histogram of the estimated values of π from all possible combinations was calculated, which is shown in Figure 11. In total, π was calculated 80855 times using the Monte Carlo method. The blue filled circles are the data points, while the blue dotted line is a Gaussian fit with the median of 3.14 and the variance of 0.273. This distribution can be broken down into individual 36 histograms, two of which are shown in Fig 10 (a). Therefore, the variance of the total distribution is determined by distribu-tions like the one obtained from Data1,2, which miscalculate values of π due to misclassification. By reducing the misclassification (improving the learning algorithms, for example 29 ), the variance can also be smaller. In the same figure, the result from the uniform random number generated by MT is also shown, which has the median at 3.13 and the variance of 0.105, the latter of which is smaller than the one obtained from RTS, as expected.

VII. CONCLUSION
In this paper, random number generation based on random telegraph signals observed in a Si SEP device was discussed. RTSs was observed in one of the SEP devices, which showed a tunability of current states as expected 25,26 . The observed RTS has three current levels, whose physical origin was attributed to smaller energy gap between the ground state and the first excited state. In order to achieve real-time random number generation, the classifier to perform analog-to-digital conversion should be learned from previous measurement results, rather than a result from which random numbers are to be extracted. Therefore, such an algorithm was developed and introduced in Sec.IV and V in detail. After the classifier was established using training data, random numbers were extracted from a new test data, whose uniformity was benchmarked by calculating values of π using Monte Carlo method This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. PLEASE CITE THIS ARTICLE AS DOI:10.1063/5.0023647 FIG. 11. Comparison between random number generated from RTS and from Mersenne twister (a PRNG). The blue solid circles are the distribution of calculated π from all data, while the blue dotted line show Gaussian fit to the distribution. The magenta filled circles are the distribution obtained from MT, while the magenta dash-dotted line show Gaussian fit to the distribution. and compared with the result obtained from pseudo random numbers generated by MT. It turned out that the difference between the classifying parameters obtained from previous results and generated from the new result can serve as a predictor to evaluate the extent of misclassification and hence the randomness in the generated random numbers.
RTSs have been considered as a reliability issue in modern CMOS devices 17-23 mainly because the variability of RTSs cannot be controlled across wafers or fabrication lots, and also parameters that characterise RTSs seem to be not correlated with each other 15,28,39 . This certainly hinders the application of RTSs as HRNG, since the observation of RTSs is rare in the first place, and additionally the clock frequency at which the sampling would be made cannot be determined a priori. The device and algorithm shown in this paper addressed both difficulties, in a sense that SEP devices reliably exhibited RTSs 25,26 and the unpredictability of average lifetimes was harnessed by implementing 'supervised learning'.
While the bit rate of this proposed RNG turned out to be relatively slow, this was limited by the characteristics of RTSs used in this study. Traditionally, RTSs with average lifetime of less than µs were rarely reported in Si devices [40][41][42][43][44][45] . However, several studies show that the trapping and de-trapping of an electron in a QD or in a trap at Si-Silicon dioxide (SiO 2 ) interface can be cycled at a frequency of GHz 46,47 . This means that, combined with the CMOS compatibility, utilising RTSs as a source of randomness can be a promising candidate for future hardware security.