Bitcoin Transaction Behavior Modeling Based on Balance Data
Abstract
When analyzing Bitcoin users’ balance distribution, we observed that it follows a log-normal pattern. Drawing parallels from the successful application of Gibrat’s law of proportional growth in explaining city size and word frequency distributions, we tested whether the same principle could account for the log-normal distribution in Bitcoin balances. However, our calculations revealed that the exponent parameters in both the drift and variance terms deviate slightly from one. This suggests that Gibrat’s proportional growth rule alone does not fully explain the log-normal distribution observed in Bitcoin users’ balances. During our exploration, we discovered an intriguing phenomenon: Bitcoin users tend to fall into two distinct categories based on their behavior, which we refer to as “poor” and “wealthy” users. Poor users, who initially purchase only a small amount of Bitcoin, tend to buy more bitcoins first and then sell out all their holdings gradually over time. The certainty of selling all their coins is higher and higher with time. In contrast, wealthy users, who acquire a large amount of Bitcoin from the start, tend to sell off their holdings over time. The speed at which they sell their bitcoins is lower and lower over time and they will hold at least a small part of their initial holdings at last. Interestingly, the wealthier the user, the larger the proportion of their balance and the higher the certainty they tend to sell. This research provided an interesting perspective to explore bitcoin users’ behaviors which may be applicable to other finance markets.
Index Terms:
Balance distribution, log-normal distribution, Gibrat’s proportional growth, transaction behavior, poor user, wealthy user.I Introduction
The Bitcoin transaction network provides a chance for us to research Bitcoin users’ behavior modes because it traces each unspent transaction output’s (UTXO) flowing history and users can also be clustered by different methods, like the heuristic methods. It has been publicly noticed that bitcoin’s balance distribution is very decentralized and has the scale-free characteristic. But what mechanism leads to this distribution is seldom explored. Because of the successful application of Gibrat’s proportional growth rule in explaining cities’ size distribution and word usage application, we will explore whether it can be used to describe bitcoin users’ balance change processes and their balance distribution. If we define as the bitcoin user’s balance, the question is: can the change of bitcoin balance () be modeled by the stochastic equation equation with ? The exploration of this research will also reveal how bitcoin users behave with time if we change in the above equation.
Lots of papers have confirmed that the indegree and outdegree of Bitcoin transaction networks were distributed as power-law and this result could be explained by linear degree preferential attachment. When it comes to users’ bitcoin balance (the number of bitcoins owned by each user), its formation mechanism is not linear preferential attachment according to [3] even if users’ bitcoin balance distribution follows scale-free rules. [4] compared the constructed index ”cumulative distribution function of rank function” to the corresponding theoretical one visually and concluded that the transaction of bitcoin follows sublinear preferential attachment. One shortcoming of this research is that they just took every address as one node, and didn’t cluster these addresses to the user level. Another shortcoming is that they actually got the conclusions by only plotting but not by statistical methods, for which it is easy to get wrong conclusions [5]. Thus, it is necessary to give a deep insight into how users’ bitcoin balance evolves, what is the mechanism behind it, and what mechanism leads to current bitcoin balance distribution. Besides explaining the mechanism of balance distribution, it is also significant to know how bitcoin users behave during their transactions, which will be another important research to explore in this paper.
In the following section, we first choose the proper bitcoin balance data and explore these empirical balance data to find out the basic facts; secondly, we analyze the mechanism that leads to the current balance distribution based on the Geometric Brownian Motion model (GBM) and then interpret users’ behaviors during transaction; in the last part, we summarized and discuss this paper.
II Data Description and Exploration
II-A Data Description
We chose the bitcoin balance data on 2016-01-23 because the bitcoin transaction network was more mature and relatively more stationary at that time compared with the earlier date. The log-log scale histogram in Fig. 1 indicates that the users’ bitcoin balance is a heavy-tail distribution.
When it comes to the balance distribution, we need to distinguish two kinds of users. The first kind of users are those whose balances on 2016-01-23 are positive and who have transactions during the next period (it is 28 days in the right panel), which is called user group A. The other kinds of users are those whose balances on 2016-01-23 are positive, but who do not have transactions during the next period , which is called user group B. This differentiation is important and necessary, otherwise, data from users who have not transacted for a long time will affect the accuracy of our analysis, for example, a dead bitcoin address.
The left panel in Fig. 1 depicts the probability distribution function (pdf) with both kinds of users’ balance data. The right panel in Fig. 1 depicts the pdf with the balance data of only those (group B) whose balances on 2016-01-23 are positive and who have transactions during the next period .
Then applying the python power-law package developed by [3], we fitted the bitcoin balance data on 2016-01-23 to the power-law and log-normal distribution and found that the log-normal distribution fits the data better. Fig. 2 and statistical test in Fig. 3 also confirmed that the log-normal distribution is better than the power-law in fitting the balance distribution data.
We also compared the fitting results between the log-normal distribution and power-law distribution by gradually increasing the minimum fitting data. The first minimum fitting data we choose is bitcoin, and the increasing step is 1 bitcoin. That means that the minimum fitting data in the second time is 1 bitcoin, then 2 bitcoins in the third time, and so on. The result shows that the log-normal function is always a better fitting than the power law in most cases. Even if the data is generated by power-law distribution, we can’t still refuse the hypothesis that the data is from a log-normal distribution only if its variance is huge just by fitting the data using the package developed by [3]. So, it is not enough to get the conclusion that our empirical data comes from power-law distribution or log-normal distribution only by this statistic package.
The uniformly most powerful unbiased (UMPU) Wilks test as suggested by [9] can be used to distinguish power-law distribution and log-normal distribution. This method comes from the idea that exponentiality can be tested against normal distribution [10][11] using the saddle point approximation method and the idea that power-law distribution and log-normal distribution can be transferred to exponential distribution and normal distribution after taking log calculation to bitcoin balance data, respectively. The null hypothesis for this test is that the data is distributed as a power-law, and its alternative hypothesis is that the data is distributed as log-normal. The test is performed as follows: Firstly, we choose a threshold for the bitcoin balance; secondly, the UPMU Wilks test is performed for bitcoin balance whose value is larger than the threshold by computing the p-value. Though the Monte Carlo method can also be used to calculate the p-value, it is very time-consuming here because we have millions of data. As shown in Fig. 3, we can reject the null hypothesis and accept the alternative hypothesis in almost all regions of bitcoin balance except regions that include only tens of the largest value of bitcoin balance. However, the proportion of tens of the largest value of bitcoin to the total number of bitcoins in our specific time-point is less than 5%.
II-B Data Exploration
Now that we know that the balance distribution may be log-normal, a natural question is what is the mechanism behind the transaction that leads to the log-normal distribution?
Gibrat’s proportional growth law is an important tool in explaining the forming mechanism of power-law distribution if we change Gibrat’s proportional growth equation a bit [3], and especially in Zipf’s distribution when taking other mechanism into consideration, such as birth process and death process. However, we also understand that the probability density function (pdf) of the solution of standard Gibrat’s proportional growth is log-normal distribution which is better than the power-law distribution in fitting our bitcoin balance data. At the same time, our test confirms that the bitcoin balance distribution function is fitted well by log-normal distribution. We also checked the distribution of bitcoin balance on 2019-01-19 which is almost three years later than 2016-01-23, and we find that the distribution is also log-normal. Can Gibratt’s proportional growth law be the mechanism to explain our data? To answer this question, we need to investigate our data first and then check whether the exponent in Gibrat’s proportional growth equation is one or neither.
We first depict the scatter plot of bitcoin balance data () versus bitcoin balance change () data to get a comprehensive impression of the holistic data distribution. Fig. 4 is the scatter plot of bitcoin balance versus bitcoin balance change within a half year. Four sub-scatter plots with different scales are depicted so that it is easy for us to look closely into the data.
There exist different clusters in the data. Hopkin statistics (smaller than in the condition of 100 random samples) test (the null hypothesis is that there is only one cluster in the data; the alternative hypothesis is that there is more than one cluster in the data) also confirms the existence of multiple clusters. There are lots of methods for clustering data, such as distance-based K-means method, and probability-based Gaussian Mixture models. Both methods work well if data points are spared in a circle shape. Gaussian Mixture models also assume that it is Gaussian distributed in all dimensions of data points which is not the case in our data. Based on this analysis, to the best of our knowledge, there seem no methods that can be used to directly better cluster bitcoin users.
However, as shown in Fig. 4, there are three straight lines, the vertical line, the horizontal line, and the diagonal line that correspond to different situations. The vertical line corresponds to those users who don’t own or own only a small number of bitcoins on 2016-01-23 but got lots of bitcoins by trading before 2016-02-20 (28 days later after 2016-01-23). The horizontal line corresponds to those users who own bitcoins and the number of bitcoins didn’t change in the time interval between 2016-01-23 and 2016-02-20. The diagonal line corresponds to those users who owned bitcoins on 2016-01-23 but sold them all before 2016-02-20.
Based on our data exploration, we think that the data point on the horizontal line should be deleted because these corresponding users didn’t take part in trading activities in our specific time span.
III Mechanism Detection
Now, we explore the mechanism behind bitcoin distribution. As before, we still define as the cryptocurrency balance owned by the user. The change of cryptocurrency balance () can be modeled as follows if they follow the Geometric Brownian Motion (GBM) mechanism:
(1) |
where is the user’ balance at the starting time; is the time interval between the starting time and the ending time for measuring the balance change ; is the balance change of the user during ; is the Brownian Motion, and is the drift and volatility, respectively. is the exponent we will focus on. We can get the following equation by taking the expectation and variance on both sides of the equation 1:
(2) |
We can plot the equation 2 to explore the parameters , , and . The whole process includes three main steps:
-
•
At first, the range of bitcoin balance is split as (for example, ) consecutive bins (with constant size or size that increases exponentially);
-
•
Then, we classify the bitcoin balance data () and corresponding bitcoin balance change data () according to bins that we choose. After classifying, we delete those bins where the number of data () is less than 50 (can be other numbers, like 100) and the corresponding bitcoin balance change data ().
-
•
At last, we calculate the average and standard deviation of in each bin.
Because the bitcoin balance distribution is scale-free, there are no data or only a few data points in lots of bins that correspond to large bitcoin balances, and most data are located in bins that correspond to several small balances. So, we think it would be a good choice to apply exponential bins and we got 167 data points which were shown in Fig. 5.
We calculated the fitting results based on equation 2 first. Because there are both negative and positive values in the average of a bitcoin balance change, it is not possible to use a log-log scale coordinate system to show the fitting equation 2. So, we still use a constant scale coordinate system to show our data and fitting results. As shown in the left panel of Fig. 5, the red straight line corresponds to the case of proportional growth (exponent is set to 1 in equation 2), we only need to calculate the value of . The green line corresponds to the case that both exponent and were calculated by fitting. By comparing visually and making regressions, it seems that the exponent is 1 in equation 2 can be accepted.
In the right panel of Fig. 5, the relationship between and is shown. The red line corresponds to the case by fitting the model . The green line corresponds to the case in which we calculated the exponent by making a regression and is 0.739 by fitting. The exponent value we get from the first equation in 2 is very different from the exponent value we get by fitting the second equation in 2. Does this result denote that the exponent in the volatility term is different from the exponent in the drift term?
Because the absolute value of bitcoin balance change varies a lot for different users, we turn to research the ratio of balance change to balance . We get the following equation 3 by dividing in both sides of equation 2:
(3) |
Based on equation 3, there is no relationship between , and if . The only difference between our current calculation and previous ones is that we need to calculate the average and standard variance of in each bin, now, but not .
As Fig. 6 shows, surprisingly, there are two different modes (blue points and red points) for the users’ balance changes. For users whose bitcoin balance is less than a specific value (blue line, we also called them poor bitcoin users), the average of is positive, namely in equation 3. The left panel of Fig. 6 shows that there is a linear relationship between and balance , and the slope of this linear line is negative which means that for those blue points. The right panel of Fig. 6 shows that there is also a negatively correlated relationship between and , which denotes again for those blue points. By contrast, for users whose bitcoin balance is larger than a specific value (red points, we also called them wealthy bitcoin users), the average of () is negative, which means that in equation 3 for red points. The line in the left panel of Fig. 6 is almost horizontal, but upward a bit actually, which means that is larger than or close to one for red points. However, the right panel of Fig. 6 shows that the linear line is not exactly horizontal for red points, which means that for the volatility term. These analyses show that there exist two different balance growth models for Bitcoin users, which are as follows:
(4) |
where subscript , , , and denote that the corresponding value is smaller than one, equal to one, larger than zero, and smaller than zero, respectively. For example, , . is a threshold value. That means that for users whose bitcoin balance value is smaller than , their balances grow according to the first model of equation 4. Because the corresponding exponent and , we can’t get an analytical solution for this model. So, we can’t calculate exactly how the bitcoin balance of these users will change on average with time. For users whose bitcoin balance value is larger than , their balances grow according to the second model of equation 4.
We now research whether this type of growth model is stable by changing the time interval , by which we can also explore how bitcoin users behave with time. Our researching target includes the exponent , drift parameter , and volatility parameter . For every time interval , we calculated the average () and the variance () of in each bin of bitcoin balance. Then, we got these target parameters (, , ) by making a linear regression between the average (), variance () and balance S, respectively, as shown in sub-Fig. 7(a).
As shown in sub-Fig. 7(b), for the first stochastic equation () in equation 4, the value of exponent in both drift term and volatility term fluctuate a lot but are both less than 1. The exponent in the drift term is negatively correlated with the time interval (), while the exponent in the volatility term seems constant despite of much fluctuation. The second figure in sub-Fig. 7(b) shows that the parameter is a monotonically decreased function with time interval () but it tends to zero in the last. It means that poor bitcoin users initially buy very few bitcoins will buy more in the next time interval and then sell all bitcoins gradually in the future. The fourth and fifth figures of sub-Fig. 7(b) shows that fluctuates a lot with time interval () but is negatively correlated with time interval (), and decreases with . This means that poor bitcoin users tend to sell all their coins with increasing for sure. By analyzing, the exact formula of the equation should be:
(5) |
where denotes that exponent in drift term is a monotonically increased function with time interval () and smaller than 1; denotes that is a monotonically decreased function with time interval () and larger than zero; denotes that exponent in volatility term is constant and smaller than 1; denotes that is a monotonically decreased function with time interval ().
Now, we focus on analyzing the second formula of equation 4 (), where . As shown in the first figure of sub-Fig. 7(c), the value of exponent in both drift term and volatility term is nearly constant with time interval. However, the exponent in the drift term is larger than one, and it is smaller than one in the volatility term. The third figure of sub-Fig. 7(c) shows that the drift term parameter is negative and decreases with , which means that these users who own lots of bitcoins have the trend to sell their bitcoins with time flying.
The fourth figure of sub-Fig. 7(c) denotes that decreases with , so the parameter in volatility term of the second equation in equation 4 should be some monotonically decreased function of time interval , which means that wealthy bitcoin users will sell their bitcoins for sure over .
By analyzing, the exact formula of the equation it should be:
(6) |
where denotes that is a monotonically decreased function with time interval . Other parameters (, ) in this equation are constant. The model will be:
(7) |
IV Summary and Discussion
In this paper, we explore the transaction patterns of bitcoin users. Firstly, we explored users’ balance distribution and found that the log-normal distribution function can better fit the balance. Secondly, we explored whether bitcoin users’ transaction behavior follows Gibrat’s proportional growth rule and found that their transaction behaviors didn’t follow Gibrat’s proportional growth rule. By extending related analysis, we find that there exist two kinds of bitcoin users: wealthy users who own plenty of bitcoins, in the beginning, tend to sell their bitcoins; poor users who have few bitcoins, in the beginning, tend to buy a bit in next period and then sell all their bitcoins again in the future.
By analyzing the balance data for wealthy users, we found that the exponent of in the drift term is almost constant and slightly larger than one, and the exponent of in the volatility term is also almost constant and slightly smaller than one, which was shown in the second equation in equation 7.
The UTXO-based blockchain records each coin’s flow history and provides us the chance to research human economic behaviors. The research on the patterns of users’ transaction behaviors on UTXO-based blockchain is still in its early stages and deserves more attention in the future. This paper provides a good starting point in this direction and the research results may also be applicable in other traditional fields.
References
- [1] PL Krapivsky, S Redner, F Leyvraz, Connectivity of growing random networks, Physical review letters 85 (21), 4629 (2000).
- [2] PL Krapivsky, S Redner, Organization of growing random networks, Physical Review E 63 (6), 066123 (2001).
- [3] Alstott J, Bullmore E, Plenz D (2014) powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions. PLoS ONE 9(1): e85777. https://doi.org/10.1371/journal.pone.0085777.
- [4] Clauset, Aaron, et al. “Power-Law Distributions in Empirical Data.” SIAM Review, vol. 51, no. 4, Society for Industrial and Applied Mathematics, 2009, pp. 661–703, http://www.jstor.org/stable/25662336.
- [5] Sheridan, P., Onodera, T. A Preferential Attachment Paradox: How Preferential Attachment Combines with Growth to Produce Networks with Log-normal In-degree Distributions. Sci Rep 8, 2811 (2018). https://doi.org/10.1038/s41598-018-21133-2
- [6] Aspembitova A, Feng L, Melnikov V, Chew LY (2019) Fitness preferential attachment as a driving mechanism in bitcoin transaction network. PLoS ONE 14(8): e0219346. https://doi. org/10.1371/journal.pone.0219346
- [7] Kondor D, Po sfai M, Csabai I, Vattay G (2014) Do the Rich Get Richer? An Empirical Analysis of the Bitcoin Transaction Network. PLoS ONE 9(2): e86197. doi:10.1371/journal.pone.0086197
- [8] Maillart T, Sornette D, Spaeth S, von Krogh G. Empirical tests of Zipf’s law mechanism in open source Linux distribution. Phys Rev Lett. 2008 Nov 21;101(21):218701. doi: 10.1103/PhysRevLett.101.218701. Epub 2008 Nov 19. PMID: 19113459.
- [9] Malevergne Y, Pisarenko V, Sornette D. Testing the Pareto against the lognormal distributions with the uniformly most powerful unbiased test applied to the distribution of cities. Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Mar;83(3 Pt 2):036111. doi: 10.1103/PhysRevE.83.036111. Epub 2011 Mar 22. PMID: 21517562.
- [10] del Castillo, Joan, Pedro Puig. “The Best Test of Exponentiality against Singly Truncated Normal Alternatives.” Journal of the American Statistical Association, vol. 94, no. 446, [American Statistical Association, Taylor & Francis, Ltd.], 1999, pp. 529–32, https://doi.org/10.2307/2670173.
- [11] Gatto, R. and Rao Jammalamadaka, S. (2002). A saddlepoint approximation for testing exponentiality against some increasing failure rate alternatives. Statistics Probability Letters (Vol. 58).
- [12] Didier Sornette, Rama Cont. Convergent Multiplicative Processes Repelled from Zero: Power Laws and Truncated Power Laws. Journal de Physique I, EDP Sciences, 1997, 7 (3), pp.431-444. ff10.1051/jp1:1997169ff. ffjpa-00247337f
- [13] A. Saichev and D. Sornette, Zipfs law and maximum sustainable growth, Journal of Economic Dynamics and Control 37 (6), 1195-1212 (2013).