This paper was presented in the SAS Global Forum , Denver Conference 2018. A special thanks to my team for putting this across. We were selected as the Top 8 teams from 60+ participating teams in the student symposium competition. Our team was named “The number Cruncher”.You can find the link of the proceedings here: https://www.sas.com/en_us/events/sas-global-forum/program/awards-academic-programs.html#student-symposium
Happy Reading!. Please feel free to drop a like or comment at the end of the article.
As the world’s first decentralized electronic currency system, Bitcoin has achieved great success and represents a fundamental change in financial systems. The unique feature of Bitcoin is that its price fluctuation relies mostly on people’s pertinent opinions instead of institutionalized money regulation. Therefore, understanding the interplay between social media and the value of Bitcoin is crucial for Bitcoin price prediction. In this study, related comments posted on the Bitcoin forum were analyzed. Conceptual link of extracted key-words of interests were developed. Five major clusters to help gain insight into Bitcoin’s user opinion were obtained from text mining analysis. Furthermore, cross-correlation between Bitcoin price fluctuation and web search—Google Trends, social media—Twitter were examined respectively. To better facilitate Bitcoin investor’s future investment, forecasting models were built, and the effectiveness of proposed models were validated based on AIC and MAPE value comparison. Forecasting model using both Bitcoin daily transaction data and Google Trends data was selected based on better performance.
Bitcoin, a decentralized electronic currency system, represents a radical change in financial systems after its creation in 2008 by Satoshi Nakamoto. It was released as an open-source software in 2009 on a peer-to-peer system where transactions take place between users without an intermediary. In contrast to the traditional banking system, Bitcoin allows user to move away from operational fees and authority filled with frauds and corruptions. At the beginning of the year 2017, Bitcoin price was under $1,000, it has rocked up to nearly $14,000 lately, a gain of 14 times. The unprecedented jump in Bitcoin price has triggered the explosion of worldwide attention in digital currencies. Questions towards the nature of digital currencies and the driven force behind the dramatic rise of Bitcoin price within a short time are raised.
Previous studies have shown that one of the unique features of Bitcoin is that its price fluctuation relies mostly on people’s pertinent opinions instead of institutionalized money regulation. Twitter as one of the major social media platforms gathers multidimensional perspectives from people worldwide. As an example, at the end of November 2017, Warren Buffet, an influential figure in finance, tweeted his optimistic opinion towards the cryptocurrency world and offered to send 1 $BTC to everyone who retweeted his post if Bitcoin hits $12,500 by the next day. The positive post toward the powerful cryptocurrency that appeared on Warren Buffet’s social media together with other optimistic perspectives from varies platforms had greatly boosted Bitcoin’s attraction. Based on these previous findings, understanding the interplay between social media and the value of Bitcoin becomes crucial in understanding the key factors behind its price fluctuation.
The present study examines correlation between social media such as the volume of Tweets and Bitcoin price fluctuation. Additionally, the impact of web search data such as Google Trends on Bitcoin price was also explored. To gain insights into user’s opinion formed around the Bitcoin topic, users comments from an online forum were extracted and analyzed. To better facilitate Bitcoin investor’s future investment, forecasting models with different properties were proposed and compared.
Four different data sources were used during the study (Appendix A):
- Historical BTC/USD
- Bitcoin forum
- Google Trends
Figure1. Project Flow
TEXT MINING–USER OPINIONS
With the increasing popularity of Bitcoin, a growing number of Bitcoin users share information on online forums. Bitcoin Forum (https://Bitcointalk.org/) as one of the popular online communities for Bitcoin users, it provides a good platform for users to discuss topics like Bitcoin mining, development, technical issues and the general Bitcoin ecosystem. Gathered all the comments under the general section of Bitcoin Discussion, terms associate with key concept was explored. To better understand what are the major concerns among users when they talk about Bitcoin, Large amount of user comments toward Bitcoin were clustered into several major clusters through text mining analysis using SAS Enterprise Miner text mining node.
I. CONCEPT LINKS OF KEY WORDS OF INTERESTS
In this part of the study, a characteristic is considered as a concept describing a certain phenomenon or a subject. A set of key words whose meanings were relevant are used to construct a concept. From the text interactive filter, it is shown that the concept ‘Bitcoin’ is strongly associated with terms like profit, future, increase, popular, country, ban, value, China (Appendix B Fig. B.2). Words constituting the concept ‘China’ are Asia, Russia, contributor, big, fail, ban, effect, and country as shown in (Appendix B Fig. B.3).
II. TEXT CLUSTERS
There are 5 major clusters identified from Bitcoin forum’s user comments through text mining. The content and frequency of the relevant clusters are summarized as follows:
Figure 2. Bitcoin Forum User Opinion Clusters Summation
From the text cluster results, perception about Bitcoin is the most discussed topic among users. Following up, the investment value of Bitcoin and the legality of cryptocurrency in general also caught a lot of attention among forum users. China as a big economic entity, its impact on the price fluctuation of Bitcoin forms a big topic in the forum discussion. Security issue relates to Bitcoin and other cryptocurrencies also raise concerns among users.
TWITTER vs. BITCOIN PRICE
With the development of technology, interested topics are usually discussed in social media such as Twitter. The emerging of Bitcoin, a digital currency, which does not need middle men in transactions draws tremendous attention from the public because of the huge jump in the price recently. While the beginning of this year, Bitcoin price was only under $1,000, the price now has rocked up to nearly $14,000, a gain of 14 times. Many people wanted to know what Bitcoin is and why it experienced a dramatic rise within a short time, and above all, how can they buy it and make a profit in a short time. The graph below shows the trend of Bitcoin price and number of Tweets over the last month.
Figure 3. Trend Pattern Comparison Between Bitcoin Price and Tweets Volume
As shown in Figure 3, there was seemingly a similar trend between these two lines. The higher the price is, the larger the number of Tweets occurs. A correlation test was performed to make sure the interpretation unbiased. The Pearson test in Fig. C.1 (with p-value less than 0.05) shows that there is a strong correlation of 0.86 between Bitcoin price and the volume of Tweets. The test again proved that the higher Bitcoin price is the more people talk about it in social media.
GOOGLE TRENDS vs. BITCOIN PRICE
In this part of the study, correlation between web search data and Bitcoin price was examined. It is shown as in the Figure 4 that there is a causality relationship between the Bitcoin price and web search component: Google Trends. The max correlation is at lag 0 and the next best is for lag -1 followed by lag -2 as displayed in Figure 5. The correlation is positive implying that when the present day Google search increases, the Bitcoin price increases for the next day. To justify the above observation statistically, the Granger causality test was employed. The output of the test (Appendix D Fig. D.22) confirms that Google Trends influenced the Bitcoin price over the time.
Figure 4 and 5: Cross-Correlation Function between Google Trends and Bitcoin Price
TIME SERIES MODELING
A series of auto regressive models were employed on the stationary data for model comparison. ARMA(p,q) model could have been used to model returns data. However, In this project modelling of original data is done using ARIMA(p,d,q) models.
Figure 6. Bitcoin Time Series Model Comparison—Fits Statistics
From the summarized table (Figure 6), it is shown that the ARIMAX (2,1,0) model with the google trend as an independent variable (X) has the lowest AIC and MAPE value. General assumptions of time series model such as stationarity, normality and significant parameters were examined and satisfied. The detailed description regarding to the output of each model can be referred as in Appendix D.
The ARIMAX (2,1,0) model was selected for Bitcoin price forecasting, based on lowest AIC and MAPE. Employing SAS Studio, we selected 2 weeks ranging from Oct 4th 2017 to Oct 18th 2017 with 14-day holdout sample for validation.
Figure 7. Comparison of Bitcoin Actual and Predictive
To evaluate the prediction result, the forecasted price was cross verified with actual price. The model doesn’t forecast more days as in ARIMA the forecast values start approaching the mean value. The detailed description of forecasting procedure can be referred as from the Appendix D.
With Bitcoin’s recent breakthrough of the $10,000 barrier, its acceptability and popularity has drawn much attention in multiple ways. The present study is noteworthy in that three major perspectives were taken into consideration in understanding key driven factors behind Bitcoin’s price fluctuation, especially the impact of both web search trends and social media. User opinions were identified from text analysis of Bitcoin forum’s user comments. It shows that Bitcoin perception, investment value, cryptocurrency legality, China and Security are five major concerns among users. Both Google Trends and Twitter volume present correlation with Bitcoin’s price fluctuation. Specifically, Twitter volume and Bitcoin price shows a strong positive correlation. In addition, the proposed forecasting model using web search component–Google Trends as an extra predictor yields better Bitcoin price prediction performance with the MAPE of 1.07%.
For future studies, we would like to explore more on the Twitter data with a longer extension of the time frame. Additionally, we plan to examine the coefficient between keywords of interest and Bitcoin price. Along these lines, the contribution of these keywords to Bitcoin price prediction would be worth investigation. Furthermore, we expect to combine different time series models with special events added to have better accuracy in prediction. We will also consider to examine other financial assets on the bitcoin price so that we can control the impact of economic trends.
- Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008.
- Li N, Wu DD. Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision support systems. 2010;48(2):354–68.
- Juea W, Jian-pinga Z, Bao-huab Z, Cheng-ronga W. Online Forum Opinion Leaders Discovering Method Based on Clustering Analysis [J]. Computer Engineering. 2011;5:017.
- Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. Virtual world currency value fluctuation prediction system based on user sentiment analysis. PloS one. 2015;10(8):e0132944. pmid:26241496
- Kim YB, Lee SH, Kang SJ, Choi MJ, Lee J, Kim CH. When Bitcoin encounters information in an online forum: Using text mining to analyses user opinions and predict value fluctuation. PloS one. 2017;12(5):e0177630. pmid:26241496
- Matta M, Lunesu I, Marchesi M, editors. Bitcoin Spread Prediction Using Social and Web Search Media. UMAP Workshops; 2015.
- Bitcoin Wiki, Available from https://en.Bitcoin.it/wiki/Main_Page
- Hodgson, C. Retrieved from Business Insider: http://www.businessinsider.com/the-worlds-unbanked-population-in-6-charts-2017-8
- Kevin Lu. What is Bitcoin’s correlation with other financial assets? https://www.signalplot.com/what-is-bitcoins-correlation-with-other-financial-assets/