Bitcoin, a decentralized electronic currency system, represents a radical change in financial systems after its creation in 2008 by Satoshi Nakamoto. It was released as an open-source software in 2009 on a peer-to-peer system where transactions take place between users without an intermediary. In contrast to the traditional banking system, Bitcoin allows user to move away from operational fees and authority filled with frauds and corruptions. At the beginning of the year 2017, Bitcoin price was under $1,000, it rocked up to nearly $18,000 at the end of 2017 and now has come down to around $11,000. The sporadic jumps in Bitcoin price has triggered the explosion of worldwide attention in digital currencies. Questions towards the nature of digital currencies and the driven force behind the dramatic rise of Bitcoin price within a short time are raised.

Seeking this as a perfect opportunity I decided to explore bitcoin and do some forecasting of my own using R as my platform. The following is my drafted paper. Do read through it and please give me your suggestions.

__Time Series Analysis for forecasting ____Bitcoin Price using ARIMA models__

__Abstract:-__

As the world’s first decentralized electronic currency system, bitcoin has achieved great success and represents a fundamental change in financial systems. The unique feature of bitcoin is that its price fluctuation doesn’t rely on any institutionalized money regulation. Hence this project will help facilitate bitcoin investor’s future investment and payment decisions, a forecasting model is built using bitcoin historical data. Our data tells us that ARIMA( 2,1,2) model works fine with an AIC of -4195.35 and satisfied assumptions.

__Introduction:-__

Bitcoin allows people to move away from the intermediary banking system that has fees, makes mistakes, and filled with corruption.The goal here is to help potential investors see the value that bitcoin brings to the investors by predicting the price in prior so that they can take a decision. For the analysis Historical data from Kaggle(https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory/data) is pulled. Which consist of 3-years of data with bitcoin price. Containing the variables date, Open Price, High, Low, Close Price, Volume: Number of Transactions, and Market Cap. In this paper only Closing price is taken as the transaction price.

In this study 2 ARIMA models are studied (2,1,0) and (2,1,2). General residual assumptions are satisfied using the Box-Ljung test for testing white noise in residuals, Shapiro test for testing normality. For comparing the models AIC, parsimony, insignificant parameters are checked. In the later stage of the paper bitcoin prices are predicted on the original data.

__Methodology:-__

**1.Creating a Stationary Series:**

**Note**: The code can be looked upon at ( https://github.com/rishanki/Bitcoin_TimeSeries)

The bitcoin price transaction graph as seen as below . It tells us that there is an explosive increasing behavior after Jan 2017 till December 2017. It is definitely not a stationary model as it doesn’t have mean 0 and a variance without long trends.

To verify this claim Augmented-dicker-fuller test is done and it gives a p-value of 0.99 greater than 0.5 ,retaining the null hypothesis that the series is not stationary.

In order to stationarize the series the exponentially increasing trend component is first controlled by doing a log power transformation on the Data. After that difference of this log data is taken to bring the series in stationary condition. The ADF test is again performed and this time the p-value is 0.01 less than 0.5, rejecting our claim that the series is not stationary.

The series looks stationary against timestamp.

**2.Identifying the p,d,q:**

For identifying the p,d,q of the dataset we use the ACF, EACF and BICplot. As we can see the ACF plot shows significant correlations at 2, 4, 6,8 .

The EACF plot shows a triangular pattern of cross and zeroes with a tip at ARMA(2,2), (4,2) or (6,2) .

However to test the presence of an ARMA model a BIC plot is created. It tells us that AR(6) model will be the chosen model with the least BIC value of 6.7. In consideration after the test ARIMA(2,1,0) and ARIMA(2,1,2) will be fitted on the data.

**3.Fitting the Models:**

Model 1: ARIMA(2,1,0) Model 2:ARIMA(2,1,2)

Assumptions Stated- It is a stationary time series dataset.

- E(et)=0
- Var(et) = constant
- Cov(et, et-k) for k>= 1
- et are normally distributed

Residual Analysis:

For the model 1 we see three 3 graphs for residual analysis. A plot of the residuals, the ACF and PACF of the residuals for the first 20 lags. The plot of residuals doesn’t show any long trends, depicting zero mean and constant variance. The ACF plot shows significant lags at 6, 18th autocorrelation. Similar pattern can be seen for the PACF plot too.

The Box-Ljung test results gives a p value of 0.101 that is greater than 0.05, suggesting the white noise model for the residuals is reasonable.

For normality histogram and qqplot shows skewness, flat tails, points deviating from the 45 degree line . The shapiro test output below gives a p-value of 2.2e-16 implying to reject our null hypothesis that the residuals are normal.

Similarly for model 2 we refer to the below output. The plot of residuals doesn’t show any long trends, depicting zero mean and constant variance. The ACF plot doesn’t show significant autocorrelation till lag 45.

Similar pattern can be seen for the PACF plot too. The Box-Ljung test results gives a p value of 0.34 that is greater than 0.05 ,suggesting the white noise model for the residuals is reasonable.

For normality histogram and qqplot shows skewness, flat tails, points deviating from the 45 degree line. The shapiro test output in (A.9.d) shows a p-value of 2.2e-16 , rejecting our null hypothesis that the residuals are normal.

**4.Evaluation and Iteration:**

Over Parameters:

**Significance of additional parameter estimate-**

The added ma1 and ma2 estimate in model 2 is statistically significant as its CI doesn’t contains 0. We are 95% confident that Θ_{1 }is between [(0.8538 + 2*0.257) ,(0.8538 – 2*0.257)] and Θ_{2 }is between [(0.5961 + 2*0.171) ,(0.5961 – 2*0.171)]

**Existing parameter is approximately same for both models-**

The ar1 for both the model 1 and model 2 is around -0.009 and -0.85 showing a significant change. Similarly for ar2 too model1 and model2 estimates are (-0.06 and -0.65) which are significantly different. The additional parameters seem to be significant on this model.

**AIC-**The AIC is smaller for model 2 which is around -4193.32 compared to the model 1 AIC that is -4195.32.

**Parsimony-**Through parsimony also we will prefer to choose model 1 as it has 2 AR component in comparison to an extra 2 MA parts in model 2.

**5.Forecasting Price:**

Here we have created the table at a 95% confidence interval for predicted values These values have been back transformed from the log values to their original values.

**ARMA(2,1,0)-**

**Forecast Values**– Studying the plot we see the forecast values very quickly reaches a constant value of $4323.67 as we forecast ahead upto 100 future values.

**Forecast Error- **Referring to the plot ,The forecast error variance grows larger as L increases. The limits become wider without bonds and our error variance is reaching infinity.

**ARIMA(2,1,2)-**

**Forecast Values**– Studying the plot we see the forecast values in some amount of time approximately increases to reach a constant value of 4317.89 as we forecast ahead upto 100 future values.

**Forecast Error- **Referring to the plot ,The forecast error variance grows larger as L increases. The limits become wider without bonds and our error variance is reaching infinity.

__Results:__

From the above conducted test for finding the best model among the AR and ARMA model we can support the fact that model 2 performs better. Our residual analysis gives a higher alpha value for better white noise residuals and no significant autocorrelations too at least till lag 100. Even when we study test for over parametrization we can imply that the MA estimates are significant to the model. It improved the AIC of the model from -4194 to -4195 too. However ,for forecast values we can observe from our data that for both model actual values lie within the 95% confidence interval generated by our model. Hence, these intervals seem correct.

__Conclusions:__

In this study we are considering ARIMA(2,1,2) as our champion model as justified in the above sections. We have used it to forecast future bitcoin price values and from the output we can see that the forecasted values and predicted values lie within the 95% confidence interval of the model. We have back transformed the log data to get actual price values. The values increase for some before becoming a constant value. Our model will help in forecasting the next day price of bitcoin which will eventually be very helpful to all people investing there dear money in bitcoin.

__Future Scope:__

As bitcoin is a volatile money investment it could be severely impacted by financial assets , macro economic trends or any other social media trends like twitter, google searches etc. For future studies we would like to include these variables in my model Try to find new patterns, work on seasonality to see if we can marginally improve our model and whether these components are resourceful for bitcoin price prediction.

**Note**: The code can be looked upon at ( https://github.com/rishanki/Bitcoin_TimeSeries)