Abstract: This study investigates the influence of sentiment analysis on predicting Bitcoin returns by integrating sentiment data from various news sources with historical Bitcoin prices. Utilizing Long Short-Term Memory (LSTM) modeling for sentiment classification, we analyze 1,826 Google News articles, categorizing sentiments as positive, neutral, or negative. We then employ Support Vector Regression (SVR) to predict Bitcoin log returns, incorporating sentiment features derived from the sentiment analysis. Our findings show that positive sentiment correlates with higher returns, highlighting investor optimism's role in market performance. In contrast, short-term sentiment fluctuations, especially changes from the previous day, negatively impact returns, reflecting investors' preference for stability. The model incorporating these features reduces MSE and MAE while enhancing R-squared, resulting in approximately a 13% increase in accuracy. These findings highlight the critical role of market sentiment in financial forecasting, providing valuable insights for investors in the cryptocurrency landscape. The whole process is conducted with R and Python.
Key Words: Bitcoin; Sentiment Analysis; Statistical Modeling; Financial Forecasting
Contents
1. Introduction
2. Data
2.1 Data Collection
2.2 Data Cleaning
3. Methodology
3.1 Sentiment Analysis
3.1.1 LSTM Modeling
3.1.2 LSTM Results
3.1.3 Sentiment Analysis of Google News Articles
3.2 Feature Engineering
3.2.1 Bitcoin Features
3.2.2 Sentiment Features
3.3 Bitcoin Log Return Prediction
3.3.1 Support Vector Regression Modeling
3.3.2 Support Vector Regression Results
3.3.3 Support Vector Regression Interpreting
4. Conclusion
1 Introduction
The rapid growth of cryptocurrencies has sparked significant interest in understanding the factors that influence their market dynamics. As digital assets like Bitcoin continue to gain traction, the interplay between market sentiment and price fluctuations becomes increasingly crucial. Recent studies suggest that news headlines and social media sentiments can profoundly impact investor behavior and, consequently, market prices. By analyzing sentiment data from various news sources, specifically Google News, Crypto News, Crypto Potato, and Coin Telegraph, alongside historical Bitcoin prices, this research aims to establish a predictive model that leverages sentiment analysis to enhance the accuracy of Bitcoin price forecasts. Through the integration of Natural Language Processing (NLP) techniques and Support Vector Regression, this study seeks to contribute to the emerging field of financial technology, offering valuable insights for investors and researchers alike.
2 Data
2.1 Data Collection
This study draws on three distinct datasets to analyze the interplay between cryptocurrency news sentiment and Bitcoin pricing.
The first dataset encompasses reviews on cryptocurrency news, featuring details such as title, text, source, subject, and sentiment analysis (with each text categorized by positive, neutral and negative) conducted using TextBlob. Sources for this data include Crypto News, Crypto Potato, and Coin Telegraph. This dataset includes the raw text of reviews on cryptocurrencies including Bitcoin and the sentiment classification results of the text obtained through TextBlob, serving as the training set for the LSTM sentiment analysis model in the study.
The second dataset is collected through web scraping from the first page of Google News for various cryptocurrencies over several years. In this study, articles from this dataset are classified using the trained LSTM model, creating sentiment features that will be integrated into the time series model to aid in predictions.
Lastly, the third dataset provides a comprehensive overview of Bitcoin's trading activity, detailing daily market statistics from July 27, 2010, to May 22, 2024. It captures essential fields such as opening, closing, high, and low prices, trading volume, and market capitalization. This dataset is used to obtain market data for Bitcoin and to establish Bitcoin's feature engineering.
2.2 Data Cleaning
I convert sentiment categories in the first dataset into numeric values, specifically, positive to 2, neutral to 1, and negative to 0. This dataset retains only reviews on Bitcoin and deletes reviews on other cryptocurrencies.
The second dataset includes headlines from Google News within the time range of February 26, 2018, to February 25, 2023, based on the daily closing time.
Only the data from the third dataset within the time range of February 26, 2018, to February 25, 2023, will be retained (based on daily closing time) to align with the time frame of the Google News text in the second dataset.
3 Methodology
3.1 Sentiment Analysis
3.1.1 LSTM Modeling
For sentiment analysis, the Long Short-Term Memory (LSTM) model was chosen due to its ability to capture sequential dependencies within the textual data, especially for financial headlines, where temporal sentiment trends are crucial.
For training, the the labeled dataset of financial reviews (positve=2, neutral=1, negative=0) was first cleaned, tokenized, and padded to a fixed length of 256 tokens. Specifically, in the tokenizing part, text sequences were tokenized using a Keras Tokenizer with a vocabulary size of 1000. These sequences were padded to ensure uniform input length, which is necessary for the LSTM architecture. A padding length of 256 tokens was chosen, based on the distribution of sentence lengths in the training data. An 80/20 split was used to divide the data into training and test sets, with a further 90/10 split for validation during training. The temporal order of data in splitting was maintained to ensure that the model’s predictive ability could align with real-world time-series applications. The model was trained over 5 epochs with a batch size of 64, and early stopping was employed to prevent overfitting when validation loss plateaued.
Parameters in the LSTM model training process included:
Epochs: 5 epochs of training
Batch Size: 64
Optimizer: Adam
Loss Function: Categorical cross-entropy, suited for multi-class sentiment classification
The LSTM model consists of an embedding layer with a vocabulary size of 1001, followed by a spatial dropout layer and an LSTM layer with 100 units to capture temporal patterns. A dense layer with 3 units and a softmax activation provided class probabilities. The model was compiled with the categorical cross-entropy loss function, Adam optimizer, and accuracy as the evaluation metric.
3.1.2 LSTM Results
The LSTM model achieved a test accuracy of approximately 68.05%, indicating a moderate performance in sentiment classification. During training, the model demonstrated an upward trend in accuracy, reaching about 80.67% by the end of five epochs, while the validation accuracy peaked at 73.93%. The loss metrics showed a consistent decline throughout training, suggesting effective learning; however, validation loss did not significantly improve in the later epochs. The confusion matrix revealed that the model faced challenges with the "negative" class, attaining a precision of 70% and a recall of around. In contrast, the "positive" class performed better, with a precision of 78% and a recall of 71%. These results highlight the need for improvement, especially in distinguishing negative sentiments.
Fig 1. Loss over Time: This plot shows the training and validation loss across epochs. A decreasing trend in both lines indicates that the model is learning effectively, while any divergence might suggest overfitting.
Fig 2. First Accuracy Plot: This shows the training accuracy and validation accuracy from the initial model training. It captures how well the model performed on the training data and how it generalized to unseen validation data during those epochs.
Fig 3. Second Accuracy Plot: This comes from the second training phase where the model is trained again with different parameters or more epochs. It illustrates a new set of training and validation accuracies, allowing you to compare improvements from the first training cycle. This plot helps assess whether the model has benefited from adjustments made in this second run.
Fig 4. Confusion Matrix: This matrix visualizes the performance of the model on the test set by displaying the true versus predicted classifications for each sentiment label (negative, neutral, positive). It helps identify which classes are being misclassified and to what extent.
3.1.3 Sentiment Analysis of Google News Articles
Given that the LSTM model has demonstrated good performance in sentiment classification, we used the trained model to perform sentiment analysis on articles from Google News, aiming to create sentiment features to add to the time series for enhanced prediction. For the dataset containing a total of 1,826 articles from Google News, which follows a chronological order, I plotted the sentiment classification frequency chart:
Fig 5. Sentiment Classification Frequency Chart of Google News Articles: The frequency of positive articles is significantly high, reaching 84%, while neutral and negative articles each account for 8% of the total.
As a comparison, I plotted the sentiment classification frequency charts using datasets from Crypto News, Crypto Potato, and Coin Telegram for training the model.
Fig 6. Sentiment Classification Frequency Chart of Articles from Crypto News, Crypto Potato and Coin Telegraph: The frequency of positive articles is the highest, reaching 45.1%, neutral articles 32.7%, negative articles 22.2%.
It is evident that the sentiment frequencies of comments from Google News differ significantly from those of the other three websites. To verify that the sentiment proportions of articles from different sources are significantly different, we conducted the following chi-square test:
Null Hypothesis: the sentiment proportions of news articles from different sources are independent, meaning there is no significant association between the source of the articles and the sentiment the articles.
Alternative Hypothesis: the sentiment proportions of news articles from different sources are not independent, indicating a significant association between the source and the sentiment label.
The results of the chi-square test are as follows:
The chi-square test yielded a very small p-value, indicating that we reject the null hypothesis. Thus, we have sufficient evidence to conclude that there is a significant association between the source and the sentiment label.
3.2 Feature Engineering
We create five features related to Bitcoin and four features related to sentiment, with Log_Return serving as our target variable to measure daily Bitcoin returns. The SVR model first is trained based on the Bitcoin features, and its accuracy is evaluated. Subsequently, the model undergoes a second training phase that incorporates the sentiment features. We anticipate that the predictive accuracy of the model in the second training phase to improve compared to the first training, demonstrating the impact of sentiment features in the predictions. Missing values resulting from taking logarithms, calculating ratios, and creating long-term window data are removed, as they made up only 0.01% of the total dataset.
3.2.1 Bitcoin Features
For the Bitcoin features, I created four variables: Log_Return, Market_Cap_Change_Rate, Price_Range, Volume_Change_Rate, and SMA_30. Among these, Log_Return is our target variable, which measures Bitcoin's daily returns.
Log_Return is our target variable, measuring Bitcoin's daily returns, calculated as
Price_Range captures the difference between the highest and lowest prices for the day.
Volume_Change_Rate measures the percentage change in trading volume compared to the previous day, calculated as:
Market_Cap_Change_Rate measures the percentage change in market capitalization compared to the previous day, calculated as:
SMA_30 represents the 30-day simple moving average of Bitcoin prices. This windows feature provides insight into longer-term trends.
3.2.2 Sentiment Features
For the sentiment features, I created four variables: sentiment, sentiment_change, sentiment_ma_7d, and sentiment_std_7d.
Sentiment represents the sentiment attitude for the day, categorized as positive, neutral, or negative, with values of 2, 1, and 0, respectively.
Sentiment_change reflects the instantaneous change in sentiment compared to the previous day, calculated as today's sentiment minus yesterday's sentiment.
Sentiment_ma_7d is the mean sentiment over the current day and the three days before and after (a total of seven days). This window feature examines how sentiment over a slightly longer period influences Bitcoin's daily returns.
Sentiment_std_7d represents the standard deviation of sentiment over the same seven-day window, helping to analyze how the variability in sentiment affects Bitcoin's daily returns.
3.3 Bitcoin Log Return Prediction
3.3.1 Support Vector Regression Modeling
We choose the SVR (Support Vector Regression) model for time series modeling due to its ability to effectively handle nonlinear relationships and its robustness against noise. The model's flexibility in capturing data features through appropriate kernel functions and its excellent performance in high-dimensional data further support its use. To address the model's limitations in accounting for time series characteristics, we include long-term window data, such as SMA_30, Sentiment_ma_7d and Sentiment_std_7d, and ensure that the training and testing sets were split according to chronological order.
Another important reason for choosing the SVR model is that it has low requirements for time series stationarity, making it suitable for our data. After conducting the ADF test on all independent variables, we found that some variables do not exhibit time series stationarity. They are Price_Range and SMA 30. A p-value of 0 in the table indicates that the p-value is so small that the computer rounds it to zero. The results of ADF test when seed equals to 42 are as follows:
Fig 7. The time series plot for Price_Range
Fig 8. The time series plot for SMA_30
To verify whether the inclusion of sentiment features enhances predictive accuracy, we first predict Log_Return using only the Bitcoin features and evaluate the results. Following this, we retrain the model by incorporating both Bitcoin and sentiment features. We anticipate that the second model will demonstrate improved performance in terms of MSE, MAE, and R-squared metrics. Both models are trained using the same training and testing sets, employing the SVR model with a linear kernel. The data is split chronologically, with the first 80% designated as the training set and the remaining 20% as the testing set.
3.3.2 Support Vector Regression Results
The performance of the first SVR model is indicated by the following metrics: the Mean Squared Error (MSE) is 0.00111, the Mean Absolute Error (MAE) is 0.0259, and the R-squared (R²) value is -0.0563. These results indicate that the model may not effectively capture the underlying patterns in the data, as evidenced by the negative R² value. Using Recursive Feature Elimination (RFE), the selected features include Market_Cap_Change_Rate, Price_Range, Volume_Change_Rate, and SMA_30, all of which received a ranking of 1. The coefficients for these features are as follows:
Fig 9. The feature coefficients of features in the first model
These coefficients illustrate the influence of each feature on predicting Log_Return, with Market_Cap_Change_Rate showing the most significant positive effect and SMA_30 demonstrating a negative impact. Overall, this analysis of the first model's performance and feature importance provides a foundation for further refinement in future iterations. Given that the coefficient for Volume_Change_Rate is the lowest, indicating it contributes the least to the model, we decided to remove this feature. This simplification aims to reduce the number of features and streamline the model, potentially enhancing its performance and interpretability in subsequent iterations.
The second SVR model achieves an MSE of approximately 0.00097, indicating improved predictive performance compared to the first model. The Mean Absolute Error (MAE) is about 0.023, demonstrating a reduction in average prediction error. The R-squared value of 0.078 suggests a modest ability to explain the variability in the target variable. Overall, these results indicate a positive impact of including sentiment features in the model.
Fig 10. The feature coefficients of features in the Second model
We observe that the features Sentiment_std_7d and Price_Range have relatively low contributions compared to the other features based on their coefficients. Therefore, we decide to remove these features to simplify the model and proceeded with retraining. This adjustment aims to enhance the model's performance by focusing on more influential predictors.
The third SVR model achieves an MSE of 0.0009655, a Mean Absolute Error (MAE) of 0.0232, and an R-squared (R²) value of 0.083. These metrics indicate a slight improvement in predictive performance compared to the previous models. The model continues to demonstrate the ability to capture some relationships in the data, although the R-squared value suggests that there is still room for enhancement in explaining the variance in the target variable.
Fig 11. The feature coefficients of features in the Third model
3.3.3 Support Vector Regression Interpreting
Comparing the third SVR model to the first highlights significant improvements in predictive performance following the integration of sentiment features.
The first model records an MSE of 0.00111, a MAE of 0.0259, and an R-squared (R²) value of -0.0563, indicating a lack of effectiveness in capturing data patterns. In contrast, the third model achieves an MSE of 0.0009655, a MAE of 0.0232, and an R² of 0.083. This indicates not only a reduction in prediction errors but also a modest ability to explain variance in the target variable.
The feature analysis reveals that in the first model, all selected features were ranked equally, while the third model showcases a more refined selection, emphasizing sentiment_encoded, sentiment_ma_7d, and Market_Cap_Change_Rate as influential predictors. Through RFE feature coefficients, Bitcoin's returns are significantly influenced by the sentiment of the day and the average sentiment over a period, more so than by the volatility of sentiment during that period. More positive sentiment indicates higher returns; conversely, short-term changes in sentiment relative to the previous day are negatively correlated with returns, reflecting a psychological tendency for individuals to pursue stability. Overall, the enhancements in the third model underline the valuable contribution of sentiment features in improving predictive performance. Even though the difference in MSE between the first and third models appears small, it is significant considering that log return represents the logarithm of daily returns.
4 Conclusion
This study highlights the significant impact of sentiment analysis on predicting Bitcoin returns. Our findings reveal that daily sentiment and average sentiment over time are more influential than sentiment volatility. Positive sentiment correlates with higher returns, while short-term sentiment changes relative to the previous day show a negative correlation, reflecting investors' preference for stability.
Using sentiment features from news articles in a Support Vector Regression (SVR) model improved predictive accuracy, achieving a Mean Squared Error (MSE) of 0.0009655 and a positive R-squared value of 0.083. This underscores the importance of sentiment in financial forecasting and offers insights for navigating the cryptocurrency market. Future research could focus on enhancing feature selection and model complexity for better predictive performance.
Comments