IIIS - 2024 Conferences Proceedings

2024 Summer Conferences Proceedings

	Comparison of Machine Learning and Deep Learning Algorithms in Detecting Fake News Li-Jing Chang Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2024, pp. 203-209 (2024); https://doi.org/10.54808/WMSCI2024.01.203	The 28th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2024 Virtual Conference September 10 - 13, 2024 Proceedings of WMSCI 2024 ISSN: 2771-0947 (Print) ISBN (Volume): 978-1-950492-79-4 (Print)
	Authors Information \| Citation \| Full Text \| Li-Jing Chang Department of Journalism and Media Studies, Jackson State University, Jackson, Mississippi, United States Cite this paper as: Chang, L. (2024). Comparison of Machine Learning and Deep Learning Algorithms in Detecting Fake News. In N. Callaos, E. Gaile-Sarkane, N. Lace, B. Sánchez, M. Savoie (Eds.), Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics: WMSCI 2024, pp. 203-209. International Institute of Informatics and Cybernetics. https://doi.org/10.54808/WMSCI2024.01.203 DOI: 10.54808/WMSCI2024.01.203 ISBN: 978-1-950492-79-4 (Print) ISSN: 2771-0947 (Print) Copyright: © International Institute of Informatics and Systemics 2024 Publisher: International Institute of Informatics and Cybernetics
Abstract Detecting fake news has become increasingly urgent amid the constant surge of misinformation across social media and other could weaken individuals’ ability to use accurate information to make informed decisions, fake news could impact our lives in several ways. For example, empirical evidence showed that spreading healthcare rumors could worsen existing pandemics. Likewise, false financial information could mislead investors into making poor investment decisions and suffering capital losses. Additionally, fabricated scientific claims can misguide policymakers, leading to poor choices that may have long-term consequences. As another common daily phenomenon, deceptive product reviews could lure customers to make unnecessary purchases. As such, an effective mechanism to identify fake news will be the first step to combating it to alleviate its social and economic impact and provide the much-needed safeguard for information integrity in the digital era. Dozens of studies have used the following machine learning algorithms to detect fake news: Support Vector Machine (SVM), Logistic Regression (LR), Passive-Aggressive Classifier (PAC), Stochastic Gradient Descent (SGD), Random Forest (RF), Naïve Bayes (NB), decision tree (DT), XGBoost (XGB), AdaBoost (AB), Gradient Boosting (GB), and K-nearest neighbors (KNN). In addition, past studies have also used deep learning algorithms such as BERT, Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Neural Network (NN) to build fake news detection models. The current study compares these algorithms’ accuracies in detecting fake news. The dataset for the study comes from the ISOT dataset, which has 23,481 fake news and 21,417 real news. The dataset was preprocessed to delete punctuations, links, special characters, and stop words. Other text preprocessing steps included lowercasing and stemming to eliminate unnecessary information and reduce data size. After that, the data was first trained in machine learning algorithms. The text data was tokenized through the TF-IDF procedure, as past research showed that such a process could improve model performance. The tokenized data was split to train and test sets with an 80:20 ratio, and the training dataset was used to train each machine learning algorithm with a grid search via 10-fold cross-validation. Each trained machine learning model was individually evaluated with the test dataset to see its performance. Following the training of the machine learning models, the text data was preprocessed for the deep learning models of LSTM and Bi-LSTM to have fixed vocabulary size and sequence length. The preprocessed data was split into train and test datasets with the same 80:20 ratio. Then, the LSTM and Bi-LSTM models were specified to each have input, embedding, dropout, LSTM, dropout, and output layers. The LSTM and Bi-LSTM models were each trained with ten epochs on the train data. The trained models were also evaluated using the test data. Finally, the text data was split into train and test datasets to train and evaluate the BERT deep learning model. The train and test datasets were tokenized and padded to limit the input sequence length. Afterward, the BERT model was trained via the train data with ten epochs. The trained BERT model was later evaluated with the test dataset. After the machine learning and deep learning models were trained and evaluated, the results were used to compare model performance. The comparison showed BERT as top performer with accuracy rate of 99.95%, followed by Bi-LSTM (99.00%), LSTM (98.81%), SVM (98.65%), LR (98.52%), SGD (98.39%), PAC (98.27%), NN (98.00%), RF (97.96%), AB (97.13%), NB (96.97%), DT (95.90%), XGB (94.00%), GB (92.82%), and KNN (61.10%). Except for KNN, 14 of the 15 algorithms tested have an accuracy rate exceeding 90.0%. The findings showed the BERT model’s accuracy level and the potential of some machine learning models, such as NB and DT.
Full Text