Fake Review Detection Model Based on Comment Content and Review Behavior
Abstract
:1. Introduction
- (1)
- A proposed an end-to-end fake review detection (FRD) model.
- (2)
- The proposed FRD model improves the classification performance by linking text and behavioral features, with the former extracted using BERT and the latter captured using transformer.
- (3)
- Experiments were conducted on a real dataset from the Dianping website. The results show that the FRD model proposed in this paper outperforms various comparison models in terms of detecting fake reviews.
2. Related Work
2.1. Fake Comment Detection Based on Text Features
2.2. Detection of Fake Reviews Based on Behavioral Features and Text Features
3. Method
Algorithm 1: FRD fake review detection algorithm |
Data: online review dataset Result: FRD Model for user review classification //Data Engineering 1 (FRD_textcontent_dataset, FRD_reviewerbehavior_dataset, FRD_merchantbehavior_dataset) <- preprocess (FRD_dataset)//Dataset Preprocessing //Model building 2 textcontent processing module 3 B_out <- BERT (FRD_textcontent_dataset)//use BERT-wwm-ext pretrain model to extract review text feature 4 Reviewer behavior processing module 5 R_out <- Transformer Encoder (FRD_reviewerbehavior_dataset)//pass reviewer behavior data through transformer encoder 6 R_out <- GlobalAveragePooling (R_out)//globalpooling transformer encoder output 7 merchant behavior processing module 8 S_out <- Transformer Encoder (FRD_merchantbehavior_dataset)//pass merchant behavior data through transformer encoder 9 S_out <- GlobalAveragePooling (S_out)//globalpooling transformer encoder output 10 Dense processing module 11 Concat_input <- Concat (B_out, R_out, S_out)//concat output from BERT and two transformer encoder 12 Final_output <- Dense (Concat_input)//pass Concat_input throuth dense layer 13 Labelled_review <- Label (Final_output)//Read Final_output from previous layer and label it as real/fake |
- (1)
- Comment text processing module: The preprocessed comment text data are passed to the BERT pre-trained model, which learns the contextual relationships between words in the comment text to generate and output global semantic features.
- (2)
- Reviewer behavior processing module: After encoding the positional information of the preprocessed reviewer behavior sequence data, it is fed into a transformer encoder. The output sequence data from the encoder undergo global pooling to obtain reviewer behavior features, which are then forwarded to the next module.
- (3)
- Reviewed merchant behavior processing module: Similar to the reviewer module, positional encoding is applied to the preprocessed reviewed merchant behavior sequence data, followed which it is fed into a transformer encoder. The resulting sequence data are globally pooled to derive the behavioral features of the reviewed merchant, which are then passed to the subsequent module.
- (4)
- Dense processing module: The global semantic features from the comment text processing module are combined with reviewer and reviewed merchant behavior features. These combined data are then sent to the dense module to perform text classification, labeling the comment text as either a real or fake review.
3.1. Data Preprocessing
3.2. Model Architecture
3.3. Model Training and Optimization
4. Experimental Evaluation
4.1. Experimental Setup
4.2. Experimental Dataset
4.3. Evaluation Metrics
4.4. Experimental Results
- Comparison of the number of different transformer hidden nodes
- b.
- Training process
- c.
- Comparison of FRD model with other models
- (1)
- Based on the method in reference [24], a model for characterizing reviewing behavior was established using the aspect rating and overall rating of the review. This was combined with the modeling of the features of the review content to detect fake reviews. Experimental results show that the AUC value of this model is lower than the AUC value of the proposed FRD model.
- (2)
- Referring to the model in [33], a CNN is used to model the content of review text, and a LSTM is used to model the behavior sequences of reviewers and merchants. The features from both parts are combined for fake review detection. When modeling the content of review texts using a CNN, the word embedding layer in the CNN is based on Tencent’s released word2vec word vector tencent-ailab-embedding-zh-d200-v0.2.0. From the experimental results, it can be seen that the AUC value of this model is lower than that of the proposed FRD model.
- (3)
- Using feature engineering methods, following the approach in [34], features are extracted from the perspectives of reviewer credibility, merchant credibility, and review text credibility. Logistic regression model is then used for training. From the experimental results, it can be seen that the model trained using the feature engineering methods combined with logistic regression has an AUC value lower than that of the proposed FRD model.
- (4)
- The use of the BERT model on the content of review texts for fake review detection.
- (5)
- The use of the RoBERTa model to model the content of review texts for fake review detection.
5. Conclusions and Future Work
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Harris, C. Detecting deceptive opinion spam using human computation. In Workshops at AAAI on Artificial Intelligence; AAAI: Philadelphia, PA, USA, 2012; pp. 87–93. [Google Scholar]
- Heydari, A.; Tavakoli, M.A.; Salim, N.; Heydari, Z. Detection of review spam: A survey. Expert Syst. Appl. 2015, 42, 3634–3642. [Google Scholar] [CrossRef]
- Jindal, N.; Liu, B. Analyzing and detecting review spam. In Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007, Omaha, NE, USA, 28–31 October 2008; pp. 547–552. [Google Scholar]
- Ott, M.; Choi, Y.; Cardie, C. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; Volume 6, pp. 309–319. [Google Scholar]
- Elmurngi, E.; Gherbi, A. Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques. In Proceedings of the IARIA/DATA ANALYTICS, Barcelona, Spain, 12–16 November 2017. [Google Scholar]
- Ren, Y.; Ji, D. Neural networks for deceptive opinion spam detection: An empirical study. Inf. Sci. 2017, 385, 213–224. [Google Scholar] [CrossRef]
- Hajek, P.; Barushka, A.; Munk, M. Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Comput. Appl. 2020, 32, 17259–17274. [Google Scholar] [CrossRef]
- Melleng, A.; Jurek-Loughrey, A.; Deepak, P. Data Fusion for Better Fake Reviews Detection. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, RANLP 2023, Varna, Bulgaria, 4–6 September 2023; pp. 730–738. [Google Scholar]
- Budhi, G.S.; Chiong, R. A Multi-type Classifier Ensemble for Detecting Fake Reviews Through Textual-based Feature Extraction. ACM Trans. Internet Techn. 2023, 23, 1–24. [Google Scholar] [CrossRef]
- Mewada, A.; Dewang, R.K.; Goldar, P.; Maurya, S.K. SentiBERT: A Novel Approach for Fake Review Detection Incorporating Sentiment Features with Contextual Features. In Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing, Noida, India, 3–5 August 2023; pp. 230–235. [Google Scholar]
- Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multim. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
- Gupta, P.; Gandhi, S.; Chakravarthi, B.R. Leveraging Transfer learning techniques-BERT, RoBERTa, ALBERT and DistilBERT for Fake Review Detection. In Proceedings of the Forum for Information Retrieval Evaluation, Virtual, 13–17 December 2021. [Google Scholar]
- Mohawesh, R.; Xu, S.; Tran, S.N. Fake reviews detection: A survey. IEEE Access 2021, 9, 65771–65802. [Google Scholar] [CrossRef]
- Shang, Y.; Liu, M.; Zhao, T.; Zhou, J. T-Bert: A Spam Review Detection Model Combining Group Intelligence and Personalized Sentiment Information; Springer: Cham, Switzerland, 2021; pp. 409–421. [Google Scholar]
- Refaeli, D.; Hajek, P. Detecting fake online reviews using fine-tuned BERT. In Proceedings of the 2021 5th International Conference on E-Business and Internet, Singapore, 15–17 October 2021. [Google Scholar]
- Huang, J.; Qian, T.; He, G. Detecting Professional Spam Reviewers, Advanced Data Mining and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 288–299. [Google Scholar]
- Lim, E.P.; Nguyen, V.A.; Jindal, N.; Liu, B.; Lauw, H.W. Detecting product review spammers using rating behaviors. In Proceedings of the ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; Volume 7, pp. 939–948. [Google Scholar]
- Kumar, A.; Gopal, R.D.; Shankar, R.; Tan, K.H. Fraudulent review detection model focusing on emotional expressions and explicit aspects: Investigating the potential of feature engineering. Decis. Support Syst. 2022, 155, 113728. [Google Scholar] [CrossRef]
- Jindal, N.; Liu, B. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007. [Google Scholar]
- Faris, H.; Ala’m, A.Z.; Heidari, A.A.; Aljarah, I.; Mafarja, M.; Hassonah, M.A.; Fujita, H. An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 2019, 4, 67–83. [Google Scholar] [CrossRef]
- Fang, Y.; Wang, H.; Zhao, L.; Yu, F.; Wang, C. Dynamic knowledge graph based fake-review detection. Appl. Intell. 2020, 50, 4281–4295. [Google Scholar] [CrossRef]
- Savage, D.; Zhang, X.; Chou, P. Detection of opinion spam based on anomalous rating deviation. Expert. Syst. Appl. Int. J. 2015, 42, 8650–8657. [Google Scholar] [CrossRef]
- Manaskasemsak, B.; Tantisuwankul, J.; Rungsawang, A. Detection of fake reviews and reviewers using behavioral graph partitioning integrated with deep neural network. Neural Comput. Appl. 2023, 35, 1169–1182. [Google Scholar] [CrossRef]
- Duma, R.A.; Niu, Z.; Nyamawe, A.S.; Tchaye-Kondi, J.; Yusuf, A.A. A Deep Hybrid Model for fake review detection by jointly leveraging review text, overall ratings, and aspect ratings. Soft Comput. 2023, 27, 6281–6296. [Google Scholar] [CrossRef]
- Zhang, D.; Niu, B.; Li, W.; Wu, C. A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decis. Support Syst. 2023, 166, 113911. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Chen, J.; Wang, M.; Zhang, X.L.; Huang, Z.; Rahardja, S. End-to-end multi-modal speech recognition with air and bone conducted speech. In Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 6052–6056. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Ji, Y.; Zhou, Z.; Liu, H.; Davuluri, R.V. DNABERT: Pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 2021, 37, 2112–2120. [Google Scholar] [CrossRef] [PubMed]
- Ba, J.; Chen, J.; Wang, M.; Muhammad, S.A. A squeeze-and-excitation and transformer based cross-task system for environmental sound recognition. arXiv 2022, arXiv:2203.08350. [Google Scholar]
- Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-Training with Whole Word Masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
- Balwant, M.K. Bidirectional LSTM Based on POS tags and CNN Architecture for Fake News Detection. In Proceedings of the 2019 10th International Conference on Computing and Networking Technology (ICCNT), Kanpur, India, 6–8 July 2019. [Google Scholar]
- Li, J.; Wu, G.; Xie, F.; Yao, X.; Qi, J.; Sun, P. Research of Fraud Review Detection Model on O2O Platform. Acta Electron. Sin. 2016, 44, 2855–2860. [Google Scholar]
Processing Module | Parameters | Value |
---|---|---|
Reviewer behavior processing module | Hidden size | 200 |
Number of encoder blocks | 2 | |
Self-attention heads | 10 | |
Dropout rate | 0.3 | |
Max_seq_length | 100 | |
Reviewed merchant behavior processing module | Hidden size | 200 |
Number of encoder blocks | 2 | |
Self-attention heads | 10 | |
Dropout rate | 0.3 | |
Max_seq_length | 100 | |
Comment text processing module | Number of layers | 12 |
Hidden size | 768 | |
Self-attention heads | 12 | |
Dropout rate | 0.3 | |
Dense processing module | Number of dense layers | 1 |
Dropout rate | 0.3 | |
Batch size | 64 | |
Loss | Binary-cross entropy |
Data Item | Meaning |
---|---|
reviewid | The number of reviews |
reviewbody | The review context |
updatetime | The time of posting a review |
userid | The number of reviewers |
shopid | The number of merchants |
star | The overall rating |
score1 | The taste rating |
score2 | The environment rating |
score3 | The service rating |
Evaluation Parameters | Predictive Value | Actual Value |
---|---|---|
True-positive (TP) | Yes | Yes |
True-negative (TN) | No | No |
False-positive (FP) | Yes | No |
False-negative (FN) | No | Yes |
Hidden Nodes | Accuracy | f1 | AUC |
---|---|---|---|
20 | 0.8893 | 0.8874 | 0.9526 |
32 | 0.8882 | 0.8889 | 0.9516 |
64 | 0.8903 | 0.8916 | 0.9530 |
80 | 0.8892 | 0.8909 | 0.9524 |
128 | 0.8936 | 0.8939 | 0.9522 |
200 | 0.8939 | 0.8948 | 0.9531 |
256 | 0.8917 | 0.8909 | 0.9520 |
Model | Accuracy | f1 | AUC |
---|---|---|---|
FRD—reviewer processing module—reviewed merchant processing module | 0.8556 | 0.8551 | 0.9123 |
FRD—reviewed merchant processing module | 0.8752 | 0.8747 | 0.9397 |
FRD—reviewer processing module | 0.8771 | 0.8745 | 0.9357 |
FRD—star—updatetime | 0.8683 | 0.8688 | 0.9407 |
FRD | 0.8939 | 0.8948 | 0.9531 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, P.; Bi, W.; Zhang, Y.; Wang, Q.; Kou, F.; Lu, T.; Chen, J. Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics 2024, 13, 4322. https://doi.org/10.3390/electronics13214322
Sun P, Bi W, Zhang Y, Wang Q, Kou F, Lu T, Chen J. Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics. 2024; 13(21):4322. https://doi.org/10.3390/electronics13214322
Chicago/Turabian StyleSun, Pengfei, Weihong Bi, Yifan Zhang, Qiuyu Wang, Feifei Kou, Tongwei Lu, and Jinpeng Chen. 2024. "Fake Review Detection Model Based on Comment Content and Review Behavior" Electronics 13, no. 21: 4322. https://doi.org/10.3390/electronics13214322
APA StyleSun, P., Bi, W., Zhang, Y., Wang, Q., Kou, F., Lu, T., & Chen, J. (2024). Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics, 13(21), 4322. https://doi.org/10.3390/electronics13214322