Next Article in Journal
BTIP: Branch Triggered Instruction Prefetcher Ensuring Timeliness
Next Article in Special Issue
Comparative Analysis of Graph Neural Networks and Transformers for Robust Fake News Detection: A Verification and Reimplementation Study
Previous Article in Journal
Conditional Community Search Based on Weight Information
Previous Article in Special Issue
Quantum-Inspired Fusion for Open-Domain Question Answering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fake Review Detection Model Based on Comment Content and Review Behavior

1
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
*
Author to whom correspondence should be addressed.
Submission received: 3 September 2024 / Revised: 16 October 2024 / Accepted: 30 October 2024 / Published: 4 November 2024
(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)

Abstract

:
With the development of the Internet, services such as catering, beauty, accommodation, and entertainment can be reserved or consumed online. Therefore, consumers increasingly rely on online information to choose merchants, products, and services, with reviews becoming a crucial factor in their decision making. However, the authenticity of reviews is highly debated in the field of Internet-based process-of-life service consumption. In recent years, due to the rapid growth of these industries, the detection of fake reviews has gained increasing attention. Fake reviews seriously mislead customers and damage the authenticity of online reviews. Various fake review classifiers have been developed, taking into account the content of the reviews and the behavior involved in the reviews, such as rating, time, etc. However, there has been no research considering the credibility of reviewers and merchants as part of identifying fake reviews. In order to improve the accuracy of existing fake review classification and detection methods, this study utilizes a comment text processing module to model the content of reviews, utilizes a reviewer behavior processing module and a reviewed merchant behavior processing module to model consumer review behavior sequences that imply reviewer credibility and merchant review behavior sequences that imply merchant credibility, respectively, and finally merges the two features for fake review classification. The experimental results show that, compared to other models, the model proposed in this paper improves the classification performance by simultaneously modeling the content of reviews and the credibility of reviewers and merchants.

1. Introduction

Online reviews play an important role in various online shopping environments, such as online hotel and restaurant reservations, e-commerce website consumer scenarios, etc. Consumers use reviews to evaluate the cost-effectiveness of different products and seller services and to make purchasing decisions. In recent years, the number of users using online shopping platforms has significantly increased. Taking the Dianping platform as an example, according to the Dianping catering market survey report in 2023, the total number of active users on the Dianping platform reached 470 million, with a daily transaction data figure of 36.82 million and a daily turnover of 73.157 billion RMB. On 31 December 2023, the number of merchants on the Dianping platform reached 1.84 million.
However, some unethical merchants, for their own benefit, employ reviewers to write negative reviews to mislead consumers, thereby tarnishing the reputation of their competitors. In addition, the authenticity of positive reviews also needs to be verified, as many merchants may induce consumers to give positive reviews through various means, such as cashback or free products.
Fake reviews have become a serious issue in the Internet era. With the popularity of online shopping, restaurant reviews, and other services, users often rely on the reviews of others to make important decisions. Fake reviews seriously impact and threaten the authenticity of the online shopping environment, misleading users and causing negative effects on the reputation of merchants and platforms. Therefore, detecting and removing these fake reviews has become an important research direction. The Dianping platform once established an integrity team consisting of nearly 300 engineers to identify fake reviews. However, manual detection is costly, time-consuming, and inaccurate compared to automated fake review detection methods [1]. Therefore, developing an effective fake review detection model is crucial for maintaining fair trade and providing trustworthy information.
Currently, significant progress has been made in the automatic detection of fake reviews. Fake review detection methods can generally be divided into two types: (1) those based solely on the content features of the review and (2) those combining the content features of the review with reviewer behavior features. There have been many studies on the first type [2,3,4,5,6,7,8,9,10], with many using word embeddings (such as word2vec) to process raw text and input it into machine learning or deep learning classifiers, such as convolutional neural networks (CNNs) and long short-term memory (LSTM) models. Recently, some studies have utilized bidirectional encoder representations from transformers (BERT) to detect fake reviews [11,12,13,14,15]. However, this method does not combine behavior features.
For the second type of research, previous studies mainly combine raw behavior features or engineered behavior features with text features [16,17,18]. Existing studies rarely combine transformed behavior features with context-aware text representations to detect suspicious reviews. This paper proposes a method that combines reviewer behavior and review content features to improve the accuracy of fake review detection.
The main contributions of this paper are summarized as follows:
(1)
A proposed an end-to-end fake review detection (FRD) model.
(2)
The proposed FRD model improves the classification performance by linking text and behavioral features, with the former extracted using BERT and the latter captured using transformer.
(3)
Experiments were conducted on a real dataset from the Dianping website. The results show that the FRD model proposed in this paper outperforms various comparison models in terms of detecting fake reviews.
The structure of this paper is as follows: Section 2 briefly introduces an overview of work related to fake review detection tasks. In Section 3, the proposed FRD model is described in detail. Section 4 discusses the experimental settings and evaluation metrics, as well as the experimental and performance analysis results, of the proposed model, and possible future directions are introduced in Section 5.

2. Related Work

Currently, people heavily rely on online shopping. From this perspective, online user reviews are very important and one of the current hot research topics. Since 2007, a lot of work has been dedicated to detecting fake reviews [19]. The features used to detect fake reviews can be classified mainly as text-based and behavior-based features. Text features refer to the analysis and description of the content of comment text. Behavioral features describe the non-linguistic features of comment data corpora, mainly involving the behaviors of commenters, such as the distribution of evaluation scores and comment frequency.

2.1. Fake Comment Detection Based on Text Features

Liu [3] proposed a comment detection model that uses comment similarity and product features as criteria. The article distinguished three types of repeated/near-repeated comments and used manually labeled training samples to detect fake comments using supervised learning. Ott [4] proposed a text polarity classification model based on comment data corpora; created a dataset containing 400 comments, divided into true and false groups; and calculated the effectiveness of the model using different classifiers based on different system configurations. Farris [20] presented an intelligent detection system based on genetic algorithms and random weighted networks for detecting spam in emails. During the detection process, the most relevant features are detected through automatic recognition. Elmurngi [5] applied sentiment analysis and text classification methods to a dataset of movie comments, using supervised machine learning techniques such as Naive Bayes, Support Vector Machine (SVM), K-Star, KNN, and Decision Trees to classify comments. Y Ren [6] used methods such as RNN, Average GRNN, GRNN, CNN, and Bidirectional Average GRNN to classify data corpora. The article used CNN to learn sentence representations, and then combined the sentence representations with gated recurrent neural networks to model information found in discourse and generate document vectors. Hajek [7] proposed two neural network models that combine traditional bag-of-words models with word context and consumer sentiment to classify fake comments.
Rohit Kumar Kaliyar [11] proposed a deep learning method based on BERT called FakeBERT, which combines multiple parallel single-layer deep CNN blocks with different kernel sizes and filters with BERT to detect fake news. Other studies also used BERT to detect fake comments [12,13,14,15]. For example, Mohawesh et al. [13] extensively applied BERT, DistilBERT, and RoBERTa to detect fake comments, but did not combine the behavioral features of comments. Alimuddin Melleng [8] combined different data representation methods, including sentiment, document embeddings, n-grams, and noun phrases, to enhance the detection of fake comments. Gregorius Satia Budhi [9] proposed an ensemble model based on text features to detect fake online consumer comments. This included several custom machine learning classifiers (including deep learning models) as base classifiers. Arvind Mewada [10] proposed a method to introduce emotional features into the comment detection process, using a pre-built sentiment lexicon to enhance detection performance.

2.2. Detection of Fake Reviews Based on Behavioral Features and Text Features

While the above methods perform well in detecting fake reviews, Fang [21] argues that detecting fake reviewers is an area that requires further research. The article points out that fake review detection often ignores the correlation between review text and time, which is often hidden in the background context of the reviews. In addition, fake review detection should consider the complex, high-dimensional, and heterogeneous relationships between reviewers, reviews, stores, and products. Savage [22] studied the manipulation of product average ratings by fake review posters, focusing on the differences between the ratings of fake review posters and the majority of honest reviewers, and proposed a lightweight and effective method based on these differences to detect fake review posters. The proposed method uses binomial regression to identify reviewers who have rating proportions that deviate significantly from the majority opinions. Huang [16] pointed out that fake reviewers often post reviews frequently within specific time periods. Therefore, the article sets a time frame for review posting and estimates whether the frequency of posting fake reviews differs from that of genuine reviews. Lim [17] surveyed a large number of reviewers and found two characteristics of fake reviewers: (1) they post reviews targeting specific products or online services; (2) their reviewing patterns, styles, and posting frequencies are usually different from those of genuine reviewers. Based on these two review characteristics, the article models reviewer behavior and calculates reviewer scores using algorithms. Ajay Kumar [18] proposed a feature engineering method to extract multiple reviewer-related features from the dataset and combined these features into a unified model to represent the overall behavior of fake reviewers. Manaskasemsak [23] used behavior graphs of reviewers with common features to detect fake reviews and further analyzed the semantic content and emotions expressed in reviews to enhance detection performance. Duma [24] proposes a novel Deep Hybrid Model for fake review detection, which jointly learns from latent text feature vectors, aspect ratings, and overall ratings. This study uses rating behavior data, but it also pertains to the rating of corresponding comments. Further exploration can be carried out from the perspective of modeling the features of the reviewers behind these reviews.
Zhang [25] proposed a novel end-to-end framework to detect fake reviewers based on behavior and textual information. It had two key components: (1) a behavior-sensitive feature extractor that learns the underlying patterns of reviewing behavior; (2) a context-aware attention mechanism that extracts valuable features from online reviews. The model in this study is used to identify fake reviewers. In online review platforms, fake reviewers and fake reviews are sometimes not consistent, and so further exploration is needed to identify fake reviews.
In summary, for the detection of fake reviews, many previous studies have mainly relied on feature engineering methods such as word frequency and syntactic analysis. However, these methods often depend on manually designed rules and are difficult to adapt to the complex and varied forms of fake reviews. With the rise of deep learning, methods based on neural networks have gradually become mainstream, with various sequence models (RNN, LSTM, BiLSTM, etc.) and Text-CNN models being used for text content processing. However, these models have shortcomings in terms of addressing long-term dependencies between words and parallelization during training. Therefore, in recent years, research has adopted the BERT model to address these issues. The BERT model uses transformer encoder modules to extract high-level contextual word embeddings, and its self-attention mechanism facilitates parallelized training. However, there are not many studies combining review content with review behaviors (including reviewer behaviors and the merchant behaviors being reviewed) to detect fake reviews. Existing studies have proposed modeling review behaviors through feature engineering methods, but these methods rely on the quality of feature engineering. Recent studies have also incorporated the modeling of review behaviors, but they focus on behavioral features directly related to reviews such as review ratings. Currently, there is no research from the perspective of simultaneously modeling the content of reviews and using behavior features to model the credibility of reviewers and the credibility of businesses being reviewed. The method proposed in this paper combines automatically extracted reviewer behavior features, the merchant behavior features being reviewed, and the content features of reviews in order to detect fake reviews, thereby enhancing detection performance.

3. Method

The following section introduces the design and methods of the proposed FRD fake review detection model. Figure 1 shows the architecture of the FRD model, from left to right, including the comment text processing module and the reviewer and reviewed merchant behavior processing module, as described in Algorithm 1, with the dense processing module above that modeling the features taken from the three. The Chinese text in Figure 1 comes from the Chinese dataset used in this paper, where “口感” means “taste” and “服务” means “service.” The dataset used is the review dataset from the Dianping website, which includes the review text, scores, and review posting times.
Algorithm 1: FRD fake review detection algorithm
 Data: online review dataset
Result: FRD Model for user review classification
 //Data Engineering
1 (FRD_textcontent_dataset, FRD_reviewerbehavior_dataset, FRD_merchantbehavior_dataset) <- preprocess (FRD_dataset)//Dataset Preprocessing
//Model building
2 textcontent processing module
3   B_out <- BERT (FRD_textcontent_dataset)//use BERT-wwm-ext pretrain model to extract review text feature
4 Reviewer behavior processing module
5   R_out <- Transformer Encoder (FRD_reviewerbehavior_dataset)//pass reviewer behavior data through transformer encoder
6   R_out <- GlobalAveragePooling (R_out)//globalpooling transformer encoder output
7 merchant behavior processing module
8  S_out <- Transformer Encoder (FRD_merchantbehavior_dataset)//pass merchant behavior data through transformer encoder
9   S_out <- GlobalAveragePooling (S_out)//globalpooling transformer encoder output
10 Dense processing module
11 Concat_input <- Concat (B_out, R_out, S_out)//concat output from BERT and two transformer encoder
12   Final_output <- Dense (Concat_input)//pass Concat_input throuth dense layer
13 Labelled_review <- Label (Final_output)//Read Final_output from previous layer and label it as real/fake
The data preprocessing in this study involves typical natural language processing procedures such as removing stop words and tokenization for comment texts. Additionally, it includes the processing of reviewers and reviewed merchants in relation to comment texts, generating sequences of reviewer behaviors and sequences of reviewed merchant behavior. The model construction integrates semantic information from comment texts and behavioral features from reviews to identify fake reviews, comprising four modules.
(1)
Comment text processing module: The preprocessed comment text data are passed to the BERT pre-trained model, which learns the contextual relationships between words in the comment text to generate and output global semantic features.
(2)
Reviewer behavior processing module: After encoding the positional information of the preprocessed reviewer behavior sequence data, it is fed into a transformer encoder. The output sequence data from the encoder undergo global pooling to obtain reviewer behavior features, which are then forwarded to the next module.
(3)
Reviewed merchant behavior processing module: Similar to the reviewer module, positional encoding is applied to the preprocessed reviewed merchant behavior sequence data, followed which it is fed into a transformer encoder. The resulting sequence data are globally pooled to derive the behavioral features of the reviewed merchant, which are then passed to the subsequent module.
(4)
Dense processing module: The global semantic features from the comment text processing module are combined with reviewer and reviewed merchant behavior features. These combined data are then sent to the dense module to perform text classification, labeling the comment text as either a real or fake review.

3.1. Data Preprocessing

Data preprocessing involves processing both review texts and review behaviors. Firstly, the preprocessing of review texts focuses on breaking down longer input paragraphs into smaller sentences while preserving sentence boundaries for subsequent processing. Secondly, the preprocessing of review behaviors involves extracting the review behavior sequences related to each reviewer, including shop ID, score, star ratings for various aspects (e.g., taste, environment, service), and posting time, as well as the review behavior sequences related to the reviewed merchant, including user ID, score, star ratings, and posting time. In this context, the user ID represents the reviewer’s identification number, while the shop ID represents the merchant being reviewed. Each reviewer and the reviewed merchant associated with a review are different, and so the lengths of review sequences for corresponding reviewers and reviewed merchants in the dataset are different. Specifically, the number of reviews that each reviewer may have published and the number of reviews that each reviewed business may have received are different. In this paper, we limit the sequence length to 100, and when processing any sequence exceeding this length, we discard the extra behavior sequences. For the vast majority of samples in the dataset, they can be fully fed into the model for training. The preprocessing steps prepare the review texts alongside the corresponding review behavior sequences of reviewers and reviewed merchants as inputs for the FRD model proposed in this study.

3.2. Model Architecture

The model comprises four main components: a comment text processing module, a reviewer behavior processing module, a reviewed merchant behavior processing module, and a dense processing module. The structures of each processing module are described below.
(1) The reviewer behavior processing module and the reviewed merchant behavior processing module are utilized to model the features of reviewer behavior. These two modules share the same model structure; therefore, the model structure of the reviewer behavior processing module is detailed here. The reviewer behavior processing module employs a transformer encoder to extract behavior features. The transformer architecture [26] incorporates stacked multi-head self-attention and fully connected layers, enabling it to capture more contextual information than RNN. It also utilizes positional embeddings to encode additional information about token positions. Consequently, transformer has been swiftly applied in various fields such as machine translation [27], speech [28], image [29], and genomics [30]. The processed reviewer’s behavior data are typical sequential data, including reviewer’s review behaviors at different time points (such as overall rating, aspect ratings, etc.). Transformer architecture has shown good modeling effectiveness in handling this type of sequential data in recent years. In the task of fake review detection, using a transformer encoder to extract the features of reviewer behaviors enables the capture of behavioral features exhibited by users during the review process, such as commenting frequency, the distribution of review scores, etc. By modeling these features, a better differentiation between authentic and fake reviews can be achieved.
In the reviewer behavior processing module based on the transformer encoder, the preprocessed reviewer behavior sequences serve as its input, consisting of three core modules: scaled dot-product attention, multi-head attention, and a position-wise feed-forward network (FFN).
In the encoder architecture, the input sequence undergoes a learnable linear projection to obtain queries (Q), keys (K), and values (V). Subsequently, the scaled dot-product attention mechanism is employed, which essentially functions as a self-attention mechanism in order to effectively combine the different positions of the input sequence to generate a representation, as illustrated in Figure 2.
The dot-product attention mechanism is applied to these queries, keys, and values. It can be observed from Figure 2 that attention values are derived from keys and queries, and then used to weight and compute the values to produce the output of this module. The computations of individual scaled dot-product attention can be performed in parallel.
Q i = Z W i Q , K i = Z W i K , V i = Z W i V , i   [ 1 , h ]
h e a d i = A t t e n t i o n Q i , K i , V i = s o f t m a x   ( Q i K i T d ) V i
In the review behavior data processed in this paper, each assessment of the reviewer includes the ID of the reviewed merchant, the review score, the scores of individual review items, and the review release time. It is possible to extract the credibility-related features of the reviewer from the distribution of review scores and review time, which may indicate whether the reviewer has posted fake reviews. Similarly, the credibility features of the merchant can also be extracted from the reviewed behavior data of the merchant, which may indicate whether the merchant has possibly hired a group of people to post fake reviews. This paper uses the aforementioned scaled dot-product attention mechanism to model the features of the review behavior sequence.
A multi-head attention mechanism is utilized to considering the various forms of relationship existing between the elements in the input sequence in the encoder architecture. As depicted in Figure 3, multi-head attention comprises multiple scaled dot-product attention modules.
Within each multi-head attention module, each scaled dot-product attention module focuses on a different subspace of the input vectors, enabling the extraction of richer feature information independently from each subspace.
M u l t i H e a d = C o n c a t   ( h e a d 1 , , h e a d h ) W o
In this paper, a multi-head attention module is utilized to more comprehensively model the correlations between behavior in the user review behavior sequence.
The position-wise feedforward network performs the same independent operation on each element at every position, consisting of two linear transformations with rectified linear unit (ReLU) activation in between. In addition to these three core modules, the encoder incorporates multiple residual and normalization layers, with layer normalization being utilized in this study [31].
M i d = L a y e r N o r m   ( Z + M u l t i H e a d )
F F N = R e L U M i d W 1 + b 1 W 2 + b 2
O u t p u t = L a y e r N o r m   ( M i d + F F N )
where Z R l × d is an input of length l and dimension d, while Q i , K i , V i R l × d / h are the transformed query, key, and value, respectively. W i Q , W i K , W i V R d × d / h and W O R d × d are trainable parameter matrices. The output of the position-wise feed forward network is denoted as FFN, where W 1 R d × d f f n , W 2 R d f f n × d , b 1 R d f f n , and b 2 R d . In this study, the dimension d represents the embedding encoding of the review behavior sequence (shop ID, score, star1, star2, star3, posting time) within the reviewer behavior processing module, while in the reviewed merchant behavior processing module, d represents the embedding encoding of the reviewed behavior sequence (user ID, score, star1, star2, star3, and posting time). The hyperparameters are set to be the same in both processing modules, with d set at 20, h at 10, and d f f n at 256.
(2) The comment text processing module is utilized for the feature modeling of comment content. It employs the BERT pre-trained model to extract features from the review text. BERT, based on the transformer structure, is a pre-trained language model that, after fine-tuning, can be applied to various downstream tasks such as classification, question-answering, and sequence-to-sequence learning, demonstrating superior accuracy compared to models trained from scratch. In the task of distinguishing fake reviews, utilizing BERT for feature extraction from review text enables the capture of semantic information within comments, including sentiment and themes. Modeling these features facilitates the detection of fake reviews. The Chinese dataset is used in this paper, and so the BERT-wwm-ext [32], which is trained with whole-word masking specifically for Chinese characteristics, is selected from the many BERT variants.
Within the comment text processing module, preprocessed review text serves as the input, and three types of data embedding are applied to the input text to enrich metadata for subsequent text feature modeling using the BERT architecture. These embeddings include word embeddings, segment embeddings, and position embeddings.
Word embeddings involve special tokens, namely, the [CLS] token at the beginning of the comment and the [SEP] token at the end of each sentence, where W1A and W2A represent the first and second words of the first sentence, and W1B and W2B represent the first and second words of the second sentence.
Segment embeddings add a special token for different sentences, where EA and EB represent the paragraph embeddings for the first and second sentences.
Position embeddings specify the position of tokens within the sentence, with Em and En representing the m-th and n-th elements in the comment. The BERT processing module converts each token into a 768-dimension embedding vector and passes it through the BERT-wwm-ext model, consisting of 12 encoding layers. Upon completing the processing in the 12th layer, the information stored in the [CLS] token serves as the output of the comment text processing module.
(3) DenseNet processing module: As shown in Figure 1, the 768-dimension vector output from the comment text processing module is combined with the 20-dimension vectors output from the reviewer behavior processing module and the reviewed merchant behavior processing module to form an 808-dimension vector. Within the DenseNet processing module, dropout is employed to mitigate overfitting issues. The rectified linear unit (ReLU) activation function is utilized in the hidden layers, while the Sigmoid function is applied in the output layer. After passing the output through multiple hidden layers, the DenseNet processing module produces the final classification result to determine whether a review is authentic or fake.

3.3. Model Training and Optimization

This study utilized a random search model tuning technique to train the FRD model. The model applied the ReLU activation function in the comment text processing module, reviewer behavior processing module, and reviewed merchant behavior processing module. Additionally, the model employed an Adam optimizer in the DenseNet processing module and utilized a Sigmoid activation function. Table 1 provides detailed parameters following model tuning. The complete configuration file for the model is given in the Supplementary Materials.
The Adam optimizer is an enhanced version of the gradient descent method that improves memory space and computational efficiency. It combines the strengths of the AdaGrad and RMSProp optimizers, yielding improvements in various applications, such as NLP and deep learning-based image processing.
The rectified linear unit (ReLU) activation function is simple, converges quickly when sparsely activated, and outperforms other activation functions, making it the default choice for most neural network training.
f x = 0 ,     x < 0 1 ,     x 0
The Sigmoid activation function takes a real number as its input and produces an output within the [0, 1] interval. It is nonlinear, continuously differentiable, monotonic, and has a fixed output range.
f x = 1 1 + e x
For binary problems, binary cross-entropy is used as the loss function in this study. It takes the following mathematical expression:
L = 1 N i = 1 N   ( ( y i log P y i ) + ( 1 y i log 1 P y i ) )
where y i represents the actual label and P ( y i ) denotes the probability that the sample belongs to the actual label.

4. Experimental Evaluation

The proposed FRD model outlined in this study requires comprehensive data on review texts, reviewers, and reviewed merchants. Thus, the FRD model was trained using a dataset from a specific review website, evaluated with multiple metrics, and compared with other baseline methods.

4.1. Experimental Setup

The FRD model proposed in this study was implemented using the PyTorch 2.3 deep learning library. The training of the FRD model was conducted on an Intel (R) Core (TM) i9-10940X CPU @ 3.30 GHz server running the Linux Ubuntu 22.04 5.15.0-101-generic operating system. The Python environment version used was Python 3.10.12.

4.2. Experimental Dataset

The experimental dataset used is from the Dianping website, and fake reviews are labeled using the Dianping website. The data format of the review dataset used in this study is as follows Table 2.
The dataset utilized in this study consists of 250,000 reviews, with the ratio of authentic to fake reviews being 34:66. It involves 184,000 reviewers and 125,000 merchants. The experiments were conducted in accordance with an 8:1:1 ratio for the training set, validation set, and test set.
We grouped reviewers and reviewed merchants based on the proportion of real reviews in the published comments. We calculated the overall ratings of the reviews published by each group and plotted the results in Figure 4. From this Figure, it can be seen that reviewers and merchants with different proportions of real reviews have different distributions of review ratings, which should be helpful for modeling the credibility of reviewers and merchants.

4.3. Evaluation Metrics

This study evaluates the proposed FRD model using metrics based on the relationship between the predicted values and the actual values, as illustrated in Table 3.
A C C = T P + T N T P + F P + F N + T N 100 %
P r e c i s i o n = T P T P + F P 100 %
R e c a l l = T P T P + F N 100 %
F 1 s c o r e = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l 100 %
Accuracy (ACC) is a measure of a model’s overall performance, precision indicates the ratio of true positive instances to the total predicted positive values, and recall signifies the ratio of true positive instances to the sum of true positive instances and false negative instances. The F1 score provides an assessment of the classifier’s accuracy, calculated as the weighted harmonic mean of the classifier’s precision and recall measures.
This study also employed the AUC metric to evaluate the models.
A U C = i p o s i t i v e C l a s s r a n k i M   ( 1   +   M ) 2 M × N
Specifically, sorting the probability values of belonging to the positive class for each test sample in descending order yields the rank values of the samples, where M represents the number of positive class samples and N represents the number of negative class samples.

4.4. Experimental Results

The experimental results of this paper are presented below in three parts: hyperparameter tuning, training process, and model comparison.
  • Comparison of the number of different transformer hidden nodes
In Table 4, the performance of the FRD model is compared under different numbers of hidden nodes in the transformer encoders of the reviewer behavior processing module and the reviewed merchant behavior processing module.
From the experimental results, it is evident that the transformer model achieves optimal performance when the number of hidden nodes is set to 200. Therefore, this study selects this hyperparameter to fine-tune the model.
b.
Training process
Figure 5 illustrates the learning process of the FRD model during hyperparameter tuning, presenting learning curves for the training and validation sets based on three evaluation metrics: accuracy, F1 score, and AUC score. The changes in the accuracy, F1 score, and AUC score of the FRD model on the training set and validation set with the increase in the number of training epochs are given in each subfigure in Figure 5. It can be seen from each subfigure that, with regard to the dataset used in this study, when the model is trained for around three rounds, the performance indicators on the validation set tend to converge.
c.
Comparison of FRD model with other models
This study conducted comparative experiments from two perspectives. Firstly, dismantling experiments were performed on the FRD model, comparing models where we removed the reviewer processing module, the reviewed merchant processing module, and both of them. Then, we compared the FRD model with a model that was trained after removing the overall ratings and update time data from the behavior sequence. The experimental results show that the model that only worked with the textual content of reviews had the lowest AUC value. The models that removed either the reviewer processing module or the reviewed merchant processing module had similar AUC values, which were higher than that of the model that only modeled textual content but lower than the FRD model’s AUC value. The models that were trained using the dataset with the input overall ratings removed and update times had AUC values which were lower than the FRD model’s AUC value. As shown in the Table 5, the experimental results confirm that incorporating the sequential modeling of review behaviors into a model can enhance the performance of fake review detection models.
Furthermore, considering both the perspectives of review text and review behavior, this paper compares five models:
(1)
Based on the method in reference [24], a model for characterizing reviewing behavior was established using the aspect rating and overall rating of the review. This was combined with the modeling of the features of the review content to detect fake reviews. Experimental results show that the AUC value of this model is lower than the AUC value of the proposed FRD model.
(2)
Referring to the model in [33], a CNN is used to model the content of review text, and a LSTM is used to model the behavior sequences of reviewers and merchants. The features from both parts are combined for fake review detection. When modeling the content of review texts using a CNN, the word embedding layer in the CNN is based on Tencent’s released word2vec word vector tencent-ailab-embedding-zh-d200-v0.2.0. From the experimental results, it can be seen that the AUC value of this model is lower than that of the proposed FRD model.
(3)
Using feature engineering methods, following the approach in [34], features are extracted from the perspectives of reviewer credibility, merchant credibility, and review text credibility. Logistic regression model is then used for training. From the experimental results, it can be seen that the model trained using the feature engineering methods combined with logistic regression has an AUC value lower than that of the proposed FRD model.
(4)
The use of the BERT model on the content of review texts for fake review detection.
(5)
The use of the RoBERTa model to model the content of review texts for fake review detection.
In addition, the statistical significance test results of the proposed FRD compared with [12,13,24,33,34] are tabulated in Table 6. As shown in the Table 7, the results indicate that FRD is significantly better than all other models.

5. Conclusions and Future Work

This paper introduces a fake review detection model, FRD, which utilizes BERT for the feature modeling of review texts, employs transformer encoder to model the behavior sequences of reviewers and reviewed merchants that imply corresponding credibility, and finally combines the features from both parts to identify fake reviews. Experimental results suggest that incorporating features from both review texts and review behaviors improves the detection performance of fake reviews. Experiments conducted on a dataset of 250,000 reviews show that the proposed FRD model outperforms a model that removes input overall ratings and update times by approximately 1.2% in terms of AUC values; outperforms a model that does not factor in the behavior sequences of reviewed merchants by approximately 1.3%; surpasses a model that does not factor in the behavior sequences of reviewers by about 1.7%; and surpasses a model that does not factor in either sequence by approximately 4%. This validates the effectiveness of simultaneously considering modeling review content and credibility of reviewers and merchants in detecting fake reviews. Furthermore, the FRD model outperforms the Deep Hybrid Model that model review text, overall ratings, and aspect ratings by around 1.5% in terms of AUC value, outperforms a model that uses CNN for text modeling and LSTM for behavior modeling by around 10%, outperforms a model that uses feature engineering to extracte text and behavior features followed by logistic regression for fake review detection by about 6%, and outperforms a model utilizing RoBERTa model for text feature modeling by approximately 1.8%. Modeling the credibility of reviewers and merchants in the proposed FRD model relies on the reviewers and businesses having a certain number of reviews and review actions. If such behavioral sequences are sufficient, they can better assist in training the model. In reality, for online shopping platforms, faced with reviewers who are tasked with posting fake reviews and businesses that pay to hire water armies to embellish their reputation, the proposed FRD model can better identify fake reviews by modeling the behavioral patterns specific to reviewers/merchants, which differ from those of normal reviewers/merchants. In contrast to previous methods, the FRD model proposed in this paper attempts to incorporate the modeling of the credibility of reviewers/merchants into the detection of fake reviews, and achieves the expected results via experiments.
Within the proposed model, although the behavioral sequences of reviews were modeled, no exploration was conducted on fake reviewer groups. The next step in fake review detection will involve detecting potential fake reviewer groups to enhance the accuracy of fake review detection.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics13214322/s1, Model parameter file.

Author Contributions

Conceptualization, J.C.; Methodology, P.S.; Validation, W.B.; Investigation, Y.Z. and Q.W.; Writing—original draft, P.S.; Funding acquisition, F.K., J.C. and T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Fundamental Research Funds for the Central Universities 2023RC08, Hubei Key Laboratory of Intelligent Robot (Wuhan Institute of Technology) (Grant No. HBIR 202302), Open Fund (DGERA 20231101) of Key Laboratory of Deep-time Geography and Environment Reconstruction and Applications of Ministry of Natural Resources, Chengdu University of Technology, the Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, No. MMC202408, and by the Fundamental Research Funds for the Central Universities, JLU (No. 93K172024K17).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Harris, C. Detecting deceptive opinion spam using human computation. In Workshops at AAAI on Artificial Intelligence; AAAI: Philadelphia, PA, USA, 2012; pp. 87–93. [Google Scholar]
  2. Heydari, A.; Tavakoli, M.A.; Salim, N.; Heydari, Z. Detection of review spam: A survey. Expert Syst. Appl. 2015, 42, 3634–3642. [Google Scholar] [CrossRef]
  3. Jindal, N.; Liu, B. Analyzing and detecting review spam. In Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007, Omaha, NE, USA, 28–31 October 2008; pp. 547–552. [Google Scholar]
  4. Ott, M.; Choi, Y.; Cardie, C. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; Volume 6, pp. 309–319. [Google Scholar]
  5. Elmurngi, E.; Gherbi, A. Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques. In Proceedings of the IARIA/DATA ANALYTICS, Barcelona, Spain, 12–16 November 2017. [Google Scholar]
  6. Ren, Y.; Ji, D. Neural networks for deceptive opinion spam detection: An empirical study. Inf. Sci. 2017, 385, 213–224. [Google Scholar] [CrossRef]
  7. Hajek, P.; Barushka, A.; Munk, M. Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining. Neural Comput. Appl. 2020, 32, 17259–17274. [Google Scholar] [CrossRef]
  8. Melleng, A.; Jurek-Loughrey, A.; Deepak, P. Data Fusion for Better Fake Reviews Detection. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, RANLP 2023, Varna, Bulgaria, 4–6 September 2023; pp. 730–738. [Google Scholar]
  9. Budhi, G.S.; Chiong, R. A Multi-type Classifier Ensemble for Detecting Fake Reviews Through Textual-based Feature Extraction. ACM Trans. Internet Techn. 2023, 23, 1–24. [Google Scholar] [CrossRef]
  10. Mewada, A.; Dewang, R.K.; Goldar, P.; Maurya, S.K. SentiBERT: A Novel Approach for Fake Review Detection Incorporating Sentiment Features with Contextual Features. In Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing, Noida, India, 3–5 August 2023; pp. 230–235. [Google Scholar]
  11. Kaliyar, R.K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multim. Tools Appl. 2021, 80, 11765–11788. [Google Scholar] [CrossRef] [PubMed]
  12. Gupta, P.; Gandhi, S.; Chakravarthi, B.R. Leveraging Transfer learning techniques-BERT, RoBERTa, ALBERT and DistilBERT for Fake Review Detection. In Proceedings of the Forum for Information Retrieval Evaluation, Virtual, 13–17 December 2021. [Google Scholar]
  13. Mohawesh, R.; Xu, S.; Tran, S.N. Fake reviews detection: A survey. IEEE Access 2021, 9, 65771–65802. [Google Scholar] [CrossRef]
  14. Shang, Y.; Liu, M.; Zhao, T.; Zhou, J. T-Bert: A Spam Review Detection Model Combining Group Intelligence and Personalized Sentiment Information; Springer: Cham, Switzerland, 2021; pp. 409–421. [Google Scholar]
  15. Refaeli, D.; Hajek, P. Detecting fake online reviews using fine-tuned BERT. In Proceedings of the 2021 5th International Conference on E-Business and Internet, Singapore, 15–17 October 2021. [Google Scholar]
  16. Huang, J.; Qian, T.; He, G. Detecting Professional Spam Reviewers, Advanced Data Mining and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 288–299. [Google Scholar]
  17. Lim, E.P.; Nguyen, V.A.; Jindal, N.; Liu, B.; Lauw, H.W. Detecting product review spammers using rating behaviors. In Proceedings of the ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; Volume 7, pp. 939–948. [Google Scholar]
  18. Kumar, A.; Gopal, R.D.; Shankar, R.; Tan, K.H. Fraudulent review detection model focusing on emotional expressions and explicit aspects: Investigating the potential of feature engineering. Decis. Support Syst. 2022, 155, 113728. [Google Scholar] [CrossRef]
  19. Jindal, N.; Liu, B. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007. [Google Scholar]
  20. Faris, H.; Ala’m, A.Z.; Heidari, A.A.; Aljarah, I.; Mafarja, M.; Hassonah, M.A.; Fujita, H. An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 2019, 4, 67–83. [Google Scholar] [CrossRef]
  21. Fang, Y.; Wang, H.; Zhao, L.; Yu, F.; Wang, C. Dynamic knowledge graph based fake-review detection. Appl. Intell. 2020, 50, 4281–4295. [Google Scholar] [CrossRef]
  22. Savage, D.; Zhang, X.; Chou, P. Detection of opinion spam based on anomalous rating deviation. Expert. Syst. Appl. Int. J. 2015, 42, 8650–8657. [Google Scholar] [CrossRef]
  23. Manaskasemsak, B.; Tantisuwankul, J.; Rungsawang, A. Detection of fake reviews and reviewers using behavioral graph partitioning integrated with deep neural network. Neural Comput. Appl. 2023, 35, 1169–1182. [Google Scholar] [CrossRef]
  24. Duma, R.A.; Niu, Z.; Nyamawe, A.S.; Tchaye-Kondi, J.; Yusuf, A.A. A Deep Hybrid Model for fake review detection by jointly leveraging review text, overall ratings, and aspect ratings. Soft Comput. 2023, 27, 6281–6296. [Google Scholar] [CrossRef]
  25. Zhang, D.; Niu, B.; Li, W.; Wu, C. A deep learning approach for detecting fake reviewers: Exploiting reviewing behavior and textual information. Decis. Support Syst. 2023, 166, 113911. [Google Scholar] [CrossRef]
  26. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
  27. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  28. Chen, J.; Wang, M.; Zhang, X.L.; Huang, Z.; Rahardja, S. End-to-end multi-modal speech recognition with air and bone conducted speech. In Proceedings of the ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 6052–6056. [Google Scholar]
  29. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  30. Ji, Y.; Zhou, Z.; Liu, H.; Davuluri, R.V. DNABERT: Pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 2021, 37, 2112–2120. [Google Scholar] [CrossRef] [PubMed]
  31. Ba, J.; Chen, J.; Wang, M.; Muhammad, S.A. A squeeze-and-excitation and transformer based cross-task system for environmental sound recognition. arXiv 2022, arXiv:2203.08350. [Google Scholar]
  32. Cui, Y.; Che, W.; Liu, T.; Qin, B.; Yang, Z. Pre-Training with Whole Word Masking for Chinese BERT. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3504–3514. [Google Scholar] [CrossRef]
  33. Balwant, M.K. Bidirectional LSTM Based on POS tags and CNN Architecture for Fake News Detection. In Proceedings of the 2019 10th International Conference on Computing and Networking Technology (ICCNT), Kanpur, India, 6–8 July 2019. [Google Scholar]
  34. Li, J.; Wu, G.; Xie, F.; Yao, X.; Qi, J.; Sun, P. Research of Fraud Review Detection Model on O2O Platform. Acta Electron. Sin. 2016, 44, 2855–2860. [Google Scholar]
Figure 1. FRD model architecture.
Figure 1. FRD model architecture.
Electronics 13 04322 g001
Figure 2. Scaled dot-product attention.
Figure 2. Scaled dot-product attention.
Electronics 13 04322 g002
Figure 3. Multi-head attention.
Figure 3. Multi-head attention.
Electronics 13 04322 g003
Figure 4. (a). Distribution of total scores for merchants with different proportions of real reviews. (b). Distribution of total scores for reviewers with different proportions of real reviews.
Figure 4. (a). Distribution of total scores for merchants with different proportions of real reviews. (b). Distribution of total scores for reviewers with different proportions of real reviews.
Electronics 13 04322 g004
Figure 5. (a). FRD model accuracy. (b). FRD model F1. (c). FRD model AUC.
Figure 5. (a). FRD model accuracy. (b). FRD model F1. (c). FRD model AUC.
Electronics 13 04322 g005
Table 1. FRD model parameters.
Table 1. FRD model parameters.
Processing ModuleParametersValue
Reviewer behavior processing moduleHidden size200
Number of encoder blocks2
Self-attention heads10
Dropout rate0.3
Max_seq_length100
Reviewed merchant behavior processing moduleHidden size200
Number of encoder blocks2
Self-attention heads10
Dropout rate0.3
Max_seq_length100
Comment text processing moduleNumber of layers12
Hidden size768
Self-attention heads12
Dropout rate0.3
Dense processing moduleNumber of dense layers 1
Dropout rate 0.3
Batch size 64
Loss Binary-cross entropy
Table 2. Data items in dataset.
Table 2. Data items in dataset.
Data ItemMeaning
reviewidThe number of reviews
reviewbodyThe review context
updatetimeThe time of posting a review
useridThe number of reviewers
shopidThe number of merchants
starThe overall rating
score1The taste rating
score2The environment rating
score3The service rating
Table 3. Fake review evaluation parameters.
Table 3. Fake review evaluation parameters.
Evaluation ParametersPredictive ValueActual Value
True-positive (TP)YesYes
True-negative (TN)NoNo
False-positive (FP)YesNo
False-negative (FN)NoYes
Table 4. FRD model performance under different numbers of transformer hidden nodes.
Table 4. FRD model performance under different numbers of transformer hidden nodes.
Hidden NodesAccuracyf1AUC
200.8893 0.8874 0.9526
320.8882 0.8889 0.9516
640.8903 0.8916 0.9530
800.8892 0.8909 0.9524
1280.8936 0.8939 0.9522
2000.8939 0.8948 0.9531
2560.8917 0.8909 0.9520
Table 5. Results of ablation experiment using FRD model.
Table 5. Results of ablation experiment using FRD model.
ModelAccuracyf1AUC
FRD—reviewer processing module—reviewed merchant processing module0.85560.85510.9123
FRD—reviewed merchant processing module0.87520.87470.9397
FRD—reviewer processing module0.87710.87450.9357
FRD—star—updatetime0.86830.86880.9407
FRD0.89390.89480.9531
Table 6. Comparison of FRD model with other models.
Table 6. Comparison of FRD model with other models.
ModelAccuracyf1AUC
Deep Hybrid Model [24]0.87850.87760.9384
CNN (model comment text) + LSTM (model review behavior) [33]0.75950.75720.8442
Feature engineering + logistic regression [34]0.83530.83470.8930
BERT [12]0.85570.85520.9123
RoBERTa [13]0.86160.85970.9347
FRD0.8939 0.8948 0.9531
Table 7. Test results of AUC value.
Table 7. Test results of AUC value.
Performance Metrics Test Results
FRD—Deep Hybrid Model [24]2.2635257 × 10−13
FRD—CNN + LSTM [33]3.4537827 × 10−33
FRD—feature engineering + logistic regression [34]1.1005201 × 10−28
FRD—BERT [12]1.0220840 × 10−25
FRD—RoBERTa [13]2.4640677 × 10−22
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, P.; Bi, W.; Zhang, Y.; Wang, Q.; Kou, F.; Lu, T.; Chen, J. Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics 2024, 13, 4322. https://doi.org/10.3390/electronics13214322

AMA Style

Sun P, Bi W, Zhang Y, Wang Q, Kou F, Lu T, Chen J. Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics. 2024; 13(21):4322. https://doi.org/10.3390/electronics13214322

Chicago/Turabian Style

Sun, Pengfei, Weihong Bi, Yifan Zhang, Qiuyu Wang, Feifei Kou, Tongwei Lu, and Jinpeng Chen. 2024. "Fake Review Detection Model Based on Comment Content and Review Behavior" Electronics 13, no. 21: 4322. https://doi.org/10.3390/electronics13214322

APA Style

Sun, P., Bi, W., Zhang, Y., Wang, Q., Kou, F., Lu, T., & Chen, J. (2024). Fake Review Detection Model Based on Comment Content and Review Behavior. Electronics, 13(21), 4322. https://doi.org/10.3390/electronics13214322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop