Next Article in Journal
Decision Feedback Equalization-Based Low-Complexity Interference Cancellation and Signal Detection Technique Based for Non-Orthogonal Signaling
Previous Article in Journal
Insight on the Nonhomogeneous Pantograph Equation with an Arbitrary Polynomial of Degree n: Explicit Solution
Previous Article in Special Issue
Traffic-Sign-Detection Algorithm Based on SK-EVC-YOLO
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Deep Learning Approach for Enhanced Sentiment Classification and Consistency Analysis in Customer Reviews

1
Department of Management Information Systems, School of Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia
2
Faculty of Specific Education, Kafrelsheikh University, Kafrelsheikh 33511, Egypt
3
Departement of Computer Science, Higher Institute of Management and Information Technology, Kafrelsheikh 33511, Egypt
4
Faculty of Specific Education, Mansoura University, Mansoura 35516, Egypt
5
Faculty of Computers and Information, Kafrelsheikh University, Kafrelsheikh 33511, Egypt
*
Author to whom correspondence should be addressed.
Submission received: 13 November 2024 / Revised: 28 November 2024 / Accepted: 6 December 2024 / Published: 7 December 2024

Abstract

:
Consumer reviews play a pivotal role in shaping purchasing decisions and influencing the reputation of businesses in today’s digital economy. This paper presents a novel hybrid deep learning model, WDE-CNN-LSTM, designed to enhance the sentiment classification of consumer reviews. The model leverages the strengths of Word Embeddings (WDE), Long Short-Term Memory (LSTM) networks, and Convolutional Neural Networks (CNNs) to capture temporal and local text data features. Extensive experiments were conducted across binary, three-class, and five-class classification tasks, with the proposed model achieving an accuracy of 98% for binary classification, 98% for three-class classification, and 95.21% for five-class classifications. The WDE-CNN-LSTM model consistently outperformed standalone CNN, LSTM, and WDE-LSTM models regarding precision, recall, and F 1 -score, achieving up to 98.26% in F 1 -score for three-class classification. The consistency analysis also revealed a high alignment between the predicted sentiment and customer ratings, with a consistency rate of 96.00%. These results demonstrate the efficacy of this hybrid architecture in handling complex sentiment classification tasks (SCTs), offering significant improvements in accuracy, classification metrics, and sentiment consistency. The findings have important implications for improving sentiment analysis in customer review systems, contributing to more reliable and accurate sentiment classification.

1. Introduction

The growing importance of online product reviews in e-commerce highlights that retail revenues are projected to reach USD 4.88 trillion globally in the coming years due to the rise of internet shopping and mobile applications [1]. In recent years, e-commerce has increasingly replaced traditional face-to-face interactions, with digital interfaces as the primary mode of customer engagement [2,3]. In this context, consumers must meticulously evaluate online product offerings by cross-referencing the provided specifications with their requirements. A pivotal element of this decision-making process involves analyzing consumer reviews, critical sources of product quality, and performance information. These reviews, often substantiated by detailed written assessments, offer valuable insights grounded in personal experience, revealing user opinions, identifying potential shortcomings, and providing recommendations crucial for informing prospective buyers [4,5,6]. A study highlights that the vast amounts of data generated from user reviews are crucial for predicting consumer preferences and trends [7]. This information can help companies refine their marketing strategies and enhance their products. In addition, user reviews are crucial in influencing purchasing decisions. A study indicates that many customers change their buying decisions based on positive or negative feedback [8]. A recent public survey indicates that 92% of consumers actively utilize online reviews as a reference during their shopping experiences. At the same time, 90% of these individuals assert that favorable reviews significantly increase their likelihood of considering a product in their purchasing decisions [9]. However, the authenticity of these reviews can often be questionable, leading to concerns about deceptive information and unfavorable online purchasing encounters [10,11].
This wealth of information plays a dual role in supporting consumers and e-commerce platforms. For consumers, reviews are vital for making well-informed purchasing decisions, enhancing their trust in the platform, and fostering brand loyalty [12]. From the e-commerce website’s perspective, developers gain valuable feedback that enables them to focus on core customer issues, ultimately improving service quality and increasing overall customer satisfaction [13,14,15].
Customer ratings and reviews are pivotal in e-commerce, serving as trusted sources of information that directly influence purchasing decisions and consumer trust. These elements convey vital information that helps customers make more informed decisions before purchasing. In addition to textual reviews, customers often provide star ratings and numerical values summarizing their general opinion of the product. Typically, these ratings are on a scale from one to five, where one or two stars indicate poor performance, three stars represent a neutral stance, and four or five stars express a positive experience with the product [16,17,18]. Online reviews and ratings are critical factors that affect consumers’ willingness to purchase products. This emphasizes that consumers increasingly rely on reading and providing product reviews and ratings to evaluate potential purchases and communicate with retailers and other consumers. However, reviews without ratings can complicate the assessment of a product’s appeal, especially when the reviews are vague or overly simplistic [19] discrepancies between consumer reviews and their corresponding ratings present a significant challenge in e-commerce. There are instances where reviews may be artificially created, either by paying individuals to produce favorable content for authentic products or by using text-generation algorithms to generate fake reviews [20,21]. This often leads to inconsistencies where a customer’s written review does not align with the star rating they provide. For example, a review might be highly critical, yet the customer gives the product a four- or five-star rating, or conversely, a glowing review might be paired with a one- or two-star rating. Competitors might exploit this inconsistency by flooding the market with negative reviews about rival products, thereby manipulating online platforms’ ranking algorithms to lower the visibility of the targeted company [22].
Addressing these inconsistencies is a priority for many e-commerce platforms, as they can make it difficult for customers to discern helpful information from the plethora of available reviews [23]. To combat this issue, there is a need for a model that can evaluate reviews based on their sentiment polarity—positive or negative—and compare the sentiment-derived rating with the actual customer rating. This comparison would reveal whether the review and rating are consistent and reliable or if the inconsistency suggests that the review should not be trusted when making purchasing decisions [4,24,25]. Organizations also face challenges in managing large volumes of consumer reviews. They need to gather feedback, forecast sales trends, and manage their reputation, which can be overwhelming without proper tools [26]. Therefore, sentiment analysis (SA) plays a crucial role in e-commerce by enabling businesses to understand customer opinions and enhance decision-making processes. Using text-mining techniques, businesses can create sentiment dictionaries that help identify emotional tendencies in user comments, particularly in live e-commerce settings [27]. A recent study emphasizes the growing necessity of SA for businesses to extract insights from consumer reviews about their products [28]. This study shows that product reviews are a significant source of customer feedback, and analyzing the sentiments within these reviews can provide businesses with insights into product quality, customer preferences, and areas needing improvement. SA, or opinion mining, is a sub-field of natural language processing (NLP) that focuses on identifying and extracting subjective information from text data. It involves classifying text into various sentiment categories, such as positive, negative, or neutral, to understand the emotions and opinions expressed by individuals [29]. This technique has become increasingly important in various applications, including customer feedback analysis, social media monitoring, and market research [30]. Businesses and organizations can gain valuable insights into consumer opinions, preferences, and satisfaction levels by analyzing sentiments expressed in textual data [31,32]. However, the complexity of human language, with its nuances, contextual variations, and subtleties like sarcasm and irony, makes SA a challenging task, mainly when dealing with large-scale and diverse datasets like those found in consumer reviews on e-commerce platforms [33,34,35].
While traditional SA models have demonstrated utility, they exhibit significant limitations in addressing complex classification tasks, particularly in multi-class SA. These models frequently struggle with class imbalance, where specific sentiment categories are disproportionately represented, leading to biased outcomes [34]. Addressing these limitations necessitates more than conventional methods of review moderation or fundamental text analysis. There is an imperative for advanced SA techniques capable of accurately discerning the underlying sentiment in consumer reviews, even when confronted with complex linguistic features such as sarcasm, irony, or nuanced expression, which can distort the results of traditional methods [30,35]. By precisely classifying sentiment, these advanced techniques can identify discrepancies between the textual content of reviews and their associated star ratings, thereby enhancing the reliability of the information provided to consumers.
To address these challenges, this paper introduces a novel hybrid model, WDE-CNN-LSTM, designed to enhance sentiment classification in consumer reviews by synergistically integrating the strengths of Word Embeddings (WDE), Long Short-Term Memory (LSTM) networks, and Convolutional Neural Networks (CNNs). This model aims to mitigate the limitations associated with standalone deep learning (DL) models by incorporating WDE to capture semantic relationships effectively, CNNs to identify local textual patterns, and LSTMs to maintain temporal dependencies within the data [36]. The research utilized the previous advanced DL models to capture complex patterns in the data and improve sentiment classification accuracy. The proposed hybrid architecture facilitates more effective feature extraction from textual inputs and enables a more rigorous evaluation of consistency between the extracted sentiment and the accompanying star ratings. The model enhances classification accuracy across binary, three-class, and five-class SA tasks while ensuring the alignment between the sentiment expressed in the review and the assigned ratings [37].

1.1. Contributions

The main contributions of this study are as follows:
  • The proposal of a mixed DL model, WDE-CNN-LSTM, mainly developed to improve sentiment classification in consumer reviews by incorporating WDE, CNNs, and LSTM networks.
  • An exhaustive evaluation of the WDE-CNN-LSTM model’s performance across multiple SCTs, including binary, three-, and five-class scenarios, employing a diverse dataset of consumer reviews from e-commerce platforms.
  • The implementation of advanced data preprocessing methods, such as tokenization, padding, and the handling of class imbalances, to transform unstructured customer review data into a form appropriate for DL models.
  • The introduction of a consistency-driven technique to SA, focusing on aligning sentiment predictions with actual customer ratings to detect inconsistencies and enhance the trustworthiness of sentiment classification.
  • A comparative investigation of the suggested WDE-CNN-LSTM model against standalone DL techniques, such as CNNs, LSTMs, and WDE-LSTM, showing the most excellent precision, recall, F 1 -score, and overall accuracy performance.
  • The execution of comprehensive experiments to evaluate the effectiveness and robustness of the hybrid model, emphasizing its capability to address complex SCTs and produce accurate and consistent sentiment predictions.
  • A presentation of the suggested model’s practical applications for e-commerce platforms, improving automated customer sentiment interpretation and informing better business decisions by leveraging accurate sentiment classification and consistency analysis.
These contributions reflect the study’s innovative aspects, particularly developing and applying a hybrid deep learning model to improve sentiment analysis in consumer reviews.

1.2. Structure

The structure of this paper is methodically organized as follows: Section 2 provides a comprehensive review of the relevant literature in sentiment classification and customer review analysis. Section 3 delves into the intricacies of the proposed DL models, offering a detailed exposition of its components and methodology. Section 4 systematically outlines the experimental results obtained from implementing various models, including CNN, LSTM, WDE-LSTM, WDE-CNN-LSTM, and the statistical consistency analysis. Practical implications and recommendations for utilizing the proposed models in real-world applications are presented in Section 5. Finally, Section 6 articulates the concluding remarks and discusses potential avenues for future research in enhancing sentiment classification models.

2. Literature Review

This section provides an exhaustive overview of current studies on SA methods applied to various domains, including healthcare, e-commerce, online services, and tourism. The studies reviewed use a range of ML and DL methods to classify customer sentiments expressed in online reviews, emphasizing the effectiveness and evolution of different techniques in addressing SCTs. From standard ML algorithms like SVM and Naive Bayes (NB) to advanced DL techniques such as LSTM and Bidirectional Encoder Representations from Transformers (BERT), these studies show the growing sophistication and accuracy of sentiment detection techniques. The reviewed research also recognizes common challenges, such as high computational requirements, data imbalance, and difficulty interpreting complex sentiments like mixed emotions and sarcasm. Furthermore, the studies highlight the demand for more advanced methods, such as hybrid techniques and explainable artificial intelligence (XAI), to improve SA interpretability and performance. Future research directions include optimizing existing techniques for efficiency, expanding their applicability to diverse datasets, and integrating multimodal data to capture a more comprehensive understanding of customer feedback and sentiment.
Hossain et al. [38] used ML techniques such as SVM, NB, Decision Trees, and Random Forests (RFs) to classify customer sentiments from insurance product reviews into negative, positive, and neutral categories. The dataset contained online consumer reviews of insurance products, which were preprocessed and labeled for SA. The study’s approach effectively handled large datasets and provided accurate sentiment predictions, aiding insurance companies in understanding customer feedback. However, limits included potential dataset bias due to class imbalance and challenges analyzing complex sentiments like sarcasm. Forthcoming research is suggested to include advanced DL techniques, such as LSTM and CNN, to improve sentiment detection and extend the analysis to a broader range of insurance products.
Kaur et al. [39] suggested a DL-based model utilizing a mixed feature extraction approach for consumer SA. The study utilized a combination of WDE, LSTM, and CNN architectures to enhance sentiment classification accuracy in consumer reviews. The dataset incorporated consumer reviews gathered from different e-commerce platforms, which underwent preprocessing to ensure quality input for the model. This mixed model captured local textual patterns and temporal dependencies, achieving high accuracy across binary, three-class, and five-class SCTs. Despite its significance, their proposed model faced restrictions such as high computational complexity and requiring extensive training datasets. Further research is recommended to optimize the model’s computational efficiency and expand its application to more diverse datasets and domains.
Dieksona et al. [40] conducted an SA study on consumer reviews specifically for Traveloka, a popular travel and accommodation booking platform. They utilized different ML algorithms, including NB, SVM, and RF, to classify customer sentiments into negative and positive classes. The dataset employed in this research consisted of consumer reviews from the Traveloka platform, which were preprocessed to enhance model accuracy. The research showed the efficacy of these algorithms in sentiment classification, with RF achieving the highest accuracy among the tested techniques. However, they noted limitations regarding handling neutral sentiments and the demand for more diverse data to improve model robustness. The forthcoming work suggests expanding the analysis to cover more complex sentiment categories and further exploring DL techniques to improve sentiment detection accuracy.
Huang et al. [28] studied current methods and future directions for SA on e-commerce platforms. They analyzed different methods, including ML and DL techniques, highlighting their strengths and limitations in handling diverse datasets and sentiment complexities. The review covered a range of datasets typically used in e-commerce SA, such as product reviews and customer feedback, emphasizing the need for high-quality preprocessing and feature extraction to enhance model performance. The advantages of the reviewed techniques include their ability to provide valuable insights into customer behavior and preferences, helping businesses in decision-making procedures. However, they determined several limitations, such as difficulty analyzing complex sentiments like sarcasm and the high computational cost associated with DL techniques. The future directions suggested include developing more efficient algorithms, integrating multi-modal data, and improving the interpretability of SA techniques.
Using a language representation model, Patel et al. [41] accomplished an SA of customer feedback and reviews for airline services. The research employed advanced NLP methods, including the BERT model, to classify sentiments expressed in consumer reviews into negative, positive, and neutral categories. The dataset comprised consumer reviews of airline services gathered from multiple online platforms and was preprocessed to improve the model’s performance. The outcomes showed that the BERT model surpassed standard ML algorithms in accurately detecting sentiments, particularly in understanding context and handling complex expressions. However, the research noted challenges such as the need for extensive data for model training and high computational requirements. Further research is recommended to optimize the model for lower computational costs and examine its applicability to other fields, such as hotel and tourism services.
Wang et al. [42] explored the use of large language models (LLMs) for SA in e-commerce, focusing on customer feedback. The research utilized advanced LLMs, including GPT-3 and BERT, to analyze and classify sentiments in consumer reviews across multiple e-commerce platforms. The dataset included diverse customer feedback data from several online retail websites, which were preprocessed to improve the techniques’ effectiveness. The results indicated that LLMs excelled in accurately capturing the nuances of customer sentiment, outperforming standard ML techniques, particularly in handling context-rich and complex reviews. Despite these advantages, the research highlighted limitations such as high computational costs and the need for significant computational resources. Future work suggested includes optimizing LLMs for more efficient SA and expanding the scope to include multi-modal data to understand customer emotions and behaviors better.
Suhartono et al. [43] designed an SA model for drug product reviews using Deep Neural Networks (DNNs) and weighted WDE. The research applied DNNs integrated with techniques such as Word2Vec and GloVe to improve the sentiment classification of consumer reviews into negative, positive, and neutral sentiments. The dataset included drug product reviews collected from different online pharmaceutical platforms, which were preprocessed and transformed into weighted WDE to serve as input for the DL techniques. The proposed model enhanced performance over standard SA approaches, mainly in catching the nuanced sentiments specific to drug reviews. However, the research encountered limitations such as the demand for large datasets for training and the high computational resources required. Future research is recommended to optimize the model’s efficiency and extend its application to other healthcare-related reviews to improve its generalizability.
Puh and Bagić Babac [44] concentrated on predicting the sentiment and rating of tourist reviews using ML methods. The research used SVM, RFs, and gradient-boosting techniques to classify tourist reviews and predict their associated ratings. The dataset consisted of tourist reviews from various travel websites, which were preprocessed and labeled for SA and rating prediction. The ML methods showed high accuracy in sentiment classification and rating prediction, providing beneficial insights into customer feedback for the tourism sector. However, limitations included the demand for large, diverse datasets to improve model generalization and the difficulty in analyzing the results for more nuanced sentiments, such as mixed or neutral reviews. Future work suggested expanding the dataset to cover a broader range of tourist destinations and incorporate DL methods to improve predictive performance and robustness.
Taherdoost and Madanchian [45] executed a comprehensive review of AI techniques and their application in SA, particularly in competitive research contexts. The study examined various AI methods, including ML and DL approaches, and their effectiveness in analyzing sentiments across different datasets, such as consumer reviews and social media content. The authors highlighted the advantages of using AI for SA, including enhanced accuracy, scalability, and the ability to handle large and complex datasets. However, the review also pointed out several limitations, such as the challenges in interpreting nuanced sentiments and the high computational costs associated with advanced techniques. Future research directions proposed by the authors include the development of more efficient AI algorithms, integrating real-time data processing, and exploring multi-language SA to broaden the scope and applicability of AI in competitive environments.
Vatambeti et al. [46] executed a Twitter SA on online food services utilizing an integrated Elephant Herd Optimization (EHO) approach with a hybrid DL approach. The research employed a dataset of tweets related to online food services, which were preprocessed to clear noise and standardize text for analysis. Their suggested approach incorporated EHO with DL methods such as LSTM and CNN to enhance sentiment classification accuracy by optimizing feature selection and model parameters. The outcomes showed that the hybrid approach significantly outperformed standard ML techniques and standalone DL methods, considering accuracy and computational efficiency. However, limits included the model complexity, which may require high computational resources, and difficulty handling sarcastic or highly nuanced sentiment in tweets. Future research should focus on reducing the model’s complexity and analyzing its application in other fields, such as e-commerce or healthcare.
Iqbal et al. [47] conducted an SA of consumer reviews, utilizing DL methods to enhance sentiment classification accuracy. The research utilized DL techniques, including LSTM, CNN, and BiLSTM, to investigate consumer reviews gathered from e-commerce platforms. The dataset underwent comprehensive preprocessing to guarantee quality inputs for the DL methods, including tokenization and text normalization. The outcomes showed that DL methods, particularly BiLSTM, exceeded standard ML techniques in accurately classifying sentiments into positive, negative, and neutral categories. Despite their significance, they noted challenges such as the high computational cost of training DL methods and the need for large annotated datasets. Further research should explore model optimization strategies to reduce computational overhead and expand the analysis to contain real-time SA for dynamic consumer feedback monitoring.
Adak et al. [48] conducted a systematic review on the SA of consumer reviews of food delivery services using DL and explainable XAI methods. The research examined DL techniques, including CNN, LSTM, and hybrid architectures, applied to customer feedback data from considerable food delivery platforms. They emphasized the significance of using XAI to analyze the predictions made by DL techniques, providing insights into model decision-making processes and improving transparency. The study concluded that DL techniques, especially when combined with XAI, show high accuracy in sentiment classification while delivering understandable outputs for end-users. However, the research also presented challenges related to the interpretability of complex techniques and the high computational costs associated with training these techniques. The forthcoming research directions include improving DL techniques’ efficiency and interpretability and extending SA to multi-modal data from different customer feedback channels.
Alantari et al. [49] empirically compared different ML techniques for a text-based SA of online consumer reviews. The research assessed several ML techniques, including NB, SVM, RFs, and Gradient Boosting, and their significance in classifying sentiments of consumer reviews gathered from different online platforms. The dataset contained diverse consumer reviews, which were preprocessed and labeled for SCTs. The outcomes showed that ensemble techniques like RFs and Gradient Boosting generally exceeded more straightforward methods such as NB and SVM, particularly regarding accuracy and robustness to noisy data. However, the research noted limitations, including challenges in analyzing model outputs and the demand for comprehensive hyperparameter tuning. Further research is recommended to analyze the integration of DL techniques and to focus on model interpretability to understand better the factors influencing sentiment classification.
Marlina et al. [50] conducted an SA on consumer reviews of natural skincare products utilizing various SA techniques. The research employed ML methods, such as NB and SVM, and DL techniques to classify customer sentiments into negative, positive, and neutral categories. The dataset incorporated reviews of natural skincare products gathered from multiple e-commerce platforms, which were preprocessed to improve data quality and model accuracy. The results showed that DL techniques surpassed standard ML approaches in capturing nuanced sentiments specific to skincare products. However, the research identified limitations regarding handling ambiguous or mixed sentiments and the need for extensive computational resources for DL methods. The forthcoming work suggested expanding the analysis to a broader range of skincare products and enhancing model performance by incorporating more advanced DL methods and feature extraction techniques.
Alzahrani et al. [51] highlighted the evolution of SA techniques, including ML and DL techniques, and the challenges of fake reviews in e-commerce. The procedure involved an exhaustive data preprocessing pipeline, including steps like punctuation removal, lowercase conversion, and part-of-speech tagging to prepare the review texts for analysis. This research utilized a CNN-LSTM model for sentiment classification, utilizing WDE and dropout layers to enhance performance and mitigate overfitting. The CNN-LSTM model achieved a high accuracy of 96%, outperforming traditional methods, with detailed evaluation metrics such as precision, recall, and F 1 -score demonstrating its effectiveness. However, the research primarily used a dataset from the Amazon website, which may not represent the full spectrum of e-commerce reviews. This restriction could affect the generalizability of the results to other platforms or product categories.
Obiedat et al. [52] suggested a hybrid evolutionary SVM-based approach for SA of consumer reviews, particularly handling the challenge of imbalanced data distribution. The research used a combination of SVM with evolutionary techniques to optimize model parameters and enhance sentiment classification accuracy on datasets with skewed class distributions. The dataset comprised consumer reviews from different e-commerce platforms, which were preprocessed and utilized to train the hybrid model. The outcomes showed that the hybrid SVM-based approach exceeded standard SVM and other ML methods in addressing imbalanced datasets, delivering better precision, recall, and F 1 -scores. However, the research also stated limitations related to the computational complexity of the evolutionary algorithms and the demand for fine-tuning to accomplish optimum outcomes. The forthcoming research directions include optimizing the evolutionary components to reduce computational costs and expanding the approach to multi-class SA tasks. Table 1 summarises the previous related work, clearly comparing the implemented methods, datasets, insights, and limitations. It highlights the evolution of SA methods, with traditional ML models proving effective for smaller datasets and basic sentiment tasks. In contrast, DL methods demonstrate superior performance for complex datasets and nuanced sentiments. Hybrid approaches bridge gaps in traditional and advanced methods, improving performance in class imbalance scenarios. However, the computational demands of DL models remain a significant challenge, necessitating optimization and scalability for broader applications.
The reviewed studies underscore the significant advancements in SA methodologies that are driven by integrating ML and DL techniques. These approaches have demonstrated notable improvements in accurately classifying customer sentiments across various domains, such as insurance, e-commerce, tourism, and healthcare. While traditional ML algorithms like NB and SVM remain valuable for specific applications, the enhanced capabilities of DL techniques, including LSTM, CNN, and BERT, have shown superior performance in capturing complex sentiments and contextual nuances. However, challenges persist, particularly in managing computational demands, addressing class imbalances, and improving model interpretability. The studies highlight the need for continued innovation, recommending exploring hybrid techniques, optimization strategies, and the incorporation of explainable XAI to enhance transparency and trust in SA outcomes.

3. Proposed Methodology

This section presents a comprehensive methodology involving four advanced DL techniques, each tailored to handle different aspects of the challenging text classification task, as illustrated in Figure 1. The methodology starts with data collection from consumer reviews accumulated in a dataset. The collected data undergoes comprehensive preprocessing stages, including cleaning apparent noise, tokenizing text data into manageable units, padding sequences for uniform input lengths, mapping scores to streamline the classification process, managing class imbalances to ensure even representation across classes, and encoding labels into numerical forms suitable for DL techniques. The preprocessed data are then divided into training and testing sets employed to develop and assess considerable DL models. The first model utilizes a CNN to catch the text’s local dependencies and global patterns by leveraging the power of convolutional operations. The second model is based on LSTM networks, which excel at managing sequential data by catching temporal relationships within the text, making it ideal for tasks that require understanding word order and context. The third model integrates WDE with LSTM layers, showing a robust method that catches both the semantic meaning of words and the temporal dynamics of text sequences. The fourth model, a hybrid WDE-CNN-LSTM architecture, synergizes the strengths of CNNs and LSTMs by concurrently processing input through convolutional and recurrent layers, allowing it to extract rich features that capture both local and temporal patterns before combining them for precise classification.
Also, the section incorporates an SA and consistency labeling technique that uses TextBlob to assess sentiment polarity and evaluate its consistency with predefined scores, providing crucial insights for refining text-based predictions. The methods are evaluated based on evaluation measures such as accuracy, precision, recall, F 1 -score, and consistency across different classification tasks (2, 3, and 5 score categories). As illustrated in Figure 1, this integrated approach leverages advanced DL techniques to improve sentiment classification accuracy and consistency, proposing a practical toolkit for a wide range of NLP tasks and handling typical challenges in text classification with efficiency and precision.

3.1. Data Preprocessing

The customer-review dataset utilized in this study was sourced from Kaggle (https://www.kaggle.com/datasets/vivekprajapati2048/amazon-customer-reviews?resource=download) (accessed on 3 November 2024) consisting of Amazon product reviews with a total of 3.5 million entries. Each review includes fields such as ID, ProductId, UserId, ProfileName, HelpfulnessNumerator, HelpfulnessDenominator, Score, Time, and Summary Text. The dataset used for analysis contains a subset of 568,454 records, segmented as detailed in Table 2.
Data preprocessing is essential in preparing the Amazon review dataset for DL techniques, including CNN, LSTM, and hybrid architectures like CNN-LSTM. Preprocessing aims to convert the raw text data into a clean, structured form suitable for input using these methods. The subsequent section outlines the steps undertaken to accomplish this conversion.
  • Load the Dataset: Load the dataset from the defined path. This initializes the data pipeline and makes the data available for further processing.
  • Drop Unnecessary Columns: Streamline the dataset by removing features that are not needed for the analysis. This phase decreases complexity and concentrates on the considerable appropriate features.
  • Combine Text Fields: Combine the ‘Summary’ and ‘Text’ features into a single feature to deliver more context for the model. The accurate handling of text guarantees that the integrated feature is well structured and meaningful.
  • Tokenize Text Data: Convert the text data into sequences of integers using the Tokenizer class from Keras library. This stage transforms the textual information into a numerical form consistent with neural networks.
  • Pad Sequences: Ensure all input sequences have a uniform length by padding them. This stage is essential for batch processing in DL techniques, which needs consistent input dimensions.
  • Map Scores: For specific cases where only Score Three and Score Two are present, map the original scores to smaller classes. This simplifies the classification task.
  • Handle Class Imbalance: Address class imbalance in the dataset using the RandomOverSampler from the imbalanced-learn library. This stage prevents the model from being biased towards the majority class.
  • Encode Labels: Convert the labels into a categorical form appropriate for the definite cross-entropy loss function used in many classification techniques.
  • Split Data: Divide the data into training and validation sets. This stage is essential for assessing the model’s performance on unseen data.
Algorithm 1 outlines the implementation of the data preprocessing stages described above. It details the sequence of operations from loading the dataset to splitting it into training and validation sets, ensuring the data are prepared adequately for modeling.
Algorithm 1 Data Preprocessing for Amazon Review Dataset
Require: 
Dataset path
Ensure: 
Preprocessed data are ready for model input
1:
Load the Dataset:
2:
df ← load_dataset(‘path/to/amazon_reviews.csv’)
3:
Drop Unnecessary Columns:
4:
dfdf.drop(columns=[’column_name1’, ‘column_name2’])
5:
Combine Text Fields:
6:
d f [ c o m b i n e d _ t e x t ] d f [ S u m m a r y ] + " " + d f [ T e x t ]
7:
Tokenize Text Data:
8:
t o k e n i z e r Tokenizer(num_words=10,000)
9:
tokenizer.fit_on_texts(df [‘combined_text’])
10:
sequencestokenizer.texts_to_sequences(df [‘combined_text’])
11:
Pad Sequences:
12:
padded_sequences ← pad_sequences(sequences,maxlen = 500)
13:
Map Scores (if applicable):
14:
d f [ s c o r e ] d f [ s c o r e ] . map ( { 5 : 1 , 4 : 1 , 3 : 0 , 2 : 0 , 1 : 0 } )
15:
Handle Class Imbalance:
16:
r o s RandomOverSampler ( )
17:
X_resampled, y_resampledros.fit_resample(padded_sequences, df [‘score’])
18:
Encode Labels:
19:
y_categorical ← to_categorical(y_resampled, num_classes = 2)
20:
Split Data:
21:
X_train, X_val, y_train, y_val ← train_test_split(X_resampled, y_categorical,test_size = 0.2)
The data preprocessing stages described above ensure that the Amazon review dataset is transformed into a clean and structured form suitable for DL methods. By systematically handling each preprocessing requirement, the quality of the input data is improved, leading to improved model performance and better generalization capabilities.

3.2. Convolutional Neural Network for Text Classification

The CNN model is tailored for text classification tasks, efficiently incorporating the power of the feature extraction capabilities of convolutional layers [53]. The model is designed to process sequences of words, capturing both local dependencies through convolutional operations and global relationships via fully connected layers [54]. Firstly, the model converts words into dense vector representations that encapsulate semantic meaning using an embedding layer. These vectors are then fed into convolutional layers that progressively refine the feature maps, catching increasingly complex patterns in the text. After the feature maps are flattened, fully connected layers process the data, culminating in a softmax layer that outputs a probability distribution over the possible classes. This model balances capturing intricate textual features and maintaining computational efficiency, making it well suited for various text classification applications [55]. Below is a detailed breakdown of each layer’s functionality and the corresponding mathematical equations.

3.2.1. Embedding Layer

The embedding layer is crucial in transforming the raw text input into a dense vector space where numerical vectors represent words. The input dimension is set to 10,000, corresponding to the vocabulary size—the number of unique words in the dataset. The output dimension is set to 128, determining the length of each word’s embedding vector. This vector captures semantic relationships between words in the dataset, allowing the model to better understand their meaning in context. The input sequences are of a maximum length of 300, meaning that each text input is truncated or padded to 300 words.
The functionality of the embedding layer is to map each word from its one-hot encoded representation into a dense vector space. The embedding matrix, denoted by W e , is learned during training. This matrix converts the sparse one-hot encoded input, X , into a continuous vector representation that captures semantic information and word relationships. The following equation can mathematically describe this transformation:
E = X · W e
where X represents the one-hot encoded input sequence, and W e is the embedding matrix of dimensions 10 , 000 × 128 .

3.2.2. Convolutional Layers

The first convolutional layer (Conv1D Layer 1) applies 128 filters with a kernel size of 5, making it suitable for detecting local patterns in the input text sequences. This layer performs a 1D convolution operation that scans the input embeddings and captures features like word n-grams. After the convolution operation, a Rectified Linear Unit, ReLU, activation function is applied to introduce non-linearity, which helps the network model more complex patterns in the data. The mathematical operation of this layer is given by:
C 1 = ReLU E W c 1 + b c 1
where W c 1 represents the convolutional filter of size 128 × 5 , b c 1 is the bias term, and ∗ denotes the convolution operation.
After Conv1D Layer 1, the MaxPooling1D layer is applied to reduce the dimensionality of the feature maps. It does this by taking the maximum value from non-overlapping regions of the input, thereby retaining the most essential features while reducing the computation and preventing overfitting. In this case, the pooling operation has a pool size of 2, meaning it reduces the output size by half. The equation for this operation is:
P 1 = MaxPool ( C 1 , 2 )
where the size of C 1 is reduced by a factor of 2 through this pooling process.
The second convolutional layer (Conv1D Layer 2) applies 64 filters with a smaller kernel size 3. This layer is designed to capture more fine-grained details and patterns from the feature maps produced by the first layer. Like Conv1D Layer 1, a ReLU activation function is applied after the convolution operation to introduce non-linearity. The equation governing this operation is:
C 2 = ReLU P 1 W c 2 + b c 2
where W c 2 is the convolutional filter of size 64 × 3 , and b c 2 is the bias term.
A second MaxPooling1D layer follows the Conv1D Layer 2 layer to further reduce the feature maps’ dimensionality, making the data more manageable before passing them to the dense layers for classification. This layer again uses a pool size of 2 to reduce the size of the feature maps by half. The operation can be expressed as:
P 2 = MaxPool ( C 2 , 2 )
This ensures that the model retains the most salient features while minimizing the input size for the successive layers.

3.2.3. Flatten Layer

The flattening layer serves the critical role of transforming the 2D output of the convolutional layers into a 1D vector, which is necessary for feeding the data into fully connected layers for classification or regression tasks. This conversion is essential because fully connected layers expect input in a flattened format, meaning that all spatial dimensions of the feature maps are collapsed into a single vector. The transformation performed by the flattened layer can be mathematically represented as:
F = Flatten ( P 2 )
where P 2 represents the output of the second MaxPooling layer, and F is the resulting 1D vector that retains all the critical features extracted from the previous convolutional and pooling layers. This 1D vector F is then passed to the subsequent dense layers for further processing.

3.2.4. Fully Connected Layers

The fully connected layers, also known as dense layers, play a key role in transforming the high-level features extracted by the convolutional and pooling layers into predictions. The first dense layer (Dense Layer 1) contains 128 units and uses the ReLU activation function. This layer applies a linear transformation to the flattened input and follows it with the ReLU activation, allowing the network to capture high-level abstract features. The operation performed by this layer can be represented mathematically as:
D 1 = ReLU ( F · W d 1 + b d 1 )
where W d 1 is the weight matrix of size Flattened Dimension × 128 , and b d 1 is the bias term. The output of this layer, D 1 , represents the non-linear transformation of the input feature vector.
To prevent overfitting, a Dropout Layer is applied next. This layer randomly sets 50% of the units from Dense Layer 1 to zero during training, forcing the model to be less reliant on specific neurons and improving generalization. The dropout operation can be expressed as:
D 1 = Dropout ( D 1 , 0.5 )
where 0.5 represents the dropout rate, meaning half of the units are randomly dropped during each training iteration.
The final layer, Dense Layer 2, is responsible for mapping the learned features to the output classes. The number of units in this layer varies depending on the classification task: it can be 5, 3, or 2. This layer uses the Softmax activation function to produce a probability distribution over the possible output classes. The operation for Dense Layer 2 is defined as:
D 2 = Softmax ( D 1 · W d 2 + b d 2 )
where W d 2 is the weight matrix of size 128 × Number of Classes , and b d 2 is the bias term. The output, D 2 , represents the probability distribution over the target classes, allowing the model to make final predictions.

3.2.5. Model Compilation

The model is compiled with categorical cross-entropy as the loss function, Adam as the optimizer, and accuracy as the evaluation metric. This configuration is typical for multi-class classification tasks where the goal is to assign each input sequence to one of several classes.
This architecture is carefully designed to balance extracting meaningful features from the input text and the classification task. It combines convolutional layers for feature extraction and dense layers for decision-making. Figure 2 shows the architecture of the proposed CNN model.
Algorithm 2 outlines implementing a CNN model designed for text classification tasks. The model leverages WDE to transform textual data into dense vectors, which are then processed through convolutional and pooling layers to extract meaningful features. The output is flattened and passed through fully connected layers, culminating in a softmax layer that provides the final classification. The model is compiled, trained, and evaluated using standard DL procedures, including the Adam optimizer and categorical cross-entropy loss function. The following steps provide a high-level overview of the model’s structure and operations:
Algorithm 2 CNN Model for text classification
  1:
Initialize the model:
  2:
model ← Sequential()
  3:
Add embedding layer:
  4:
model.add(Embedding(10,000, 128, 300))
  5:
Add convolutional and pooling layers:
  6:
model.add(Conv1D(128, 5, activation=’relu’))
  7:
model.add(MaxPooling1D(pool_size=2))
  8:
model.add(Conv1D(64, 3, activation=’relu’))
  9:
model.add(MaxPooling1D(pool_size=2))
10:
Flatten the output:
11:
model.add(Flatten())
12:
Add fully connected layers:
13:
model.add(Dense(128, activation=’relu’))
14:
model.add(Dropout(0.5))
15:
model.add(Dense(num_classes, activation=’softmax’))
16:
Compile and train the model:
17:
model.compile(loss=’categorical_crossentropy’, optimizer=’Adam’, metrics=[’accuracy’])
18:
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))
19:
Evaluate the model:
20:
model.evaluate(X_test, y_test)

3.3. LSTM Model for Text Classification

This section outlines the architecture and implementation of an LSTM model designed for text classification tasks. The LSTM model is well suited for sequential data, such as text, where it effectively captures temporal dependencies and patterns over time [56]. The model begins with an embedding layer that converts words into dense vector representations, followed by multiple LSTM layers that process the sequences and capture the temporal relationships within the text [57]. The output is passed through fully connected layers to perform the final classification. Below is a detailed breakdown of each layer’s functionality and the corresponding mathematical equations.

3.3.1. Embedding Layer

The embedding layer serves the crucial role of converting input text sequences into dense vector representations, which capture the semantic meaning and relationships between words. In this case, the input dimension is set to 10,000, which corresponds to the size of the vocabulary, while the output dimension is 128, representing the size of each word’s embedding vector. The input length is 300, meaning the model processes sequences with a maximum of 300 words, truncating or padding as necessary.
The embedding layer maps each word in the input sequence from its sparse one-hot encoded form into a dense vector space, where similar words are closer together based on their context and usage in the training data. This transformation is performed by an embedding matrix, W e , which is learned during the training process. The mathematical representation of the operation performed by the embedding layer is:
E = X · W e
where X is the one-hot encoded input sequence and W e is the embedding matrix of dimensions 10 , 000 × 128 . The output E is the dense embedding of the input sequence, allowing the network to work with more informative and compact representations of the input data, enabling better learning of relationships between words.

3.3.2. LSTM Layer 1

The first LSTM layer contains 128 units and is designed to process the embedded input sequences, capturing the temporal dependencies between words across time steps. This layer is particularly effective in handling sequential data, such as text, where understanding the order of words is crucial. The Return Sequences parameter is set to True, meaning that the LSTM will output the entire sequence of hidden states rather than just the final state, allowing for the subsequent LSTM layers to continue processing the temporal relationships within the sequence. A dropout layer with a rate of 0.2 is applied to this LSTM layer to help prevent overfitting by randomly setting 20% of the units to zero during training.
The mathematical operation of the LSTM layer at each time step t can be described as:
h t , c t = LSTM ( E , h t 1 , c t 1 )
where h t is the hidden state, and c t is the cell state at time step t. These states are updated recursively based on the input embeddings E , allowing the model to maintain and propagate memory across the sequence, capturing both short- and long-term dependencies.

3.3.3. LSTM Layer 2

The second LSTM layer consists of 64 units and continues processing the sequence of hidden states generated by the first LSTM layer. Unlike the first layer, the Return Sequences parameter is set to False, meaning that this layer will only output the final hidden state at the last time step rather than the entire sequence. This design enables the model to capture high-level temporal features and compress the information from the sequence into a single output. Like the previous layer, a dropout layer with a rate of 0.2 is applied to mitigate overfitting by randomly setting 20% of the units to zero during training.
The operation of the second LSTM layer at the final time step T is described by the following equation:
h T , c T = LSTM ( h T 1 , c T 1 )
where h T and c T represent the final hidden state and cell state at the last time step T. By outputting only the final hidden state, this layer effectively summarizes the entire sequence, capturing the most relevant temporal features for the subsequent layers in the model.

3.3.4. Dense Output Layer

The final layer of the network is the Dense Output Layer, responsible for mapping the output of the second LSTM layer to the target classes for the classification task. The number of units in this layer can be 5, 3, or 2, depending on the specific classification problem being addressed (e.g., multi-class or binary classification). The Softmax activation function is applied in this layer, ensuring that the output is a probability distribution over the possible labels, where the sum of all probabilities is equal to 1. This allows the model to make predictions based on the likelihood of each class.
The mathematical operation of this dense layer is expressed as:
y = Softmax ( h T · W d + b d )
where h T is the final hidden state from the second LSTM layer, W d is the weight matrix, and b d is the bias term. The Softmax function transforms the raw scores from the dense layer into probabilities, providing the final classification output y .

3.3.5. Model Compilation

The model is compiled with categorical cross-entropy as the loss function, Adam as the optimizer, and accuracy as the evaluation metric. This configuration is typical for multi-class classification tasks where the goal is to assign each input sequence to one of several classes.
Algorithm 3 outlines the implementation steps for constructing an LSTM model for text classification tasks. The model is built sequentially, beginning with an embedding layer that transforms input text sequences into dense vector representations. This is followed by multiple LSTM layers that process these sequences, capturing the temporal dependencies inherent in the text data. Dropout layers are included to prevent overfitting by randomly deactivating neurons during training. The final dense layer applies a softmax activation to output a probability distribution over the possible classes, enabling the model to perform classification. The pseudocode provides a high-level representation of the model architecture, illustrating the essential operations and layers involved in the construction and training of the LSTM model.
Algorithm 3 LSTM Model for text classification
1:
Initialize the model:
2:
model ← Sequential()
3:
Add embedding layer:
4:
model.add(Embedding(10000, 128, 300))
5:
Add LSTM layers:
6:
model.add(LSTM(128, return_sequences=True))
7:
model.add(Dropout(0.2))
8:
model.add(LSTM(64, return_sequences=False))
9:
model.add(Dropout(0.2))
10:
Add dense output layer:
11:
model.add(Dense(num_classes, activation=’softmax’))
12:
Compile the model:
13:
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[’accuracy’])
14:
Train the model:
15:
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))
16:
Evaluate the model:
17:
model.evaluate(X_test, y_test)
Figure 3 shows the architecture of the proposed LSTM model.

3.4. WDE-LSTM Model for Text Classification

This subsection details the architecture and implementation of a WDE and LSTM model designed for text classification tasks [6]. The model begins by utilizing WDE to transform input text sequences into dense vector representations, capturing the semantic relationships between words in a continuous vector space. These WDEs serve as the foundation for the subsequent LSTM layers, which process the sequences and capture temporal dependencies inherent in the text data. The output of the LSTM layers is then passed through fully connected dense layers to perform the final classification [58]. Below is a detailed breakdown of each layer’s functionality and the corresponding mathematical equations.

3.4.1. Embedding Layer

The embedding layer plays a fundamental role in the model by converting each word in the input sequences into dense vector representations that capture the semantic relationships between words. The input dimension is set to vocab_size, which corresponds to the size of the vocabulary, and the output dimension is 128, meaning each word is represented by a 128-dimensional embedding vector. The input length is 300, so the model processes sequences with a maximum length of 300 words, truncating or padding as needed.
This layer is trainable, meaning that the embedding vectors are updated during training, allowing the model to fine-tune these word representations to better fit the specific task of text classification. The embeddings not only capture individual word meanings but also the relationships between words based on their context, making it easier for the model to learn patterns and semantic connections.
The operation of the embedding layer is mathematically described by the following equation:
E = X · W e
where X is the one-hot encoded input sequence, and W e is the embedding matrix of dimensions vocab_size × 128 . The resulting matrix E is a dense representation of the input sequence, allowing the network to work with meaningful vectorized representations of the words.

3.4.2. LSTM Layer 1

The first LSTM layer contains 128 units and is responsible for processing the embedded input sequences, capturing the temporal dependencies between words across time steps. By setting Return Sequences to True, the layer outputs the entire sequence of hidden states, which are then passed to the next LSTM layer. This allows the model to build upon the temporal patterns learned in this layer, enabling more comprehensive learning of the sequential structure in the text.
A dropout layer with a dropout rate of 0.2 is applied to this LSTM layer to prevent overfitting by randomly dropping 20% of the units during training. This regularization technique ensures the model generalizes well to unseen data, improving its robustness.
The operation of the LSTM layer at each time step t is governed by the following equation:
h t , c t = LSTM ( E , h t 1 , c t 1 )
where h t is the hidden state and c t is the cell state at time step t. These states are updated recursively, allowing the LSTM to maintain a memory of previous time steps while processing the current input, thus capturing both short-term and long-term dependencies in the sequence.

3.4.3. LSTM Layer 2

The second LSTM layer has 64 units and is designed to further process the sequence of hidden states passed from the first LSTM layer. Unlike the first LSTM layer, the Return Sequences parameter is set to False, meaning this layer outputs only the final hidden state at the last time step, T. This focus on the final hidden state allows the model to capture higher-level temporal features that summarize the entire sequence, reducing the dimensionality of the output and streamlining it for the subsequent dense layer.
Similar to the first LSTM layer, a dropout layer with a dropout rate of 0.2 is applied to prevent overfitting, ensuring that the model generalizes well to new data.
The operation of the LSTM layer at the final time step T is described by the following equation:
h T , c T = LSTM ( h T 1 , c T 1 )
where h T and c T represent the final hidden and cell states at time step T. By returning only the final hidden state, this layer effectively compresses the sequential information into a more compact representation for the subsequent layers.

3.4.4. Dense Output Layer

The dense output layer is the final component of the model and plays a crucial role in mapping the processed features from the LSTM layers to the target classes. The number of Units in this layer can be 5, 3, or 2, depending on the specific classification task. The output of this layer is a probability distribution over the possible classes, achieved through the use of the Softmax activation function. This transformation ensures that the model’s predictions sum to 1, making it suitable for multi-class classification problems where each prediction represents the probability of a specific class.
The operation of the dense output layer can be described by the following equation:
y = Softmax ( h T · W d + b d )
where h T is the final hidden state output from the last LSTM layer, W d is the weight matrix, and b d is the bias term for the dense layer. The Softmax function transforms the output into probabilities, enabling the model to make actionable predictions by selecting the class with the highest probability.

3.4.5. Model Compilation

The model is compiled using categorical cross-entropy as the loss function, Adam as the optimizer, and accuracy as the evaluation metric. This configuration is particularly suitable for multi-class classification tasks, which aim to predict the correct class for each input sequence.
Algorithm 4 provides a step-by-step guide to implementing the WDE and LSTM model for text classification tasks. This model architecture begins by embedding input text sequences into dense vector representations, which are then processed by sequential LSTM layers. The LSTM layers are designed to capture and retain temporal dependencies within the text, leveraging the power of recurrent neural networks to handle sequential data. Dropout layers are strategically placed to prevent overfitting by randomly deactivating neurons during training. The final dense layer applies a softmax activation to produce a probability distribution over the possible output classes, enabling the model to classify the input text. The pseudocode encapsulates the essential operations and decisions in constructing, compiling, training, and evaluating the WDE-LSTM model.
Algorithm 4 WDE-LSTM Model for text classification
  1:
Initialize the model:
  2:
model ← Sequential()
  3:
Add embedding layer:
  4:
model.add(Embedding(vocab_size, 128, 300))
  5:
Add LSTM layers:
  6:
model.add(LSTM(128, return_sequences=True))
  7:
model.add(Dropout(0.2))
  8:
model.add(LSTM(64, return_sequences=False))
  9:
model.add(Dropout(0.2))
10:
Add dense output layer:
11:
model.add(Dense(num_classes, activation=’softmax’))
12:
Compile the model:
13:
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[’accuracy’])
14:
Train the model:
15:
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))
16:
Evaluate the model:
17:
model.evaluate(X_test, y_test)
Figure 4 shows the architecture of the proposed WDE-LSTM model.

3.5. WDE-CNN-LSTM Model for Text Classification

This section outlines the architecture and implementation of a model that combines WDE, CNN, and LSTM layers for text classification tasks [59]. The model begins by embedding the input text sequences into dense vector representations. These embeddings are then processed by two parallel branches: a CNN branch that extracts local features through convolutional operations and an LSTM branch that captures temporal dependencies within the text. The outputs of these branches are concatenated and passed through a final dense layer for classification [60]. This hybrid architecture leverages the strengths of both CNNs and LSTMs, enabling it to handle both local and sequential patterns in text data effectively [61]. Below is a detailed breakdown of each layer’s functionality and the corresponding mathematical equations.

3.5.1. Input Layer

The input layer is the first layer of the model and is responsible for accepting the raw text sequences that will be fed into the network. The Input Shape is defined by max_length, which represents the maximum length of the input sequences. If any input sequence is shorter than this maximum length, it will be padded; if it is longer, it will be truncated. This layer serves as the entry point for the model, preparing the text data to be processed by the subsequent embedding, CNN, and LSTM layers. The input layer does not apply transformations but defines the shape of the data entering the network.

3.5.2. Embedding Layer

The embedding layer is vital in converting the input text sequences into dense vector representations. The Embedding Dimension is set to 128, meaning each word in the input sequence is represented as a 128-dimensional vector. This layer captures the semantic meaning and relationships between words by mapping them into a continuous vector space, where words with similar meanings are positioned closer together. The embeddings created by this layer provide the foundational input to both the CNN and LSTM branches of the model, enabling these subsequent layers to process text data more effectively.
The mathematical operation of the embedding layer can be described as:
E = X · W e
where X represents the input sequence and W e is the embedding matrix with dimensions vocab_size ×   128 . The resulting matrix E is a dense representation of the input sequence, facilitating better learning in the downstream layers.

3.5.3. CNN Branch

The CNN branch of the model begins with a Conv1D Layer, which applies 32 convolutional filters with a kernel size of 3 to the embedded sequences. This layer extracts local features, such as n-grams, by scanning the input text with the filters. The convolution operation is followed by a ReLU activation function to introduce non-linearity, allowing the model to capture more complex patterns in the data. The operation of the convolutional layer can be expressed as:
C = ReLU ( E W c + b c )
where W c is the convolutional filter and b c is the bias term applied during the convolution.
Following the convolutional layer, a MaxPooling1D Layer with a pool size of 2 is used to reduce the dimensionality of the feature maps. This pooling operation selects the maximum value from non-overlapping regions of the feature maps, thereby retaining the most prominent features while reducing the computational complexity. The following equation defines the pooling operation:
P = MaxPool ( C , 2 )
Lastly, the output from the pooling layer is passed through a Flatten Layer, which reshapes the 2D feature maps into a 1D vector. This flattening is necessary to prepare the data for concatenation with the output from the LSTM branch, ensuring the model can effectively combine information from both the CNN and LSTM branches.

3.5.4. LSTM Branch

The LSTM branch starts with an LSTM Layer, which consists of 128 units. This layer processes the embedded sequences and captures the temporal dependencies across the sequence. The LSTM’s ability to maintain information over long sequences makes it ideal for modeling dependencies in sequential data, such as text. The output from the LSTM is the final hidden state, h T , at the last time step T, while the cell state c T is also updated during the process. The operation of the LSTM layer is represented as:
h T , c T = LSTM ( E , h T 1 , c T 1 )
where T is the final time step.
Next, the Concatenate Layer merges the outputs from both the CNN and LSTM branches. This layer takes the local features extracted by the CNN and combines them with the temporal dependencies captured by the LSTM, resulting in a comprehensive feature representation. The concatenated vector F contains information from both branches, which provides a more holistic view of the input data. The operation for concatenation is expressed as:
F = Concat ( P , h T )
where P is the output from the CNN branch, and h T is the final hidden state from the LSTM branch.
Finally, the Dense Output Layer takes the concatenated features and maps them to the target classes. The number of units in this layer depends on the classification task, with possible values being 5, 3, or 2, and a Softmax activation function is applied to generate a probability distribution over the possible labels. The output y is a vector representing the predicted probabilities for each class. The mathematical operation of the dense layer is described as:
y = Softmax ( F · W d + b d )
where W d is the weight matrix, and b d is the bias term for the dense layer.
The model is compiled using categorical cross-entropy as the loss function, Adam as the optimizer, and accuracy as the evaluation metric. This configuration is tailored for multi-class classification tasks, aiming to accurately assign input sequences to one of the possible classes.
Algorithm 5 provides a step-by-step guide to implementing the WDE-CNN-LSTM model for text classification tasks. This model architecture leverages the strengths of WDE, CNN, and LSTM layers to process and classify textual data. The pseudocode begins by initializing the model and defining the input and embedding layers. It then outlines the construction of two parallel branches: a CNN branch that extracts local features through convolutional operations and an LSTM branch that captures temporal dependencies within the text sequences. These branches are concatenated, combining their respective outputs to create a comprehensive feature set passed through a dense layer for final classification. The pseudocode encapsulates the critical operations required to build, compile, train, and evaluate the WDE-CNN-LSTM model, making it a robust solution for various text classification applications.
Algorithm 5 WDE-CNN-LSTM Model for text classification
  1:
Initialize the model:
  2:
model ← Sequential()
  3:
Add input layer:
  4:
model.add(InputLayer(input_shape=(max_length)))
  5:
Add embedding layer:
  6:
model.add(Embedding(vocab_size, 128, input_length=max_length))
  7:
Create CNN branch:
  8:
conv_branch ← Conv1D(32, 3, activation=’relu’)(model.output)
  9:
conv_branch ← MaxPooling1D(pool_size=2)(conv_branch)
10:
conv_branch ← Flatten()(conv_branch)
11:
Create LSTM branch:
12:
lstm_branch ← LSTM(128)(model.output)
13:
Concatenate branches:
14:
combined ← Concatenate()([conv_branch, lstm_branch])
15:
Add dense output layer:
16:
model.add(Dense(num_classes, activation=’softmax’)(combined))
17:
Compile the model:
18:
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[’accuracy’])
19:
Train the model:
20:
model.fit(X_train, y_train, batch_size=32, epochs=10, validation_data=(X_val, y_val))
21:
Evaluate the model:
22:
model.evaluate(X_test, y_test)
Figure 5 shows the architecture of the proposed WDE-CNN-LSTM model.

3.6. Consistency Check

In recent years, SA has become a critical tool for understanding consumer behavior by analyzing consumer reviews. Consistency in these reviews is paramount, as inconsistencies can distort insights derived from SA, leading to unreliable conclusions. This section focuses on the application of SA to assess the consistency of consumer reviews in the utilized dataset, aiming to identify and address potential discrepancies. By analyzing the alignment between customer sentiments and their corresponding review ratings, the research seeks to enhance the reliability of sentiment classification techniques, thereby providing more accurate and actionable insights for businesses [41].
Algorithm 6 outlines the steps for performing SA on textual data using the TextBlob library, followed by consistency labeling based on the sentiment score and a predefined scoring metric [62]. The algorithm begins by calculating text data’s sentiment polarity, assigning sentiment scores to each entry, and then checking for consistency between the sentiment and an existing score. The final output includes the original data augmented with sentiment scores, consistency labels, and a calculated consistency percentage. This process is essential for assessing the alignment between sentiment and scoring in text data, providing valuable insights for various text analysis applications.
Algorithm 6 Sentiment analysis and consistency labeling
  1:
Import the TextBlob library for sentiment analysis.
  2:
Define a function  get_sentiment  that calculates sentiment polarity:
  3:
function get_sentiment(text)
  4:
    a. Create a sentiment object from the input text using TextBlob.
  5:
    b. Obtain the polarity score (ranging from −1 to 1).
  6:
    c. Return the polarity score.
  7:
end function
  8:
Apply the  get_sentiment  function to the ’Text’ column of the DataFramedfand store the results in a new column’sentiment_score’:
  9:
for all row in df do
10:
    Obtain the text from the current row.
11:
    Calculate the sentiment score using get_sentiment.
12:
    Store the sentiment score in the new column ’sentiment_score’.
13:
end for
14:
Define a function  label_consistency  to check for consistency between sentiment and score:
15:
function label_consistency(row)
16:
    a. If sentiment_score > 0.5 and Score < 3, or if sentiment_score < 0 and Score > 3, return 0 (inconsistent).
17:
    b. Otherwise, return 1 (consistent).
18:
end function
19:
Apply the  label_consistency  function to each row of the DataFrame  df  and store the results in a new column  ’consistency_label’  :
20:
for all row in df do
21:
    Calculate the consistency label using label_consistency.
22:
    Store the consistency label in the new column ’consistency_label’.
23:
end for
24:
Display the DataFrame with the columns ’Text’, ’Score’, ’sentiment_score’, and ’consistency_label’.
25:
Calculate the consistency percentage:
26:
Take the mean of the ’consistency_label’ column and multiply by 100.
27:
Output the consistency percentage.

4. Experimental Results and Analysis

This section presents the experimental results for the proposed models, including CNNs, LSTM, WDE-LSTM, and WDE-CNN-LSTM. The models were assessed using training, validation, and testing datasets. The conclusions were drawn based on the average values of the evaluation metrics. The working environment parameters are outlined in Section 4.1 and Section 4.2, while Section 4.3 details the performance metrics. Section 4.4 offers a comparative analysis of the models, and Section 4.5 presents the results of the statistical consistency analysis. Finally, Section 4.6 compares the proposed models with state-of-the-art techniques.

4.1. Working Environment

All experiments were carried out on the Google Colab platform, leveraging the computational power of an NVIDIA T4 GPU to ensure efficient processing and faster model training. The code environment was configured using Python version 3.10.12, the primary programming language. Keras version 3.3.3 was employed as the high-level neural network API, facilitating the implementation of DL models with simplicity and flexibility. TensorFlow version 2.15.0 provided the underlying framework for building and training the models, offering powerful tools for machine learning, including support for automatic differentiation, model optimization, and GPU acceleration. This setup allowed for the seamless integration and execution of various DL tasks, from model definition to training and evaluation.

4.2. Parameters Settings

Table 3 provides a summary of the hyperparameters utilized to implement the DL models. These parameters were fine-tuned before training to optimize model performance and behavior. The specified values were employed to conduct the experiments in Python as part of the research discussed in this paper.

4.3. Evaluation Metrics

The proposed models’ performance is assessed using various evaluation metrics, including recall, specificity, precision, accuracy, and the F 1 -score. These metrics are calculated based on common evaluation parameters for predictive models, such as True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) [63,64].
Recall [65] is defined as the ratio of TP to the sum of TP and FN, and is mathematically represented by Equation (24):
Recall = T P T P + F N
Precision [66] is computed by dividing TP by the sum of TP and FP, as expressed in Equation (25):
Precision = T P T P + F P
Accuracy is determined using Equation (26):
Accuracy = T N + T P T P + F P + T N + F N × 100
F 1 -score [65] is the harmonic mean of precision and recall, calculated as shown in Equation (27):
F 1 - s c o r e = 2 × T P 2 × T P + F P + F N

4.4. Comparative Analysis

This paper proposes a system that is requested to handle customer review issues. For this purpose, different DL models were deployed. The objective is to achieve a DL model with optimal detection, accuracy, and consistency. The proposed models are based on binary and multiple classification cases to validate them with simple and robust issues. Each case will be discussed in detail in the subsections that follow.

4.4.1. Performance Results of the Proposed CNN Model

The evaluation metrics for the CNN model, as shown in Table 4, demonstrate the model’s effectiveness across 2, 3 and 5 score classes. For the binary classification (Score 2), the CNN achieved near-perfect precision, recall, and F 1 -scores for both classes, indicating minimal errors in detection. In the multi-class classification cases (Scores 3 and 5), the model maintained strong performance across all classes, with slightly lower precision and recall in some categories, particularly for Class 3 and 4 in Score 5. Despite this, the F 1 -score remain high, suggesting a well-balanced performance for the CNN model across all tasks.
The confusion matrices in Figure 6 further clarify the CNN model’s classification accuracy. In Figure 6a, the binary classification confusion matrix reveals that the model has high accuracy, with very few misclassifications between Class 0 and Class 1. Similarly, Figure 6b shows the confusion matrix for the multi-class problem with three classes, where most predictions are correctly classified with only a few errors. Figure 6c expands the analysis to five classes, showing that while the model handles most predictions correctly, there are some noticeable confusions between closely related classes, particularly in the lower diagonal. These matrices highlight the CNN model’s robustness in binary and multi-class classification scenarios.

4.4.2. The Performance Results of the Proposed LSTM Model

Table 5 shows the evaluation metrics for the LSTM mode with the 2, 3 and 5 score classes. At Score 2, the model exhibits high performance, with precision, recall, and F 1 -scores consistently at 0.98–0.99 across all metrics and classes, reflecting balanced classification ability. Scores 3 and 5 reveal slightly lower yet still steady performance, particularly with a decline in precision and F 1 -scores for specific classes (e.g., Class 3 in Score 5 with 0.87 precision and 0.88 F 1 -score). The overall accuracy is notably high across all scores, with the model achieving up to 98% accuracy for Scores 2 and 3 and a slightly reduced accuracy of 94% at Score 5.
Figure 7 presents three confusion matrices illustrating the performance of the LSTM model in multi-class classification tasks. Figure 7a represents the confusion matrix for a binary classification problem, where the model has successfully classified a vast majority of instances correctly, with minor misclassifications observed between the two classes, as indicated by the non-zero off-diagonal values. Figure 7b expands this to a three-class classification task, where the model performs well. Still, a slight increase in misclassifications is evident, particularly in Class 2, with some instances being confused between neighboring classes. Figure 7c demonstrates the confusion matrix for a more complex five-class classification task. As expected with increasing class numbers, the misclassification rate grows, notably with some confusion between Classes 3 and 4 and between Classes 4 and 5, as shown by the higher off-diagonal values in these regions. These matrices collectively highlight that while the LSTM model maintains strong classification capabilities, the complexity of task introduces more opportunities for misclassification, as observed through the decreasing clarity in the distinction between some classes.

4.4.3. The Performance Results of the Proposed WDE-LSTM Model

Table 6 presents the evaluation metrics for the proposed WDE-LSTM model, covering precision, recall, and F 1 -scores across three classification tasks with 2, 3, and 5 classes. For the binary classification task (Score 2), the model consistently performs with precision, recall, and F 1 -scores at 0.97 to 1.00, leading to an overall accuracy of 98%. For the three-class classification task (Score 3), the model maintains similarly high performance across all metrics, with minimal macro, micro, and weighted-average variation, achieving 98% accuracy. However, for the more complex five-class classification task (Score 5), the model’s performance slightly declines, particularly in Class 4, where precision and F 1 -scores drop to 0.88, contributing to a lower overall accuracy of 94%.
Figure 8 demonstrates strong classification performance across binary, three-class, and five-class tasks. While achieving high accuracy in binary classification, slight misclassifications appear in the three-class scenario, and these increase in the five-class task, particularly between neighboring classes. Despite the growing complexity, the model maintains robust accuracy, showcasing its generalization capabilities across varying classification challenges.
Table 7 presents the evaluation metrics for the proposed WDE-CNN-LSTM model across binary, three-class, and five-class classification tasks. In the binary classification task (Score 2), the model demonstrates exceptional performance, with precision, recall, and F 1 -scores close to or at 1.00, resulting in an accuracy of 98%. For the three-class classification task (Score 3), the model maintains similarly high performance, with precision, recall, and F 1 -scores consistently at 0.98–0.99, achieving an overall accuracy of 98%. The performance remains consistent even in the five-class classification task (Score 5), where the model shows robust precision, recall, and F 1 -scores across all classes, again achieving 98% accuracy. The minimal variation in the macro, micro, and weighted averages across all tasks suggests that the WDE-CNN-LSTM model exhibits strong generalization capabilities and effectively handles binary and multi-class classification scenarios. In conclusion, the WDE-CNN-LSTM model demonstrated the highest performance among the proposed models.
Figure 9 illustrates the visual results of the WDE-CNN-LSTM model across three classification tasks: Figure 9a illustrates the confusion matrix for the binary classification task, showing near-perfect accuracy with minimal misclassification; Figure 9b illustrates the confusion matrix for the three-class task, where the model maintains high accuracy but exhibits some minor misclassifications between neighboring classes; and Figure 9c illustrates the confusion matrix for the five-class task, reflecting a more significant number of misclassifications, particularly between Classes 4 and 5, though the majority of instances are still accurately classified. These figures collectively demonstrate the model’s robust performance across varying classification complexities.
Figure 10, Figure 11 and Figure 12 depict the training, validation accuracy, and loss curves for the WDE-CNN-LSTM model across different classification tasks (binary, three-class, and five-class). In Figure 10, for the binary classification task (Score 2), Figure 10a shows that training accuracy increases steadily and converges near 1.0, while validation accuracy levels off at a slightly lower value, indicating good generalization with minimal overfitting. Figure 10b reveals that training loss decreases rapidly. In contrast, validation loss decreases initially but stabilizes, indicating that the model has reached a balance between fitting the data and avoiding overfitting. Figure 11, for the three-class task (Score 3), follows a similar trend in Figure 11a, with both training and validation accuracy converging at high values. In contrast, Figure 11b shows a steep decline in training loss with a more gradual stabilization of validation loss. Finally, Figure 12, for the five-class task (Score 5), illustrates in Figure 12a that training accuracy continues to improve. Still, the gap between training and validation accuracy slightly widens, indicating increased complexity. The training loss declines sharply in Figure 12b. In contrast, the validation loss decreases and then plateaus, suggesting that while the model learns effectively, the increased complexity of the classification task poses challenges to further minimizing validation loss. These plots highlight the model’s effectiveness in training across various classification tasks while demonstrating the trade-off between model complexity and generalization performance.
Table 8 presents a comparison between Scores 2, 3, and 5 across the models (CNN, LSTM, WDE-LSTM, and WDE-CNN-LSTM), revealing that lower scores (2 and 3) result in consistently high performance across all metrics, particularly for WDE-LSTM and WDE-CNN-LSTM, which maintain precision, recall, and F 1 -scores above 98%. At Score 2, CNN shows lower accuracy (91.19%) compared to the other models, which maintain higher accuracy. As the score increases to 3, all models improve in accuracy, with WDE-CNN-LSTM achieving the best results across all metrics (98.26%). However, at Score 5, there is a general decline in performance across all models, with CNN experiencing the most significant drop, particularly in accuracy. At the same time, WDE-CNN-LSTM retains a relatively stronger performance (around 95%), making it the most robust model across varying score levels.

4.5. Statistical Consistency Analysis

This section assesses the reliability of the SA model when applied to consumer reviews by evaluating the alignment between sentiment polarity scores and the corresponding user-assigned ratings. The consistency was determined by analyzing how well the sentiment scores, derived from the text, matched the numerical ratings provided by the users. The analysis revealed that 96.00% of the reviews were classified as consistent, indicating a strong correlation between the calculated sentiment polarity and the user-assigned scores. This high level of consistency demonstrates that the SA model is highly effective in accurately capturing the sentiment conveyed in the reviews and aligning it with the users’ ratings.
The strong consistency rate of 96.00% highlights the model’s ability to reliably reflect the underlying sentiment of customer feedback, showing that the sentiment scores closely correspond to the assigned ratings.

4.6. Comparison with the State-of-the-Art Models

A concise comparison of the proposed models is conducted based on evaluation metrics to identify the optimal model. The metrics considered include precision, recall, F 1 -score, and accuracy, as shown in Table 9. It is evident that the proposed models, WDE-CNN-LSTM, WDE-LSTM, and LSTM, outperform the traditional machine learning and DL models across binary, three-class, and five-class classification tasks.
The WDE-CNN-LSTM model demonstrates superior performance, achieving 98% accuracy and an F 1 -score of 98.26% for binary and three-class tasks. Even in the more complex five-class task, the model maintains a high 95.21% accuracy and 95.20% F 1 -score. The WDE-LSTM model follows closely, with 98.18% accuracy for binary classification and 93.55% accuracy for five-class tasks, showing a minor drop in performance in more complex scenarios. The LSTM model within the proposed framework also performs well, achieving 98% accuracy for binary classification and 93.53% accuracy for five-class tasks, making it reliable for multi-class scenarios.
The proposed hybrid models show significant improvements compared to traditional models like CNNs and Text-CNN. CNNs achieve 95% accuracy in binary classification but decline to 93.35% accuracy in five-class tasks. Text-CNN, as shown in [67], consistently performs at 85% accuracy across tasks, highlighting the performance gap between traditional and hybrid approaches. Similarly, while performing well with 91% accuracy in binary classification, BERT struggles in more complex tasks, maintaining the same accuracy across five-class scenarios.
The Novel DL Model for Inconsistency Detection by [6] excels in five-class classification, achieving 99% accuracy and a 97% F 1 -score, setting a new benchmark for performance in complex multi-class classification tasks.
These results align with prior studies where traditional models, such as CNNs and LSTM, struggle with more complex class distributions, as highlighted by [68,69]. Integrating Weighted Differential Evolution (WDE) in the proposed models enhances feature selection and convergence, enabling superior performance across diverse class distributions. This is particularly evident in the multi-class tasks where hybrid models, such as WDE-CNN-LSTM, consistently outperform traditional approaches.
Table 9. Comparison with the state-of-the-art models (Classes 2, 3, and 5).
Table 9. Comparison with the state-of-the-art models (Classes 2, 3, and 5).
MethodSizeNo. of ClassesAccuracyPrecisionRecall F 1 -Score
WDE-CNN-LSTM (Proposed)568,45420.980.980.980.98
30.980.980.980.98
50.950.950.950.95
WDE-LSTM (Proposed)568,45420.980.980.980.98
30.980.980.980.98
50.940.930.930.94
LSTM (Proposed)568,45420.980.980.980.98
30.980.980.980.98
50.940.930.930.93
CNNs (Proposed)568,45420.980.980.980.98
30.980.970.970.97
50.930.930.930.93
Text-CNN [70]72,50050.85--0.85
Bi-LSTM [70]72,50050.90--0.90
BERT [70]72,50050.91--0.89
LSTM [71]29,1632-0.860.900.88
3-0.760.580.56
Novel DL Model [6]568,45420.950.890.920.90
30.950.780.810.79
50.920.660.690.67
WDE-CNN [72]11,75420.88--0.88
WDE-LSTM [72]11,75420.88--0.88

5. Discussion

Applying the WDE-CNN-LSTM model in SA for consumer reviews provides practical value in several key areas of business operations. By leveraging this advanced model, e-commerce platforms and service providers can automate classifying customer sentiment with greater accuracy. This enables businesses to quickly identify trends in customer satisfaction, dissatisfaction, and areas needing improvement, facilitating timely and informed decision-making. The model’s high performance, particularly in multi-class classification tasks, can directly support customer service departments by providing real-time product or service quality feedback. Moreover, the model’s consistent results across different sentiment classes help businesses ensure that their SA tools remain reliable even as the volume and complexity of customer feedback grow.
The following recommendations guide the practical implementation and optimization of the WDE-CNN-LSTM model for SA in consumer reviews. These points focus on maximizing the model’s efficiency and ensuring its adaptability to various operational environments:
  • Integrate with CRM Systems: Integrate the SA model with customer relationship management (CRM) tools to automate feedback processing and prioritize customer responses based on sentiment classification, improving operational efficiency.
  • Automate Sentiment-Driven Actions: Utilize the model to automate the identification of critical reviews, particularly those with negative sentiments, enabling proactive intervention and enhancing customer satisfaction.
  • Regular Model Updates: Continuously update the model with new customer data to account for changes in language patterns and sentiment trends, ensuring the model remains accurate and relevant over time.
  • Optimize Resource Allocation: Apply the SA results to optimize resource allocation in customer service by prioritizing immediate issues, streamlining workflow, and improving response times.
  • Enhance Decision-Making Processes: Leverage insights from the model to inform data-driven decision-making in product development, marketing strategies, and customer engagement initiatives, aligning business actions with customer preferences and sentiments.

6. Conclusions and Future Work

The paper presents a novel hybrid deep learning model, WDE-CNN-LSTM, significantly enhancing customer reviews’ sentiment classification. The proposed model achieved impressive accuracy rates of 98% for binary and three-class classifications and 95.21% for five-class classifications, demonstrating its effectiveness in handling complex sentiment classification tasks (SCTs). The WDE-CNN-LSTM model outperformed standalone models in precision, recall, and F 1 -score, achieving an F 1 -score of up to 98.26% for three-class classification. This indicates a robust capability in accurately classifying sentiments. The model showed a high consistency rate of 96.00% between predicted sentiments and actual customer ratings, which is crucial for building trust in sentiment analysis systems. Furthermore, the findings suggest that the hybrid architecture can significantly improve sentiment analysis in customer review systems, leading to more reliable and accurate sentiment classification, essential for businesses aiming to understand customer feedback better. The research results are particularly beneficial for e-commerce platforms seeking to enhance customer feedback analysis, data scientists developing sentiment analysis models, and researchers exploring hybrid deep learning approaches in natural language processing.
While the proposed WDE-CNN-LSTM model demonstrates substantial improvements in sentiment classification, certain limitations warrant further investigation. One notable limitation is the computational complexity of the hybrid architecture, which may hinder its deployment in real-time applications or environments with constrained computational resources. This highlights the need for future research to explore optimization techniques, such as model compression or pruning, to reduce computational demands without compromising performance. Addressing these challenges would contribute to the broader adoption and practical implementation of the model.
Future studies could focus on optimizing the proposed hybrid model for efficiency, particularly in terms of computational requirements, to make it more accessible for real-time applications. Research should also address challenges related to class imbalances in datasets, which can affect model performance. Techniques such as data augmentation or synthetic data generation could be explored. Finally, applying the hybrid model to various domains beyond customer reviews, such as healthcare or tourism, could validate its versatility and effectiveness across different contexts. Future work could also investigate adapting the model for multilingual sentiment analysis to expand its usability across diverse markets.

Author Contributions

Conceptualization, S.E.S. and A.E.A.; methodology, S.E.S. and A.A.A.; software, A.A. and S.E.S.; validation, A.A. and A.E.-S.; formal analysis, S.E.S., A.E.-S. and A.E.A.; investigation, S.E.S., resources, A.A.A., data curation, S.E.S., A.A. and A.E.A.; writing—original draft preparation, S.E.S., A.A., A.E.-S. and A.E.A.; writing—review and editing, S.E.S., A.A., A.E.-S. and A.E.A.; visualization, S.E.S., A.A., A.E.-S. and A.E.A. funding acquisition, S.E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia, under Project Grant KFU242570.

Data Availability Statement

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Chang, H.L.; Liu, Y.L.; Keng, C.J.; Jiang, H.L. Examining Discrepancies between Online Product Ratings and Sentiments Expressed in Review Contents. Manag. Anal. Soc. Insights 2024, 1, 129–144. [Google Scholar]
  2. Kalkha, H.; Khiat, A.; Bahnasse, A.; Ouajji, H. The rising trends of smart e-commerce logistics. IEEE Access 2023, 11, 33839–33857. [Google Scholar] [CrossRef]
  3. Rejeb, A.; Simske, S.; Rejeb, K.; Treiblmaier, H.; Zailani, S. Internet of Things research in supply chain management and logistics: A bibliometric analysis. Internet Things 2020, 12, 100318. [Google Scholar] [CrossRef]
  4. Hu, M.; Liu, B. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; pp. 168–177. [Google Scholar]
  5. Archak, N.; Ghose, A.; Ipeirotis, P.G. Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 2011, 57, 1485–1509. [Google Scholar] [CrossRef]
  6. Kassem, M.A.; Abohany, A.A.; El-Mageed, A.A.A.; Hosny, K.M. A novel deep learning model for detection of inconsistency in e-commerce websites. Neural Comput. Appl. 2024, 36, 10339–10353. [Google Scholar] [CrossRef]
  7. Chamekh, A.; Mahfoudh, M.; Forestier, G. Sentiment analysis based on deep learning in e-commerce. In Proceedings of the International Conference on Knowledge Science, Engineering and Management; Springer: Berlin/Heidelberg, Germany, 2022; pp. 498–507. [Google Scholar]
  8. Marwat, M.I.; Khan, J.A.; Alshehri, D.M.D.; Ali, M.A.; Ali, H.; Assam, M. Sentiment analysis of product reviews to identify deceptive rating information in social media: A sentideceptive approach. KSII Trans. Internet Inf. Syst. TIIS 2022, 16, 830–860. [Google Scholar]
  9. Zhang, W.; Xie, R.; Wang, Q.; Yang, Y.; Li, J. A novel approach for fraudulent reviewer detection based on weighted topic modelling and nearest neighbors with asymmetric Kullback–Leibler divergence. Decis. Support Syst. 2022, 157, 113765. [Google Scholar] [CrossRef]
  10. Lamb, Y.; Cai, W.; McKenna, B. Exploring the complexity of the individualistic culture through social exchange in online reviews. Int. J. Inf. Manag. 2020, 54, 102198. [Google Scholar] [CrossRef]
  11. Zhang, W.; Wang, Q.; Li, J.; Ma, Z.; Bhandari, G.; Peng, R. What makes deceptive online reviews? A linguistic analysis perspective. Humanit. Soc. Sci. Commun. 2023, 10, 1–14. [Google Scholar] [CrossRef]
  12. Chevalier, J.A.; Mayzlin, D. The effect of word of mouth on sales: Online book reviews. J. Mark. Res. 2006, 43, 345–354. [Google Scholar] [CrossRef]
  13. Liao, H.H.; Liu, S.C.S.; Pi, S.M.; Hu, D.C.H. Usability evaluation of an online role-playing game for teaching business ethics. J. Educ. Technol. Soc. 2013, 16, 14–26. [Google Scholar]
  14. Baek, H.; Ahn, J.; Choi, Y.K. The role of online product reviews in purchasing decisions. J. Electron. Commer. Res. 2012, 13, 289. [Google Scholar]
  15. Mezei, J.; Davoodi, L.; Nikou, S. Customer Review Analysis of Online E-commerce Platforms—A Configurational Approach. In Proceedings of the 57th Hawaii International Conference on System Sciences, Honolulu, HI, USA, 3–6 January 2024; pp. 1476–1485. [Google Scholar]
  16. Chen, Y.; Xie, J. Extracting diverse sentiment expressions with target-dependent polarity from reviews. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics: Stroudsburg, PA, USA, 2011; pp. 198–207. [Google Scholar]
  17. McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar]
  18. Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. LLMs in e-commerce: A comparative analysis of GPT and LLaMA models in product review evaluation. Nat. Lang. Process. J. 2024, 6, 100056. [Google Scholar] [CrossRef]
  19. Ni, P.; Li, Y.; Chang, V. Recommendation and sentiment analysis based on consumer review and rating. In Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines; IGI Global: Hershey, PA, USA, 2022; pp. 1633–1649. [Google Scholar]
  20. Jindal, N.; Liu, B. Opinion spam and analysis. In Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, CA, USA, 11–12 February 2008; ACM: New York, NY, USA, 2008; pp. 219–230. [Google Scholar]
  21. Ott, M.; Choi, Y.; Cardie, C.; Hancock, J.T. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA, 19–24 June 2011; pp. 309–319. [Google Scholar]
  22. Lim, E.P.; Nguyen, V.A.; Jindal, N.; Liu, B.; Lauw, H.W. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, 26–30 October 2010; pp. 939–948. [Google Scholar]
  23. Ghosh, S.; Surachawala, T.; Lerman, K. Understanding and combating link farming in the twitter social network. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; ACM: New York, NY, USA, 2012; pp. 61–70. [Google Scholar]
  24. Lu, X.; Moser, R. Excellent versus satisfied: What is the difference? J. Econ. Psychol. 2010, 31, 533–545. [Google Scholar]
  25. Lazić, A.; Milić, S.; Vukmirović, D. The Future of Electronic Commerce in the IoT Environment. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 172–187. [Google Scholar] [CrossRef]
  26. Loukili, M.; Messaoudi, F.; El Ghazi, M. Sentiment analysis of product reviews for e-commerce recommendation based on machine learning. Int. J. Adv. Soft Comput. Its Appl. 2023, 15, 1–13. [Google Scholar]
  27. Xiong, W.; Zuo, Y.; Zhang, M.; Zhang, C.; Guo, C. Research on Sentiment Analysis of E-commerce Live Comments based on Text Mining. Front. Comput. Intell. Syst. 2023, 6, 34–36. [Google Scholar] [CrossRef]
  28. Huang, H.; Zavareh, A.A.; Mustafa, M.B. Sentiment analysis in e-commerce platforms: A review of current techniques and future directions. IEEE Access 2023, 11, 90367–90382. [Google Scholar] [CrossRef]
  29. Liu, B. Sentiment Analysis and Opinion Mining; Morgan & Claypool Publishers: San Rafael, CA, USA, 2012. [Google Scholar]
  30. Pang, B.; Lee, L. Opinion Mining and Sentiment Analysis; Now Foundations and Trends: Delft, The Netherlands, 2008; Volume 2, pp. 1–135. [Google Scholar]
  31. Cambria, E.; Schuller, B.; Xia, Y.; Havasi, C. Affective computing and sentiment analysis. IEEE Intell. Syst. 2017, 31, 102–107. [Google Scholar] [CrossRef]
  32. Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
  33. Yao, L.; Xu, H.; Liu, Y. Recent advances in sentiment analysis: A review of research trends, approaches, and applications. IEEE Trans. Affect. Comput. 2019, 6, 100059. [Google Scholar]
  34. Zhang, L.; Wang, S.; Liu, B. Deep learning for sentiment analysis: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1253. [Google Scholar] [CrossRef]
  35. Yang, Y.; Li, Y.; Jiang, H.; Shen, J. A survey on sentiment analysis and opinion mining for social multimedia: Methodologies, applications, and challenges. IEEE Access 2020, 8, 211640–211660. [Google Scholar]
  36. Hochreiter, S.; Schmidhuber, J. Long short-term memory. In Proceedings of the Neural Computation; MIT Press: Cambridge, MA, USA, 1997; pp. 1735–1780. [Google Scholar]
  37. Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
  38. Hossain, M.S.; Rahman, M.F. Customer sentiment analysis and prediction of insurance products’ reviews using machine learning approaches. FIIB Bus. Rev. 2023, 12, 386–402. [Google Scholar] [CrossRef]
  39. Kaur, G.; Sharma, A. A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. J. Big Data 2023, 10, 5. [Google Scholar] [CrossRef]
  40. Dieksona, Z.A.; Prakosoa, M.R.B.; Qalby, M.S.; Putraa, M.; Achmada, S.; Sutoyoa, R. Sentiment analysis for customer review: Case study of Traveloka. Procedia Comput. Sci. 2023, 216, 682–690. [Google Scholar] [CrossRef]
  41. Patel, A.; Oza, P.; Agrawal, S. Sentiment analysis of customer feedback and reviews for airline services using language representation model. Procedia Comput. Sci. 2023, 218, 2459–2467. [Google Scholar] [CrossRef]
  42. Wang, Z.; Zhu, Y.; He, S.; Yan, H.; Zhu, Z. Llm for sentiment analysis in e-commerce: A deep dive into customer feedback. Appl. Sci. Eng. J. Adv. Res. 2024, 3, 8–13. [Google Scholar]
  43. Suhartono, D.; Purwandari, K.; Jeremy, N.H.; Philip, S.; Arisaputra, P.; Parmonangan, I.H. Deep neural networks and weighted word embeddings for sentiment analysis of drug product reviews. Procedia Comput. Sci. 2023, 216, 664–671. [Google Scholar] [CrossRef]
  44. Puh, K.; Bagić Babac, M. Predicting sentiment and rating of tourist reviews using machine learning. J. Hosp. Tour. Insights 2023, 6, 1188–1204. [Google Scholar] [CrossRef]
  45. Taherdoost, H.; Madanchian, M. Artificial intelligence and sentiment analysis: A review in competitive research. Computers 2023, 12, 37. [Google Scholar] [CrossRef]
  46. Vatambeti, R.; Mantena, S.V.; Kiran, K.; Manohar, M.; Manjunath, C. Twitter sentiment analysis on online food services based on elephant herd optimization with hybrid deep learning technique. Clust. Comput. 2024, 27, 655–671. [Google Scholar] [CrossRef]
  47. Iqbal, A.; Amin, R.; Iqbal, J.; Alroobaea, R.; Binmahfoudh, A.; Hussain, M. Sentiment analysis of consumer reviews using deep learning. Sustainability 2022, 14, 10844. [Google Scholar] [CrossRef]
  48. Adak, A.; Pradhan, B.; Shukla, N. Sentiment analysis of customer reviews of food delivery services using deep learning and explainable artificial intelligence: Systematic review. Foods 2022, 11, 1500. [Google Scholar] [CrossRef] [PubMed]
  49. Alantari, H.J.; Currim, I.S.; Deng, Y.; Singh, S. An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews. Int. J. Res. Mark. 2022, 39, 1–19. [Google Scholar] [CrossRef]
  50. Marlina, D.; Tri Basuki, K.; Mohd Zaki, Z.; Siti Farahnasihah, A. Sentiment Analysis on Natural Skincare Products. J. Data Sci. 2022, 2022, 1–17. [Google Scholar]
  51. Alzahrani, M.E.; Aldhyani, T.H.; Alsubari, S.N.; Althobaiti, M.M.; Fahad, A. Developing an Intelligent System with Deep Learning Algorithms for Sentiment Analysis of E-Commerce Product Reviews. Comput. Intell. Neurosci. 2022, 2022, 3840071. [Google Scholar] [CrossRef]
  52. Obiedat, R.; Qaddoura, R.; Ala’M, A.Z.; Al-Qaisi, L.; Harfoushi, O.; Alrefai, M.; Faris, H. Sentiment analysis of customers’ reviews using a hybrid evolutionary SVM-based approach in an imbalanced data distribution. IEEE Access 2022, 10, 22260–22273. [Google Scholar] [CrossRef]
  53. Umer, M.; Imtiaz, Z.; Ahmad, M.; Nappi, M.; Medaglia, C.; Choi, G.S.; Mehmood, A. Impact of convolutional neural network and FastText embedding on text classification. Multimed. Tools Appl. 2023, 82, 5569–5585. [Google Scholar] [CrossRef]
  54. Soni, S.; Chouhan, S.S.; Rathore, S.S. TextConvoNet: A convolutional neural network based architecture for text classification. Appl. Intell. 2023, 53, 14249–14268. [Google Scholar] [CrossRef]
  55. Magalhães, D.; Lima, R.H.; Pozo, A. Creating deep neural networks for text classification tasks using grammar genetic programming. Appl. Soft Comput. 2023, 135, 110009. [Google Scholar] [CrossRef]
  56. Barik, K.; Misra, S.; Ray, A.K.; Bokolo, A. LSTM-DGWO-Based Sentiment Analysis Framework for Analyzing Online Customer Reviews. Comput. Intell. Neurosci. 2023, 2023, 6348831. [Google Scholar] [CrossRef] [PubMed]
  57. Qayyum, H.; Ali, F.; Nawaz, M.; Nazir, T. FRD-LSTM: A novel technique for fake reviews detection using DCWR with the Bi-LSTM method. Multimed. Tools Appl. 2023, 82, 31505–31519. [Google Scholar] [CrossRef]
  58. Sahoo, C.; Wankhade, M.; Singh, B.K. Sentiment analysis using deep learning techniques: A comprehensive review. Int. J. Multimed. Inf. Retr. 2023, 12, 41. [Google Scholar] [CrossRef]
  59. Al-Qerem, A.; Raja, M.; Taqatqa, S.; Sara, M.R.A. Utilizing Deep Learning Models (RNN, LSTM, CNN-LSTM, and Bi-LSTM) for Arabic Text Classification. In Artificial Intelligence-Augmented Digital Twins: Transforming Industrial Operations for Innovation and Sustainability; Springer: Berlin/Heidelberg, Germany, 2024; pp. 287–301. [Google Scholar]
  60. Sendhilkumar, S. Developing a conceptual framework for short text categorization using hybrid CNN-LSTM based Caledonian crow optimization. Expert Syst. Appl. 2023, 212, 118517. [Google Scholar]
  61. Hasib, K.M.; Azam, S.; Karim, A.; Al Marouf, A.; Shamrat, F.J.M.; Montaha, S.; Yeo, K.C.; Jonkman, M.; Alhajj, R.; Rokne, J.G. Mcnn-lstm: Combining cnn and lstm to classify multi-class text in imbalanced news data. IEEE Access 2023, 11, 93048–93063. [Google Scholar] [CrossRef]
  62. Rodríguez-Ibánez, M.; Casánez-Ventura, A.; Castejón-Mateos, F.; Cuenca-Jiménez, P.M. A review on sentiment analysis from social media platforms. Expert Syst. Appl. 2023, 223, 119862. [Google Scholar] [CrossRef]
  63. Sorour, S.E.; Mine, T.; Goda, K.; Hirokawa, S. A predictive model to evaluate student performance. J. Inf. Process. 2015, 23, 192–201. [Google Scholar] [CrossRef]
  64. Sorour, S.E.; Mine, T.; Goda, K.; Hirokawa, S. Predicting students’ grades based on free style comments data by artificial neural network. In Proceedings of the 2014 IEEE Frontiers in Education Conference (FIE) Proceedings, Madrid, Spain, 22–25 October 2014; pp. 1–9. [Google Scholar]
  65. Amigó, E.; Gonzalo, J.; Artiles, J.; Verdejo, F. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 2009, 12, 461–486. [Google Scholar] [CrossRef]
  66. De Medeiros, A.K.A.; Guzzo, A.; Greco, G.; Van der Aalst, W.M.; Weijters, A.; Van Dongen, B.F.; Saccà, D. Process mining based on clustering: A quest for precision. In Proceedings of the Business Process Management Workshops: BPM 2007 International Workshops, BPI, BPD, CBP, ProHealth, RefMod, Semantics4ws, Brisbane, Australia, 24 September 2007; Revised Selected Papers 5; Springer: Berlin/Heidelberg, Germany, 2008; pp. 17–29. [Google Scholar]
  67. Bhuvaneshwari, P.; Rao, A.N.; Robinson, Y.H.; Thippeswamy, M. Sentiment analysis for user reviews using bi-lstm self-attention based CNN model. Multim. Tools Appl. 2022, 81, 12405–12419. [Google Scholar] [CrossRef]
  68. Zhang, C.; Lin, D.; Cao, D.; Li, S. Grammar guided embedding based Chinese long text sentiment classification. Concurr. Comput. Pract. Exp. 2021, 33, e6439. [Google Scholar] [CrossRef]
  69. Bhaskar, S.A.K.; Sharma, H.S. Fake reviews classification using deep learning. In Proceedings of the International Multi-disciplinary Conference in Emerging Research Trends (IMCERT), Karachi, Pakistan, 4–5 January 2023; Volume 1, pp. 1–8. [Google Scholar]
  70. Zou, H.; Wang, Z. A semi-supervised short text sentiment classification method based on improved Bert model from unlabelled data. J. Big Data 2023, 10, 35. [Google Scholar] [CrossRef]
  71. Saumya, S.; Singh, J.P.; Kumar, A. A machine learning model for review rating inconsistency in e-commerce websites. In Proceedings of the Data Management, Analytics and Innovation: Proceedings of ICDMAI 2020; Springer: Berlin/Heidelberg, Germany, 2021; Volume 1, pp. 221–230. [Google Scholar]
  72. Zhao, W.; Guan, Z.; Chen, L.; He, X.; Cai, D.; Wang, B.; Wang, Q. Weakly-supervised deep embedding for product review sentiment analysis. IEEE Trans. Knowl. Data Eng. 2017, 30, 185–197. [Google Scholar] [CrossRef]
Figure 1. The proposed hybrid deep learning framework for sentiment classification.
Figure 1. The proposed hybrid deep learning framework for sentiment classification.
Mathematics 12 03856 g001
Figure 2. The proposed CNN model.
Figure 2. The proposed CNN model.
Mathematics 12 03856 g002
Figure 3. The proposed LSTM model.
Figure 3. The proposed LSTM model.
Mathematics 12 03856 g003
Figure 4. The proposed WDE-LSTM model.
Figure 4. The proposed WDE-LSTM model.
Mathematics 12 03856 g004
Figure 5. The proposed WDE-CNN-LSTM model.
Figure 5. The proposed WDE-CNN-LSTM model.
Mathematics 12 03856 g005
Figure 6. The confusion matrix of the CNN model. Confusion matrix for the two scores (a); confusion matrix for the three scores (b); confusion matrix for the five scores (c).
Figure 6. The confusion matrix of the CNN model. Confusion matrix for the two scores (a); confusion matrix for the three scores (b); confusion matrix for the five scores (c).
Mathematics 12 03856 g006
Figure 7. The confusion matrix of the LSTM model. Confusion matrix for the two scores (a); confusion matrix for the three scores (b); confusion matrix for the five scores (c).
Figure 7. The confusion matrix of the LSTM model. Confusion matrix for the two scores (a); confusion matrix for the three scores (b); confusion matrix for the five scores (c).
Mathematics 12 03856 g007
Figure 8. The confusion matrix of the WDE-LSTM model. Confusion matrix for the two scores (a); Confusion matrix for the three scores (b); Confusion matrix for the five scores (c).
Figure 8. The confusion matrix of the WDE-LSTM model. Confusion matrix for the two scores (a); Confusion matrix for the three scores (b); Confusion matrix for the five scores (c).
Mathematics 12 03856 g008
Figure 9. The confusion matrix of the WDE-CNN-LSTM model. Confusion matrix for the 2 scores (a); Confusion matrix for the 3 scores (b); Confusion matrix for the 5 scores (c).
Figure 9. The confusion matrix of the WDE-CNN-LSTM model. Confusion matrix for the 2 scores (a); Confusion matrix for the 3 scores (b); Confusion matrix for the 5 scores (c).
Mathematics 12 03856 g009
Figure 10. Visual results of WDE_LSTM_CNN model for Score 2: (a) training and validation accuracy; (b) training and validation loss.
Figure 10. Visual results of WDE_LSTM_CNN model for Score 2: (a) training and validation accuracy; (b) training and validation loss.
Mathematics 12 03856 g010
Figure 11. Visual results of WDE_LSTM_CNN model for Score 3: (a) training and validation accuracy plot; (b) training and validation loss plot.
Figure 11. Visual results of WDE_LSTM_CNN model for Score 3: (a) training and validation accuracy plot; (b) training and validation loss plot.
Mathematics 12 03856 g011
Figure 12. Visual results of WDE_LSTM_CNN model for Score 5: (a) training and validation accuracy plot; (b) training and validation loss plot.
Figure 12. Visual results of WDE_LSTM_CNN model for Score 5: (a) training and validation accuracy plot; (b) training and validation loss plot.
Mathematics 12 03856 g012
Table 1. Comparative analysis of sentiment analysis studies.
Table 1. Comparative analysis of sentiment analysis studies.
AuthorMethodsDatasetInsightsLimitations
Hossain et al. (2023) [38]SVM, NB, Decision Trees, RFInsurance product reviewsEffective for large datasets; high predictive accuracyClass imbalance; difficulty with complex sentiments (e.g., sarcasm)
Kaur et al. (2023) [39]WDE, LSTM, CNNE-commerce reviewsHigh accuracy across multiple SCTsHigh computational complexity; requires extensive training datasets
Dieksona et al. (2023) [40]SVM, NB, RFTraveloka reviewsRF achieved highest accuracy; effective for binary classificationLimited diversity; struggles with neutral sentiments
Huang et al. (2023) [28]ML, DL methodsE-commerce platformsInsights into customer behavior; valuable for decision-makingHigh computational costs; issues with nuanced sentiments
Patel et al. (2023) [41]BERTAirline service reviewsEffective in capturing complex sentiments; context-aware predictionsRequires large datasets; high computational resources
Wang et al. (2024) [42]GPT-3, BERTE-commerce reviewsExcels in nuanced sentiment detectionResource-intensive; limited multimodal integration
Suhartono et al. (2023) [43]DNN, Word2Vec, GloVePharmaceutical reviewsCaptures nuanced sentiments; enhanced classification performanceHigh computational resources; requires large datasets
Puh and Bagić Babac (2023) [44]SVM, RF, Gradient BoostingTourist reviewsAccurate classification; rating predictionLimited generalization; issues with mixed/neutral sentiments
Alzahrani et al. (2022) [51]CNN-LSTM, WDE, DropoutAmazon reviewsHigh accuracy (96%); robust classificationDataset scope limits generalizability
Obiedat et al. (2022) [52]Hybrid SVM with Evolutionary TechniquesE-commerce reviewsHandles data imbalance effectively; improved recall and precisionComputational complexity; requires fine-tuning
Table 2. Description of the dataset fields.
Table 2. Description of the dataset fields.
ItemDescription
IDUnique identifier for each review.
Product IdIdentifier for the product being reviewed.
User IdUnique identifier for the user who provided the review.
Profile NameThe profile name of the user who provided the review.
HelpfulnessNumeratorThe number of users who found the review helpful.
HelpfulnessDenominatorThe number of users who voted on the review’s helpfulness.
ScoreThe rating given by the user, typically on a scale from 1 to 5, where:
1: Very Dissatisfied (52,268 reviews)
2: Dissatisfied (29,769 reviews)
3: Neutral (42,640 reviews)
4: Satisfied (80,655 reviews)
5: Very Satisfied (363,122 reviews)
TimeThe timestamp of the review, represented as seconds since epoch (1 January 1970 00:00:00 UTC).
SummaryA summary or headline of the review provided by the user.
TextThe full review text, where the user describes their experience with the product.
Table 3. Parameter settings for the proposed models.
Table 3. Parameter settings for the proposed models.
ModelParameter
CNNEpochs: ‘Automatically set’
Batch Size = 64
Total Number of Parameters = 25 , 432 , 694
Trainable Parameters = 25 , 432 , 694
Non-trainable Parameters = 0
Optimizer: Adam
Activation Function: ReLU (Conv Layers), Softmax
LSTMEpochs: ‘Automatically set’
Batch Size = 64
Total Number of Parameters = 25 , 686 , 390
Trainable Parameters = 25 , 686 , 390
Non-trainable Parameters = 0
Optimizer: Adam
Activation Function: ReLU (Dense Layer), Softmax
WDE-LSTMEpochs: ‘Automatically set’
Batch Size = 128
Total Number of Parameters = 25 , 343 , 478
Trainable Parameters = 25 , 343 , 278
Non-trainable Parameters = 200
Optimizer: Adam
Activation Function: ReLU (Dense Layer), Softmax
WDE-CNN-LSTMEpochs: ‘Automatically set’
Batch Size = 128
Total Number of Parameters = 25 , 248 , 602
Trainable Parameters = 25 , 248 , 402
Non-trainable Parameters = 200
Optimizer: Adam
Activation Function: ReLU (Conv Layers and Dense Layer), Softmax (Final Layer)
Table 4. Evaluation metrics of CNN model.
Table 4. Evaluation metrics of CNN model.
ScoreClassPrecisionRecall F 1 -Score
200.971.000.98
11.000.970.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
00.980.980.98
310.970.990.98
20.990.960.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
00.970.980.98
10.980.970.98
520.940.970.96
30.870.890.88
40.910.850.88
Micro Avg.0.980.980.98
Macro Avg.0.930.930.93
Weighted Avg.0.930.930.93
Accuracy0.93
Table 5. Evaluation metrics of LSTM model.
Table 5. Evaluation metrics of LSTM model.
ScoreClassPrecisionRecall F 1 -Score
200.980.990.98
10.990.970.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
300.980.990.99
10.970.990.98
20.990.960.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
500.970.980.98
10.980.980.98
20.950.970.96
30.870.890.88
40.910.850.88
Micro Avg.0.940.940.93
Macro Avg.0.940.940.93
Weighted Avg.0.980.980.98
Accuracy0.94
Table 6. Evaluation metrics for the proposed WDE-LSTM model.
Table 6. Evaluation metrics for the proposed WDE-LSTM model.
ScoreClassPrecisionRecall F 1 -Score
200.971.000.98
10.990.970.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
300.980.990.99
10.980.990.98
20.980.970.97
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
500.980.970.98
10.970.990.98
20.960.970.96
30.890.860.88
40.880.890.88
Micro Avg.0.940.940.94
Macro Avg.0.940.940.94
Weighted Avg.0.940.940.94
Accuracy0.94
Table 7. Evaluation metrics for the proposed WDE-CNN-LSTM model.
Table 7. Evaluation metrics for the proposed WDE-CNN-LSTM model.
ScoreClassPrecisionRecall F 1 -Score
200.971.000.99
11.000.970.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
300.980.990.99
10.980.990.98
20.980.970.98
Micro Avg.0.980.980.98
Macro Avg.0.980.980.98
Weighted Avg.0.980.980.98
Accuracy0.98
500.980.980.98
10.980.990.98
20.970.980.98
30.900.930.91
40.930.880.90
Micro Avg.0.950.950.95
Macro Avg.0.950.950.95
Weighted Avg.0.950.950.95
Accuracy0.95
Table 8. Comparison of evaluation metrics by scores and models.
Table 8. Comparison of evaluation metrics by scores and models.
ScoreModelPrecisionRecall F 1 -ScoreAccuracy
2CNN98.2298.1998.1991.19
LSTM98.1398.1198.1298.11
WDE-LSTM98.2198.1898.1898.18
WDE-CNN-LSTM98.5298.4998.4998.00
3CNN97.9597.3597.9397.94
LSTM98.0898.0498.0498.04
WDE-LSTM98.0498.0498.0498.04
WDE-CNN-LSTM98.2698.2698.2698.26
5CNN93.3393.3393.3293.35
LSTM93.5193.5393.5093.53
WDE-LSTM93.5093.4893.4893.55
WDE-CNN-LSTM95.2195.2195.2095.21
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sorour, S.E.; Alojail, A.; El-Shora, A.; Amin, A.E.; Abohany, A.A. A Hybrid Deep Learning Approach for Enhanced Sentiment Classification and Consistency Analysis in Customer Reviews. Mathematics 2024, 12, 3856. https://doi.org/10.3390/math12233856

AMA Style

Sorour SE, Alojail A, El-Shora A, Amin AE, Abohany AA. A Hybrid Deep Learning Approach for Enhanced Sentiment Classification and Consistency Analysis in Customer Reviews. Mathematics. 2024; 12(23):3856. https://doi.org/10.3390/math12233856

Chicago/Turabian Style

Sorour, Shaymaa E., Abdulrahman Alojail, Amr El-Shora, Ahmed E. Amin, and Amr A. Abohany. 2024. "A Hybrid Deep Learning Approach for Enhanced Sentiment Classification and Consistency Analysis in Customer Reviews" Mathematics 12, no. 23: 3856. https://doi.org/10.3390/math12233856

APA Style

Sorour, S. E., Alojail, A., El-Shora, A., Amin, A. E., & Abohany, A. A. (2024). A Hybrid Deep Learning Approach for Enhanced Sentiment Classification and Consistency Analysis in Customer Reviews. Mathematics, 12(23), 3856. https://doi.org/10.3390/math12233856

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop