1. Introduction
Due to the accessibility of high-accuracy yet simple-to-use image-altering tools, digital photographs are frequently the target of modification. Image forensics is required to determine the image’s origin, processing history, and veracity. There are numerous ways [
1,
2,
3] to identify the image source device. Since a fake image is typically constructed using two or more photographs, the mismatch of sources aids in identifying image forgery. The majority of fake images use several processes to appear genuine. Finding fake images is made simple by the identification of actions like median filtering [
4,
5], sharpening [
6,
7], and resizing [
8,
9,
10]. It is also possible to concurrently identify the image forgery processes using various universal methods [
11,
12,
13,
14,
15,
16,
17]. Some schemes [
18,
19,
20] are discussed to detect image forgery rather than for detecting an operator. Schemes [
18,
19] also trace the external objects in the fake image. Method [
20] is suitable both for splicing and copy-move forgery detection.
Nonetheless, general techniques identify one operator processing accurately on the image. The image is subjected to multiple operations in a practical scenario. In this paper, a series of procedures may be precisely identified to provide the image processing timeline. There are not many methods [
21,
22,
23,
24,
25,
26] that can identify the operations and their series. Although performance varies depending on the operation, JPEG compression—the most popular format—plays a crucial role in forensic investigation. When JPEG compression is considered, the performance of recent approaches is degraded.
Convolutional neural networks have shown favorable outcomes in many applications in the current period of a deep learning network. The identification of resizing, brightness changes, median filtering, general image alteration, manifold JPEG compression, image forgery, etc., uses a deep network. For the first time, a deep network was introduced [
11] to detect additive white noise, scaling, median, and Gaussian filtering. In particular, the starting layer of the deep network uses a constraint. Outcomes from the experiment are provided for 227 × 227 size pictures. However, no results were offered for images of JPEG-compressed and small sizes. With the aid of a better CNN model [
12], the proposed concept of constraint is further expanded. A constraint convolutional layer is trailed by four groups of layers in the enhanced deep network, each of which has a convolutional layer, batch normalization layer, rectified linear unit layer, and pooling layer. Additionally, multiple SoftMax classification layers are employed. A randomized tree classifier is utilized to categorize the results. By deducting the outcome value from the filter window’s center value, constrained convolutional layer filters to guess mistakes. Each iteration’s training phase involves enforcing the constraint. When two operations are operated on an image in succession, even on a high-resolution image, the performance of the CNN model generally suffers. Boroumand and Fridrich [
13] proposed a deep network and multilayer perceptron exemplary to identify four operations— denoising, tone modification, low-pass, and high-pass filtering. In the proposed CNN, eight convolutional layers are employed. Moments are calculated in the end stage of the deep network and are used by the multilayer perceptron to classify the images. The manual feature extraction method is contrasted with the previously discussed technique. Only 512 × 512 size images are covered in the experiments. The calculation of the out-of-bag error by Li et al. [
14] allowed them to choose a few sub-models from the SRM. The selection procedure may significantly lower the feature dimension. Eleven image processes, including spatial filtering, image enhancement, and JPEG compression, are examined in the results. The proposed method also promises the successful identification of four anti-forensic procedures, including median filtering, resampling, contrast enhancement, and JPEG compression. However, for small-size images, the performance suffers. A deep network with two convolutional layers was introduced by Singhal et al. [
15] to detect seven different sorts of operations. The deep network uses the discrete cosine transform factors of the median kernel residual as the input array. The convolutional layer employs large dimension kernels. The Siamese network was used by Xue et al. [
16] to identify activities such as adding text, an emblem, and a black chunk to an image and operators like Gamma correction, Gaussian noise, and image resampling. The Siamese network used ResNet-18 and AlexNet. Uncompressed images are taken into account in experiments. Image procedures like median filtering, image scaling, and histogram equalization were discovered by Barni et al. [
17]. Two neural networks are used to extract the features [
11,
27]. The robust characteristics are chosen from the CNN network using a random feature selection strategy, and a support vector machine classifier is then used to determine the kind of attack.
Detecting the order of image operations is a crucial concern when comprehensively analyzing the history of image processing. Various efforts have been made to determine the correct sequence of operations applied to an image [
21,
22,
23,
24,
25,
26] to address this challenge. These research endeavors aim to develop methodologies and techniques to accurately and automatically identify the specific order in which image processing operations were applied. Researchers and practitioners can gain valuable insights into the image’s processing history by successfully determining the operation order. This information is essential for understanding the transformations and manipulations that an image has undergone, which is particularly important in fields like forensics, image analysis, and restoration.
The cited references [
21,
22,
23,
24,
25,
26] likely represent a collection of previous works in image processing and forensics that have contributed to the ongoing efforts to solve this challenging problem. Exploring these prior studies helps build on existing knowledge and lays the foundation for further image operation order detection advancements. As researchers continue to investigate and refine these techniques, they move closer to achieving more accurate and reliable solutions for unraveling the history of image processing operations. A framework based on mutual knowledge is proposed in [
21,
22] to analyze the causes of the operator series order’s non-detection. Some operator series are impossible for the algorithm to recognize. The prior approach fails to identify JPEG-compressed images. Comesaa [
23] has discussed operator order detection’s theoretical potential. In order to estimate the order of operations, Bayar and Stamm [
24] deliberated on a deep network that contains a constrained convolution layer. Liao et al. [
25] discussed a dual-stream deep network to identify the operators and their corresponding sequences. The approach claimed the detection of an operator with unknown parameters exhausting transfer learning, though tailored preprocessing is necessary to employ for a particular operator. Cho et al. [
26] proposed detecting operators and their respective orders. In this scheme, tailored preprocessing is not required, although the detection performance can be improved by considering some modifications in the deep network.
In this paper, a scheme that can guarantee improved performance on two-operator chain series detection is proposed. The following are key points of the proposed network’s specific contributions:
The proposed scheme can detect an operated image of two-operator and the operation series. The successful detection of many operations include Gaussian blurring, median filtering, unsharp masking, and image upscaling;
The transposed convolutional layer is considered instead of the convolutional layer to reduce the classification error. As the proposed scheme is suitable for challenging scenarios like the usage of small-size images;
The bottleneck strategy helps to lower the training parameters. Therefore, the proposed method using a bottleneck strategy can insert more transposed convolutional layers into the convolutional neural network;
The pooling layer is avoided among the convolutional layers to save the most statistical data possible. Subsequently, it might reduce the computing expense at the overhead of pertinent inherited operation impressions;
Information fusion is applied to the features of trained networks using multiple optimizers. Information fusion enhances performance drastically;
Without specific preprocessing requirements, the proposed method can guarantee improved performance in demanding situations with low-resolution compressed images and two-operator series manipulation.
The rest of the paper is organized as follows. A problem for two-operator manipulation detection in various contexts is formulated in
Section 2.
Section 3 explains the proposed scheme. In
Section 4, a comprehensive experimental analysis is carried out along with a comparative analysis. In
Section 5, the benefits and limitations of the proposed scheme are emphasized as conclusions.
3. The Proposed Scheme
Deep networks have established their value in solving various problems, including image categorization, fake face identification, and image forgery detection. The detection of one and two operators in operated images is discussed in this paper, along with a resilient deep architecture and information fusion. In a compressed and uncompressed scenario, the proposed scheme performs well. As with some earlier methods [
15,
25], the proposed CNN design can eliminate the requirement of any preprocessing layer. According to the operator, an exclusive preprocessing was necessary for the earlier methodologies, but this needed to be practicable and limited the network performance for specific operators. When two processes are performed simultaneously, the second operator may reduce the artifacts of the first operator. Various pairs of operators are considered to examine how operations on the BOSSBase [
28] image database behave as in
Figure 3. Five operators are taken into account: Gaussian blurring (GAB_1.0), median filtering of filter size 5 × 5 (MDF_5 × 5), unsharp masking (USM_3.0), and upscaling (USL_1.5). The covariance plot of the entropy is considered for uncompressed and compressed images for two scenarios to visualize the behavior of five categories of images. The covariance plot estimates the power spectral density (power/frequency) of a discrete-time signal (entropy values for our problem) discovered by the covariance approach. Each column’s power spectral density is calculated separately. The estimate for the power spectral density is calculated using an autoregressive model. Uncompressed images are considered in the first row as per the discussion of
Section 2.1. JPEG compressed images are taken with QF1 = 75 and QF2 = 85 in the second row as per the discussion of
Section 2.2. The gap between some lines is less, and overlapping is more in compressed images (first image of the second row) than in uncompressed images (first image of the first row) for operators
u = GAB_1.0 and
v = MDF_5 × 5. This fact is also reflected in the experimental analysis, i.e., the classification error is more in compressed images compared to uncompressed images. A similar behavior is also followed by operators
u = USM_3.0 and
v = USL_1.5. Therefore, a common solution can be proposed to deal with this issue.
The proposed architecture is better able to withstand the operator series issue. CNN uses a number of layers and kernels to divide the different types of images into categories.
Figure 4 displays the framework of the proposed CNN. The resulting feature map from the transposed convolution has a higher spatial dimensionality than the feature map from the input image. The standard convolution reduces the input dimension by employing sliding convolutional kernels. By flattening the input and output, we can represent the convolution operation as Z = M*X + S, where Z is the output, M is the convolution matrix, and S is the bias vector. These parameters are obtained from the layer’s weights and biases.
On the other hand, the transposed convolution is employed to expand the input using sliding convolutional kernels. The process involves adding padding to all the edges of the input, where the padding size is determined by subtracting one from the kernel’s edge size. This is performed to achieve upsampling instead of downsampling. When both the input and output are flattened, the transposed convolution can be equivalently expressed as Z = MT*X + S. A conventional convolution layer’s backward function can be compared to this process.
The transposed convolutional (TConv) layer overrode the results of a typical convolutional layer by retaining the connection pattern. Thus, the original input is returned, in contrast to how a typical convolutional layer functions. The padding is applied to the result rather than the input image in the transposed convolutional layer. Padding is applied to the output in the transposed convolution rather than the input in regular convolution. Transposed convolution is the regular convolution reversed, but only by dimension, not by value. The proposed CNN has twelve transposed convolutional layers. The bottleneck approach is followed. In some applications, such as image steganalysis [
31], the bottleneck technique produces superior results compared to the standard approach.
Figure 5 shows an abstract representation of the bottleneck technique. The batch normalization (Batch Norm) and the rectified linear unit (ReLU) layers trail two consecutive TConv layers. Point-wise convolution is carried out in the first TConv layer, and depth-wise convolutions are performed by the second TConv layer in two successive TConv layers. When used with a depth-wise convolution in steganalysis [
32], 1 × 1 point-wise convolution enhances the outcomes. Using a 1 × 1 filter and a 3 × 3 filter in that order helps lessen the computational complexity for TConv layers. Experiments also reveal a performance improvement. The percentage detection error using TConv 1 × 1 is less than (approximately 2%) TConv 3 × 3 while considering a 1 × 1 filter and a 3 × 3 filter order compared to a 3 × 3 filter and a 1 × 1 filter order. TConv 1 × 1 training takes less time than TConv 3 × 3 training, and the percentage detection error also decreases. Therefore, using the bottleneck technique has two substantial advantages.
The network architecture consists of transposed convolution layers, specifically the first and second, with eighty filters each. These filters have sizes of 1 × 1 and 3 × 3, respectively. The number of filters in each block matches the 1 × 1 filter size in the 3 × 3 transposed convolutional layers. The transposed convolutional layers have a stride of one.
To enhance the training process, several techniques are employed. First, the network initialization sensitivity is reduced, which helps stabilize the training. Additionally, a Batch Norm layer is utilized, which accelerates the training rate and reduces the inner covariant shift [
33]. The Batch Norm layer updates the learning parameters based on the mean and variance of each mini-batch during training. Once the training is complete, the Batch Norm layer’s final mean and variance values predict unseen data.
A ReLU layer [
34] is applied after the Batch Norm layer to improve the network’s performance. This layer substitutes negative values with zeros, which enhances the network’s ability to learn and generalize.
The proposed network uses one global average pooling (GAP) layer because the internal statistical information details are vital, and the image size is small. The GAP layer increases accuracy in steganalysis [
35,
36]. One element is obtained using GAP from each feature map. The activation function follows the GAP layer. The activation function extracts features from the trained deep network while considering different optimizers. The information fusion process combines the feature vectors obtained from the activation function, as illustrated in
Figure 6.
In the feature vector extraction phase, three different optimizers are employed: Optimizer 1 (Adam), Optimizer 2 (RMSprop), and Optimizer 3 (SGDM). The training process runs for 50 epochs to prevent any unfair bias towards unseen data, and the data are shuffled before each epoch. A learning rate of 0.01 is considered. These optimizers are used to train the network and extract essential features from the data.
The Global Average Pooling (GAP) layer is utilized to process the feature vectors, proceeded by SoftMax and classification layers. The experimental paper demonstrates that including the GAP layer reduces the percentage classification error by up to 2.7%. Additionally, the GAP layer preserves the operation fingerprints and mitigates the overfitting problem [
37]. Although multiple pooling layer experiments were conducted, only one GAP layer is considered in the final experimental analysis. Given that the last transposed convolutional layer in the proposed CNN has 24 filters, the GAP layer generates 24 features as a result.
The fully connected layer is critical in consolidating all the knowledge acquired from the preceding layers. It combines the extracted features to make a comprehensive decision. The SoftMax function is applied to handle the output of the fully connected layer, assigning probabilities to each class. The crucial characteristic of the SoftMax function is that the total likelihood across all categories must equal 1, ensuring that it represents a valid probability distribution. Consequently, the classification layer employs a cross-entropy loss function to determine the exclusive class for classification.
The importance of CNN’s weight initialization cannot be overstated, as it significantly impacts the network’s overall performance. In the prior stage, random values are collected for network initialization. However, this method is impractical, leading to inconsistent performance across different runs due to varying weight initializations. Glorot and Bengio [
38] introduced a weight initialization strategy that improves performance and speeds up convergence to address this issue. This strategy works particularly well with less dense networks like the proposed CNN. The weights are initialized based on the number of inputs and hidden nodes, promoting more stable training and better generalization. For classification purposes, an SVM (Support Vector Machine) classifier is employed, as illustrated in
Figure 7. This classifier takes the features extracted by CNN and uses them to classify the input data into specific classes.
4. Experimental Analysis
In this paper, a robust scheme is proposed to detect processed images of single and two operators with their sequence. Numerous experiments are run to verify the resilience and adaptability of the proposed network. A total of twenty-six thousand images are taken in equal proportion from BOSSBase [
28], UCID [
39], LIRMM [
40], and never-compressed (NC) [
41] image databases, which contain 10,000, 1338, 10,000, and 5150 uncompressed color images, respectively, to build the experimental dataset. First, the 256 × 256 pixel middle block of each image is used, and then sixteen non-overlapping blocks with a dimension of 64 × 64 pixels are produced. In the end, 416,000 patches with a dimension of 64 × 64 pixel images are created. Each class uses twenty-four thousand images for training and six thousand images for validation. Tests are conducted on fifteen thousand image patches. The image patches that are used in training are never reused in testing. In the experimental paper, five operators are taken into account: Gaussian blurring (GAB_X), median filtering of filter sizes 3 × 3 and 5 × 5 (MDF_3 × 3, MDF_5 × 5), unsharp masking (USM_X), and upscaling (USL_X) with various X parameters. The operator is applied to the image while taking into account symmetric padding. It is crucial to note that thirty thousand image patches of size 64 × 64 pixels are chosen randomly to obtain unbiased findings for each operator. Using an NVIDIA GTX2070 GPU and 32 GB RAM, the experiments are conducted. As discussed in
Section 2.1, the C1 class stands refers to the original picture, the C2 class refers to images processed by operator
u, the C3 class refers to images processed by operator
v, the C4 class refers to images processed by operator
u followed by operator
v, and the C5 class refers to images processed by operator
v followed by operator
u. Likewise, the classes are defined for compressed images, as described in
Section 2.2. The proposed scheme can categorize two-operator processed images in their order. The proposed scheme’s classification error is less than the existing schemes. T1, T2, and T3 signify when the proposed CNN is trained using Adam, RMSprop, and SGDM optimizers. The SoftMax classifier is utilized for T1, T2, and T3. TF represents the features of T1, T2, and T3 when they are concatenated. The SVM classifier with a linear kernel is considered for TF.
Table 1 displays specific data for the percentage detection error for various operator pairings. In
Table 1, the following operators are taken into consideration: Gaussian blurring with standard deviation 0.7 (GAB_0.7), 1.0 (GAB_1.0), median filtering of filter size 3 × 3 (MDF_3 × 3), and 5 × 5 (MDF_5 × 5). The mean percentage detection error (MPDE) and standard deviation of percentage (STD) detection error are also defined for T1, T2, T3, and TF. In the case of GAB_1.0, the classification errors are lesser, though GAB_0.7 is high. Due to the low blurring of images, misclassification between different classes increased. The TF provides the least MPDE, and the standard deviation is the lowest for TF.
Table 2 considers several operators, including GAB_0.7, GAB_1.0, and unsharp masking with a 2.0 radius (USM_2.0) and 3.0 radius (USM_3.0). Among all the scenarios, the classification error is the highest in class C5, which corresponds to the combined application of USM and GAB (USM GAB). This observation is supported by the confusion matrix of GAB_1.0 and USM_3.0 for the TF (transfer function), as shown in
Figure 8. The confusion matrix reveals that a significant number of 1290 images belonging to class C5 are misclassified as class C1 (GAB_1.0). Similarly, 579 images from class C1 (GAB_1.0) are misclassified as class C5. However, despite these classification errors, the information fusion technique (TF) still yields the best overall results.
Table 3 shows the analyses of various operators, including GAB_0.7, GAB_1.0, and upscaling with factors 1.2 and 1.5. Among these scenarios, TF (transfer function) consistently outperforms other operators and demonstrates superior stability in delivering the results. The Mean Percentage Difference Errors (MPDEs) associated with TF are significantly lower than the other operator series presented in
Table 2. Specifically, when considering GAB_1.0 in conjunction with upscaling, the results show that using USL_1.5 as the upscaling factor yields better outcomes than USL_1.2. The upscaling process introduces interpolation, which affects the intrinsic statistical evidence of the data, making the classification process easier. However, it is noteworthy to mention that the classification error is lower when using GAB_1.0 and USL_1.5 together than when using GAB_1.0 and USL_1.2. This is because, as seen in USL_1.5, high-factor upscaling disturbs the intrinsic statistical evidence to a lesser extent than low-factor upscaling, as observed in USL_1.2. The interference caused by high-factor upscaling is comparatively less, leading to improved classification accuracy for the combination of GAB_1.0 and USL_1.5.
Table 4 evaluates three operators: MDF, USM, and USL. As with the findings in
Table 1,
Table 2 and
Table 3, the transfer function (TF) consistently outperforms all other operators across different scenarios, providing more stable and reliable results. In particular, when the combination of
u = MDF_3 × 3 and
v = USM_2.0 is used, TF achieves an impressive MPDE of 3.36%. Despite a slightly larger MDF kernel (
u = MDF_5 × 5) while maintaining
v = USM_2.0, TF still performs well, resulting in a 5.03% MPDE. However, it is worth noting that the outcomes are less favorable when using the combination of
u = MDF_5 × 5 and
v = USL_1.2, which leads to inferior results compared to other manipulation types. Additionally, the classification errors are exceptionally high for the C2 and C5 classes, indicating that these classes pose significant challenges for the evaluated operators and manipulation combinations. Overall,
Table 4 reaffirms the superiority of the transfer function in achieving superior performance and stability across diverse scenarios. Furthermore, it underscores the importance of selecting appropriate operator combinations to optimize classification results effectively.
The JPEG format is frequently used as the default format in a real-world camera setting, as the graphic quality is still decent afterward compression. As a result, three stages are taken into account while detecting the operator series in JEPG images. The image is compressed with the QF1 factor in the first step. In step 2, the operator series is applied to the compressed images. Step 3 involves using JPEG compression with the QF2 quality factor. In
Section 2.2, a comprehensive description of JPEG compression is provided. Compared to uncompressed images, the proposed CNN’s performance can suffer from compressed images. However, performance is acceptable considering the modest image size (64 × 64) and low compression quality aspects.
The outcomes of compressed images are presented in
Table 5. Numerous compression quality variables are considered in the actual scenario, QF1 = QF2, QF1 < QF2, and QF1 > QF2. The difference between quality factors QF1 and QF2 is variable ranging from 5 to 20. The operator’s artifacts are diminished during compression. However, the average percentage of detection error is less than 9% in most cases. In the first case, QF1 = 75 and QF2 = 85, and in the second case, QF1 = 85 and QF2 = 75; the two cases with
u = GAB_1.0 and
v = MDF_5 × 5 are taken into consideration. The percentage of detection error in the first case is 2.32%, as shown in
Table 5, while it is 3.89% in the second. The percentage of detection error of TF is less than 5%, even when QF1 = 90 and QF2 = 70, and
u = GAB_1.0 and
v = MDF_5 × 5. Several other results are also considered in
Table 6, where the results of only TF are considered.
In both the training and testing phases of the aforementioned experimental analysis, the same parameter settings are used for the operators. The operators might be the same but have different parameters in the real world. Experiments are conducted to evaluate the proposed method’s robustness to the different operator specification requirements. Gaussian blurring standard deviations 0.7, 0.8, 0.9, and 1.0 are considered for training. The training set consists of sixty thousand images, of which fifteen thousand are processed with the Gaussian blurring parameters on sixty thousand images. The forty thousand photographs are utilized for testing with 300 Gaussian blurring parameters, with a parameter variety of 0.701 to 0.900, for a total of forty thousand images. In order to detect difficulty with the five-class classification of the two-operator series, 300,000 pictures are utilized for training and 200,000 for testing. In different specifications, the proposed CNN model likewise performs quite well. For operators
u = GAU and
v = UP in
Table 7, two scenarios of uncompressed and compressed images with QF1 = 80 and QF2 = 90 are shown.
The tests focused on detecting two-operator series, excluding single-operator detection, as shown in
Table 8. The images underwent various operations, including Gaussian blurring, median filtering, unsharp masking, and upscaling, and unaltered images were categorized separately. The Mean Percentage Detection Error (MPDE) for uncompressed images of TF was found to be 1.35%, whereas for JPEG images with a QF of 85 of TF, the MPDE was 3.19%. These results indicate that the proposed scheme is effective in detecting two-operator series and well-suited for accurately identifying single-operator processed images.
The proposed CNN is capable of classifying two operators and their series for uncompressed and JPEG small-size images. The in-depth investigation is covered. Now, the outcomes of the proposed method are contrasted with a few different cutting-edge methods. Unlike other conventional models, the CNN model [
12] introduces a confined convolutional layer. Different size filters, including 7 × 7, 5 × 5, and 3 × 3, are employed in the convolutional layer. According to our experimental results, small-dimension kernels are better suited. Bayar and Stamm [
24] introduced a constrained convolution layer to picture residuals for better outcomes. After improvement, the results are better, but there is still a performance gap because there are fewer convolutional layers and larger filters. In the Bayar and Stamm methods [
12,
24], method [
24] has better performance, which is the reason why method [
24] is used for comparison purposes. CNN model with two streams was proposed by Liao et al. [
25]. The two-stream model’s findings were excellent. Another significant development in the research is the notion of operator series detection. The two-stream model is capable of identifying operators with unidentified requirements. Due to the numerous layers and specialized prior processing that must be used to identify, the computational cost is high. Cho et al. [
26] improved the performance using the bottleneck approach. The scheme by Cho et al. did not need customized preprocessing. Due to the transposed convolution and bottleneck method, our proposed scheme is more trustworthy. The two benefits of the bottleneck technique are reducing the learning parameters and allowing for an increase in the network depth. Also, information fusion is performed to reduce the detection errors. The outcomes of several scenarios for uncompressed and compressed images are displayed in
Figure 9 and
Figure 10, respectively.
In a comparative analysis, it has been found that the scheme proposed by Liao et al. [
25] exhibits inferior performance compared to our proposed CNN architecture. Liao’s CNN model includes multiple pooling layers, which, unfortunately, results in the loss of critical statistical data during the downsampling process. Additionally, their approach’s usage of large kernel sizes has negatively impacted the overall performance. In contrast, the scheme introduced by Cho et al. [
26] outperforms Liao et al.’s approach. In Cho et al.’s scheme, the use of pooling layers is omitted, which prevents the loss of crucial statistical information. This absence of pooling layers contributes to better data preservation, ultimately improving the results.
Our proposed CNN architecture considers these observations, aiming to address the limitations of previous approaches. Instead of employing pooling layers, we utilize a transposed convolution, which allows us to retain as many hereditary fingerprints as possible during the upsampling process. This feature is vital for maintaining essential information and ensuring the accuracy of the classification. Furthermore, the information fusion techniques are incorporated into our proposed CNN model to enhance detection performance further. Combining information from multiple layers improves the accuracy and reliability of operator sequence detection.
As a result of these enhancements and adaptations in our proposed CNN architecture, the method’s performance has shown significant improvement when compared to both Liao et al.’s and Cho et al.’s schemes. Using transposed convolution, coupled with information fusion, is crucial in achieving more precise and reliable operator sequence detection. These advancements make our proposed CNN model a promising solution for addressing the challenges of image processing history analysis and hold potential for various real-world applications in the field.
In all other respects, the proposed CNN also performs well. The proposed technique has a lower average classification error, exclusive of particular preprocessing.