A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Liu, Wenyang; Wang, Yi; Wu, Kejun; Yap, Kim-Hui; Chau, Lap-Pui

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.06983 (cs)

[Submitted on 14 Apr 2023]

Title:A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Authors:Wenyang Liu, Yi Wang, Kejun Wu, Kim-Hui Yap, Lap-Pui Chau

View PDF

Abstract:File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence \& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at this https URL.

Comments:	Accepted by AICAS 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)
Cite as:	arXiv:2304.06983 [cs.CV]
	(or arXiv:2304.06983v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.06983

Submission history

From: Wenyang Liu [view email]
[v1] Fri, 14 Apr 2023 08:06:52 UTC (882 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators