Robustness of Image-based Malware Detection against Adversarial Attacks
In this project, we aim to assess the robustness of image-based malware detection against adversarial attacks. To that end, we design and construct a lightweight CNN image-based malware detection model to detect Windows PE malware, based on the family it belongs to. It is worth mentioning that adversarial attacks, which are relatively easy to apply to images in the computer vision domain, are extremely difficult to apply to transformed images of malware samples due to the high risk of breaking functionality when converting back to binary. We select only attacks that preserve malware functionality and compare our approach against the state‑of‑the‑art MalConv classifier under both white‑box and black‑box settings, performing four distinct adversarial attacks that maintain binary integrity.
Project Abstract: We present a reproducible framework that transforms Windows PE binaries into grayscale images, trains a compact CNN for family‑level malware classification, and systematically evaluates its resilience to four functionality‑preserving adversarial attacks, benchmarking against MalConv to demonstrate improved robustness with minimal overhead.
Project Description
Motivation
Machine and deep learning models have achieved outstanding performance in malware detection, but their security‑oriented deployments demand trustworthiness against adversarial manipulations. Prior work shows that carefully crafted perturbations can drastically degrade classifier accuracy, and simple transferability means even black‑box attacks can succeed. We are motivated to explore whether an image‑based representation of PE binaries can intrinsically resist these adversarial threats while preserving malware functionality.
Problem Scope
- Domain constraints: Byte perturbations must preserve PE functionality (≤10% size increase).
- Adversarial settings: Both white‑box (full model knowledge) and black‑box (query‑only) scenarios.
- Evaluation target: Compare resilience of our CNN image‑based classifier against MalConv under four attacks.
Key Contributions
- Design and implement a lightweight CNN for grayscale‑image malware classification with high accuracy and low overhead.
- Perform four functionality‑preserving adversarial attacks (Random/BENIGN Byte Append, Random/BENIGN Byte FGSM) under both attack models.
- Benchmark robustness and performance against MalConv, demonstrating a significant drop in MalConv’s evasion rates for image‑based under most attacks.
Background
PE File Format & Visualization: Windows PE binaries (.text, .rdata, .data, .rsrc) are read as uint8 arrays and reshaped into grayscale images for classification, revealing consistent family‑level patterns.
Adversarial ML in Malware: Attacks must craft byte perturbations imperceptible to functionality; prior image‑based attacks often break executables due to non‑localized pixel shifts.
Threat Model
An adversary aims to evade ML‑based malware classifiers by adding ≤10% byte perturbations to Windows PE samples, under two knowledge regimes:
- White‑box: Full access to model architecture, parameters, and training data.
- Black‑box: Only input‑output query access; no internal details.
Goal: Force misclassification of malicious samples as benign while ensuring binary functionality is preserved.
Proposed Approach
- Data Preprocessing: Convert PE binaries to normalized 100×100 grayscale images.
- CNN Training: Train a 3‑block convolutional network (16→32→64 filters) with (3×3) kernels, max‑pool (2×2), two dense layers, 100 epochs, batch size 32.
- Adversarial Generation: Apply four attacks preserving functionality:
- Brute‑Force Random Byte Append
- Brute‑Force Benign Byte Append
- Random Byte FGSM
- Benign Byte FGSM
- Robustness Evaluation: Measure evasion rates on our classifier and MalConv over held‑out 20% validation split.
Key Findings
Include Table 1: Classification accuracy and overhead comparison (Image‑based vs MalConv).
Include Table 2: Evasion rates for four adversarial attacks on both classifiers.
- Classification Accuracy: Our CNN achieved 96.30% accuracy vs 95.29% for MalConv, with similar training time (~12 min 50 s) and lower RAM usage (36.8% vs 43.6%).
- Random Append Attack: MalConv evasion 54.66% vs 5.66% image‑based.
- Benign Append Attack: MalConv 44.22% vs 5.11% image‑based.
- Random FGSM: MalConv 55.18% vs 100% image‑based (vulnerability to gradient‑based in embedding space).
- Benign FGSM: MalConv 55.19% vs 46.69% image‑based.
Conclusion
Our lightweight CNN image‑based classifier not only matches MalConv in detection performance but also dramatically reduces evasion rates under byte‑append attacks. Gradient‑based FGSM remains a challenge due to end‑to‑end differentiability in embedding space. Future work will explore append/perturbations in non‑terminal sections and defense mechanisms tailored to image representations.
Publications:
- Yassine Mekdad, Faraz Naseem, Ahmet Aris, Harun Oz, Abbas Acar, Leonardo Babun, Selcuk Uluagac, Güliz Seray Tuncay, and Nasir Ghani. “On the robustness of image-based malware detection against adversarial attacks.” In Network Security Empowered by Artificial Intelligence, pp. 355-375. Cham: Springer Nature Switzerland, 2024.[pdf] [bibtex]
Presentations and Talks:
- Harun Oz, Faraz Naseem, Ahmet Aris, Abbas Acar, Guliz Seray Tuncay, and A. Selcuk Uluagac. “Poster: feasibility of malware visualization techniques against adversarial machine learning attacks.” In 43rd IEEE symposium on security and privacy (S&P). 2022.[poster] [bibtex]