In today’s ever-changing threat landscape, artificial intelligence (AI) techniques have become a key technology for cybersecurity researchers and practitioners. Integrating AI into cybersecurity curricula is increasingly necessary to better prepare the future cybersecurity workforce. However, this is also a considerable challenge. AI and cybersecurity are both difficult areas of study, appeal to different types of students, and individually require significant commitments within a fixed number of credit hours. This project proposes to address the following questions: how can AI be integrated into an already packed cybersecurity curriculum, and how can this be done without significantly increasing the student workload. This project will focus on the educational modules that will be designed to be highly adaptable, extending the project’s impact to any potential computer scientist who does not conform to the stereotypes and normative expectations of the field. The results of this study have the potential to expand and redefine who pursues cybersecurity, as well as how we integrate it into the curriculum. This project will pursue three target goals designed to address the difficult problem of integrating AI effectively into cybersecurity curricula while carefully attending to the effect of these integrations. First, the project team will design highly adaptable AI curricular modules that can easily be leveraged by non-AI cybersecurity educators and inserted into existing cybersecurity courses, with each module associated with a suite of potential insertion points. Second, these modules will include machine learning (ML) and natural language processing (NLP). Topics such as NLP have been shown to increase the appeal of computer science for diverse populations. Last, the project team will evaluate the effectiveness of these modules at the curriculum level within existing cybersecurity programs that serve a diverse population.

The specific research questions to be addressed will include: (1) How did instructors perceive the content’s efficacy? (2) What factors influence an instructor’s participation in curricular change? (3) What are the obstacles or considerations for AI integration into cybersecurity curricula? (4) How do students perceive content efficacy? (5) What (if any) influence do the modules have on student interest and engagement? This project answers the call for advances in education research at the intersection of cybersecurity and AI through a fully interdependent and integrated approach that draws on the expertise of the team. It also leverages widely accepted theoretical frameworks and methods to evaluate and assess the effectiveness of the work to ensure high impact and potential for future scale-up.

This project is supported by a special initiative of the Secure and Trustworthy Cyberspace (SaTC) program to foster new, previously unexplored, collaborations between the fields of cybersecurity, artificial intelligence, and education. The SaTC program aligns with the Federal Cybersecurity Research and Development Strategic Plan and the National Privacy Research Strategy to protect and preserve the growing social and economic benefits of cyber systems while ensuring security and privacy.

Project Description

Enabling Artificial Intelligence into Cybersecurity Education: A Comprehensive Data-driven Approach

Motivations & Scope

Cybersecurity researchers and practitioners have determined that modern cybersecurity methods are increasingly using AI techniques. However, the cybersecurity curriculum has not been updated to integrate such topics or techniques. Therefore, we find it imperative to find relevant topics in AI that are most commonly used within cybersecurity in order to form an explicit AI module that integrates AI concepts into cybersecurity.

We ask the questions: What are the most common correlations between AI and cybersecurity topics? What are suitable AI topics to integrate in a module within the context of cybersecurity?

We follow a methodology that considers over 2,000 research papers from top-tier cybersecurity conferences and journals (e.g., NDSS, USENIX, ACM CCS, IEEE S\&P, and IEEE TIFS). We extracted AI-related keywords and used Natural Language Processing (NLP) techniques to identify AI concepts. Furthermore, we extracted cybersecurity keywords from a cybersecurity pilot course, incorporating additional keywords from cybersecurity academic textbooks. We used the extracted keywords to create a co-occurrence matrix. Finally, we created a specific AI module using the co-occurrence matrix for a cybersecurity course within an academic institution.

Data Collection

For our data collection, we gathered over 2,000 cybersecurity research papers from top cybersecurity conferences and journals (e.g., NDSS, USENIX, ACM CCS, and IEEE S&P). We gathered papers concerning computer security topics. After we collected our raw data, we adjusted a pre-trained model to filter the papers that have a higher probability of using AI topics. We then divide the papers into three distinct categories: AI-positive, AI-neutral, and AI-negative. For our analysis and results, we only considered the AI-positive papers.

Data Processing Pipeline

We process our collection of raw data by passing it through the following pipeline. The scripts mentioned below are responsible for data cleaning, processing, and analysis. This pipeline can be used with a different collection of research papers than our own for analysis.

  • AI Parser: This script processes the PDF files into plaintext RAW files.
  • AI Cleaner: This script cleans the RAW files by removing stopwords and unnecessary data. It then exports CLEAN files.
  • AI Helper: This code contains helper methods used by the main and processor scripts.
  • AI Check: This code uses an API key to check if a text is related to AI.
  • AI Terms: This code contains several lists consisting of AI and cybersecurity terms.
  • Latent Dirichlet Algorithm (LDA): This analysis reads the CLEAN files to check if they contain AI terms. If a CLEAN file contains AI terms, the algorithm counts the number of AI and cybersecurity keywords and exports them into an Excel file.

Once the analysis is complete, an Excel file will be created named “Results.xlsx.” The files contain a list of documents containing machine learning-related terms after analysis is complete, and the machine learning terms used in the paper.

These Python scripts form our methodology and analysis of cybersecurity research papers regarding their implementation of Artificial Intelligence content and applications. In the figure below we illustrate our complete methodology in this paper.

 

Overview of our Methodology

Figure: Diagram of our methodology. It contains 5 phases that are executed chronologically.

 

 

Designing and Evaluating Curricular Modules for Integration of AI into Cybersecurity Education

Motivations & Scope

Artificial Intelligence (AI) has become a fundamental tool for cybersecurity researchers and practitioners. It is frequently used to address major security problems such as supply chain attacks, ransomware threats, and social engineering. Yet, the current cybersecurity curriculum still suffers from the absence of AI resources, particularly the detailed understanding of the appropriate AI mechanisms.

To address this, we consider a methodology where we design an AI lecture module that can be integrated into any cybersecurity course. We then present the module to several cybersecurity courses in our institution and assess their performance before and after the lecture. Our AI lecture is composed of a pre-lecture survey, the AI module, live AI examples, and a post-lecture survey.

Data Collection

For our data collection, we distributed two surveys within a lecture titled “A Lecture on Artificial Intelligence, Machine Learning, and Deep Learning: From Theory to Practice”. We collected data using the Qualtrics platform and distributed the surveys via anonymous links or anonymous QR codes. The first survey contains questions about demographics, cybersecurity, AI models, and their corresponding performance metrics. The second survey contains questions regarding demographics, AI metrics, AI models, Deep Learning, and AI training.

Analysis Pipeline

In our data analysis, we use several text preprocessing and Natural Language Processing techniques to ensure that our analysis is accurate and without bias. To that end, we implement 4 steps to gather our data and convert the survey responses into quantifiable data. These steps include our data collection, lecture extraction, survey analysis, and feedback analysis.

  • Lecture Extraction: This code extracts all the text in a PDF file using PyPDF2. Convert our lecture (or your own AI module) into a PDF and rename the file to Final_AI_ML_Lecture.pdf.
  • Topic Extraction: This code extracts topics and displays them as a topic distribution. Rename all survey CSV files to match the ones used in the code.
  • Survey Analysis: This code analyzes and scores the student performance before and after the lecture. Rename all survey CSV files as necessary (including the CSV with correct answers). Remove any extra columns provided by Qualtrics, stopping at the column with participant IDs. Remove any rows that contain unnecessary data provided by Qualtrics, stopping at the question number row. Make sure to download the extracted lecture text file.
  • Feedback Analysis: This code assesses feedback analysis using sentimentality models. Make sure to remove any unnecessary data provided by Qualtrics as done in the previous step. Rename any CSV files as necessary.

Through the implementation of these four steps and scripts, we analyze the reliability and efficacy of our Artificial Intelligence lecture module in delivering knowledge to cybersecurity students. Below we visualize our implemented methodology.


AI Lecture Analysis Methodology

Figure: Diagram of our implemented methodology. Note that while not included here, we perform a feedback analysis separately.

  1.  

AI and Cybersecurity Analysis

Co-occurrence Matrix Visualization

We include a way to visualize the results from the “Results.xlsx” Excel file. This method uses Jupyter notebook.

Run Concurrence.ipynb to visualize results from the Excel file.

These results are organized as “Computer Security Terms” and “Machine Learning Terms”.

 

Co-occurrence Matrix results

Figure: The visualization of our co-occurrence results as a matrix.

Insight: Our findings reveal the various correlations between Cybersecurity and AI keywords present in recent research. We find the most commonly shown correlations and organize them as a matrix to represent the various co-occurrences. From this matrix, we find that the most frequently seen co-occurrences are between the AI and Cybersecurity keywords, model and evaluation, respectively. By using this matrix, we can create a dedicated lecture module that can be implemented into any cybersecurity course to enhance the quality of the course and bolster the students’ knowledge of AI.

Lecture Survey Analysis

Topic Distribution

These results are the topic distribution based on the pre-lecture and post-lecture surveys.

 

Topic distribution results

Figure: Our pre-lecture and post-lecture topic distribution results.

 

Survey Grading

In this work, we consider only the survey-to-lecture analysis in our paper regarding student performance. However, here we show the results for the scores that students achieved after grading their answers for correctness.

  • Survey Results:

Sentiment Analysis Results

Figure: Results of student performance based on correctness on the post-lecture survey.

 

Feedback Analysis

These results are based on the last questions present in the post-lecture survey regarding the lecture.

  • Feedback Results:

Sentiment Analysis Results

Figure: Our feedback results as sentiment analysis scores.

Insight: Our findings show the efficacy of our AI lecture module. Students began using more AI-centric keywords after the lecture module, and they achieved high scores in the survey grading. Further analysis also reveals that the students saw an improvement in lecture-to-survey scores of up to 30% in the post-lecture results. Additionally, students received the lecture module well. In our feedback analysis, we find that we had high positive sentiment analysis scores and low negative scores. The relatively high neutral analysis scores appear to be due to the comments regarding lecture engagement by students. We will heavily consider their feedback to improve our lecture for future reproducibility.

Project Team Members

Selcuk Uluagac
Eminent Scholar Chaired Professor
Yassine Mekdad
Graduate Student Researcher
Alejandro Perez Pestaña
Undergraduate Student Researcher
Fernando Brito
Undergraduate Student Researcher
Abbas Acar
Post-Doctoral Associate
Mark Alan Finlayson
Affiliated Associate Professor
Monique Ross
Affiliated Associate Professor

Publications:

  • Yassine Mekdad, Alejandro Perez Pestaña, Abbas Acar, Mark A. Finlayson, Monique Ross, and A. Selcuk Uluagac. “Integrating Artificial Intelligence into Cybersecurity Education: A Comprehensive Data-Driven Approach.” TBD (2025) [pdf] [bibtex]
  • Fernando Brito, Yassine Mekdad, Monique Ross, Mark A. Finlayson, and A. Selcuk Uluagac. “Enhancing Cybersecurity Education with Artificial Intelligence Content.” Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1. (2025) [pdf] [bibtex]

Presentations and Talks:

  • TBD [poster]