Exploring Open Access Security Datasets for Machine Learning: Model Training and Explainability Techniques
Short Description: This thesis investigates the selection of open access datasets within the cybersecurity field to facilitate the exploration of machine learning (ML) models and the application of explainability methods. By identifying and utilizing relevant datasets, the research aims to improve model performance in detecting cyber threats while ensuring that the decision-making processes of these models are interpretable and transparent.
Objectives:
General Objective: To evaluate and select suitable open access datasets in cybersecurity for training machine learning models and applying explainability techniques.
Specific Objectives:
-To assess the quality and relevance of various open access cybersecurity datasets.
-To develop machine learning models using selected datasets and evaluate their performance in threat detection.
– To implement explainability methods to interpret model predictions and enhance user trust.
Methodology:
The research will adopt a systematic approach comprising the following steps:
– Dataset Selection: Conduct a comprehensive review of available open access datasets relevant to cybersecurity, focusing on their attributes, size, and applicability.
– Model Development: Utilize selected datasets to train various machine learning algorithms (e.g., decision trees, neural networks) tailored for cybersecurity applications.
– Performance Evaluation: Assess model performance using metrics such as accuracy, precision, recall, and F1-score to determine effectiveness in threat detection.
– Explainability Implementation: Apply explainability techniques like SHAP and LIME to analyze model outputs, providing insights into how decisions are made by the algorithms.
– Analysis of Results: Compare model performance with and without explainability methods to evaluate their impact on user understanding and trust.
Expected Outcomes for the Student:
– Research Skills Mastery: The student will develop advanced skills in identifying and evaluating open access datasets relevant to cybersecurity.
– Machine Learning Development: Practical experience in training machine learning models using selected datasets, enhancing understanding of algorithms and performance metrics.
– Explainability Techniques Application: Knowledge in implementing methods like SHAP and LIME to interpret model predictions, fostering transparency in AI systems.
– Critical Analysis and Problem-Solving: Enhanced ability to analyze complex cybersecurity problems and synthesize findings into actionable insights.
– Academic Contribution: Opportunities for publications or presentations, contributing original insights to the field of cybersecurity.
– Career Preparation: Skills gained will prepare the student for advanced roles in cybersecurity or further academic pursuits.
– Ethical Awareness: Understanding of ethical considerations in data usage and AI, including privacy and bias issues.