1st Edition

Data Mining Tools for Malware Detection

450 Pages 131 B/W Illustrations

by Auerbach Publications

450 Pages

by Auerbach Publications

Learn about VitalSource eBooks

Also available as eBook on:

Taylor & Francis eBooks
(Institutional Purchase)Opens in new tab or window

Description

Although the use of data mining for security and malware detection is quickly on the rise, most books on the subject provide high-level theoretical discussions to the near exclusion of the practical aspects. Breaking the mold, Data Mining Tools for Malware Detection provides a step-by-step breakdown of how to develop data mining tools for malware detection. Integrating theory with practical techniques and experimental results, it focuses on malware detection applications for email worms, malicious code, remote exploits, and botnets.

The authors describe the systems they have designed and developed: email worm detection using data mining, a scalable multi-level feature extraction technique to detect malicious executables, detecting remote exploits using data mining, and flow-based identification of botnet traffic by mining multiple log files. For each of these tools, they detail the system architecture, algorithms, performance results, and limitations.

Discusses data mining for emerging applications, including adaptable malware detection, insider threat detection, firewall policy analysis, and real-time data mining
Includes four appendices that provide a firm foundation in data management, secure systems, and the semantic web
Describes the authors’ tools for stream data mining

From algorithms to experimental results, this is one of the few books that will be equally valuable to those in industry, government, and academia. It will help technologists decide which tools to select for specific applications, managers will learn how to determine whether or not to proceed with a data mining project, and developers will find innovative alternative designs for a range of applications.

Introduction
Trends
Data Mining and Security Technologies
Data Mining for Email Worm Detection
Data Mining for Malicious Code Detection
Data Mining for Detecting Remote Exploits
Data Mining for Botnet Detection
Stream Data Mining
Emerging Data Mining Tools for Cyber Security Applications
Organization of This Book
Next Steps

Part I: DATA MINING AND SECURITY
Introduction to Part I: Data Mining and Security

Data Mining Techniques
Introduction
Overview of Data Mining Tasks and Techniques
Artificial Neural Network
Support Vector Machines
Markov Model
Association Rule Mining (ARM)
Multi-class Problem
2.7.1 One-VS-One
2.7.2 One-VS-All
Image Mining
2.8.1 Feature Selection
2.8.2 Automatic Image Annotation
2.8.3 Image Classification
Summary
References

Malware
Introduction
Viruses
Worms
Trojan Horses
Time and Logic Bombs
Botnet
Spyware
Summary
References

Data Mining for Security Applications
Overview
Data Mining for Cyber Security
4.2.1 Overview
4.2.2 Cyber-terrorism, Insider Threats, and External Attacks
4.2.3 Malicious Intrusions
4.2.4 Credit Card Fraud and Identity Theft
4.2.5 Attacks on Critical Infrastructures
4.2.6 Data Mining for Cyber Security
Current Research and Development
Summary
References

Design and Implementation of Data Mining Tools
Introduction
Intrusion Detection
Web Page Surfing Prediction
Image Classification
Summary and Directions
References

Conclusion to Part I

DATA MINING FOR EMAIL WORM DETECTION

Introduction to Part II

Email Worm Detection
Introduction
Architecture
Related Work
Overview of Our Approach
Summary
References

Design of the Data Mining Tool
Introduction
Architecture
Feature Description
7.3.1 Per-Email Features
7.3.2 Per-Window Features
Feature Reduction Techniques
7.4.1 Dimension Reduction
7.4.2 Two-Phase Feature Selection (TPS)
7.4.2.1 Phase I
7.4.2.2 Phase II
Classification Techniques
Summary
References

Evaluation and Results
Introduction
Dataset
Experimental Setup
Results
8.4.1 Results from Unreduced Data
8.4.2 Results from PCA-Reduced Data
8.4.3 Results from Two-Phase Selection
Summary
References

Conclusion to Part II

Part III: DATA MINING FOR DETECTING MALICIOUS EXECUTABLES
Introduction to Part III

Malicious Executables
Introduction
Architecture
Related Work
Hybrid Feature Retrieval (HFR) Model
Summary and Directions
References

Design of the Data Mining Tool
Introduction
Feature Extraction Using n-Gram Analysis
10.2.1 Binary n-Gram Feature
10.2.2 Feature Collection
10.2.3 Feature Selection
10.2.4 Assembly n-Gram Feature
10.2.5 DLL Function Call Feature
The Hybrid Feature Retrieval Model
10.3.1 Description of the Model
10.3.2 The Assembly Feature Retrieval (AFR) Algorithm
10.3.3 Feature Vector Computation and Classification
Summary and Directions
References

Evaluation and Results
Introduction
Experiments
Dataset
Experimental Setup
Results
11.5.1 Accuracy
11.5.1.1 Dataset1
11.5.1.2 Dataset2
11.5.1.3 Statistical Significance Test
11.5.1.4 DLL Call Feature
11.5.2 ROC Curves
11.5.3 False Positive and False Negative
11.5.4 Running Time
11.5.5 Training and Testing with Boosted J48
Example Run
Summary and Directions
References

Conclusion to Part III

DATA MINING FOR DETECTING REMOTE EXPLOITS

Introduction to Part IV

Detecting Remote Exploits
Introduction
Architecture
Related Work
Overview of Our Approach
Summary and Directions
References

Design of the Data Mining Tool
Introduction
DExtor Architecture
Disassembly
Feature Extraction
13.4.1 Useful Instruction Count (UIC)
13.4.2 Instruction Usage Frequencies (IUF)
13.4.3 Code vs. Data Length (CDL)
Combining Features and Compute Combined Feature Vector
Classification
Summary and Directions
References

Evaluation and Results
Introduction
Dataset
Experimental Setup
14.3.1 Parameter Settings
14.2.2 Baseline Techniques
Results
14.4.1 Running Time
Analysis
Robustness and Limitations
14.6.1 Robustness against Obfuscations
14.6.2 Limitations
Summary and Directions
References

Conclusion to Part IV

Part V: DATA MINING FOR DETECTING BOTNETS

Introduction to Part V

Detecting Botnets
Introduction
Botnet Architecture
Related Work
Our Approach
Summary and Directions
References

Design of the Data Mining Tool
Introduction
Architecture
System Setup
Data Collection
Bot Command Categorization
Feature Extraction
16.6.1 Packet-level Features
16.6.2 Flow-level Features
Log File Correlation
Classification
Packet Filtering
Summary and Directions
References

Evaluation and Results
Introduction
17.1.1 Baseline Techniques
17.1.2 Classifiers
Performance on Different Datasets
Comparison with Other Techniques
Further Analysis
Summary and Directions
References

Conclusion to Part V

STREAM MINING FOR SECURITY APPLICATIONS

Introduction to Part VI

Stream Mining
Introduction
Architecture
Related Work
Our Approach
Overview of the Novel Class Detection Algorithm
Classifiers Used
Security Applications
Summary
References

Design of the Data Mining Tool
Introduction
Definitions
Novel Class Detection
19.3.1 Saving the Inventory of Used Spaces during Training
19.3.1.1 Clustering
19.3.1.2 Storing the Cluster Summary Information
19.3.2 Outlier Detection and Filtering
19.3.2.1 Filtering
19.3.2.2 Detecting Novel Class
Security Applications
Summary and Directions
Reference

Evaluation and Results
Introduction
Datasets
20.2.1 Synthetic Data with Only Concept-Drift (SynC)
20.2.2 Synthetic Data with Concept-Drift and Novel Class (SynCN)
20.2.3 Real Data—KDDCup 99 Network Intrusion Detection
20.2.4 Real Data—Forest Cover (UCI Repository)
Experimental Setup
20.3.1 Baseline Method
Performance Study
20.4.1 Evaluation Approach
20.4.2 Results
20.4.3 Running Time
Summary and Directions
References

Conclusion for Part VI

EMERGING APPLICATIONS

Introduction to Part VII

Data Mining For Active Defense
Introduction
Related Work
Architecture
A Data Mining–Based Malware Detection Model
21.4.1 Our Framework
21.4.2 Feature Extraction
21.4.2.1 Binary n-Gram Feature Extraction
21.4.2.2 Feature Selection
21.4.2.3 Feature Vector Computation
21.4.3 Training
21.4.4 Testing
Model-Reversing Obfuscations
21.5.1 Path Selection
21.5.2 Feature Insertion
21.5.3 Feature Removal
Experiments
Summary and Directions
References

Data Mining for Insider Threat Detection
Introduction
The Challenges, Related Work, and Our Approach
Data Mining for Insider Threat Detection
22.3.1 Our Solution Architecture
22.3.2 Feature Extraction and Compact Representation
22.3.3 RDF Repository Architecture
22.3.4 Data Storage
22.3.4.1 File Organization
22.3.4.2 Predicate Split (PS)
22.3.4.3 Predicate Object Split (POS)
22.3.5 Answering Queries Using Hadoop MapReduce
22.3.6 Data Mining Applications
Comprehensive Framework
Summary and Directions
References

Dependable Real-Time Data Mining
Introduction
Issues in Real-Time Data Mining
Real-Time Data Mining Techniques
Parallel, Distributed, Real-Time Data Mining
Dependable Data Mining
Mining Data Streams
Summary and Directions
References

Firewall Policy Analysis
Introduction
Related Work
Firewall Concepts
24.3.1 Representation of Rules
24.3.2 Relationship between Two Rules
24.3.3 Possible Anomalies between Two Rules
Anomaly Resolution Algorithms
24.4.1 Algorithms for Finding and Resolving Anomalies
24.4.1.1 Illustrative Example
24.4.2 Algorithms for Merging Rules
24.4.2.1 Illustrative Example of the Merge Algorithm
Summary and Directions
References

Conclusion to Part VII

Summary and Directions
Overview
Summary of This Book
Directions for Data Mining Tools for Malware Detection
Where Do We Go from Here?

Appendix A: Data Management Systems: Developments and Trends
Overview
Developments in Database Systems
Status, Vision, and Issues
Data Management Systems Framework
Building Information Systems from the Framework
Relationship between the Texts
Summary and Directions
References

Appendix B: Trustworthy Systems
Overview
Secure Systems
B.2.1 Overview
B.2.2 Access Control and Other Security Concepts
B.2.3 Types of Secure Systems
B.2.4 Secure Operating Systems
B.2.5 Secure Database Systems
B.2.6 Secure Networks
B.2.7 Emerging Trends
B.2.8 Impact of the Web
B.2.9 Steps to Building Secure Systems
Web Security
Building Trusted Systems from Untrusted Components
Dependable Systems
B.5.1 Overview
B.5.2 Trust Management
B.5.3 Digital Rights Management

Author(s)

Biography

Mehedy Masud is a postdoctoral fellow at the University of Texas at Dallas (UTD), where he earned his PhD in computer science in December 2009. He has published in premier journals and conferences, including IEEE Transactions on Knowledge and Data Engineering and the IEEE Data Mining Conference. He will be appointed as a research assistant professor at UTD in Fall 2012. Masud’s research projects include reactively adaptive malware, data mining for detecting malicious executables, botnet, and remote exploits, and cloud data mining. He has a patent pending on stream mining for novel class detection.

Latifur Khan is an associate professor in the computer science department at the University of Texas at Dallas, where he has been teaching and conducting research since September 2000. He received his PhD and MS degrees in computer science from the University of Southern California in August 2000 and December 1996, respectively. Khan is (or has been) supported by grants from NASA, the National Science Foundation (NSF), Air Force Office of Scientific Research (AFOSR), Raytheon, NGA, IARPA, Tektronix, Nokia Research Center, Alcatel, and the SUN academic equipment grant program. In addition, Khan is the director of the state-of-the-art DML@UTD, UTD Data Mining/Database Laboratory, which is the primary center of research related to data mining, semantic web, and image/videoannotation at the University of Texas at Dallas. Khan has published more than 100 papers, including articles in several IEEE Transactions journals, the Journal of Web Semantics, and the VLDB Journal and conference proceedings such as IEEE ICDM and PKDD. He is a senior member of IEEE.

Bhavani Thuraisingham joined the University of Texas at Dallas (UTD) in October 2004 as a professor of computer science and director of the Cyber Security Research Center in the Erik Jonsson School of Engineering and Computer Science and is currently the Louis Beecherl Jr. Distinguished Professor. She is an elected Fellow of three professional organizations: the IEEE (Institute for Electrical and Electronics Engineers), the AAAS (American Association for the Advancement of Science), and the BCS (British Computer Society) for her work in data security. She received the IEEE Computer Society’s prestigious 1997 Technical Achievement Award for "outstanding and innovative contributions to secure data management." Prior to joining UTD, Thuraisingham worked for the MITRE Corporation for 16 years, which included an IPA (Intergovernmental Personnel Act) at the National Science Foundation as Program Director for Data and Applications Security. Her work in information security and information management has resulted in more than 100 journal articles, more than 200 refereed conference papers, more than 90 keynote addresses, and 3 U.S. patents. She is the author of ten books in data management, data mining, and data security.

Add to Cart

Data Mining Tools for Malware Detection

Description

Table of Contents

Author(s)

Biography