Signature Base Method Dataset Feature Reduction of Opcode Using Pre-Processing Approach

Mr. Bhushan P. Kinholkar

doi:10.17762/ijritcc.v3i12.5147

PDF

Published: Dec 31, 2015

DOI: https://doi.org/10.17762/ijritcc.v3i12.5147

Mr. Bhushan P. Kinholkar

Abstract

Malware can be defined as any type of malicious code that has the potential to harm a computer or network. To detect unknown malware families, the frequency of the appearance of Opcode (Operation Code) sequences are used through dynamic analysis. Opcode n-gram analysis used to extract features from the inspected files. Opcode n-grams are used as features during the classification process with the aim of identifying unknown malicious code. A support vector machine (SVM) is used to create a reference model, which is used to evaluate two methods of feature reduction, which are area of intersect. The SVM is configured to traverse through the dataset searching for Opcodes that have a positive impact on the classification of benign and malicious software. The dataset is constructed by representing each executable file as a set of Opcode density histograms. Classification tasks involve separating dataset into training and test data. The training sets are classified into benign and malicious software. In area of interest the characteristics of benign and malicious Opcodes are plotted as normal distributions. They are grouped into density curves of a single Opcode. The key feature to note is the overlapping area of the two density curves. In Subspace analysis the importance of individual Opcodes, are investigated by the eigenvalues and eigenvectors in subspace .PCA is used for data compression and mapping. The eigenvector filter Opcodes coincides with the SVM chose Opcodes.

How to Cite

, M. B. P. K. (2015). Signature Base Method Dataset Feature Reduction of Opcode Using Pre-Processing Approach. International Journal on Recent and Innovation Trends in Computing and Communication, 3(12), 6813–6819. https://doi.org/10.17762/ijritcc.v3i12.5147