Automatic Labelling and Document Clustering for Forensic Analysis

Main Article Content

Ms. Raksha K.Mundhe, Prof. Ankush Maind

Abstract

In computer forensic analysis, retrieved data is in unstructured text, whose analysis by computer examiners is difficult to be performed. In proposed approach the forensic analysis is done very systematically i.e. retrieved data is in unstructured format get particular structure by using high quality well known algorithm and automatic cluster labelling method. Indexing is performed on txt, doc, and pdf file which automatically estimate the number of clusters with automatic labelling to it. In the proposed approach DBSCAN algorithm and K-mean algorithm are used; which makes it very easy to retrieve most relevant information for forensic analysis also the automated methods of analysis are of great interest. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysis. Two methods are used for document clustering for forensic analysis; the first method uses an x2 test of significance to detect different word usage across categories in the hierarchy which is well suited for testing dependencies when count data is available. The second method selects words which both occur frequently in a cluster and effectively discriminate the given cluster from the other clusters. Finally, we also present and discuss several practical results that can be useful for researchers of forensic analysis.

Article Details

How to Cite
, M. R. K. P. A. M. (2014). Automatic Labelling and Document Clustering for Forensic Analysis. International Journal on Recent and Innovation Trends in Computing and Communication, 2(9), 2934–2941. https://doi.org/10.17762/ijritcc.v2i9.3325
Section
Articles