Adaptive N-Gram Classifier for Privacy Protection

Main Article Content

Libin Babu, Deepa S.S

Abstract

We are living in a world where information is worth more than gold. Hence protecting sensitive information has become a crucial task. When telephones gave way to smartphones people not just start using them as communication tools, but to work on the go and to actively immerse in social network circles and other private communication services like chat SMS etc. Knowing each end point to the Internet is a potential risk which was a PC or laptop a while ago. Traditional methods limit the usage and somewhat the convenience of the user which dealt severely. The user knowingly or unknowingly releases sensitive information into the web which are either monitored or mined by third parties and uses them for unlawful purposes. Existing techniques mostly use data fingerprinting, exact and partial document matching and statistical methods to classify sensitive data. Keyword-based are used when the target documents are less diverse and they ignore the context of the keyword, on the other hand statistical methods ignore the content of the analyzed text. In this paper we propose a dynamic N-gram analyzer which can be used as a document classifier, we investigate the relationship of size and quality of N-grams and the effect of other feature sets like exclusion of common N-grams, grammatical words, N-gram-sizes etc. Another improvement is in the area of dynamic N-gram updater which dynamically changes the N-gram feature vectors. Our work has shown that the techniques fairly outperforms the traditional methods even when the categories exhibit frequent similarities.
DOI: 10.17762/ijritcc2321-8169.1506148

Article Details

How to Cite
, L. B. D. S. (2015). Adaptive N-Gram Classifier for Privacy Protection. International Journal on Recent and Innovation Trends in Computing and Communication, 3(6), 4218–4222. https://doi.org/10.17762/ijritcc.v3i6.4624
Section
Articles