Cross-Layer Fragment Indexing based File Deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for Optimized Cloud Storage

Main Article Content

K. Geetha
A. Vijaya

Abstract

Cloud computing and storage processing is a big service for maintaining a large number of data in a centralized server to store and retrieve data depending on the use to pay as a service model. Due to increasing storage depending on duplicate copy presence during different sceneries, the increased size leads to increased cost. To resolve this problem, we propose a Cross-Layer Fragment Indexing (CLFI) based file deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for optimized cloud storage. Initially, the file storage indexing easy carried out with Lexical Syntactic Parser (LSP) to split the files into blocks. Then comparativesector was created based on Chunk staking. Based on the file frequency weight, the relative Indexing was verified through Cross-Layer Fragment Indexing (CLFI). Then the fragmented index gets grouped by maximum relative threshold margin usingIntra Subset Near-Duplicate Clusters (ISNDC). The hashing is applied to get comparative index points based on hyper correlation comparer using Hyper Spectral Hash Duplicate Filter (HSHDF). This filter the near duplicate contentdepending on file content difference to identify the duplicates. This proposed system produces high performance compared to the other system. This optimizes cloudstorage and has a higher precision rate than other methods.

Article Details

How to Cite
Geetha, K. ., & Vijaya, A. . (2023). Cross-Layer Fragment Indexing based File Deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for Optimized Cloud Storage. International Journal on Recent and Innovation Trends in Computing and Communication, 11(8s), 565–575. https://doi.org/10.17762/ijritcc.v11i8s.7239
Section
Articles

References

Y. Tan et al., "Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods," in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 1, pp. 214-228, 1 Jan. 2021, DOI: 10.1109/TPDS.2020.3012704.

X. Yang, R. Lu, J. Shao, X. Tang, and A. A. Ghorbani, "Achieving Efficient Secure Deduplication With User-Defined Access Control in Cloud," in IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 1, pp. 591-606, 1 Jan.-Feb. 2022, DOI: 10.1109/TDSC.2020.2987793.

B. Wang et al., "A Data Structure for Efficient File Deduplication in Cloud Storage," 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0071-0077, DOI: 10.1109/UEMCON51285.2020.9298159.

B. Wang et al., "A Data Structure for Efficient File Deduplication in Cloud Storage," 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0071-0077, DOI: 10.1109/UEMCON51285.2020.9298159.

F. Rashid, A. Miri, and I. Wolfgang, "Proof of Storage for Video Deduplication in the Cloud," 2015 IEEE International Congress on Big Data, 2015, pp. 499-505, DOI: 10.1109/BigDataCongress.2015.79.

J. Ren, Z. Yao, J. Xiong, Y. Zhang, and A. Ye, "A Secure Data Deduplication Scheme Based on Differential Privacy," 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), 2016, pp. 1241-1246, DOI: 10.1109/ICPADS.2016.0169.

M. Aman, P. Verma, and D. Rajeswari, "Secure Cloud Data Deduplication with Efficient Re-Encryption," 2021 International Conference on Intelligent Technologies (CONIT), 2021, pp. 1-4, DOI: 10.1109/CONIT51480.2021.9498487.

K. Vijayalakshmi and V. Jayalakshmi, "Analysis on data deduplication techniques of storage of big data in cloud," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 2021, pp. 976-983, DOI: 10.1109/ICCMC51019.2021.9418445.

N. Chhabra and M. Bala, "A Comparative Study of Data Deduplication Strategies," 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 2018, pp. 68-72, DOI: 10.1109/ICSCCC.2018.8703363.

Yong-Ting Wu, Min-Chieh Yu, Jenq-ShiouLeu, Eau-Chung Lee and Tian Song, "Design and implementation of various file deduplication schemes on storage devices," 2015 11th International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QSHINE), 2015, pp. 80-84.

Kulkarni, L. . (2022). High Resolution Palmprint Recognition System Using Multiple Features. Research Journal of Computer Systems and Engineering, 3(1), 07–13. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/35

P. Bartus and E. Arzuaga, "Using file-aware deduplication to improve capacity in storage systems," 2017 IEEE Colombian Conference on Communications and Computing (VOLCOM), 2017, pp. 1-6, DOI: 10.1109/ColComCon.2017.8088193.

L. Conde-Canencia and B. Hamoum, "Deduplication algorithms and models for efficient data storage," 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), 2020, pp. 23-28, DOI: 10.1109/CSCC49995.2020.00013.

Y. Tan et al., "Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods," in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 1, pp. 214-228, 1 Jan. 2021, DOI: 10.1109/TPDS.2020.3012704.

W. Xia et al., "The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems," in IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 9, pp. 2017-2031, 1 Sept. 2020, DOI: 10.1109/TPDS.2020.2984632.

Y. Won, K. Lim and J. Min, "MUCH: Multithreaded Content-Based File Chunking," in IEEE Transactions on Computers, vol. 64, no. 5, pp. 1375-1388, 1 May 2015, DOI: 10.1109/TC.2014.2322600.

B. Wang et al., "A Data Structure for Efficient File Deduplication in Cloud Storage," 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0071-0077, DOI: 10.1109/UEMCON51285.2020.9298159.

N. Chhabra and M. Bala, "A Comparative Study of Data Deduplication Strategies," 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), 2018, pp. 68-72, DOI: 10.1109/ICSCCC.2018.8703363.

K. Vijayalakshmi and V. Jayalakshmi, "Analysis on data deduplication techniques of storage of big data in cloud," 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), 2021, pp. 976-983, DOI: 10.1109/ICCMC51019.2021.9418445.

Y. Tan et al., "Improving the Performance of Deduplication-Based Storage Cache via Content-Driven Cache Management Methods," in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 1, pp. 214-228, 1 Jan. 2021, DOI: 10.1109/TPDS.2020.3012704.

N. Sharma, A. V. Krishna Prasad and V. Kakulapati, "File-level Deduplication by using text files – Hive integration," 2021 International Conference on Computer Communication and Informatics (ICCCI), 2021, pp. 1-6, DOI: 10.1109/ICCCI50826.2021.9402465.