Optimization Scheme for Storing and Accessing Huge Number of Small Files on HADOOP Distributed File System

L. Prasanna Kumar, Sampathirao Suneetha

doi:10.17762/ijritcc.v4i2.1816

PDF

Published: Feb 26, 2016

DOI: https://doi.org/10.17762/ijritcc.v4i2.1816

L. Prasanna Kumar, Sampathirao Suneetha

Abstract

Hadoop is a distributed framework which uses a simple programming model for the processing of huge datasets over the network of computers. Hadoop is used across multiple machines to store very large files, which are normally in the range of gigabytes to terabytes. High throughput access is acquired using HDFS for applications with huge datasets. In Hadoop Distributed File System(HDFS), a small file is the one which is smaller than 64MB which is the default block size of HDFS. Hadoop performance is better with a small number of large files, as opposed to a huge number of small files. Many organizations like financial firms need to handle a large number of small files daily. Low performance and high resource consumption are the bottlenecks of traditional method. To reduce the processing time and memory required to handle a large set of small files, an efficient solution is needed which will make HDFS work better for large data of small files. This solution should combine many small files into a large file and treat these large files as an individual file. It should also be able to store these large files into HDFS and retrieve any small file when needed.

How to Cite

, L. P. K. S. S. (2016). Optimization Scheme for Storing and Accessing Huge Number of Small Files on HADOOP Distributed File System. International Journal on Recent and Innovation Trends in Computing and Communication, 4(2), 315–319. https://doi.org/10.17762/ijritcc.v4i2.1816