Full Program »
Multi-Split HDFS Technique for Improving Data Confidentiality in Big Data Replication
Big Data and Security are a major challenge for the performance of the cloud storage systems. Hadoop Distributed file systems (HDFS) and other Distributed file systems (DFS) are widely utilized to store big data. The HDFS replicates and saves data as multiple copies to achieve availability and reliability, but HDFS increases storage and resources consumption. In the previous work; Redundant Independent Files (RIF) approach is used to reduce storage and resources consumption in big data replication. In this paper, the Secure Distributed Redundant Independent Files (SDRIF) approach addresses some issues found with the RIF approach and it mainly introduced data confidentiality that the RIF approach lacks to offer. It works similar to RIF, but the generated parity is not stored in one separate file. The generated parity blocks are distributed among all four data parts. The SDRIF is built over cloud providers (CP). The CPSDRIF is the model produced when combining the SDRIF with CP. The CPSDRIF introduces data confidentiality and data security (Integrity and availability) through the multi-split HDFS technique by distributing the parity blocks and reducing the size of the SDRIF block to the HDFS block. According to the experimental results to the CPSDRIF system using the TeraGen benchmark, it is found that the data confidentiality using CPSDRIF have been improved as compared to CPRIF. Also, the storage space is reduced by 33.3% with CPSDRIF system compared to other models. and improved the data writing by 34% and reading by about 31%.