Hadoop Configuration Tuning for Performance Optimization

  • Christian Christian Swiss German University
  • Kho I Eng Swiss German University
  • Heru Purnomo Ipung Swiss German University
Keywords: Apache Hadoop, Computer Cluster, Configuration tuning, Terasort Benchmark

Abstract

Configuration parameter tuning is an essential part of the implementation of Hadoop clusters. Each parameter in a configuration plays a role that impacts the ov erall performance of the cluster. Therefore, we need to learn the characteristics of said parameter and understand the impact in hardware utilization in order to achieve optimal configuration. In this paper, we conducted experiments that includes modifying configuration and performed benchmark to find out if there is any performance gain. TeraSort is the program that runs the benchmark, we measure the time needed to complete the sort of the set of data and the CPU utilization during the benchmark. We conclu de that from our experiments we can see significant performance improvements by tuning with the configurations. However, the results may vary between different cluster configuration.

Published
2017-04-25
How to Cite
Christian, C., I Eng, K., & Ipung, H. P. (2017). Hadoop Configuration Tuning for Performance Optimization. Journal of Applied Information, Communication and Technology, 4(1), 31-40. https://doi.org/10.33555/ejaict.v4i1.81
Section
Articles