Message sent! Check your Phone

work in

work in

2012 Red Hat Summit: Red Hat's Big Data Strategy Overview & Optimizing Apache Hadoop

4y ago


Organizations continue to amass data at an exponential clip and traditional data processing and analytics have failed to keep up. Hadoop is central to big data. Apache Hadoop is a popular implementation of the MapReduce framework for distributed computing on large data sets using computer clusters. However, there are several known limitations with Hadoop's scheduler and its policies with regards to multi-tenancy, Hadoop's resource partitioning (via slots), and Hadoop's ability to work in heterogeneous environments. Furthermore, not all workloads are suitable for MapReduce, thus introducing inefficiencies in job performance and the inability to use alternate frameworks in the cluster. All this significantly impacts job throughput, job performance, resource utilization, and resource flexibility. Red Hat Enterprise MRG Grid, and Condor on which Red Hat Enterprise MRG Grid is based, have had proven success with job and resource scalability and job throughput and performance in heterogeneous environments at some of the world's largest grids. In this session, we'll start with an overview of Red Hat's approach to big data. Then we make the case for using Red Hat Enterprise MRG Grid in Apache Hadoop to address some of the limitations with Apache Hadoop . In addition, we will discuss an architectural framework to implement Red Hat MRG Grid in an Apache Hadoop implementation and present some examples. We conclude with a summary of the benefits of using Red Hat MRG Grid with Apache Hadoop and explore some of the future possibilities in this area.