By: infotechalive July 30, 2022
Data analysts are seeing an increasing number of queries around “big data” but not so much with “Hadoop”. However, this is not much of a surprise for anyone as it was 2014 when Cloudera had announced its Enterprise Data Hub and now its been 8 years since then. The marketing based on Hadoop has gradually moved away to embrace Spark, Kafka, NiFi and other open sources.
The numbers do not tell us what the Hadoop enquiries are about but that aspect has also undergone a change. There has been emerging interest in SQL Access to Hadoop. SQL Access to Object Stores was also a part of it. Products have been launched which promise to improve performance in comparison to unoptimized data along with acceleration technology built on open sources.
Apache Hadoop is still Fighting On
A majority section of the compute workloads has been taken over by Spark. However, more than half of the queries are directed to analysts who work outside the data management team. The demarcation between compute and storage is showing itself through the queries which were earlier only about Hadoop. The issues being faced now are different and the teams dealing with those are different too. The compute-enthusiasts expect the stores which are provided to them by the data teams would have access with open APIs. They further expect that the marketplace would offer at the storage layer which will easily be able to accommodate these expectations.
The Hadoop version 3.3 got its release in the year 2020 and there have been multiple updates since. The Apache site by MapReduce, HDFS and YARN still has it, with value and significant installed bases still continuing. The next steps are already being seen.
MapReduce is no longer a preferred tool. There are multiple competitors for HDFS at its layer of storage. YARN is not being found anywhere else. The other open source tools involved in resource management are cribbing with each other in a declining on-prem landscape.