Cloudera and Hortonworks merger means Hadoop’s influence is declining

Nitin Naresh October 6, 2018

0 262 4 minutes read

On Wednesday, Cloudera and Hortonworks announced a “merger of equals,” where Cloudera is acquiring Hortonworks with stock so that Cloudera shareholders end up with 60 percent of the combined company. The deal signifies that the Hadoop market could no longer sustain two big competitors. Hadoop has been synonymous with big data for years, but the market — and customer needs — have moved on. Several megatrends are driving this change:

The public cloud tide is rising

The first megatrend is the shift to public cloud. Companies of all sizes are increasing their adoption of AWS, Azure, and Google Cloud services at the expense of on-premises infrastructure and software. Enterprise server revenues reported by IDC and Gartner continue to decline. The Top 3 cloud providers (90 percent of the market) offer their own managed Hadoop/Spark services, such as Amazon’s Elastic Map Reduce (EMR). These are fully integrated offerings that have a lower cost of acquisition and are cheaper to scale. If you’re making the shift to cloud, it makes sense to look at alternative Hadoop offerings as part of that – it’s a natural decision-point. Ironically, there has been no Cloud Era for Cloudera.

Crushing storage costs

The second megatrend? Cloud storage economics are crushing Hadoop storage costs. At introduction in 2005, the Hadoop Distributed File System (HDFS) was revolutionary: It took servers with ordinary hard drives and turned them into a distributed storage system capable of parallel IO consumable by Java apps. There was nothing like it, and it was a crucial component that allowed large scale data sets that didn’t fit onto a single machine to be processed in parallel. But that was 13 years ago. Today, there is a plethora of much cheaper alternatives, primarily object storage services like AWS S3, Azure Blob Storage, and Google Cloud Storage. A terabyte of cloud object storage costs about $20 a month, compared to about $100/month for HDFS (not including the cost to operate it). Which is why Google’s HDFS service, for example, is merely a shim that translates HDFS operations onto object storage operations – because that’s 5x cheaper.

Faster, better, and cheaper cloud databases

Hadoop’s problems don’t end there, because it’s not just about direct competition from cloud-vendor Hadoop/Spark services and cheaper storage. The third megatrend is the advent of “serverless” cloud services that completely eliminate the need to run Hadoop or Spark at all. A common use case for Spark is to handle ad-hoc distributed SQL queries for users. Google was first to market with a revolutionary service called BigQuery in 2011 that solves the same problem in a completely different way. It lets you run ad-hoc queries on any amount of data stored in its object storage service (you don’t have to load it into special storage like HDFS). You just pay for the compute time: If you need 1,000 cores for 3.5 seconds to run your query, that’s all you pay for. There is no need to provision servers, install the OS, install software, configure everything, scale the cluster to 1,000 nodes, and feed and care for the cluster as you would with Hadoop/Spark. Google does all that, hence the moniker “serverless.” There are banks running 2,000-node Hadoop/Spark clusters operated and maintained by scores of IT people that can’t match BigQuery’s flexibility, speed, and scale. And they have to pay for all the hardware, software, and people to run and maintain Hadoop.
BigQuery is just one example. Other cloud database services are similarly massive scale, highly flexible, globally distributed “pay for what you use” databases. There’s start-up Snowflake, Google Big Table, AWS Aurora, and Microsoft Cosmos. They’re all much easier to use than a Hadoop/Spark install, and you can be up and running in 5 minutes for tens of dollars – no $500k purchase order and weeks of installation, configuration, and training required.

Python and R data science running on containers and Kubernetes

The fourth megatrend is containers and Kubernetes. Hadoop/Spark is not just a storage environment but also a compute environment. Again, back in 2005, this was revolutionary – the Map-Reduce approach of Hadoop provided a framework for parallel computation of Java applications. But the Java-centric nature (Scala-centric for Spark) of Cloudera and Hortonworks infrastructure is at odds with today’s data scientists doing machine learning in Python and R. The need to constantly iterate and improve machine learning models and to have them learn on production data means native deployment of Python and R models is a necessity, not a “nice to have.”
As recently as this week, the big Hadoop vendors’ advice has been “translate Python/R code into Scala/Java,” which sounds like King Hadoop commanding the Python/R machine learning tide to go back out again. Containers and Kubernetes work just as well with Python and R as they do with Java and Scala, and provide a far more flexible and powerful framework for distributed computation. And it’s where software development teams are heading anyway – they’re not looking to distribute new microservice applications on top of Hadoop/Spark. Too complicated and limiting.

A shift in data gravity

The net is that after a good 10 years of Cloudera and Hortonworks being the center of the Big Data universe, the center of gravity has moved elsewhere. The leading cloud companies don’t run large Hadoop/Spark clusters from Cloudera and Hortonworks – they run distributed cloud-scale databases and applications on top of container infrastructure. They do their machine learning in Python, R, and other languages that are not Java. Increasingly, enterprises are shifting to similar approaches because they want to reap the same speed and scale benefits. It’s time for the Hadoop and Spark world to move with the times.
Source: VentureBeat
To Read Our Daily News Updates, Please visit Inventiva or Subscribe Our Newsletter & Push.

Nitin Naresh October 6, 2018

0 262 4 minutes read

Top 5 Best Digital Gaming Companies In India 2024

Top 5 Best Digital Lending Companies In India 2024

SpiceJet Flights Overbooked? How Overbooking Symbolises A Dark Side Of The Aviation Industry Paving Inconvenience To Passengers!

Period Pain Relief: Understanding Your Options and the Benefits of Meftal Spas

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Swiggy’s IPO Plans, Secures Shareholder Approval For A Potential $1.2 Billion IPO

United Nations Turns Into Battleground As United States And Russia Clash Over Nuclear Weapons In Space; How Dominance In Space Is Opening A 4th Dimension In Warfare, And A Worrying One!

What Is Project Nimbus? Why Are Google Employees Protesting It? Do Tech Companies Have Ties With The Military?

Cloudera and Hortonworks merger means Hadoop’s influence is declining

The public cloud tide is rising

Crushing storage costs

Faster, better, and cheaper cloud databases

Python and R data science running on containers and Kubernetes

A shift in data gravity

Nitin Naresh

Read Next

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

The public cloud tide is rising

Crushing storage costs

Faster, better, and cheaper cloud databases

Python and R data science running on containers and Kubernetes

A shift in data gravity

Read Next

WhatsApp Tells Delhi High Court It Will Shut Down If Forced To Break Encryption; Can The Indian Government Ask For Anything And Everything? What About Privacy Laws, Are We Becoming China?

RBI Slamming The Breaks On Kotak Mahindra Bank At The Critical Time Of Elections, What’s The Story, How Will It Affect Kotak Customers?

Unmasking Patanjali and FMCG’s Deceptive Marketing: Supreme Court’s Stand Against Misleading Ads!

This is the Google Pixel Slate

Marshall’s Kilburn II is a ruggedly handsome bluetooth speaker

Related Articles

Leave a Reply Cancel reply

Top 10 Best Artificial Intelligence (AI) Companies of India in 2022

Top 10 Best Agriculture Companies in India 2022

Ampere launches new chip built from ground up for cloud workloads

Acer may shutter or sell StarVR after location-based VR revenues sink

Indonesia short on oxygen, seeks help as virus cases soar

Floods- Why are Pune and Mumbai prone to it?

The solar storms will hit the Earth and cause disruption in GPS and mobile connectivity.

The death of democracy in India

Employee Engagement In The Hybrid Workplace Of The Future

Adblock Detected