Big Data a $50 Billion Market by 2012?

The world is buzzing about Big Data and it begs the question: “How big is the Big Data market?” Unable to find any market size information, Wikibon, a open-source-style community of industry analysts, kicked off a project to study the size and forecast the market and report on market shares.

The study, How Big is the Big Data Market?, written by Jeff Kelly with David Vellante and David Floyer, looks at who is who in the Big Data space today, who is innovating and which companies are jockeying for position.

Wikibon defines Big Datato include data sets whose size and type make them impractical to process and analyze with traditional database technologies and related tools. The Big Data market, therefore, includes those technologies, tools and services designed to address these shortcomings, including:

- Hadoop distributions, software, subprojects and related hardware;
- Next gen data warehouses and related hardware;
- Data integration tools and platforms as applied to Big Data;
- Big Data analytic platforms, applications and data visualization tools;
- Big Data support, training and professional services.

Highlights from the study show that the market leaders are IBM, Intel, and HP. These mega-vendors, the study said, will face increased competition from established enterprise suppliers as well as big data pure-plays, like Vertica, Splunk and Cloudera, who are developing big data technologies around Hadoop and use cases that are driving the market.

While IT heavyweights IBM and Intel currently lead the big data market in overall revenue, this is mainly due to their breadth of offerings and entrenchment in many enterprise data centers, the study said.

Most of the “impactful” innovations are coming from the many small pure-play vendors. While not all will succeed in the long term, and some have yet to deliver any significant revenue, Wikibon said it expects many of these vendors to grow quickly. But, as their offerings, support services, and sales channels mature they will also become take-over targets. As was the case with Vertica (HP), Aster Data (Teradata), and Greenplum (EMC).

What follows is a listing of some of the bigger independent players and what they play with:

Hadoop distributions – Cloudera and Hortonworks are responsible for the majority of contributions to the Apache Hadoop project that are significantly improving the open source big data framework’s performance capabilities and enterprise-readiness.

Cloudera contributes significantly to Apache HBase, the Hadoop-based non-relational database that allows for low-latency, quick lookups and Hortonworks’ engineers are working on a next-generation MapReduce architecture that promises to increase the maximum Hadoop cluster size beyond its current practical limitation of 4,000 nodes.

Next-gen data warehousing – The three leading and, until recently, independent next-generation data warehouse vendors are Vertica, Greenplum, and Aster Data.

Big Data analytics platforms and applications – A handful of up-and-coming vendors are developing applications and platforms that leverage the underlying Hadoop infrastructure to provide both data scientists and “regular” business users with easy-to-use tools for experimenting with big data.

These include Datameer, which has developed a Hadoop-based business intelligence platform with a familiar spreadsheet-like interface; Karmasphere, whose platform allows data scientists to perform ad hoc queries on Hadoop-based data via a SQL interface; and Digital Reasoning, whose Synthesis platform sits on top of Hadoop to analyze text-based communication.

Big Data-as-a-Service (BDaaS) for SMBs – BDaaS is developing rapidly thanks to vendors such as Tresata, 1010data and ClickFox. Tresata’s cloud-based platform, for example, leverages Hadoop to process and analyze large volumes of financial data and returns results via on-demand visualizations for banks, financial data companies, and other financial services companies.

1010data offers a cloud application that allows business users and analysts to manipulate data in the familiar spreadsheet format but at big data scale. And the ClickFox platform mines large volumes of customer touch-point data to map the total customer experience with visuals and analytics delivered on-demand.

Non-Hadoop Big Data platforms

Other non-Hadoop vendors contributing significant innovation to the big data landscape include:

Splunk, which specializes in processing and analyzing log file data to allow administrators to monitor IT infrastructure performance and identify bottlenecks and other disruptions to service; HPCC Systems, a spin-off of LexisNexis, that offers a competing big data framework to Hadoop that its engineers built internally over the last ten years to assist the company in processing and analyzing large volumes of data for its clients in finance, utilities and government; and DataStax, which offers a commercial version of the open source Apache Cassandra NoSQL database along with related support services bundled with Hadoop.

Enterprises should keep a close eye on these and other big data pure-plays as they continue to develop innovative but practical big data platforms, applications and services.