Big Data Vendor Roundup

The latest hot trend is Big Data. But you knew that already having heard the hype, oh, just about everywhere. So you’re ready now to take a look at the vendor lineup and hopefully pick a winner. Let’s break the contenders down then, take a look at who they are and what they’ve got, and get on with it.

As is always the case with any tech, define what you have on hand and what you want to achieve before you go vendor shopping.

“Now if you reckon you have some decent quality data already in place, and a business outcome defined, this allows you to at least now focus on specific vendors who are specialized in this space,” advised Zane Moi, co-Founder and CEO of TreeCrunch, a Big Data analytics and conversation platform for brands.

Why is this important? Because Big Data, like the Cloud before it, is ill-defined thus making it incredibly easy to buy a “pig in a poke” and still find yourself pitifully short on bacon.

“Big Data is a very broad term, which encompasses a broad series of, sometimes, mutually exclusive activities,” cautioned Moi. “So the first two things the CIO has to think about is, what are the business outcomes that they want to achieve and where are they currently, from a master data management perspective, to be able to actually achieve this business vision.”

Big Data marketscape takes shape

According to Wikibon, the Big Data market stands at a little over $5 billion and will grow beyond $50 billion in the next five years. Wikibon.org is a professional, open source community that works at solving technology and business problems through sharing free advisory knowledge. It was founded by former IDC, Meta and Gartner analysts.

A Wikibon report by Jeff Kelly with David Vellante and David Floyer says that only five percent of the current vendors are pure-plays; the usual big players in enterprise tech, such as IBM, Intel, HP, Oracle, Teradata, Fujitsu, and others, account for 95 percent of the overall revenue.

There are two things to immediately think about here. The mega vendors are going to be hard pressed to be as agile and innovative as the smaller pure-plays. But the pure-plays are facing plenty of pressure, too. “It is incumbent upon Hadoop-focused pure-plays, however, to establish a profitable business model for commercializing the open source framework and related software, which to date has been elusive,” said Kelly in the Wikibon report.

This creates the usual conundrum for tech buyers: go with the more innovative smaller players who may crash and burn financially at some point or, go with the financially secure mega vendors whose products may prove ineffectual and inefficient in the end. Because this market is booming, many new pure-plays will also enter the field and many mega vendors will be on the prowl to acquire the best of the lot. Indeed, mega vendors starting buying independent pure-plays in big gulps last year and are continuing a M&A growth strategy. Thus turmoil will be the order of the day for awhile yet until the market matures and settles down.

“As Big Data becomes an increasingly big theme for many companies, the corresponding Big Data vendor landscape is developing in lock step,” said Dan Kearnan, senior director of Data Warehouse Marketing, business analytics at SAP. “However, navigating the Big Data vendor landscape can be an overwhelming exercise for those companies looking to solve Big Data challenges.”

Pure-play vendor breakdown

“When looking to solve a Big Data challenge in your company, it is first essential to clearly define the challenge faced, whether it be Big Data storage, Big Data analysis, or Big Data speed, said Kearnan.

“This analysis will put you in a good position to then map the identified challenge to the growing number of Big Data vendors. Companies should diligently analyze the growing Big Data solution provider landscape to better understand players and their approaches to Big Data.”

That said, here are the general defining points in the four leading players: Vertica, Teradata Aster (formerly Aster Data), Greenplum, and Splunk.

The first three are “upending the traditional enterprise data warehouse market with massively parallel, columnar analytic databases that deliver lightening fast data loading and near real-time query capabilities,” according to Kelly. The three were pioneering Big Data products long before Hadoop emerged as the mainstream Big Data play. Interestingly, all three offered Hadoop connectivity anyway.

Vertica, Aster Data and Greenplum were the three leading independent next generation data warehouse vendors up until recently.

Vertica is now owned by HP. The company defines itself as “high-speed, self-tuning column-oriented SQL database management software for data warehousing and business intelligence.” Among its more remarkable features, most profoundly evident in its latest Vertica Analytic Platform iteration, are “new elasticity capabilities to easily expand or contract deployments and a slew of new in-database analytic functions” said Kelly.

Vertica 5.1 includes a revamped client framework for easier integration with third-party BI, ETL, analytics, and other ecosystem solutions such as Hadoop distributions based on Apache Commons Release 1.0.0, including Hortonworks Data Platform v1.

Teradata Aster (formerly Aster Data before being bought by Teradata) “has pioneered a novel SQL-MapReduce framework, combining the best of both data processing approaches” said Kelly. Specifically, the Teradata Aster MapReduce Platform combines MapReduce, the language of Big Data analytics, with SQL, the language of business analytics.

This makes it easier to analyze large volumes of complex data such as Web logs, machine data, and text, while also making it easier to perform more rich analysis than is possible with traditional SQL technology alone. Aster Database 5.0 offers greater development flexibility and includes pre-built MapReduce modules for behavioral click stream interpretation, marketing attribution, decision tree analysis, and other analysis.

Greenplum is now owned by EMC. “Greenplum’s unique collaborative analytic platform, Chorus, provides a social environment for Data Scientists to experiment with Big Data,” said Kelly. Indeed, Chorus resembles Facebook in that is designed specifically for social collaboration, except between data scientists rather than ordinary folks, but it differs from Facebook and industry competitors by pairing that collaboration with Big Data analytics and processes. Other products in the company’s lineup include Greenplum Unified Analytics Platform, Greenplum Data Computing Appliance, Greenplum Database, Greenplum Analytics Lab, and Greenplum HD.

Splunk, ranking third in Wikibon’s ranking of these top four vendors, “specializes in processing and analyzing log file data to allow administrators to monitor IT infrastructure performance and identify bottlenecks and other disruptions to service” said Kelly. The company recently went public and immediately soared to a $3 billion market value. Erik Swan, the company’s CEO and co-founder, describes Splunk as “Google for machine data” in an interview with Dell. That sounds deceptively simple when you realize that Splunk identifies and tracks machine data ranging from under-the- hood and often automated machine workings to patterns in human use of these machines.

From these four vendors alone one can easily see that the approaches to Big Data are as varied as the vendors doing the approaching. It is vital then to understand exactly where you want to end up and exactly how each vendor would get you to that destination (and even if they are the right vehicle for that particular trip) before you decide to commit to the journey.

And that just on the data side. Networking vendors are jumping on this bandwagon in a big way because, without the underlying network architecture to support all of this data analysis and warehousing, what do you have? A big pile of Big Data that causes a big problem and little else.

A prolific and versatile writer, Pam Baker writes about technology, science, business, and finance for leading print and online publications including ReadWriteWeb, CIO and CIO.com, Institutional Investor, Fierce Markets Network, I Six Sigma magazine, CIO Update, E-Commerce Times, and many others. Her published credits include eight traditional books, a smattering of eBooks, and several analytical studies on various technologies for research firms on two continents. Among other awards, Baker won international acclaim for her documentary on the paper-making industry, and is a member of the National Press Club and the Internet Press Guild (IPG). She lives in Georgia, USA with her family and two dogs.