eBay has open sourced a database technology, called Kylin, that takes advantage of distributed processing and the HBase data store in order to return faster results for SQL queries over Hadoop data.
Online auction site eBay has open sourced a database technology called Kylin that the company says enables fast queries over even petabytes of data stored in Hadoop. eBay isn’t a big data user on par with companies like Google and Facebook, but it does run technologies such as Hadoop at a fairly large scale and Kylin seems a good example of the type of innovation it’s doing on top of them.
Citing among other features its REST APIs, ANSI-SQL compatibility, connections to analysis tools Tableau and Excel, and sub-second latency on some queries. However, the most unique features of Kylin involve how it deals with scale. eBay says it can query billions of rows of data — on datasets more that 14 terabytes in size — at speeds much faster than using the traditional Apache Hive tool.
The way Kylin works, at a high level, is to take data from Hive; pre-process large queries using MapReduce; and then store those results as key-value “cuboids” in HBase. When a user runs a Kylin query using a particular set of variables, the values are ready to go without requiring them to be processed again. It’s not entirely dissimilar from the cubes than analytic databases have been utilizing for years, but Kylin’s cuboids are designed with HBase’s preferred data structure in mind.


No comments:
Post a Comment