专访:大数据群雄逐鹿 Hadoop坚持开源？

发布时间：2017-01-09 07:21:05 所属栏目：教程来源：皮丽华

导读：副标题#e# 【评论】出身名门雅虎的Hortonworks拥有许多优秀的Hadoop架构师与源代码的贡献者，它们为Apache Hadoop项目贡献了超过80%的源代码。随着各种Hadoop发行版的涌现，Hortonworks如何能一枝独秀，坚持自己百分之百的开源路线呢?本期IT名人堂嘉宾，我

　　Jeff Markham:Those were very common terms used to associate with Hadoop and Big Data, the V words, right?what I think is instead of trying to simplify what Hadoop is all about through those particular words, I think once again, we gotta see a shift more in… in terms of simplifying the distributions. Simplifying the distributions, so that we can take that data in… into the Hadoop ecosystem in a number of different ways.Technologies like Storm, technologies like Spark…

　　When this time, uh…, I was here this time last year. None of these technologies were very key in anybody’s presentation last year. Yeah, these are the technologies that are not only key, they’re virtually requirements in all the modern Hadoop architectures that we see today. So, er…, one of the things that I do see is that the components. The different individual features of each distribution are only going to grow in number; each individual component will grow in terms of its own functionality and importance to that particular distribution and… erm… I just think that you gotta see a shift again away from conversation towards the… the details of these components and more towards ease of use… erm… for the operation team, ease of use for the users, the analysts, erm… How do I address my specific used cases? I think that is the conversation that people are gotta have today, erm… going forward in 2015 when it comes to Hadoop.

　　PiPi:In my opinion, big data or Hadoop is used to turn raw data into US dollars or RMB, but data is valuable and sensible, so how can we keep it safe while data mining?

　　Jeff Markham:That’s a great question, you know. Again, I think that really relates to… uh… the issue that I just mentioned before that we are gotta give away from who does what query 10 seconds faster, 5 seconds faster, and I think we are gotta look at the entire distribution holistically particularly in the… the area of security. The area of security has always been contrarily popular believe a huge area of focus in the Hadoop community. What we have done in Hortonworks is we have an acquisition of a company, called XA Secure that we put into open source in Apache software foundation as a project called Apache Ranger. What Apache Ranger does, in combination with a lot of security features that are… are starting to appear in the core Hadoop projects themselves is provide a comprehensive security suite for the Hadoop distribution. So, instead of having different slyvo (07:10) security for each component, instead of having fragmented security across each individual distribution, we have done for the first time, is make available in pure open source, a comprehensive security suite, no matter where your data is stored in the Hadoop cluster that you have, whether it’s in a Hive table, HBase table, HDFS itself. That data can be secured in a four- comprehensive manner using the Apache Ranger project. The Hortonworks Data Platform is the only platform, is the only distribution to feature this, and again, it is pure open source.

　　PiPi:Although hadoop is so popular but seldom people are using the straight Apache distribution.We notice,there are several Hadoop distributions that emerge right now, including Cloudera,IBMMIcrosoftHortonWorksAmazon .Why do so many distributions emerge now ?How do you see distribution market shaping up?

　　Jeff Markham:Well, I think… First of all, let me answer… answer the first question about the different distributions and different vendors. When you say people are using… are not using the plain… correct… the plain standard Apache distribution. In fact, if you are using the Hortonworks Data Platform, you are using the pure Apache Software Foundation distributions.

　　That’s pure open source. Hortonworks doesn’t believe in… in providing any proprietary software, providing any walk-in toward any customer that might want to use Hadoop. We believe is that open source gives you the best value; gives you the best innovation; gives you the best technology for your data center. So, what we do is we do all our work inside the Apache Software Foundation. We have zero code that we have is the proprietary. So, somebody is using the Hortonworks Data Platform, they are in fact using pure Apache Software Foundation projects. Thus that, secondly, what I would think about the other distributions… I have a lot of respect for these other distributions. They… they do a lot to advance the cost of Hadoop, but a lot of distributions have done besides Hortonworks is take some of the core open source projects and then add proprietary products around it. For example, in Cloudera, we see some products like the Cloudera Manager, Cloudera Navigator, things that are close-sourced proprietary products that are addressed in the open source world, the MBuy (10:00) project, the Apache Falcon project. These are projects that address the used cases in Cloudier Manager, Cloudier Navigator, and more, yet are pure open source. That’s our philosophy. Our philosophy is what we need to do… uh… to advance the Hadoop ecosystem, we need to do it in a pure open source. Otherwise, the distributions become fragmented; otherwise, we have a situation… uh… like we had with Unix. Well, we have many flavors and no one standard because there was no one company to enforce the pure open nature of that project. With Hadoop, that one company that enforces the pure open nature of the entire Hadoop eco-system is Hortonworks and Hortonworks only. There is no other company that ships 100% pure open source, only Hortonworks does that.

　　PiPi:What would you say to Chinese CTO who works on Hadoop and big data?

　　Jeff Markham:My advice is this. When we work in the Apache software foundation putting all our code out there, what we do everyday as an engineering team, is make sure that as we build core Hadoop, we leverage your existing skillset, we leverage your existing investments in the products you have and your data center. So, if you have Oracle, if you have Microsoft SQL Server, if you have Teradata, if you have SAS, Tabblo, SpaFire (11:38), whatever it is, we want you to be able to use Hadoop and integrate with your existing investments and technologies that you have, and be able to leverage your existing skillset. So, my advice to the CTL is to put Hadoop into your data center, integrated with the products that you already have because we are open source. It’s likely that we have a partnership with whatever the technology is that you are using today. So, we want your existing analytic software package that you are using… uh… to be continued to be used today. And the only thing your analysts know is that they are analysing more data and more different kinds of data. That’s the perfect state for us with Hadoop is that the end users may not even know they are actually using Hadoop. All they know is they are using even more data and more different types of data.

　　PiPi: Could you give some advice for those people who want to start using Hadoop?

　　Jeff Markham:For individuals, I’d say the best way to get started is to go to Hortonworks.com and download the Sandbox. The Sandbox is a single virtual machine that people can use… free-use on their desktop right away. They can use it with their VMware; they can use it with VirtualBox; they can use it with HyperBeam; they can use it on Windows; they can use it on Mac. They can get started right away. Download the Sandbox, then follow along a lot of different tutorials — tutorials on how to use Hive; how to use Pig (13:20); what is manual produce (13:21); what is Ranger; how do I start configuring Ranger to secure my entire ecosystem; how do I start using embarry (13:30) to manage, monitor… uh… my cluster. All these things you are able to do with the Hortonworks Sandbox. Free of charge, download it today and you can start tutorial. It’s a free tutorial that we have for you to start becoming familiar with it right away. A lot of our partners also have tutorials available on our website. So, for example, if you are an application developer… uh… we have a partnership with Cascading, so you can start using the Cascading’s framework… uh… to start building your Hadoop-based application to that.

　　PiPi: That’s all!Thanks very much!Thanks for my interview.

　　Jeff Markham:Thank you.

如果您想了解更多嘉宾采访，请关注我们名人堂栏目：http://www.itpub.net/star/

（编辑：西安站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

3/5

首页

尾页