News and Insights

advice and updates for IT professionals and employers.

Microsoft Embraces Apache Hadoop

If you’re an advocate of open source and/or have experience with Big Data, you’ve very probably heard of Apache Hadoop. In a very small nutshell, Apache Hadoop is a framework that supports very large volumes of data. In a slightly larger nutshell, it’s “a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.” The open source project has gained a large following among developers, to the point that some of the major contributors have gotten into arguments over who contributed more.

This sort of distributed framework catered to very large data sets will become increasingly important as cloud solutions continue to develop and gain traction, and Microsoft is well aware of this. Yesterday Ted Kummert gave a PASS Summit 2011 keynote address,announcing that Microsoft will be utilizing distributions based on Apache Hadoop for both Windows Server and Windows Azure.

As no keynote is complete without sweeping predictions, Ted concluded; “Imagine if everyone, regardless of what type of data frameworks or platforms they use, could achieve deep business insights by amassing and analyzing enormous amounts of data not just from their own organization, but from all over the world using a global data marketplace.”

Mary Jo Foley of ZDNet reports that a CTP test build for Windows Azure is slated for delivery by the end of 2011, and that the Windows Server build will come “sometime in 2012.” Compatibility for major tools such as Excel, PowerPivot, and PowerView have been confirmed.

This commitment to the open source platform has many advocates excited, both inside and outside Microsoft. Not surprisingly, this incudes Microsoft’s Senior Director for Open Source Communities, Gianugo Rabellino. He goes into some detail about the project on the Port 25 blog, who discusses a few more specifics. “The Hadoop based service for Windows Azure will allow any developer or user to submit and run standard Hadoop jobs directly on the Azure cloud with a simple user experience.”