Gradle Script for Hadoop Java Projects

After you have selected a Hadoop vendor (Cloudera is used in this example) to work in your organization and all the hardware is installed and ready to go, you (developer) have to start setting up your environments to start working on your team projects. To build and automate your projects in an easier, more readable way, you can use Gradle and add all of Hadoop’s libraries that you would like/need to use.

I am assuming you know what Gradle is and that you use it intensively. The code here will be added in the build.gradle file in your project. Also, Hadoop works better on Linux, so you should definitely have an linux environment to work on your Hadoop projects. If you are using windows, there are ways to hack into this, but I will not touch that on this post.

This is an example that should let you get started. You might have to add/delete dependencies according to what you need. Stuff like this is not very well documented because of the how new Hadoop is in the market.

Things to consider in this Gradle Hadoop Script

Java Version

Java version is 1.7. You could try 1.6 but it might not work properly.

Cloudera Repositories

It is located at ‘https://repository.cloudera.com/artifactory/repo/’

Hadoop Version

In this example, version used is cdh5.1.0. It changes constantly so I used one that worked at the moment I created this.

Dependencies

You will see libraries for Pig, PigUnit, LinkedIn DataFu, PiggyBank, Apache Hadoop and its dependencies.

Gradle Hadoop script

Jars are mainly from Cloudera’s CDH version