Apache Pig – PigUnit Test Example

apache pig

PigUnit Example

It is always useful to test your code before actually going into production. PigUnit is a very helpful tool to test your pig scripts in “Local” mode before they are sent to the Production stage. Also, it helps to add code incrementally and assert every alias output. When testing every piece of the Pig Script, you can see the intermediate output from every alias. This will help you understand what Pig (MapReduce behind the scenes) is doing. This includes LOAD, GROUP, COUNT, outputs from an UDF, etc.

These are the records inside sampleFile.txt

123456,hadoop,Big Data,Data Science,09/10/2014
123457,hadoop,Big Data,Data Science,09/11/2014
123458,hadoop,Big Data,Data Science,09/12/2014
123459,hadoop,Big Data,Data Science,09/13/2014
123450,hadoop,Big Data,Data Science,15/09/2014


Pig Script Example

This PIG script counts the number of valid dates from a date field in a file.

To see the User Defined Function “IsDateValid”, please refer to this post: Unit Testing Java UDFs and Pig Scripts