Java Pig UDF to Order a String separated by any delimiter

java pig udf

So, let’s say you have a string (chararray) field in Apache PIG that contains several values divided with a bar delimiter “|” such as “Red|Green|Blue”. There might be an occasion when you need sort this scenario to “Blue|Green|Red”. This Pig UDF will help you do that. So, here is a Java Pig UDF to Sort a delimited String.

Pig Script

This Pig script loads a file with 2 fields. The second field comes with the scenario above (“Red|Green|Blue”). In Alias “B”, field2 gets transformed to (“Blue|Green|Red”).

Java Pig UDF to sort a delimited string

This UDF sorts a delimited string. It accepts 2 variables from the Pig UDF. The first value is the actual string that is going to be sorted. The second, is the delimiter that the first value contains.

We could have made this UDF smarter to recognize which delimiter is used, but for illustration purposes, we are just passing it to the Java UDF as a second variable.

Once the process reaches the UDF, it is much easier to manipulate the data. You can use any Java libraries available and manipulate the data as you wish. This is one of the great advantages to use UDFs.

PigUnit Tests

Since this Pig UDF is for illustration purposes, it is only tested using 2 delimiters: Commas and bar delimiters.