[Apache Pig] SAMPLE records from output

apache pig

After you are done writing running any of your Pig script, you might want to select a sample of these outputs (since this could still be large amounts of data)

First, load the records:

Then select the percentage amount of records you want from the mapreduce outputs in ‘/user/crackinghadoop/file_directory’

Delete the existing directory so you can replace it. Store new data into the ‘sample_records’ file directory

PIG sample records from output script

EXTRA: if you desire to concatenate all those sample part files and put them into one file, run this command on your linux hadoop box: