Create a Hive table in Hue in 5 steps!
In this post, you will learn how to create Hive table from file. First of all, Apache Hive is a very useful data warehouse built on top of Hadoop (HDFS). It allows you to create, manage, and query large datasets that are in distributed storage.
Through a SQL-like language called HiveQL, Apache Hive allows you to access data in a similar way that you would do it on a relational database. Another interesting feature of Hive is that it uses the MapReduce programming model behind the scenes in order to “Map” and “Reduce” the data that is processing.
There are different ways to create hive table from file. You could either do it from a command line or in a more user-friendly way through Hue. In this post, you will learn how to create a hive table from a file through Hue, which is the Web UI for Hadoop.
This example assumes you already created a Database. In this case, we have a database named “sample_db” and we will add one table from a given file already on HDFS.
Steps to create Hadoop Hive table from file
- Go to Metastore Manager and click “Create a new table from a file” on the left side of the screen.
- Fill out the table information as shown on the illustration below.
- Select a file from a location on HDFS.
- Click “Next”.
Import data from file checkbox will be always checked. “Import data from file” checkbox will be always checked. You can uncheck it to create an empty Hive table.
- Hue will automatically detect which delimiter you are using in your file. However, you have the option to change it if you see any discrepancies in the table previews.
- Click “Next” if you think the table and data layout seems correct.
- If your file has a header, ignore this and go to the Step 3.
- Hue will automatically recognize the first record of the file as the header. You have the option to select if you want this default.
- Also, Hue has a nice feature that allows you to “bulk edit” the column names if necessary.
- Hue will also try to recognize the column data type. You can change it according to how you need to view or process this data.
- Click “Create Table”.
Hue tends to be a little buggy when it tries to deal with dates or integers. If possible, make all datatypes “string” and then change the schema on a command line if this happens.
- Verify the table structure is correct.
- Go to the Hive Editor, refresh the database, and query the newly created table in the Query Editor.
In conclusion, creating a Hive table from a file in Hue was easier than anticipated. As long as you have a text delimited file, you can create a Hive table and query it for your data analysis. Please remember that this example contains a very small file. On a real business scenario, you would have much bigger files.
You could find a Hive manual here
To learn more about Hue, click here