Apache HBase is an open-source, distributed, versioned, sorted map datastore modeled after Google's BigTable. It works on top of Hadoop Distributed File System (HDFS).
Since HBase is written in Java, it can access through a Java API
(like JDBC). HBase also supports RESTful
access. To access HBase using another programming language,
you may refer to Thrift.
To access HBase through HTTP, you may refer to RESTful
access.
In this lab, we will teach you
Since HBase is implemented on top of HDFS, we need to start HDFS before launching HBase.
Open the terminal. Navigate to the home directory of hadoop by the following command:
Start HDFS by the following command:
Open HDFS's WebUI address: http://localhost:50070 in the virtual machine to see if HDFS has started successfully.
Navigate to the home directory of HBase by the following command:
Start HBase by the following command:
Open HBase's WebUI address: http://localhost:16010 in the virtual machine to see if HBase has started successfully.
Connect to the running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install:
Use the create command to create a new table named Contacts with two column families, i.e., Personal and Office. Recall that only Table and Column Family names have to be pre-defined (columns within a column family can be added/deleted dynamically). Also notice that table names, rows, columns all must be enclosed in quote characters.
To insert data into the table, use the put command.
Use the scan command to scan the table. Here, we can see that different row keys can have different subsets of columns specified. For instance, column Personal:Residence_phone is not specified for row key 00002.
To retrieve the data with respect to a single row key, use the get command.
To drop (delete) a table, we need to disable it first (using the delete command), then use the drop command.
HBase is written in Java, no surprise that it has a native Java API. This API can do everything that hbase-shell can do and more. It can be categorized as:
We use an example about Twitter application to go through these two kinds of APIs. Download the code, unzip it, and move the folder to the guest machine through shared folder (see Lab 1). Assume the path of the folder is as follows:
The application administrator is responsible for the table design, which can be defined by answering the following questions in the context of a use case:
In our use case, we mainly use twitter id to access the information of each twitter, so we set twitterId as the row key [answering question (i)]. Furthermore, we create two column families general and user. The former one contains information w.r.t. the twitter itself (e.g., text, created time), and the latter one contains its sender's information (e.g., name, registered state). Such design depends on our access pattern that information belonging to different column families will rarely be accessed together [answering question (ii)]. The file HbaseTableCreator.java creates the twitter table as required.
Execute the following command to compile this code:
The string `~/Programs/hbase/bin/hbase classpath` executes the hbase classpath command to return all the Java libraries that are required to compile the code. Note that the symbol to quote this command is the backtick (`) instead of the single quote (').
Next, run this code as follows:
Log information will appear after running this code. To see whether the table twitter is created successfully, let us open the hbase shell to find the table information using list and describe command:
After the table twitter has been created, the application clients could access the table with CRUD (Create, Retrieve, Update, and Delete) operations. The file HbaseClientExample.java contains the complete code. Within this file, we created a simple Twitter class:
In this class, twitterId serves as the row key. Besides, attributes (or column qualifier) text and createdAt belong to column family general, and name belongs to column family user. Note that in practice, column qualifiers of each column family can be added dynamically. Here we fix them for the ease of illustration.
Next, we walk through the example step-by-step. First of all, in the constructor of class HbaseClientExample, it connects to HBase and saves a reference to the HTableInterface as a class member variable (i.e., twitterTable), which handles any following access to the table twitter in HBase. Then, we illustrate four types of access: (1) create, (2) retrieve, (3) update, and (4) delete.
We create five twitters and insert them into the table. Note that the hbase table is sparse. For example, the value for general:createdAt column is not specified for twitter2.
Here we illustrate two types of retrieve methods:
The output for this portion of code is as follows:
Update is simple. Just re-use the insertTwitter method to overwrite the target cell.
We can delete all values w.r.t. to specific row key. Here we also perform a full table scan to see the effectiveness of deletion.
The output for this portion of code is as follows (after the previous update and delete operations):
Finally, we illustrate how to perform search on the cell values.
The output for this portion of code is as follows:
To see the output as above, please compile and run HbaseClientExample.java in the same way as HbaseTableCreator.java.