Hub2-Labeling

See Section 5.1.2 of the Quegel technical report for the algorithms of Hub2-Labeling. The algorithm consists of the following steps:

Step 1: Hub Selection. The first step is to select k vertices with the highest degree. One may use Terasort to sort vertices by degree, and here is the code for an undirected graph. For a directed graph, one may use this code, which contains three modes INDEGREE_ONLY, OUTDEGREE_ONLY and INOUT_DEGREE as described in our technical report. These programs should be compiled with Quegel system code and run like indicated here), and both programs dump sorted vertices to HDFS path "/sort".

After sorting vertices by degree, one may pick the first k vertices as hubs. One may run this Java code which reads HDFS path "/sort" and generates k hubs into a local file "top_vertices.txt". The easiest way to run the Java program is to create a Java project in Eclipse, and include the relevant .jar libraries into the project. For example, in Hadoop 1.2.1, you should include all .jar files under $HADOOP_HOME and $HADOOP_HOME/lib, while in Hadoop 2.6.0, you should include all .jar files under $HADOOP_HOME/share/hadoop/hdfs, $HADOOP_HOME/share/hadoop/hdfs/lib, $HADOOP_HOME/share/hadoop/common and $HADOOP_HOME/share/hadoop/common/lib.

Step 2: Indexing. The second step is to run k BFS-style queries using the k hubs selected, to compute the core-hub list of each vertex. For this goal, one should compile the server program using this code for undirected graphs or this code for directed graphs, where one should set the variable "deg_th" as the degree of the k-th hub (as can be looked up from the generated hub file "top_vertices.txt"). The client program should be the batch-file version, and the hub file "top_vertices.txt" should be the client input file. See here for how to run the server and client programs.

There are two outputs. One is the query output under HDFS path "/ol_out", where each subfolder refers to the results dumped for one querying hub. We need to postprocess it using this Java code and then this Java code. The other output is under HDFS path "/ol_index", which is the dumped graph where each vertex is associated with its core-hubs. Finally, we merge these two outputs so that each hub vertex is associated with its distance to all other hubs while each non-hub vertex is only associated with its core-hubs, using this code for undirected graphs or this code for directed graphs (compiled with Quegel system code and run like indicated here).

Step 3: Querying. Finally, one may perform shortest path distance querying over the indexed graph. Use this server-side code for undirected graphs or this server-side code for directed graphs. See here for how to run the server and client programs.