Writing Gtimer Application Programs

 

This webpage presents the concrete APIs for users when writing application codes. We assume that users are already familiar with the basic concepts of the system; if not, we recommend you to read the system architecture overview first.

 

Gtimer's High-level APIs

Gtimer provides a set of basic operators that are fundamental and critical for temporal graph analysis.

We list Gtimer's High-level APIs below.

string out_nb(int u, int t1, int t2);

//return u's out-neighbors within time interval [t1, t2]


string in_nb(int u, int t1, int t2);

//return u's in-neighbors within time interval [t1, t2]


string out_edge(int u, int t1, int t2);

//return u's out-edges within time interval [t1, t2]


string in_edge(int u, int t1, int t2);

//return u's in edges within time interval [t1, t2]


string reachability(int u, int v, int t1, int t2);

//return whether u can reach v within time interval [t1, t2] with online search


string reachability_topChain(int u, int v, int t1, int t2);

//return whether u can reach v within time interval [t1, t2] with TopChain labels pruning


string earliest_topChain(int u, int v, int t1, int t2);

//return earliest arrival time from u to v within time interval [t1, t2] with TopChain labels pruning


string earliest(int u, int t1, int t2);

//return earliest arrival time from vertex u within time interval [t1, t2]


string fastest(int u, int t1, int t2);

//return fastest time from vertex u within time interval [t1, t2]


string shortest(int u, int t1, int t2);

//return temporal shortest path distance from vertex u within time interval [t1, t2]


string latest(int u, int t1, int t2);

//return latest departure time to vertex u within time interval [t1, t2]


string topk_earliest(int u, int k, int t1, int t2);

//return top-k vertices with minimum earliest arrival time from vertex u within time interval [t1, t2]


string topk_fastest(int u, int k, int t1, int t2);

//return top-k vertices with minimum fastest time from vertex u within time interval [t1, t2]


string topk_shortest(int u, int k, int t1, int t2);

//return top-k vertices with minimum temporal shortest distance from vertex u within time interval [t1, t2]


string topk_latest(int u, int k, int t1, int t2);

//return top-k vertices with maximum latest departure time to vertex u within time interval [t1, t2]


string khop(int u, int k, int t1, int t2);

//return vertices that can be reached from vertex u in k steps within time interval [t1, t2]


string intersect(int u, int v, int t1, int t2);

//return intersect set of u's out-neighbors and v's in-neighbors within time interval [t1, t2]


string middle(int u, int v, int t1, int t2);

//return the set of middle vertex from u to v with 2 hops within time interval [t1, t2]


void addedge(int u, int v, int t1, int t2);

//add an edge (u,v,t1,t2)

To write a user program using these APIs, users simply need to write a cpp file in "frontend/" directory. First, users need to include the header "GtimerOperator.h". Then, user can create an GtimerOperator object with the output path as an argument.

GtimerOperator op("/output_path");

With this object, users can invoke GtimerOperator's member functions to analyse a temporal graph. Here is an example to get vertex 0's out neighbors.

string s = op.out_nb(0);

Note that all the high-level APIs will return a string type which specifies where the result is stored in HDFS.

Users can use resultVector type to define a new variable and load the result of the query executed before.

resultVector res = op.loadResult(s);

The result will be loaded and stored in res variable. Function printResult will print the result to console.

op.printResult(res);

Basically, resultVector is a vector <vector<int> > type. We use this design for the generality and expressiveness of our system. For example, for a query such as query out_nb, the result returned is a list of vertex id. However, for another query such as query earliest, the return value is a list of vertex id together with the earliest arrival time of that vertex. In this case, we need to return a two-columns array as the result. Therefore, we define resultVector as vector <vector<int> > type, users can handle the result according to his/her demands.

Below is a complete example to get vertex 0's out-neighbors.

#include "GtimerOperator.h"
using namespace std;
int main()
{
  GtimerOperator op("/output_path");
  string s1 = op.out_nb(0);
  resultVector res = op.loadResult(s1);
  op.printresult(res);
}

 

User-defined Operators

Besides the high-level operators natively provided by Gtimer, we also allow users to add user-defined vertex-centric APIs to our system. They only need to modify two files: tgs/Gtimer.cpp and ol/global_ol.h.

Take earliest-arrival query as an example in the following sections. Earliest-arrival query asks for the earliest arrival time from source vertex u to every other vertex. You can check our paper for details.

First, users need to define a constant integer variable in ol/global_ol.h for the new operator. Make sure that this integer is not occupied by other operators. We define EARLIEST_QUERY as 7.

#define EARLIEST_QUERY 7

Next, in tgs/Gtimer.cpp file, we need to modify four places.

1. We need to add code in void temporalWorker::task_init() function to specify which node should be activated initially when the query comes. In this case, we only need to activate the source vertex.

case EARLIEST_QUERY:
{
  int src = queryContainer[1];
  int pos = get_vpos(src);
  if (pos != -1) activate(pos);
}
break;

We also need to tell whether the operator will use the combiner. As default, the combiner is not used. To use the combiner, we add the following code.

case EARLIEST_QUERY:
{
  TaskT& task=*(TaskT*)query_entry();
  task.useCombiner = 1;
}
break;

2. For the new operator, we need to define the initial value for every vertex. We only need to add code to the void temporalVertex::init_value(vector& query) function which specifies the initial value for every vertex. The value can be accessed and modified by qvalue() function later.

For example, in EARLIEST_QUERY, we want each vertex to keep a value representing the earliest-arrival time from the source vertex. So, initially, when a vertex is visited, we will set its earliest-arrival time as inf(i.e., infinity). Later, the vertex-centric program will modify the value during the computation process and finally the value converges when the program terminates.

case EARLIEST_QUERY:
  ret.push_back(inf);
break;

3. The most important part is to decide the computing logic for the new operator. We need to modify the function void temporalVertex::compute(MessageContainer& messages) to include the computing logic of the new operator. The computing logic of earliest-arrival query is listed below.

case EARLIEST_QUERY:
{
  int t1 = 0; int t2 = inf;
  if (queryContainer.size() == 4) {t1 = queryContainer[2]; t2 = queryContainer[3];}
  if (superstep() == 1)
  {
    std::map::iterator it,it1,it2;
    it1 = mOut.lower_bound(t1);
    it2 = mOut.upper_bound(t2);
    int idx;
    vector send(1);
    for (it = it1; it != mOut.end() && it != it2; ++ it)
    {
      idx = it->second;
      for (int j = 0; j < VoutNeighbors[idx].size(); ++ j)
      {
        send[0] = Vout[idx] + VoutNeighbors[idx][j].w;
        send_message(VoutNeighbors[idx][j].v, send);
      }
    }
    qvalue()[0] = t1;
  }
  else
  {
    int mini = inf;
    int& lastT = qvalue()[0];
    for (int i = 0; i < messages.size(); ++ i)
    {
      if (messages[i][0] < mini) mini = messages[i][0];
    }
    if (mini < lastT)
    {
      std::map::iterator it;
      int start, end;
      it = mOut.lower_bound(mini);
      if (it != mOut.end()) {
        start = it->second;
        end = min(lastT, t2);
        vector send(1);
        for (int i = start; i < Vout.size() && Vout[i] < end; ++ i)
        {
          for (int j = 0; j < VoutNeighbors[i].size(); ++ j)
          {
            send[0] = Vout[i] + VoutNeighbors[i][j].w;
           send_message(VoutNeighbors[i][j].v, send);
          }
        }
      }
      lastT = mini;
    }
  }
  vote_to_halt();
  break;
}

4. Finally, we need to define the format of the query result. For earliest-arrival query, for every vertex visited, we output its vertex id and earliest-arrival time from the source vertex. The earliest-arrival time is stored in qvalue()[0] in the EARLIEST_QUERY.

case EARLIEST_QUERY:
{
  if (vertex->id == task.query[1]) sprintf(buf, "%d %d\n", vertex->id, 0);
  else sprintf(buf, "%d %d\n", vertex->id, vertex->qvalue()[0]);
  writer.write(buf);
  break;
}