Pregel+: A Distributed Graph Computing Framework with Effective Message Reduction

 

Pregel+ is not just another open-source Pregel implementation, but a substantially improved distributed graph computing system with effective message reduction. Compared with existing Pregel-like systems, Pregel+ provides simpler programming interface and yet achieves higher computational efficiency. We give ample examples and detailed tutorials to demonstrate how to program in Pregel+ and deploy Pregel+ in a distributed environment. Pregel+ is also a better choice for researchers who want to change the system to support new functionalities, as the design of Pregel+ is much simpler and more flexible than most other Pregel-like systems.

Pregel+ supports two effective message reduction techniques: (1)vertex mirroring and (2)a new request-respond paradigm. These techniques not only reduce the total number of messages exchanged through the network, but also bound the number of messages sent/received by any vertex, especially for processing power-law graphs and (relatively) dense graphs.

Performance Highligts

●  Running Hash-Min, an algorithm for computing connected components of a small-diameter graph, over the BTC dataset in a cluster with 1 master machine and 15 slave machines each running 8 workers (here, "workers" refer to "processes" for Pregel+ and GPS, and "threads" for GraphLab and Giraph).

Pregel+ Pregel+ with Mirroring Giraph GraphLab Sync GraphLab ASync GPS GPS LALP
27 seconds 10 seconds* 93 seconds 83 seconds 155 seconds 38 seconds 33 seconds

 

(Number of messages sent by each worker using Pregel+, blue bars — ordinary mode, red bars — mirroring)

 

●  Running S-V, an algorithm for computing connected components of a large-diameter graph, over the USA road network in a cluster with 1 master machine and 15 slave machines each running 4 workes. GraphLab is inapplicable since it does not support vertex communication with a non-neighbor, which is required by S-V.

Pregel+ Pregel+ with Request-Respond Paradigm Giraph GraphLab GPS
262 seconds 138 seconds* 690 seconds inapplicable 190 seconds

 

(Number of messages sent by each worker using Pregel+, blue bars — ordinary mode, red bars — request-respond paradigm)