Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Ibrahim  EL-Sanosi; Paul  Ezhilchelvan

Research Article

Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Download1166 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.10-4-2018.154455,
    author={Ibrahim  EL-Sanosi and Paul  Ezhilchelvan},
    title={Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes},
    journal={EAI Endorsed Transactions on Energy Web and Information Technologies},
    volume={5},
    number={17},
    publisher={EAI},
    journal_a={EW},
    year={2018},
    month={4},
    keywords={Apache ZooKeeper, Atomic Broadcast, Crash-Tolerance, Server Replication, Protocol Latency, Throughput, PerformanceEvaluation},
    doi={10.4108/eai.10-4-2018.154455}
}

Ibrahim EL-Sanosi
Paul Ezhilchelvan
Year: 2018
Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes
EW
EAI
DOI: 10.4108/eai.10-4-2018.154455

Ibrahim EL-Sanosi^1,2^,*, Paul Ezhilchelvan²

1: FacultyofInformationTechnology,SebhaUniversity,Sebha,Libya
2: School of Computing Science,Newcastle University,Newcastle Upon Tyne,UK

*Contact email: i.elsanosi@sebhau.edu.ly

Abstract

Operating at the core of the highly-available ZooKeeper system is the ZooKeeper atomic broadcast (Zab) for imposing a total order on service requests that seek to modify the replicated system state. Zab is designed with the weakest assumptions possible under crash-recovery fault model; e.g., any number - even all - of servers can crash simultaneously and the system will continue or resume its service provisioning when a server quorum remains or resumes to be operative. Our aim is to explore ways of improving Zab performance without modifying its easy-to-implement structure. To this end, we first assume that server crashes are independent and a server quorum remains operative at all time. Under these restrictive, yet practical, assumptions, we propose three variations of Zab and do performance comparison. The first variation orders excellent performance but can be only used for 3-server systems; the other two do not have this limitation. One of them reduces the leader overhead further by conditioning the sending of acknowledgements on the outcomes of coin tosses. Owing to its superb performance, it is re-designed to operate under the least-restricted Zab fault assumptions. Further performance comparisons confirm the potential of coin-tossing in ordering performances better than Zab, particularly at high workloads.

Keywords: Apache ZooKeeper, Atomic Broadcast, Crash-Tolerance, Server Replication, Protocol Latency, Throughput, PerformanceEvaluation

Received: 2017-12-15
Accepted: 2018-01-05
Published: 2018-04-10
Publisher: EAI

: http://dx.doi.org/10.4108/eai.10-4-2018.154455

Copyright©2018IbrahimEL-SanosiandPaulEzhilchelvan,licensedtoEAI.Thisisanopenaccessarticledistributedundertheterms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimiteduse,distributionandreproductioninanymediumsolongastheoriginalworkisproperlycited.