ew 18(17): e11

Research Article

Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes

Download1166 downloads
  • @ARTICLE{10.4108/eai.10-4-2018.154455,
        author={Ibrahim  EL-Sanosi and Paul  Ezhilchelvan},
        title={Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes},
        journal={EAI Endorsed Transactions on Energy Web and Information Technologies},
        volume={5},
        number={17},
        publisher={EAI},
        journal_a={EW},
        year={2018},
        month={4},
        keywords={Apache ZooKeeper, Atomic Broadcast, Crash-Tolerance, Server Replication, Protocol Latency, Throughput, PerformanceEvaluation},
        doi={10.4108/eai.10-4-2018.154455}
    }
    
  • Ibrahim EL-Sanosi
    Paul Ezhilchelvan
    Year: 2018
    Improving ZooKeeper Atomic Broadcast Performance When a Server Quorum Never Crashes
    EW
    EAI
    DOI: 10.4108/eai.10-4-2018.154455
Ibrahim EL-Sanosi1,2,*, Paul Ezhilchelvan2
  • 1: FacultyofInformationTechnology,SebhaUniversity,Sebha,Libya
  • 2: School of Computing Science,Newcastle University,Newcastle Upon Tyne,UK
*Contact email: i.elsanosi@sebhau.edu.ly

Abstract

Operating at the core of the highly-available ZooKeeper system is the ZooKeeper atomic broadcast (Zab) for imposing a total order on service requests that seek to modify the replicated system state. Zab is designed with the weakest assumptions possible under crash-recovery fault model; e.g., any number - even all - of servers can crash simultaneously and the system will continue or resume its service provisioning when a server quorum remains or resumes to be operative. Our aim is to explore ways of improving Zab performance without modifying its easy-to-implement structure. To this end, we first assume that server crashes are independent and a server quorum remains operative at all time. Under these restrictive, yet practical, assumptions, we propose three variations of Zab and do performance comparison. The first variation orders excellent performance but can be only used for 3-server systems; the other two do not have this limitation. One of them reduces the leader overhead further by conditioning the sending of acknowledgements on the outcomes of coin tosses. Owing to its superb performance, it is re-designed to operate under the least-restricted Zab fault assumptions. Further performance comparisons confirm the potential of coin-tossing in ordering performances better than Zab, particularly at high workloads.