sesa 19(19): e4

Research Article

Differentially Private High-Dimensional Data Publication via Markov Network

Download1291 downloads
  • @ARTICLE{10.4108/eai.29-7-2019.159626,
        author={Wei Zhang and Jingwen  Zhao and Fengqiong  Wei and Yunfang  Chen},
        title={Differentially Private High-Dimensional Data Publication via Markov Network},
        journal={EAI Endorsed Transactions on Security and Safety},
        volume={6},
        number={19},
        publisher={EAI},
        journal_a={SESA},
        year={2019},
        month={1},
        keywords={Differential privacy, High-dimensional, Data publication, Markov network},
        doi={10.4108/eai.29-7-2019.159626}
    }
    
  • Wei Zhang
    Jingwen Zhao
    Fengqiong Wei
    Yunfang Chen
    Year: 2019
    Differentially Private High-Dimensional Data Publication via Markov Network
    SESA
    EAI
    DOI: 10.4108/eai.29-7-2019.159626
Wei Zhang1,2, Jingwen Zhao1, Fengqiong Wei1, Yunfang Chen1,*
  • 1: School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • 2: Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
*Contact email: chenyf@njupt.edu.cn

Abstract

Differentially private data publication has recently received considerable attention. However, it faces some challenges in differentially private high-dimensional data publication, such as the complex attribute relationships, the high computational complexity and data sparsity. Therefore, we propose PrivMN, a novel method to publish high-dimensional data with differential privacy guarantee. We first use the Markov model to represent the mutual relationships between attributes to solve the problem that the direction of relationship between variables cannot be determined in practical application. We then take advantage of approximate inference to calculate the joint distribution of high-dimensional data under differential privacy to figure out the computational and spatial complexity of accurate reasoning. Extensive experiments on real datasets demonstrate that our solution makes the published high-dimensional synthetic datasets more efficient under the guarantee of differential privacy.