Reconciling Schema Matching Networks Through Crowdsourcing

Nguyen Quoc Viet Hung; Nguyen Thanh Tam; Zoltán Miklós; Karl Aberer

Research Article

Reconciling Schema Matching Networks Through Crowdsourcing

Download1739 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/cc.1.2.e2,
    author={Nguyen Quoc Viet Hung and Nguyen Thanh Tam and Zolt\^{a}n Mikl\^{o}s and Karl Aberer},
    title={Reconciling Schema Matching Networks Through Crowdsourcing},
    journal={EAI Endorsed Transactions on Collaborative Computing},
    volume={1},
    number={1},
    publisher={ICST},
    journal_a={CC},
    year={2014},
    month={10},
    keywords={data integration, schema matching, crowdsourcing, worker assessment, user effort},
    doi={10.4108/cc.1.2.e2}
}

Nguyen Quoc Viet Hung
Nguyen Thanh Tam
Zoltán Miklós
Karl Aberer
Year: 2014
Reconciling Schema Matching Networks Through Crowdsourcing
CC
ICST
DOI: 10.4108/cc.1.2.e2

Nguyen Quoc Viet Hung¹, Nguyen Thanh Tam¹, Zoltán Miklós², Karl Aberer¹

1: École Polytechnique Fédérale de Lausanne
2: Université de Rennes

Abstract

for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, usually human effort is required to validate the generated correspondences. This validation process is often costly, as it is performed by highly skilled experts. Our paper analyzes how to leverage crowdsourcing techniques to validate the generated correspondences by a large group of non-experts. In our work we assume that one needs to establish attribute correspondences not only between two schemas but in a network. We also assume that the matching is realized in a pairwise fashion, in the presence of consistency expectations about the network of attribute correspondences. We demonstrate that formulating these expectations in the form of integrity constraints can improve the process of reconciliation. As in the case of crowdsourcing the user’s input is unreliable, we need specific aggregation techniques to obtain good quality. We demonstrate that consistency constraints can not only improve the quality of aggregated answers, but they also enable us to more reliably estimate the quality answers of individual workers and detect spammers. Moreover, these constraints also enable to minimize the necessary human effort needed, for the same expected quality of results.

Keywords: data integration, schema matching, crowdsourcing, worker assessment, user effort

Received: 2014-07-11
Accepted: 2014-07-23
Published: 2014-10-20
Publisher: ICST

: http://dx.doi.org/10.4108/cc.1.2.e2

Copyright © 2014 Pierre St Juste et al., licensed to ICST. This is an open access article distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.