WorldCIST'15 - 3rd World Conference on Information Systems and Technologies

Full Program »

Towards Reusing Data Cleaning Knowledge

The organizations’ demand to integrate several disparate data sources and an ever-increasing amount of data is intensifying the occurrence of data quality problems. Currently, data cleaning approaches are tailored for data sources having different schemas but sharing the same data model (e.g. relational model) and are highly dependent on a domain expert to specify data cleaning operations. This paper presents a novel and generic data cleaning methodology aiming to assist the domain expert during the specification of data cleaning operations through reusing knowledge previously expressed for other data sources even if those sources have different data models and/or schemas. This is achieved by abstracting data source models and schemas to a closer human level and by the use of a vocabulary to describe the structure and the semantics of data cleaning operations.

Author(s):

Ricardo Almeida    
ISEP-IPP
Portugal

Paulo Maio    
ISEP-IPP
Portugal

Paulo Oliveira    
ISEP-IPP
Portugal

João Barroso    
UTAD
Portugal

 

Powered by OpenConf®
Copyright ©2002-2013 Zakon Group LLC