WorldCIST'15 - 3rd World Conference on Information Systems and Technologies

Full Program »

A Data Preparation Methodology in Data Mining Applied to Mortality Population Databases

Data mining purpose is to explore large databases in order to discover unknown patterns; however, the quality of these patterns depends on a correct data preparation. Currently, data mining methodologies are of general purpose and do not provide the sufficient level of detail for their direct application in data mining projects. Traditionally, generic methodologies like CRISP-DM have been used, but these methodologies do not provide the layer that links it to specific domain applications. As a consequence, data mining process requires more time to be developed. Particularly, the data preparation is the most time consuming phase, between 50% or up to 70% of the total time project. This paper proposes a data preparation methodology, based on CRISP-DM process model, with a higher level of detail, applied to a data mining project in the epidemiological domain and, that could be partially used in other domains. In order to validate the proposed methodology, official census databases of 2000, with mortality population records, were used. Additionally, as a result of using a case of study with real data, we have obtained findings of potential interest for the institutions responsible of public health services in Mexico.

Author(s):

Joaquín Pérez|    
CENIDET
Mexico

Emmanuel Iturbide    
CENIDET
Mexico

Miguel Hidalgo    
CENIDET
Mexico

 

Powered by OpenConf®
Copyright ©2002-2013 Zakon Group LLC