Full Program »
Automating Complaints Processing in the Food and Economic Sector: A Classification Approach
Text categorization is a supervised learning task which aims to assign labels to documents based on the predicted outcome suggested by a classifier trained on a set of labelled documents. The association of text classification to facilitate labelling reports/complaints in the economic and health-related fields can have a tremendous impact in the speed at which these are processed, and therefore, lowering the required time to act upon these complaints and reports. In this work, we aim to classify complaints into the main 4 economic activities given by the Portuguese Economic and Food Safety Authority. We evaluate the classification performance of 9 algorithms (Complement Naïve Bayes, Bernoulli Naïve Bayes, Multinomial Naïve Bayes, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, AdaBoost and Logistic Regression) at different layers of text pre-processing. Results reveal high levels of accuracy, roughly around 85%. It was also observed that the linear classifiers (support vector machine and logistic regression) allowed us to obtain higher f1-measured values than the other classifiers in addition to the high accuracy values revealed. It was possible to conclude that the use of these algorithms is more adequate for the data selected and that applying text classification methods can facilitate and help the complaints and reports processing which, in turn, leads to a swifter action by authorities in charge. Thus, relying on text classification of reports and complaints can have a positive influence in either economic crime prevention or in public health, in this case, by means of food-related inspections.