Full Program »
Filtering Users Accounts for Enhancing the Results of Social Media Mining Tasks
Filtering out the illegitimate Twitter accounts for online social media mining tasks reduces the noise and thus improves the quality of the outcomes of those tasks. Developing a supervised machine learning classifier requires a large annotated dataset. While building the annotation guidelines, the rules were found suitable to develop an unsupervised rule-based classifying program. However, despite its high accuracy, the performance of the rule-based program was not time efficient. So, we decided to use the unsupervised rule-based program to create a massive annotated dataset to build a supervised machine learning classifier, which was found to be fast and matched the unsupervised classifier performance with an F-Score of 92%. The impact of removing those illegitimate accounts on an influential users identification program developed by the authors, was investigated. There were slight improvements in the precision results but not statistically significant, which indicated that the influential user program didn’t identify erroneously spam accounts as influential.