Please use this identifier to cite or link to this item: http://hdl.handle.net/10071/25098
Author(s): Batista, F.
João P. Carvalho
Editor: Adnan Yazici, Nikhil R. Pal, Uzat Kaymak
Date: 2015
Title: Text based classification of companies in CrunchBase
Event title: IEEE International Conference on Fuzzy Systems
ISSN: 1544-5615
ISBN: 978-1-4673-7428-6
DOI (Digital Object Identifier): 10.1109/FUZZ-IEEE.2015.7337892
Keywords: Text classification
Fuzzy fingerprints
Text mining
Crunchbase
Document classification
Abstract: This paper introduces two fuzzy fingerprint based text classification techniques that were successfully applied to automatically label companies from CrunchBase, based purely on their unstructured textual description. This is a real and very challenging problem due to the large set of possible labels (more than 40) and also to the fact that the textual descriptions do not have to abide by any criteria and are, therefore, extremely heterogeneous. Fuzzy fingerprints are a recently introduced technique that can be used for performing fast classification. They perform well in the presence of unbalanced datasets and can cope with a very large number of classes. In the paper, a comparison is performed against some of the best text classification techniques commonly used to address similar problems. When applied to the CrunchBase dataset, the fuzzy fingerprint based approach outperformed the other techniques.
Peerreviewed: yes
Access type: Open Access
Appears in Collections:IT-CRI - Comunicações a conferências internacionais

Files in This Item:
File Description SizeFormat 
conferenceobject_24671.pdfVersão Submetida332,42 kBAdobe PDFView/Open


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis Logotipo do Orcid 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.