Utilize este identificador para referenciar este registo:
http://hdl.handle.net/10071/20839
Registo completo
Campo DC | Valor | Idioma |
---|---|---|
dc.contributor.author | Marco Felgueiras | - |
dc.contributor.author | Batista, F. | - |
dc.contributor.author | João P. Carvalho | - |
dc.contributor.editor | Lesot, Marie-Jeanne and Vieira, Susana and Reformat, Marek Z. and Carvalho, João Paulo and Wilbik, Anna and Bouchon-Meunier, Bernadette and Yager, Ronald R. | - |
dc.date.accessioned | 2020-11-20T11:05:04Z | - |
dc.date.available | 2020-11-20T11:05:04Z | - |
dc.date.issued | 2020 | - |
dc.identifier.isbn | 978-3-030-50146-4 | - |
dc.identifier.uri | http://hdl.handle.net/10071/20839 | - |
dc.description.abstract | This paper compares different models for multilabel text classification, using information collected from Crunchbase, a large database that holds information about more than 600000 companies. Each company is labeled with one or more categories, from a subset of 46 possible categories, and the proposed models predict the categories based solely on the company textual description. A number of natural language processing strategies have been tested for feature extraction, including stemming, lemmatization, and part-of-speech tags. This is a highly unbalanced dataset, where the frequency of each category ranges from 0.7% to 28%. Our findings reveal that the description text of each company contain features that allow to predict its area of activity, expressed by its corresponding categories, with about 70% precision, and 42% recall. In a second set of experiments, a multiclass problem that attempts to find the most probable category, we obtained about 67% accuracy using SVM and Fuzzy Fingerprints. The resulting models may constitute an important asset for automatic classification of texts, not only consisting of company descriptions, but also other texts, such as web pages, text blogs, news pages, etc. | eng |
dc.language.iso | eng | - |
dc.publisher | Springer International Publishing | - |
dc.relation | UIDB/50021/2020 | - |
dc.rights | openAccess | - |
dc.title | Creating classification models from textual descriptions of companies using crunchbase | eng |
dc.type | conferenceObject | - |
dc.event.title | IPMU 2020: Information Processing and Management of Uncertainty in Knowledge-Based Systems | - |
dc.event.type | Conferência | pt |
dc.event.location | Lisboa | eng |
dc.event.date | 2020 | - |
dc.pagination | 695 - 707 | - |
dc.peerreviewed | yes | - |
dc.journal | Information Processing and Management of Uncertainty in Knowledge-Based Systems | - |
degois.publication.firstPage | 695 | - |
degois.publication.lastPage | 707 | - |
degois.publication.location | Lisboa | eng |
degois.publication.title | Creating classification models from textual descriptions of companies using crunchbase | eng |
dc.date.updated | 2020-11-20T11:01:40Z | - |
dc.description.version | info:eu-repo/semantics/publishedVersion | - |
dc.identifier.doi | 10.1007/978-3-030-50146-4_51 | - |
dc.subject.fos | Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação | por |
dc.subject.fos | Domínio/Área Científica::Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática | por |
dc.subject.fos | Domínio/Área Científica::Humanidades::Línguas e Literaturas | por |
iscte.identifier.ciencia | https://ciencia.iscte-iul.pt/id/ci-pub-72399 | - |
iscte.alternateIdentifiers.scopus | 2-s2.0-85086244630 | - |
Aparece nas coleções: | IT-CRI - Comunicações a conferências internacionais |
Ficheiros deste registo:
Ficheiro | Descrição | Tamanho | Formato | |
---|---|---|---|---|
Felgueiras2020_Chapter_CreatingClassificationModelsFr.pdf | Versão Editora | 827,4 kB | Adobe PDF | Ver/Abrir |
Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.