Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

Banerjee, Sounak; Majumder, Prasenjit; Mitra, Mandar

Computer Science > Information Retrieval

arXiv:1710.09085 (cs)

[Submitted on 25 Oct 2017]

Title:Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

Authors:Sounak Banerjee, Prasenjit Majumder, Mandar Mitra

View PDF

Abstract:A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated with a substantial cost. They require significantly greater resources to operate. This paper argues against the justification of the higher costs of these algorithms, based on their performance in text classification problems. In order to prove the conjecture, the performance of one of the best dependence models is compared to several well established algorithms in text classification. A very specific collection of datasets have been designed, which would best reflect the disparity in the nature of text data, that are present in real world applications. The results show that even one of the best term dependence models, performs decent at best when compared to other independence models. Coupled with their substantially greater requirement for hardware resources for operation, this makes them an impractical choice for being used in real world scenarios.

Comments:	23 Pages, 16 Figures, 3 Tables, Some Figures at the end of the document because of limiting factors in the Latex format
Subjects:	Information Retrieval (cs.IR); Computation and Language (cs.CL)
MSC classes:	68P20
Cite as:	arXiv:1710.09085 [cs.IR]
	(or arXiv:1710.09085v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1710.09085

Submission history

From: Prasenjit Majumder [view email]
[v1] Wed, 25 Oct 2017 06:26:28 UTC (1,679 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.IR

< prev | next >

new | recent | 2017-10

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sounak Banerjee
Prasenjit Majumder
Mandar Mitra

export BibTeX citation

Computer Science > Information Retrieval

Title:Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators