Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

Cheng, Bowen; Xiao, Rong; Guo, Yandong; Hu, Yuxiao; Wang, Jianfeng; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:1809.06131 (cs)

[Submitted on 17 Sep 2018]

Title:Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

Authors:Bowen Cheng, Rong Xiao, Yandong Guo, Yuxiao Hu, Jianfeng Wang, Lei Zhang

View PDF

Abstract:We study in this paper how to initialize the parameters of multinomial logistic regression (a fully connected layer followed with softmax and cross entropy loss), which is widely used in deep neural network (DNN) models for classification problems. As logistic regression is widely known not having a closed-form solution, it is usually randomly initialized, leading to several deficiencies especially in transfer learning where all the layers except for the last task-specific layer are initialized using a pre-trained model. The deficiencies include slow convergence speed, possibility of stuck in local minimum, and the risk of over-fitting. To address those deficiencies, we first study the properties of logistic regression and propose a closed-form approximate solution named regularized Gaussian classifier (RGC). Then we adopt this approximate solution to initialize the task-specific linear layer and demonstrate superior performance over random initialization in terms of both accuracy and convergence speed on various tasks and datasets. For example, for image classification, our approach can reduce the training time by 10 times and achieve 3.2% gain in accuracy for Flickr-style classification. For object detection, our approach can also be 10 times faster in training for the same accuracy, or 5% better in terms of mAP for VOC 2007 with slightly longer training.

Comments:	tech report
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1809.06131 [cs.CV]
	(or arXiv:1809.06131v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1809.06131

Submission history

From: Bowen Cheng [view email]
[v1] Mon, 17 Sep 2018 11:23:33 UTC (1,158 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators