Localizing Moments in Video with Natural Language

Hendricks, Lisa Anne; Wang, Oliver; Shechtman, Eli; Sivic, Josef; Darrell, Trevor; Russell, Bryan

Computer Science > Computer Vision and Pattern Recognition

arXiv:1708.01641 (cs)

[Submitted on 4 Aug 2017]

Title:Localizing Moments in Video with Natural Language

Authors:Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, Bryan Russell

View PDF

Abstract:We consider retrieving a specific temporal segment, or moment, from a video given a natural language text description. Methods designed to retrieve whole video clips with natural language determine what occurs in a video but not when. To address this issue, we propose the Moment Context Network (MCN) which effectively localizes natural language queries in videos by integrating local and global video features over time. A key obstacle to training our MCN model is that current video datasets do not include pairs of localized video segments and referring expressions, or text descriptions which uniquely identify a corresponding moment. Therefore, we collect the Distinct Describable Moments (DiDeMo) dataset which consists of over 10,000 unedited, personal videos in diverse visual settings with pairs of localized video segments and referring expressions. We demonstrate that MCN outperforms several baseline methods and believe that our initial results together with the release of DiDeMo will inspire further research on localizing video moments with natural language.

Comments:	ICCV 2017
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1708.01641 [cs.CV]
	(or arXiv:1708.01641v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1708.01641

Submission history

From: Lisa Anne Hendricks [view email]
[v1] Fri, 4 Aug 2017 18:57:52 UTC (9,242 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-08

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Localizing Moments in Video with Natural Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Localizing Moments in Video with Natural Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators