research-article

Total Cluster: A person agnostic clustering method for broadcast videos

Authors:

Makarand Tapaswi,

Omkar M. Parkhi,

Eric Sommerlade,

Rainer Stiefelhagen,

Andrew ZissermanAuthors Info & Claims

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

Article No.: 7, Pages 1 - 8

https://doi.org/10.1145/2683483.2683490

Published: 14 December 2014 Publication History

Abstract

The goal of this paper is unsupervised face clustering in edited video material – where face tracks arising from different people are assigned to separate clusters, with one cluster for each person. In particular we explore the extent to which faces can be clustered automatically without making an error. This is a very challenging problem given the variation in pose, lighting and expressions that can occur, and the similarities between different people.

The novelty we bring is three fold: first, we show that a form of weak supervision is available from the editing structure of the material – the shots, threads and scenes that are standard in edited video; second, we show that by first clustering within scenes the number of face tracks can be significantly reduced with almost no errors; third, we propose an extension of the clustering method to entire episodes using exemplar SVMs based on the negative training data automatically harvested from the editing structure.

The method is demonstrated on multiple episodes from two very different TV series, Scrubs and Buffy. For both series it is shown that we move towards our goal, and also outperform a number of baselines from previous works.

References

[1]

B. Bhattarai, G. Sharma, F. Jurie, and P. Perez. Some faces are more equal than others: Hierarchical organization for accurate and efficient large-scale identity-based face retrieval. In ECCV Workshop, 2014.

[2]

P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Finding actors and actions in movies. In Proc. ICCV, 2013.

Digital Library

[3]

R. G. Cinbis, J. J. Verbeek, and C. Schmid. Unsupervised metric learning for face identification in TV video. In Proc. ICCV, 2011.

Digital Library

[4]

T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/script: Alignment and parsing of video and text transcription. In Proc. ECCV, 2008.

Digital Library

[5]

T. Cour, B. Sapp, A. Nagle, and B. Taskar. Talking pictures: Temporal grouping and dialog-supervised person recognition. In Proc. CVPR, 2010.

[6]

T. Cour, B. Sapp, and B. Taskar. Learning from ambiguously labeled images. In Proc. CVPR, 2009.

[7]

T. Cour, B. Sapp, and B. Taskar. Learning from partial labels. J. Machine Learning Research, 2011.

Digital Library

[8]

M. Eichner and V. Ferrari. Better appearance models for pictorial structures. In Proc. BMVC., 2009.

[9]

M. Everingham, J. Sivic, and A. Zisserman. "Hello! My name is... Buffy" – automatic naming of characters in TV video. In Proc. BMVC., 2006.

[10]

M. Everingham, J. Sivic, and A. Zisserman. Taking the bite out of automatic naming of characters in TV video. Image and Vision Computing, 27(5), 2009.

Digital Library

[11]

P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proc. CVPR, 2008.

[12]

P. F. Felzenszwalb, R. B. Grishick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, 2010.

Digital Library

[13]

M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? Metric learning approaches for face identification. In Proc. ICCV, 2009.

[14]

E. Khoury, P. Gay, and J.-M. Odobez. Fusing Matching and Biometric Similarity Measures for Face Diarization in Video. In ICMR, 2013.

Digital Library

[15]

A. Kläser, M. Marszałek, C. Schmid, and A. Zisserman. Human focused action localization in video. In International Workshop on Sign, Gesture, Activity, 2010.

[16]

D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.

Digital Library

[17]

M. Marin-Jimenez, A. Zisserman, and V. Ferrari. "Here's looking at you, kid." Detecting people looking at each other in videos. In Proc. BMVC., 2011.

[18]

J. Monaco. How to Read a Film: The World of Movies, Media, Multimedia – Language, History, Theory. OUP USA, Apr 2000.

[19]

O. M. Parkhi, K. Simonyan, A. Vedaldi, and A. Zisserman. A compact and discriminative face track descriptor. In Proc. CVPR, 2014.

Digital Library

[20]

L. C. Pickup and A. Zisserman. Automatic retrieval of visual continuity errors in movies. In Proc. CIVR, 2009.

Digital Library

[21]

D. Ramanan, S. Baker, and S. Kakade. Leveraging archival video for building face datasets. In Proc. ICCV, 2007.

[22]

J. See and C. Eswaran. Exemplar Extraction Using Spatio-Temporal Hierarchical Agglomerative Clustering for Face Recognition in Video. In ICCV, 2011.

Digital Library

[23]

G. Sharma, F. Jurie, and P. Perez. EPML: Expanded Parts based Metric Learning for Occlusion Robust Face Verification. In ACCV, 2014.

[24]

J. Shi and C. Tomasi. Good features to track. In Proc. CVPR, pages 593–600, 1994.

[25]

J. Sivic, M. Everingham, and A. Zisserman. "Who are you?" – learning person specific classifiers from video. In Proc. CVPR, 2009.

[26]

T. J. Smith. An Attentional Theory of Continuity Editing. PhD thesis, University of Edinburgh, 2006. Unpublished Doctoral Thesis.

[27]

M. Tapaswi, M. Bäuml, and R. Stiefelhagen. "Knock! Knock! Who is it?" Probabilistic Person Identification in TV Series. In Proc. CVPR, 2012.

Digital Library

[28]

M. Tapaswi, M. Bäuml, and R. Stiefelhagen. StoryGraphs: Visualizing Character Interactions as a Timeline. In CVPR, 2014.

Digital Library

[29]

P. Wohlhart, M. Köstinger, P. M. Roth, and H. Bischof. Multiple instance boosting for face recognition in videos. In DAGM-Symposium, 2011.

Digital Library

[30]

L. Wolf, T. Hassner, and I. Maoz. Face recognition in unconstrained videos with matched background similarity. In Proc. CVPR, 2011.

Digital Library

[31]

B. Wu, S. Lyu, B.-G. Hu, and Q. Ji. Simultaneous Clustering and Tracklet Linking for Multi-Face Tracking in Videos. In ICCV, 2013.

Digital Library

[32]

B. Wu, Y. Zhang, B.-G. Hu, and Q. Ji. Constrained Clustering and Its Application to Face Clustering in Videos. In CVPR, 2013.

Digital Library

[33]

Y. Yusoff, W. Christmas, and J. Kittler. A Study on Automatic Shot Change Detection. Multimedia Applications, Services and Techniques, 1998.

Digital Library

Cited By

Tada EKurita T(2024)Clustering of Face Images in Video by Using Deep LearningFrontiers of Computer Vision10.1007/978-981-97-4249-3_2(14-26)Online publication date: 30-Jun-2024
https://doi.org/10.1007/978-981-97-4249-3_2
Walawalkar DGarrido P(2024)VideoClusterNet: Self-supervised and Adaptive Face Clustering for VideosComputer Vision – ECCV 202410.1007/978-3-031-73404-5_22(377-396)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-73404-5_22
Wang YDong MShen JLuo YLin YMa PPetridis SPantic M(2023)Self-supervised Video-centralised Transformer for Video Face ClusteringIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3243812(1-16)Online publication date: 2023
https://doi.org/10.1109/TPAMI.2023.3243812
Show More Cited By

Index Terms

Total Cluster: A person agnostic clustering method for broadcast videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
        Video summarization
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Total Variation Models for Variable Lighting Face Recognition

In this paper, we present the logarithmic total variation (LTV) model for face recognition under varying illumination, including natural lighting conditions, where we rarely know the strength, direction, or number of light sources. The proposed LTV ...
On cluster tree for nested and multi-density data clustering

Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
To cluster, or not to cluster: An analysis of clusterability methods
Highlights
- The paper surveys and compares clusterability tests.
- New clusterability tests ...
Abstract
Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. For most applications, applying clustering is only appropriate when cluster structure is present. As such, the study of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '14: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing

December 2014

692 pages

ISBN:9781450330619

DOI:10.1145/2683483

General Chairs:
A. G. Ramakrishnan
IISc, Bangalore
,
Jitendra Malik
University California, Berkeley
,
Program Chairs:
Alex Efros
UC-Berkeley
,
C. V. Jawahar
IIIT Hyderabad
,
Manik Varma
Microsoft Research

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICVGIP '14

ICVGIP '14: Indian Conference on Computer Vision Graphics and Image Processing

December 14 - 18, 2014

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
119
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tada EKurita T(2024)Clustering of Face Images in Video by Using Deep LearningFrontiers of Computer Vision10.1007/978-981-97-4249-3_2(14-26)Online publication date: 30-Jun-2024
https://doi.org/10.1007/978-981-97-4249-3_2
Walawalkar DGarrido P(2024)VideoClusterNet: Self-supervised and Adaptive Face Clustering for VideosComputer Vision – ECCV 202410.1007/978-3-031-73404-5_22(377-396)Online publication date: 30-Oct-2024
https://doi.org/10.1007/978-3-031-73404-5_22
Wang YDong MShen JLuo YLin YMa PPetridis SPantic M(2023)Self-supervised Video-centralised Transformer for Video Face ClusteringIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3243812(1-16)Online publication date: 2023
https://doi.org/10.1109/TPAMI.2023.3243812
Brown AKalogeiton VZisserman A(2021)Face, Body, Voice: Video Person-Clustering with Multiple Modalities2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW54120.2021.00357(3177-3187)Online publication date: Oct-2021
https://doi.org/10.1109/ICCVW54120.2021.00357
Sharma VTapaswi MSarfraz MStiefelhagen R(2020)Video Face Clustering With Self-Supervised Representation LearningIEEE Transactions on Biometrics, Behavior, and Identity Science10.1109/TBIOM.2019.29472642:2(145-157)Online publication date: Apr-2020
https://doi.org/10.1109/TBIOM.2019.2947264
Sharma VTapaswi MSarfraz MStiefelhagen R(2020)Clustering based Contrastive Learning for Improving Face Representations2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)10.1109/FG47880.2020.00011(109-116)Online publication date: Nov-2020
https://doi.org/10.1109/FG47880.2020.00011
Dagher IMikhael SAl-Khalil O(2020)Gabor face clustering using affinity propagation and structural similarity indexMultimedia Tools and Applications10.1007/s11042-020-09822-5Online publication date: 1-Oct-2020
https://doi.org/10.1007/s11042-020-09822-5
Tapaswi MLaw MFidler S(2019)Video Face Clustering With Unknown Number of Clusters2019 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV.2019.00513(5026-5035)Online publication date: Oct-2019
https://doi.org/10.1109/ICCV.2019.00513
Sharma VTapaswi MSarfraz MStiefelhagen R(2019)Self-Supervised Learning of Face Representations for Video Face Clustering2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)10.1109/FG.2019.8756609(1-8)Online publication date: May-2019
https://doi.org/10.1109/FG.2019.8756609
Sarfraz SSharma VStiefelhagen R(2019)Efficient Parameter-Free Clustering Using First Neighbor Relations2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2019.00914(8926-8935)Online publication date: Jun-2019
https://doi.org/10.1109/CVPR.2019.00914
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten