Skip to content

Machine learning-powered video library that returns accurate results given search queries based on the features: audio, onscreen text, and image objects

Notifications You must be signed in to change notification settings

czarinagluna/ml-powered-video-library

Repository files navigation

Machine Learning-Powered Video Library

Video sharing applications today lack the functionality for users to search videos by their content. As a solution I developed a searchable video library that processes videos and returns exact matches to queries using machine learning and artificial intelligence including speech recognition, optical character recognition, and object detection.

Link to Web Application

Business Problem

Applications for video sharing and storage may be able to enhance user experience by allowing users to search for videos by their content, such as specific words or objects in the video. One of the most popular video sharing apps right now is TikTok where users can save the videos they like to their profile but yet cannot search through the liked videos.

As it lacks that functionality, its millions of users are forced to scroll through every single video they have ever liked to find one single clip, and over again. To address this problem, I create a library of TikTok videos and build a search engine that breaks down the videos into several features and returns exact matches to any given query.

Data Understanding

A sample of 140 videos are provided in the videos folder in this repository for the purpose of demonstrating the end-to-end process I performed. This set of sample is originally saved from my personal user account, but in addition, I downloaded two datasets each containing 1000 videos from Kaggle (found here and here). Altogether I analyzed over 2000 videos for the whole project, which I uploaded on Google Drive. You may download all the videos to explore the complete dataset.

Multimedia Data

A video is a complex data type that can be broken down in a lot of different ways. Through feature engineering, I turned the raw videos into multiple data features that I extracted using the following approaches:

  • Converting the video to audio and transcribing the speech
  • Breaking down the video as a sequence of images or frames
    • Recognizing on-screen text in the video frames
    • Detecting image objects in the video frames

Video Processing

  1. Audio processing using moviepy, pydub, and speech_recognition
  2. Optical character recognition using opencv-python, PIL, and pytesseract
  3. Object detection using opencv-python and YOLOv3 algorithm

YOLO Object Detection

To process the videos, the state-of-the-art object detection system YOLO is used. YOLO looks at the image once. Thus, the name You Only Look Once.

  • The input image is divided into a grid of x by x number of cells.
  • Around the cells, bounding boxes are predicted with confidence scores.
  • Class probabilities are mapped, with the bounding boxes weighted by the predictions.
  • The output of objects detected are displayed if the threshold set is met.

An example frame to illustrate:

Using the above packages and models, the features are extracted as text and I applied Natural Language Processing (NLP) to process the text and to create a corpus of all the words to search through. Finally I built the search engine using BM25 and deployed the full app via Streamlit.

Search Results

demo.mov

Local Setup

To run the app locally, clone the repository and install the requirements with pip.

git clone https://github.com/czarinagluna/ml-powered-video-library/
cd ml-powered-video-library
pip install -r requirements.txt

And lastly:

streamlit run streamlit-app.py

SOURCE CODE: Main Notebook

Contact

Feel free to contact me for any questions and connect with me on Linkedin.

About

Machine learning-powered video library that returns accurate results given search queries based on the features: audio, onscreen text, and image objects

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published