An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and translate any audio file.
View Demo
·
Report Bug
·
Request Feature
Table of Contents
Audioflare emerged from my side project endeavors at Smol AI, specifically aimed at exploring the capabilities of Cloudflare AI workers. The project demonstrates a practical use case by orchestrating a series of AI workers to process an audio file of up to 30 seconds. Here’s a walkthrough of the core functionality:
-
Transcription:
- Initially, the audio file is transcribed using Cloudflare's Speech to Text worker, which is built on OpenAI's
whisper
API.
- Initially, the audio file is transcribed using Cloudflare's Speech to Text worker, which is built on OpenAI's
-
Summarization:
- The transcribed text is then summarized using Cloudflare's LLM AI worker, based on Meta's
llama-2-7b-chat-int8
model. It's worth noting that the LLM model struggles with lengthy prompts.
- The transcribed text is then summarized using Cloudflare's LLM AI worker, based on Meta's
-
Sentiment Analysis:
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker, leveraging the Huggingface’s
distilbert-sst-2-int8
model.
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker, leveraging the Huggingface’s
-
Translation:
- The transcribed text is translated into nine languages using Cloudflare's Translation AI workers, which utilize Meta's
m2m100-1.2b
model.
- The transcribed text is translated into nine languages using Cloudflare's Translation AI workers, which utilize Meta's
-
Performance Metrics:
- The time taken for each request to be processed is calculated and disclosed, providing insight into the performance metrics.
-
Observability and Monitoring:
- The Cloudflare AI Gateway is used to add observability and monitoring to the AI workers, including analytics, logging, caching, and rate limiting.
The current setup has its limitations; transcription is confined to 30 seconds, and the LLM model's performance on summarization could be better.
The underlying concept of Audioflare underscores the potential of Cloudflare AI workers by standardizing the AI API request framework, simplifying multi-step AI activities. Although the models in use have limitations and are marked as 'beta' by Cloudflare, there's a clear path toward enhancing this project as more models become available.
Your engagement is encouraged. Feel free to submit pull requests and issues as you experiment with Audioflare. This project is intended to serve as a template for learning and working with Cloudflare AI workers, and while it doesn’t currently include Cloudflare's Image Classification or Text Embedding workers due to their irrelevance to the audio use case, it’s a step towards understanding and utilizing the Cloudflare AI ecosystem better.
As Cloudflare broadens its model support, I look forward to refining Audioflare, making it a more robust and informative template for the developer community.
-
Audio Processing:
- Users can upload an audio file for processing.
- Drag and drop a local audio file from their computer.
- Alternatively, drag and drop one of three pre-provided audio files included on the main page and in this repo.
- Audio files longer than 30 seconds are supported, but only the first 30 seconds will be transcribed.
- Audio transcription is handled by Cloudflare's Speech to Text worker (based on OpenAI's
Whisper
API).
- Users can upload an audio file for processing.
-
Text Summarization:
- Transcribed text is summarized using Cloudflare's LLM AI worker (based on Meta's
llama-2-7b-chat-int8
model).
- Transcribed text is summarized using Cloudflare's LLM AI worker (based on Meta's
-
Sentiment Analysis:
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker (based on Huggingface’s
distilbert-sst-2-int8
model).
- Sentiment analysis is performed on the transcribed text using Cloudflare's Text Classification AI worker (based on Huggingface’s
-
Translation:
- Transcribed text is translated into nine different languages using Cloudflare's Translation AI workers (based on Meta's
m2m100-1.2b
model).
- Transcribed text is translated into nine different languages using Cloudflare's Translation AI workers (based on Meta's
-
Performance Metrics:
- Time taken for each request to be processed is calculated and displayed.
-
Observability and Monitoring:
- Uses Cloudflare AI Gateway to add observability and monitoring to the AI workers:
- Analytics: View metrics like the number of requests and tokens.
- Logging: Monitor requests and errors.
- Caching: Serve requests from Cloudflare’s cache for faster response and cost savings.
- Rate Limiting: Control application scaling by limiting the number of received requests.
- Uses Cloudflare AI Gateway to add observability and monitoring to the AI workers:
-
Learning and Exploration:
- Audioflare serves as a template for learning and working with Cloudflare AI workers.
- Users can explore the functionality of different Cloudflare AI workers excluding the Image Classification or Text Embedding workers as they are not integrated due to their irrelevance to the audio use case.
This project was built in 2023 using the following technologies.
See package.json for a full list of dependencies.
To get a local copy up and running follow these simple steps.
-
Clone this repository
git clone https://github.com/seanoliver/audioflare.git
-
Install dependencies
cd audioflare bun install
-
Create a Cloudflare account
-
Install Wrangler and login
bun add wrangler --dev wrangler login
-
Rename
.env.example
to.env
and follow the instructions linked in the comments to find each of the required keys and values. -
Run the app
bun dev
-
Go to
http://localhost:3000
to check it out
This is a great project for learning Cloudflare, AI Workers, and simple Next.js API Routes. Feel free to fork this repo and make it your own. If you have any questions or suggestions, please feel free to contact me!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Your Name - @SeanOliver - [email protected]
Project Link: https://github.com/seanoliver/audioflare
Live Demo: https://audioflare.seanoliver.dev/