Add initial OpenAIEmbeddings support to Chonkie ✨ #46
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces support for OpenAI embeddings within the
chonkie
project by adding a newOpenAIEmbeddings
class and integrating it with the existing embeddings infrastructure. The most important changes include updating dependencies, adding the new embeddings class, registering it, and creating tests for the new functionality.Integration of OpenAI Embeddings:
pyproject.toml
: Addedopenai
to the optional dependencies to support OpenAI embeddings.src/chonkie/embeddings/__init__.py
: ImportedOpenAIEmbeddings
and included it in the__all__
list to make it available for use.src/chonkie/embeddings/openai.py
: Added theOpenAIEmbeddings
class, which implements the OpenAI embeddings using their API. This class includes methods for embedding single texts and batches, counting tokens, and computing similarity.Registration of OpenAI Embeddings:
src/chonkie/embeddings/registry.py
: RegisteredOpenAIEmbeddings
with specific patterns to theEmbeddingsRegistry
to enable its use within the system. [1] [2]Testing for OpenAI Embeddings:
tests/chunker/test_semantic_chunker.py
: Added fixtures and tests to ensure that theSemanticChunker
can be initialized withOpenAIEmbeddings
and functions correctly. [1] [2] [3]tests/embeddings/test_openai_embeddings.py
: Created comprehensive tests for theOpenAIEmbeddings
class, including initialization, embedding single and batch texts, token counting, similarity computation, and availability checks.