Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSE-4358: BodoSQL support for S3 tables #126

Merged
merged 81 commits into from
Jan 10, 2025
Merged
Changes from 1 commit
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
e76ddfb
Start planner API movement
njriasan Jan 2, 2025
64812e8
Updated the entry point for parseQuery
njriasan Jan 2, 2025
b1bbc8a
Fixed the import
njriasan Jan 2, 2025
88e6673
Merge branch 'main' into nick/bodosql_python_interface
njriasan Jan 2, 2025
325486c
Merge branch 'main' into nick/bodosql_python_interface
njriasan Jan 2, 2025
85cbbad
Moved getOptimizedPlanString
njriasan Jan 3, 2025
777458d
Moved getPandasAndPlanString
njriasan Jan 3, 2025
293e3b8
Moved getPandasString
njriasan Jan 3, 2025
2a835ec
Moved getWriteType
njriasan Jan 3, 2025
f9d9611
Moved executeDDL
njriasan Jan 3, 2025
abab715
Moved the last RelationalAlgebra API
njriasan Jan 3, 2025
1ec8501
Added missing docstrings [run CI]
njriasan Jan 3, 2025
7f9d2e9
Fixed the import ordering [run CI]
njriasan Jan 3, 2025
7da882e
Move reset to package private [run CI]
njriasan Jan 3, 2025
6d02755
Updated fromTypeID API [run CI]
njriasan Jan 3, 2025
2a943b9
Updated WriteTargetEnum [run CI]
njriasan Jan 3, 2025
add7295
Updated the logger API [run CI]
njriasan Jan 3, 2025
d8665d5
Fixed save issue [run CI]
njriasan Jan 3, 2025
28cce7d
Fixed API copying issue [run CI]
njriasan Jan 3, 2025
7ac23dc
Merge branch 'nick/bodosql_python_interface' into nick/fix_BodoSQLCol…
njriasan Jan 3, 2025
af1a9cf
Added missing static declaration [run CI]
njriasan Jan 3, 2025
864ee4e
Merge branch 'nick/bodosql_python_interface' into nick/fix_BodoSQLCol…
njriasan Jan 3, 2025
48605d6
Added more missing static declarations [run CI]
njriasan Jan 3, 2025
1b909c9
Merge branch 'main' into nick/bodosql_python_interface [run CI]
njriasan Jan 3, 2025
04db869
Removed ArrayList
njriasan Jan 3, 2025
80fd441
Removed hash map interaction
njriasan Jan 3, 2025
98bafde
Added stack trace API
njriasan Jan 3, 2025
052ce93
Removed properties
njriasan Jan 3, 2025
c25928c
Added builders
njriasan Jan 3, 2025
2349f14
Add local table builder
njriasan Jan 3, 2025
142c45f
Update LocalSchema
njriasan Jan 3, 2025
7febfe5
Removed LocalSchemaClass [run CI]
njriasan Jan 3, 2025
f87b383
Removed more dead code [run CI]
njriasan Jan 3, 2025
bca4fdc
Fixed bugs [run CI]
njriasan Jan 3, 2025
3b36b27
Merge branch 'nick/bodosql_python_interface' into nick/fix_BodoSQLCol…
njriasan Jan 3, 2025
688f131
Merge branch 'nick/fix_BodoSQLColumn' into nick/remove_constructors_b…
njriasan Jan 3, 2025
6918eb9
Removed planner type
njriasan Jan 3, 2025
09f51b6
Removed DDLExecutionResult calls
njriasan Jan 3, 2025
7ebec08
Added get lowered globals
njriasan Jan 3, 2025
f484159
Refactored pair access
njriasan Jan 3, 2025
1c5d5be
Fixed the exception interface
njriasan Jan 3, 2025
b1cee44
Removed ColumnDataTypeClass
njriasan Jan 3, 2025
76a0c56
Removed ColumnDataTypeClass
njriasan Jan 3, 2025
1e0139e
Removed everything but the constructors
njriasan Jan 3, 2025
f63733d
Refactored RelationalAlgebraGenerator constructor
njriasan Jan 3, 2025
374e4bf
Removed RelationalAlgebraGeneratorClass [run CI]
njriasan Jan 3, 2025
db61339
Merge branch 'main' into nick/bodosql_python_interface [run CI]
njriasan Jan 4, 2025
4b7890e
Merge branch 'nick/bodosql_python_interface' into nick/fix_BodoSQLCol…
njriasan Jan 4, 2025
a9c46f3
Merge branch 'nick/fix_BodoSQLColumn' into nick/remove_constructors_b…
njriasan Jan 4, 2025
10952b6
Merge branch 'nick/remove_constructors_bodosql' into nick/remove_rema…
njriasan Jan 4, 2025
395d644
Fixed map APIs [run CI]
njriasan Jan 5, 2025
95052cb
Merge branch 'nick/remove_constructors_bodosql' into nick/remove_rema…
njriasan Jan 5, 2025
2ca5b36
Fixed create_java_dynamic_parameter_type_list [run CI]
njriasan Jan 6, 2025
db9edfa
Merge branch 'nick/remove_constructors_bodosql' into nick/remove_rema…
njriasan Jan 6, 2025
d066870
Fixed merge conflict [run CI]
njriasan Jan 6, 2025
c873e60
Merge branch 'nick/fix_BodoSQLColumn' into nick/remove_constructors_b…
njriasan Jan 6, 2025
2dfb822
Apply Isaac's feedback [run CI]
njriasan Jan 6, 2025
c66bafc
Merge branch 'nick/remove_constructors_bodosql' into nick/remove_rema…
njriasan Jan 6, 2025
fcebae7
Merged with main [run CI]
njriasan Jan 6, 2025
0fb2a07
Merge branch 'nick/remove_constructors_bodosql' into nick/remove_rema…
njriasan Jan 6, 2025
2fddacb
Fixed APIs
njriasan Jan 6, 2025
3164ada
moved definitions [run CI]
njriasan Jan 6, 2025
86947e4
Add S3TablesCatalog skeleton
IsaacWarren Jan 6, 2025
e91d009
Add implementations for bodosql catalog
IsaacWarren Jan 6, 2025
0f7b77c
Add BodoS3TablesCatalog to PythonEntryPoint
IsaacWarren Jan 7, 2025
550691b
Add s3_tables_catalog to bodosql
IsaacWarren Jan 7, 2025
0585d51
Add s3_tables_catalog fixture
IsaacWarren Jan 7, 2025
01fe6ac
Actually cleanup written table
IsaacWarren Jan 7, 2025
d712c80
Add basic s3 tables read test
IsaacWarren Jan 7, 2025
a7cd361
Add write test
IsaacWarren Jan 7, 2025
43844d7
Update docs
IsaacWarren Jan 7, 2025
e9372c4
Allow catalogs to not specify a default schema
IsaacWarren Jan 7, 2025
d5bb594
Don't give default schema
IsaacWarren Jan 7, 2025
c015ad2
[Run CI]
IsaacWarren Jan 7, 2025
43f60fc
Merge remote-tracking branch 'origin/main' into isaac/bodosql_s3_tables
IsaacWarren Jan 8, 2025
5a8c201
[Run CI]
IsaacWarren Jan 8, 2025
6e2cc69
Fix docstring
IsaacWarren Jan 9, 2025
21de7a1
Fix doc
IsaacWarren Jan 9, 2025
d4389f5
Fix comment
IsaacWarren Jan 9, 2025
06c9ed3
Add infra requirements to docstrings
IsaacWarren Jan 9, 2025
7095e2d
[Run CI]
IsaacWarren Jan 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add implementations for bodosql catalog
  • Loading branch information
IsaacWarren committed Jan 7, 2025
commit e91d0099b4899a1d6102a0859bef6e550bc006db
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,89 @@ package com.bodosql.calcite.catalog

import com.bodosql.calcite.adapter.bodo.StreamingOptions
import com.bodosql.calcite.application.BodoCodeGenVisitor
import com.bodosql.calcite.application.write.IcebergWriteTarget
import com.bodosql.calcite.application.write.WriteTarget
import com.bodosql.calcite.ir.Expr
import com.bodosql.calcite.ir.Variable
import com.bodosql.calcite.sql.ddl.CreateTableMetadata
import com.bodosql.calcite.table.CatalogTable
import com.bodosql.calcite.table.IcebergCatalogTable
import com.google.common.collect.ImmutableList
import org.apache.calcite.sql.ddl.SqlCreateTable
import software.amazon.s3tables.iceberg.S3TablesCatalog

class BodoS3TablesCatalog(
private val warehouse: String,
) : IcebergCatalog<S3TablesCatalog>(createS3TablesCatalog(warehouse)) {
override fun getTableNames(schemaPath: ImmutableList<String>?): MutableSet<String> {
TODO("Not yet implemented")
/**
* Returns a set of all table names with the given schema name.
*
* @param schemaPath The list of schemas to traverse before finding the table.
* @return Set of table names.
*/
override fun getTableNames(schemaPath: ImmutableList<String>): MutableSet<String> {
val ns = schemaPathToNamespace(schemaPath)
return getIcebergConnection().listTables(ns).map { it.name() }.toMutableSet()
}

/**
* Returns a table with the given name and found in the given schema.
*
* @param schemaPath The list of schemas to traverse before finding the table.
* @param tableName Name of the table.
* @return The table object.
*/
override fun getTable(
schemaPath: ImmutableList<String>?,
tableName: String?,
schemaPath: ImmutableList<String>,
tableName: String,
): CatalogTable {
TODO("Not yet implemented")
val columns = getIcebergTableColumns(schemaPath, tableName)
return IcebergCatalogTable(tableName, schemaPath, columns, this)
}

override fun getSchemaNames(schemaPath: ImmutableList<String>?): MutableSet<String> {
TODO("Not yet implemented")
/**
* Get the available subSchema names for the given path.
*
* @param schemaPath The parent schema path to check.
* @return Set of available schema names.
*/
override fun getSchemaNames(schemaPath: ImmutableList<String>): MutableSet<String> {
val ns = schemaPathToNamespace(schemaPath)
return getIcebergConnection().listNamespaces(ns).map { it.level(it.length() - 1) }.toMutableSet()
}

override fun getDefaultSchema(depth: Int): MutableList<String> {
TODO("Not yet implemented")
/**
* Return the list of implicit/default schemas for the given catalog, in the order that they
* should be prioritized during table resolution. The provided depth gives the "level" at which to
* provide the default. Each entry in the list is a schema name at that level, not the path to reach
* that level.
*
* @param depth The depth at which to find the default.
* @return List of default Schema for this catalog.
*/
override fun getDefaultSchema(depth: Int): List<String> {
if (depth == 0) {
return listOf("default")
}
return listOf()
}

override fun numDefaultSchemaLevels(): Int {
TODO("Not yet implemented")
}
/**
* Return the number of levels at which a default schema may be found.
* S3 Table catalogs don't have subSchemas, so this method always returns 1.
* @return The number of levels a default schema can be found.
*/
override fun numDefaultSchemaLevels(): Int = 1

/**
* Generates the code necessary to produce an append write expression from the given catalog.
*
* @param visitor The PandasCodeGenVisitor used to lower globals.
* @param varName Name of the variable to write.
* @param tableName The path of schema used to reach the table from the root that includes the
* table.
* @return The generated code to produce the append write.
*/
override fun generateAppendWriteCode(
visitor: BodoCodeGenVisitor?,
varName: Variable?,
Expand All @@ -45,6 +93,18 @@ class BodoS3TablesCatalog(
TODO("Not yet implemented")
}

/**
* Generates the code necessary to produce a write expression from the given catalog.
*
* @param visitor The PandasCodeGenVisitor used to lower globals.
* @param varName Name of the variable to write.
* @param tableName The path of schema used to reach the table from the root that includes the
* table.
* @param ifExists Behavior to perform if the table already exists
* @param createTableType Type of table to create if it doesn't exist
* @param meta Expression containing the metadata information for init table information.
* @return The generated code to produce a write.
*/
override fun generateWriteCode(
visitor: BodoCodeGenVisitor?,
varName: Variable?,
Expand All @@ -56,6 +116,18 @@ class BodoS3TablesCatalog(
TODO("Not yet implemented")
}

/**
* Generates the code necessary to produce a read expression from the given catalog.
*
* @param useStreaming Should we generate code to read the table as streaming (currently only
* supported for snowflake tables)
* @param tableName The path of schema used to reach the table from the root that includes the
* table.
* @param useStreaming Should we generate code to read the table as streaming (currently only
* supported for snowflake tables)
* @param streamingOptions The options to use if streaming is enabled.
* @return The generated code to produce a read.
*/
override fun generateReadCode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have to add more of these it may be useful to have an abstract parent class that defines the limited behavior for certain catalogs. For example if append and remote queries don't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, most of this is identical to glue/tabular so we could have some kind of "normal" iceberg metclass, basically everything but snowflake

tableName: ImmutableList<String>?,
useStreaming: Boolean,
Expand All @@ -64,31 +136,73 @@ class BodoS3TablesCatalog(
TODO("Not yet implemented")
}

/**
* Generates the code necessary to submit the remote query to the catalog DB.
*
* @param query Query to submit.
* @return The generated code.
*/
override fun generateRemoteQuery(query: String?): Expr {
TODO("Not yet implemented")
}

override fun schemaDepthMayContainTables(depth: Int): Boolean {
TODO("Not yet implemented")
}
/**
* Returns if a schema with the given depth is allowed to contain tables. S3 Tables only allows Tables to be present in "databases", which maps to schemas when using iceberg. Schemas cannot be nested.
*
* @param depth The number of parent schemas that would need to be visited to reach the root.
* @return true if the depth is 1 and false otherwise.
*/
override fun schemaDepthMayContainTables(depth: Int): Boolean = depth == 1

override fun schemaDepthMayContainSubSchemas(depth: Int): Boolean {
TODO("Not yet implemented")
}
/**
* Returns if a schema with the given depth is allowed to contain subSchemas.
* S3 Tables catalogs do not support subSchemas, so this method always returns false for non-zero values.
*
* @param depth The number of parent schemas that would need to be visited to reach the root.
* @return true if the depth is 0 and false otherwise.
*/
override fun schemaDepthMayContainSubSchemas(depth: Int): Boolean = depth == 0

override fun generatePythonConnStr(schemaPath: ImmutableList<String>?): Expr {
TODO("Not yet implemented")
}
/**
* Generate a Python connection string used to read from or write to a Catalog in Bodo's SQL
* Python code.
*
*
* TODO: This method is needed for the XXXToPandasConverter nodes, but exposing
* this is a bad idea and this class likely needs to be refactored in a way that the connection
* information can be passed around more easily.
*
* @param schemaPath The schema component to define the connection not including the table name.
* @return The connection string
*/
override fun generatePythonConnStr(schemaPath: ImmutableList<String>?): Expr =
Expr.Call("bodosql.get_s3Tables_connection", Expr.StringLiteral(warehouse))

/**
* Return the desired WriteTarget for a create table operation.
* This catalog only supports Iceberg tables, so the WriteTarget will be an IcebergWriteTarget.
*
* @param schema The schemaPath to the table.
* @param tableName The name of the type that will be created.
* @param createTableType The createTable type. This is unused by the file system catalog.
* @param ifExistsBehavior The createTable behavior for if there is already a table defined.
* @param columnNamesGlobal Global Variable holding the output column names.
* @return The selected WriteTarget.
*/
override fun getCreateTableWriteTarget(
schema: ImmutableList<String>?,
tableName: String?,
createTableType: SqlCreateTable.CreateTableType?,
ifExistsBehavior: WriteTarget.IfExistsBehavior?,
columnNamesGlobal: Variable?,
): WriteTarget {
TODO("Not yet implemented")
}
schema: ImmutableList<String>,
tableName: String,
createTableType: SqlCreateTable.CreateTableType,
ifExistsBehavior: WriteTarget.IfExistsBehavior,
columnNamesGlobal: Variable,
): WriteTarget =
IcebergWriteTarget(
tableName,
schema,
ifExistsBehavior,
columnNamesGlobal,
generatePythonConnStr(schema),
)

companion object {
/**
Expand Down