I am attempting to manually add a table to the columnar engine in AlloyDB, but am running into an error indicating that I need to increase the population heap size parameter:
db=> select google_columnar_engine_add(relation => 'my_table_name');
WARNING: Error occurred: Columnar engine encountered out of memory during population. Set google_columnar_engine.population_heap_size_in_mb to a bigger value.
db=> show google_columnar_engine.population_heap_size_in_mb;
google_columnar_engine.population_heap_size_in_mb
---------------------------------------------------
250
db=> SET google_columnar_engine.population_heap_size_in_mb TO 1000;
ERROR: parameter "google_columnar_engine.population_heap_size_in_mb" cannot be changed now
As you can see above, I don't seem to be able to configure this parameter at runtime and I don't see an AlloyDB flag that would control it. The table size is ~4.6 GB which should fit within the available columnar engine memory capacity of 14000 MB (see below):
db=> \dt+ my_table_name;
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
-------------------+-----------------+-------+----------+-------------+---------------+---------+-------------
public | my_table_name | table | db_user | permanent | heap | 4663 MB |
db=> show google_columnar_engine.memory_size_in_mb;
google_columnar_engine.memory_size_in_mb
------------------------------------------
14000
(1 row)
db=> SELECT google_columnar_engine_memory_available();
google_columnar_engine_memory_available
-----------------------------------------
12242
(1 row)
So it seems like the issue isn't that the table wouldn't fit in memory. It's more that this internal parameter which controls the heap size available while loading data into memory is being exhausted somehow.
Is there any way to control the `google_columnar_engine.population_heap_size_in_mb` parameter? I can't find any documentation for it, nor does it appear in `pg_settings`.
---
Update:
I've tried adding specific columns one at a time and this succeeds up to a point. None of the columns is particularly large (most consume about 35MB when added to the columnar engine, as indicated in the g_columnar_columns table), but once I've added 16 columns to the engine in this way, attempting to add the 17th column throws the original heap size error.
Solved! Go to Solution.
The error message indicating an out-of-memory issue during population suggests that this heap size limit is being exceeded, despite the table fitting comfortably within the overall columnar engine memory (google_columnar_engine.memory_size_in_mb
). Incrementally adding columns until hitting a limit further supports this theory. Unfortunately, this parameter is not configurable at runtime or through any known flags.
Reaching out to Google Cloud Support is the best approach for a definitive solution. They can provide insights into the parameter's intended behavior, suggest workarounds, potentially adjust the parameter on the backend, and offer guidance on optimal data loading strategies for large tables.
If immediate adjustments are needed, consider the following workarounds:
Alternative Loading Methods:
CREATE OR REPLACE FUNCTION load_table_to_columnar(table_name text) RETURNS void AS $$
DECLARE
column_name text;
BEGIN
FOR column_name IN SELECT column_name FROM information_schema.columns WHERE table_name = table_name
LOOP
EXECUTE 'SELECT google_columnar_engine_add(relation => $1, columns => ARRAY[$2])'
USING table_name, column_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT load_table_to_columnar('my_table_name');
Memory consumption during columnar population can be significantly influenced by data types, with complex types like arrays or JSON requiring more heap space. Additionally, the distribution of data across columns can lead to uneven memory usage during loading, particularly with highly skewed data.
Contacting Google Cloud Support offers the most direct path to resolving the heap size limitation, restructuring the table or employing alternative loading methods can serve as effective interim solutions.
The error message indicating an out-of-memory issue during population suggests that this heap size limit is being exceeded, despite the table fitting comfortably within the overall columnar engine memory (google_columnar_engine.memory_size_in_mb
). Incrementally adding columns until hitting a limit further supports this theory. Unfortunately, this parameter is not configurable at runtime or through any known flags.
Reaching out to Google Cloud Support is the best approach for a definitive solution. They can provide insights into the parameter's intended behavior, suggest workarounds, potentially adjust the parameter on the backend, and offer guidance on optimal data loading strategies for large tables.
If immediate adjustments are needed, consider the following workarounds:
Alternative Loading Methods:
CREATE OR REPLACE FUNCTION load_table_to_columnar(table_name text) RETURNS void AS $$
DECLARE
column_name text;
BEGIN
FOR column_name IN SELECT column_name FROM information_schema.columns WHERE table_name = table_name
LOOP
EXECUTE 'SELECT google_columnar_engine_add(relation => $1, columns => ARRAY[$2])'
USING table_name, column_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT load_table_to_columnar('my_table_name');
Memory consumption during columnar population can be significantly influenced by data types, with complex types like arrays or JSON requiring more heap space. Additionally, the distribution of data across columns can lead to uneven memory usage during loading, particularly with highly skewed data.
Contacting Google Cloud Support offers the most direct path to resolving the heap size limitation, restructuring the table or employing alternative loading methods can serve as effective interim solutions.
Hi @ms4446 , I have been going through AlloyDB recently too since we are using it in a POC. Our source database has partitions and since you mention partitioning in your answer it got me interested. I have gone through the AlloyDB documentation but not come across anything about partitions. However, there is a supported extension called "pg_partman" that supports partitioning. I assume that there is no managed partitioning service/strategy that AlloyDB has (like in BigQuery for example). That is, if we were to use this extension, we'd have to manage everything about the partitions ourselves. Is my understanding correct?