This post is about a particular challenge posted on the DuckDB Discord server which happened to interest me. So, here goes!!
Yesterday on the DuckDB discord server user @col_montium asked:
I've got a bit of a head scratcher. My json from https://api.census.gov/data/2000/dec/sf1/variables.json is a single struct with dynamic keys. I want to extract a table with columnsfield_code
,label
,concept
, andgroup
. I completed this with a small sample of the data but with the entire dataset the query uses up 24 gigabytes of RAM and then crashes. Here's my query:
WITH source AS ( SELECT * FROM 'https://api.census.gov/data/2000/dec/sf1/variables.json' ), all_keys AS ( SELECT unnest( json_keys( variables ) ) AS "key", variables FROM source ), extracted_fields AS ( SELECT "key" AS field_code , variables->key->>'$.label' AS label , variables->key->>'$.concept' AS concept , variables->key->>'$.predicateType' AS predicate_type , variables->key->>'$.group' AS "group" FROM all_keys WHERE "key" NOT IN ('for', 'in', 'ucgid') ) SELECT * FROM extracted_fields ORDER BY field_codeI tried to run this myself on my laptop using the DuckDB 1.1.3 (GA) command line interface, and it failed with a similar error as reported by @col_montium:
Run Time (s): real 252.335 user 197.031250 sys 9.500000 Out of Memory Error: failed to allocate data of size 16.0 MiB (24.9 GiB/25.0 GiB used)I was able to run it successfully using a 1.1.4 nightly build, but it took about 3 minutes to complete:
Run Time (s): real 188.681 user 148.500000 sys 8.078125
Dataset: Size and Structure
Surprisingly, the source data set is less than 2MB (2,039,748 bytes)! The query also doesn't appear to be terribly complicated, and if it finishes successfully, it yields only
8141
rows.
So, clearly something interesting must be the matter with the structure of the dataset.
As it is so modest in size, we can easily inspect it with a text editor:
{ "variables": { "for": { "label": "Census API FIPS 'for' clause", "concept": "Census API Geography Specification", "predicateType": "fips-for", "group": "N/A", "limit": 0, "predicateOnly": true }, "in" { "label": "Census API FIPS 'in' clause", "concept": "Census API Geography Specification", "predicateType": "fips-in", "group": "N/A", "limit": 0, "predicateOnly": true }, "ucgid" { "label": "Uniform Census Geography Identifier clause", "concept": "Census API Geography Specification", "predicateType": "ucgid", "group": "N/A", "limit": 0, "predicateOnly": true, "hasGeoCollectionSupport": true }, "P029009" { "label": "Total!!In households!!Related child!!Own child!!6 to 11 years", "concept": "RELATIONSHIP BY AGE FOR THE POPULATION UNDER 18 YEARS [46]", "predicateType": "int", "group": "P029", "limit": 0 }, ...many more variables... "PCT012H185" { "label": "Total!!Female!!78 years", "concept": "SEX BY AGE (HISPANIC OR LATINO) [209]", "predicateType": "int", "group": "PCT012H", "limit": 0 } } }The structure of the dataset is quite simple: a single object with a single
variables
property, which has an object value with many object-typed properties.
The object-type values of these properties contain a handful of recurring scalar properties like label
, concept
, predicateType
, group
and limit
.
Now that we analyzed the structure, it should be clear what the query attempts to achieve. It wants to make a row for each property of the outermost
variables
-object.
The property name becomes the field_code
column, and forms its natural primary key.
The recurring properties of the innermost objects become its non-key columns label
, concept
, predicate_type
and group
.
Using read_text()
and a DuckDB table column with the JSON
-datatype
In an attempt to get the query working without running out of memory, I decided to store the dataset in a DuckDB table like so:
CREATE TABLE t_json AS SELECT filename , content::JSON AS data FROM read_text('https://api.census.gov/data/2000/dec/sf1/variables.json');With this in place, a modified version of the initial query might be:
WITH variables AS ( SELECT data->variables AS variables , unnest( json_keys( variables ) ) AS field_code FROM t_json ) SELECT field_code , variables->field_code->>'label' AS label , variables->field_code->>'concept' AS concept , variables->field_code->>'predicateType' AS predicate_type , variables->field_code->>'group' AS group FROM variables WHERE field_code NOT IN ('for', 'in', 'ucgid') ORDER BY field_codeThe results look like this:
┌────────────┬──────────────────────────────────┬─────────┬────────────────┬─────────┐ │ field_code │ label │ concept │ predicate_type │ group │ │ varchar │ varchar │ varchar │ varchar │ varchar │ ├────────────┼──────────────────────────────────┼─────────┼────────────────┼─────────┤ │ AIANHH │ American Indian Area/Alaska Na… │ NULL │ NULL │ N/A │ │ AIHHTLI │ American Indian Area (Off-Rese… │ NULL │ NULL │ N/A │ │ AITSCE │ American Indian Tribal Subdivi… │ NULL │ NULL │ N/A │ │ ANRC │ Alaska Native Regional Corpora… │ NULL │ NULL │ N/A │ │ BLKGRP │ Census Block Group │ NULL │ NULL │ N/A │ │ BLOCK │ Census Block │ NULL │ NULL │ N/A │ │ CD106 │ Congressional District (106th) │ NULL │ string │ N/A │ │ CONCIT │ Consolidated City │ NULL │ NULL │ N/A │ │ · │ · │ · │ · │ · │ │ · │ · │ · │ · │ · │ │ · │ · │ · │ · │ · │ │ SUBMCD │ Sub-Minor Civil Division (FIPS) │ NULL │ NULL │ N/A │ │ SUMLEVEL │ Summary Level code │ NULL │ string │ N/A │ │ TRACT │ Census Tract │ NULL │ NULL │ N/A │ │ UA │ Urban Area │ NULL │ NULL │ N/A │ │ US │ United States │ NULL │ NULL │ N/A │ │ ZCTA3 │ ZIP Code Tabulation Area (Thre… │ NULL │ NULL │ N/A │ │ ZCTA5 │ Zip Code Tabulation Area (Five… │ NULL │ NULL │ N/A │ ├────────────┴──────────────────────────────────┴─────────┴────────────────┴─────────┤ │ 8141 rows (15 shown) 5 columns │ └────────────────────────────────────────────────────────────────────────────────────┘ Run Time (s): real 11.682 user 8.015625 sys 2.406250Less than 12 seconds. Not super-fast, but still: a pretty substantial improvement as compared to the initial query. What could explain this difference? Two things stand out as compared to the initial query:
-
Using the
read_text()
-function to ingest the data. In the original query, the data was ingested by the first common table expression calledsource
, which did aSELECT
directlyFROM
the dataset's URL'https://api.census.gov/data/2000/dec/sf1/variables.json'
. Clearly, with this syntax, DuckDB will do some magic to perform the appropriate action, which in this case will be to invokeread_json_auto('https://api.census.gov/data/2000/dec/sf1/variables.json')
. -
Explicitly casting the content of the data file to DuckDB's
JSON
-datatype. I expected the issue with the original query had something todo with the transformation of JSON data to a tabular result. Having the data in aJSON
-datatype seemed like a good start to play with different methods to extract data. Theread_text()
-function returns the contents of the file asVARCHAR
. It cannot and does not attempt to parse or process the data. That's why we need to cast its output explictly to theJSON
-type ourselves.
read_text()
directly:
WITH variables AS ( SELECT json( content )->variables AS variables , unnest( json_keys( variables ) ) AS field_code FROM read_text( 'https://api.census.gov/data/2000/dec/sf1/variables.json' ) ) SELECT field_code , variables->field_code->>'label' AS label , variables->field_code->>'concept' AS concept , variables->field_code->>'predicateType' AS predicate_type , variables->field_code->>'group' AS group FROM varibles WHERE field_code NOT IN ('for', 'in', 'ucgid') ORDER BY field_codeThis query takes about 20 seconds, in other words: about the same time as it takes to run the previous query plus the time to fetch the dataset and load it in the table. For a serious comparison of query performance we should probably eliminate the download and use only local files, but we'll get to that later. For now it's enough to notice that this setup is a significant improvement, both in terms of query execution time as well as in stability/memory consumption.
You might also have noticed this alternative query uses slightly different syntax to the extract
label
, concept
, predicateType
and group
.
Also, the final data presentation query is combined with the field extraction logic.
However, these changes are really a matter of style.
They do not have any bearing on query performance - not for this data set anyway.
Understanding the difference: the JSON-reader
With these results in mind, we have to conclude that the actual operations on the JSON data cannot really explain the difference. So, it must have something todo with the JSON-reader that was implicitly invoked by the initial query. Let's zoom in a bit on the difference between data extracted from a
JSON
-typed value and data extracted by the JSON-reader.
We know that in the second example, the data is of the
JSON
-type, because we explicitly cast it to that type.
However, that only tells us that the data is essentially text, conforming to the JSON-syntax.
It is of course very useful to know the encoding scheme used to represent values and structures, that still does not reveal anything about the content and structure of the data itself.
What data type does the JSON reader think the data has? Let's find out using this
DESCRIBE
-statement:
DESCRIBE
SELECT *
FROM 'https://api.census.gov/data/2000/dec/sf1/variables.json'
Its result is:┌─────────────┬─────────────────────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├─────────────┼─────────────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ variables │ STRUCT("for" STRUCT("labe… │ YES │ NULL │ NULL │ NULL │ └─────────────┴─────────────────────────────┴─────────┴─────────┴─────────┴─────────┘ Run Time (s): real 9.541 user 1.968750 sys 0.031250As you can see, the
variables
property of the main object that makes up our dataset was detected and projected as a column.
DuckDB has also inferred a pretty detailed data type for the column, expressed using DuckDB native SQL data types.
In the example above it's truncated, but if we run it again using
.mode line
, we can see the entire type descriptor.
It goes like this:
STRUCT( "for" STRUCT( "label" VARCHAR , concept VARCHAR , predicateType VARCHAR , "group" VARCHAR , "limit" BIGINT , predicateOnly BOOLEAN ) , "in" STRUCT( "label" VARCHAR , concept VARCHAR , predicateType VARCHAR , "group" VARCHAR , "limit" BIGINT , predicateOnly BOOLEAN ) , ucgid STRUCT( "label" VARCHAR , concept VARCHAR , predicateType VARCHAR , "group" VARCHAR , "limit" BIGINT , predicateOnly BOOLEAN , hasGeoCollectionSupport BOOLEAN ) , P029009 STRUCT( "label" VARCHAR , concept VARCHAR , predicateType VARCHAR , "group" VARCHAR , "limit" BIGINT ) ...many more ... , PCT012H185 STRUCT( "label" VARCHAR , concept VARCHAR , predicateType VARCHAR , "group" VARCHAR , "limit" BIGINT ) )Looks familiar? It should, as it corresponds closely to the structure we witnessed when analyzing the raw JSON data using a text editor.
So, this really is quite different as compared to the
JSON
-type column, as the JSON-reader must have done some work to explore the structure of the data.
Not only did it detect the data type, it also uses the type to represent the data, which in turn determines what operators and functions will be available by the remainder of the query to work with the data.
Whether this is related to the difference in performance and to the out of memory errors, is yet to be determined, but it does not seem unlikely. For starters, the data type closely resembles the data. The textual description of the data type is about half the size of the data itself, which does not seem like a good thing!
We will investigate the JSON-readers' capabilities for data type detection and the possibilities to control that in the next couple of sections. But before we examine that in more detail, let's take a step back and consider what we just learned.
One lesson learned
So far, we've learned at least one important lesson: the JSON reader does in fact do exactly what its name implies - read JSON-encoded data. But while doing so, it tends to output data using DuckDB native data types, and typically not the
JSON
data type.
In hindsight this certainly sounds perfectly sensible. Yet, both me and @col_montium apparently did not realize that fully, as the original query takes the data coming out of the JSON reader, and processes it using functions like
json_keys()
, and the extraction operators ->
and ->>
.
These are designed for working with values in the JSON
data type, and not for the STRUCT
values that the JSON reader hands back to us.
You may wonder: how it is possible that these functions and operators work at all on the
STRUCT
value returned by the JSON reader?
The answer is quite simple: when we apply these functions and operators, they first implicitly cast their arguments and operands to the JSON
-type.
So, just as the JSON reader spent all its effort to read raw JSON-encoded data and convert it into neat native DuckDB STRUCT
s, we immediately undid that effort only to convert them back to a JSON
-type.
And initially we - or at least I - didn't even realize it.
Actually - it is a little bit different still, as the JSON-reader deals with reading JSON-encoded text whereas the
JSON
-data type is still a database type.
But the point here is that the JSON-reader did work to provide a precise description of the data as STRUCT
s, and we didn't bother treating the data as such.
Instead, we implicitly cast it to the JSON
-type, which is much looser.
So it feels there might be an unused opportunity to benefit from the detected data type.
We will examine this possibility in the next section.
Avoiding the implicit JSON
-type cast?
Now that we have a better understanding of what we've done, could we rewrite the original query so we can avoid the implicit
JSON
-type cast, and benefit from the DuckDB native types returned by the JSON reader?
Somewhat surprisingly, this is not quite as simple as one might think it would (or should) be.
First of all, we cannot simply use whatever are the
STRUCT
equivalents for the JSON
-type functions and operators.
For example, it is currently not possible to extract the keys from a
STRUCT
type.
So there's no straightforward equivalent to how the original query uses json_keys()
and then unnest()
to spread out the keys from the original variables
object into separate rows.
(In the DuckDB github repository, @pkoppstein opened a discussion about creating a feature to extract the keys from STRUCT
s in case you want to chime in.)
Even if we somehow managed to extract the keys from the
STRUCT
, there is currently no way to use it to extract the corresponding value.
That is, STRUCT
s of course support extraction methods, but all of them require the key to be a constant value. This prohibits using a column value, such as what we would get if we'd use an unnested set of keys:
WITH field_codes AS ( SELECT unnest( json_keys( variables ) ) AS field_code , variables FROM 'https://api.census.gov/data/2000/dec/sf1/variables.json' ) SELECT field_code , variables[field_code] AS "variable" FROM field_codes WHERE field_codes NOT IN ('for', 'in', 'ucgid')In the example above,
variables
is the STRUCT
-typed value we receive from the JSON-reader.
The expression variables[field_code]
attempts to extract a value from it using the column value field_code
.
The statement fails because field_code
is not a constant:Run Time (s): real 8.345 user 1.531250 sys 0.031250 Binder Error: Key name for struct_extract needs to be a constant stringAs the error message suggests, the square bracket extraction syntax is just syntactic sugar for the
struct_extract()
function.
Using a function call instead yields the exact same error.
A Solution based on STRUCT
s
Eventually I came up with an approach that at least allows me to envision what the solution based on
STRUCT
s, and without relying on the JSON
-type might look like.
Here's that attempt:
WITH variables AS ( UNPIVOT ( WITH variables AS ( SELECT COLUMNS( variables.* EXCLUDE( "for", "in", "ucgid" ) ) FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' ) ) SELECT struct_pack( label := struct_extract( COLUMNS( * ) , 'label' ) , concept := struct_extract( COLUMNS( * ) , 'concept' ) , predicateType := struct_extract( COLUMNS( * ) , 'predicateType' ) , "group" := struct_extract( COLUMNS( * ) , 'group' ) ) FROM variables ) ON( * ) INTO NAME field_code VALUE "variable" ) SELECT field_code, "variable".* FROM variables ORDER BY field_codeThe main idea here is to use
variables.*
to turn the keys of the STRUCT
given to us by the JSON-reader into separate columns.
The UNPIVOT
-statement can then turn those columns into rows.
In the original query this was achieved by combining json_keys()
and unnest()
.
Since we don't need the metadata-like objects
ucgid
, for
and in
, we also don't need to unpack them from the variables
object.
So, we wrapped variables.*
into a COLUMNS
-"star expression"
so we could add an EXCLUDE
-clause to remove them.
In the original query, those where removed in the WHERE
-clause.
The
UNPIVOT
-statement not only creates rows from columns, but also lets us transform the column names into values.
We use the NAME
-clause to collect them into a new field_code
column.
The VALUE
-clause can be used to name the column that receives the corresponding column values.
In the example above, that value column is assigned the name variable
.
Now, the values extracted from the
variables
-object are themselves STRUCT
s, and they do not all have the same set of keys.
From the DuckDB persective, they therefore have distinct types, and thus cannot all simply be lumped together directly into a single column.
However, we're only interested in a particular set of properties. They are
'label'
, 'concept'
, 'predicateType'
, and 'group'
.
If the type of their value is the same across all objects, then we can extract them and create new STRUCT
s having only those keys.
This is achieved using struct_extract()
and struct_pack()
.
Again, the
COLUMNS(*)
-"star expression" proves to be a very useful tool!
We use it here as argument for struct_extract()
.
This way, we need to write the STRUCT
extraction-and-assembly business only once.
The star expression then applies it to all columns without having to name them explicitly.
The STRUCT
-value assembled by struct_pack()
gets the name of the original column.
The final touch is to unwrap the
STRUCT
s we assembled into separate columns.
This is done in the terminal SELECT
, again using variable.*
syntax.
Nice! Pity it works only in theory
When we run it, we get:
Run Time (s): real 14.183 user 12.484375 sys 0.062500 Binder Error: Could not find key "concept" in structApparently, not only do the objects in our data set sometimes have more keys than we're interested in, some also don't have all the keys we're attempting to extract. There does not seem to be any way to detect whether some arbitrary
STRUCT
-value has a specific key, so we can't simply work around it.
If we comment out the extraction for the the
concept
key, we get the same error but now for the predicateType
key.
If you comment out that extraction too, the query executes succesfully.
On my laptop, it takes about as long as the original query, that is to say, much worse than the alternative using the JSON
-type.
So even if we would somehow be able to overcome the issues with extracting the properties and creating the
STRUCT
, we still can't seem to really benefit from the more strict typing.
I guess the main take-away here is, we can STRUCTggle all we want, but our data simply appears not to be given to us in a way that allows it to work for us.
JSON reader Parameters
We mentioned earlier that
SELECT
-ing directly from the url, as the original query did, causes DuckDB to invoke read_json()
or read_json_auto()
, which are synonyms of each other.
If you like, you can convince yourself by running an equivalent DESCRIBE
-statement that explicitly invokes the reader:
DESCRIBE SELECT * FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' )If you execute it, you'll notice that the inferred column name and data type are identical to that returned by the prior
DESCRIBE
-statement.
But invoking the reader explicitly has a benefit in that it offers us the possibility to pass parameters to control its behavior.
Some relevant to the topic at hand are:
BOOLEAN auto_detect
- Whether to auto detect the schema at all. Contrary to what the current documentation states, the default value is TRUE rather than FALSE.
STRUCT(name VARCHAR, type VARCHAR) columns
- If you choose
auto_detect
to be FALSE, you are required to explicitly specify the columns and column types. INTEGER maximum_depth
- The number of nesting levels that are considered when detecting the datatype. The default is
-1
, which means there is no restriction on the depth of type detection. When set to a positive integer, and there are object-type values at the maximum level of nesting, then those will get 'detected' as being of theJSON
-type.
Controlling the depth of the JSON-reader's type detection
Now that we learned about the
maximum_depth
parameter, let's apply it and experience its effect:
DESCRIBE SELECT * FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' , maximum_depth = 0 )At
maximum_depth = 0
, there is no detection at all, and the entire data set is just a JSON
-typed value:
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤ │ json │ JSON │ YES │ NULL │ NULL │ NULL │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘ Run Time (s): real 7.583 user 0.375000 sys 0.015625This is on par from what we achieved with the
read_text()
and explicitly casting its content
column to the JSON
-type.
Let's allow for one level:
DESCRIBE SELECT * FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' , maximum_depth = 1 )At
maximum_depth = 1
we get at least extraction into separate columns.
However, the column data type will still be the generic JSON
-type:
┌─────────────┬─────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├─────────────┼─────────────┼─────────┼─────────┼─────────┼─────────┤ │ variables │ JSON │ YES │ NULL │ NULL │ NULL │ └─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘ Run Time (s): real 8.521 user 0.343750 sys 0.000000We could allow another level:
DESCRIBE SELECT * FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' , maximum_depth = 2 )At
maximum_depth = 2
, the type detected for the variables
column is a bag of properties of the JSON
-type.
In the DuckDB type system, this is represented as MAP( VARCHAR, JSON )
:
┌─────────────┬────────────────────┬─────────┬─────────┬─────────┬─────────┐ │ column_name │ column_type │ null │ key │ default │ extra │ │ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │ ├─────────────┼────────────────────┼─────────┼─────────┼─────────┼─────────┤ │ variables │ MAP(VARCHAR, JSON) │ YES │ NULL │ NULL │ NULL │ └─────────────┴────────────────────┴─────────┴─────────┴─────────┴─────────┘ Run Time (s): real 8.229 user 0.343750 sys 0.000000This is interesting! So far we've seen
STRUCT
s, but not MAP
s before.
JSON
-objects as DuckDB MAP
s
Let's attempt to rewrite the original query using the MAP( VARCHAR, JSON )
-typed value we get from the JSON-reader when we pass maximum_depth = 2
:
WITH variables AS ( SELECT unnest( map_keys( variables ) ) AS field_code , unnest( map_values( variables ) ) AS "variable" FROM read_json( 'https://api.census.gov/data/2000/dec/sf1/variables.json' , maximum_depth = 2 ) ) SELECT field_code , "variable"->>'label' AS label , "variable"->>'concept' AS concept , "variable"->>'predicateType' AS predicate_type , "variable"->>'group' AS group FROM variables WHERE field_code NOT IN ('for', 'in', 'ucgid') ORDER BY field_codeThe main difference as compared to using only the
JSON
-type, is that the MAP
-type lets us extract both the keys as well as the corresponding values using map_keys()
and map_values()
respectively.
This means the final SELECT
does not have to extract the variable object explicitly using the key: instead, we can immediately extract properties of interest from the variable objects.
In addition, the main difference with the original query is that the JSON-reader now hands the variable objects to us as
JSON
-typed values.
The original query relied on implicit typecasting, which does not happen here.
If we download the JSON file and store it on the local disc, and modify our queries to use that, the results are quite remarkable: the one using
read_text()
and the explicit cast to the JSON
-type takes about 12 seconds.
However, that result is smashed completely by the last query, which now takes just half a second!
A really nice improvement!
Conclusion
-
DuckDB's feature to
SELECT
directly from a file or URL is convenient, but it pays off to examine the underlying reader invocation and its parameters. -
When using the JSON-reader, it's a good idea to examine whether the detected datatype is suitable for the extraction you're attempting to perform.
By default, the JSON-reader will exhaustively detect the type.
If you find the detected type is large as compared to your dataset, and/or more precise or detailed than required for your purpose, try using the
maximum_depth
parameter to curtail the type detection. -
If you're using the JSON-reader and find that your extraction logic relies on
JSON
-type functions and operators, then beware of implicit casting of your extracted data to theJSON
-type. Implicit casting to theJSON
-type may point to an opportunity to limit the level of data type detection. -
When reading JSON-data, you can try extracting from a
JSON
-typed value - either by loading your data to a table with aJSON
-typed column, or using a combination ofread_text()
and an explicit cast to theJSON
-type. Even if your goal is use the JSON-reader, theJSON
-typed value is a useful baseline: if you find that your JSON-reader based query is slower than the one on theJSON
-typed value, it means you have an opportunity to improve. -
If you're using
json_keys()
with the purpose of extracting objects or values from the JSON-data, then consider using the JSON-reader and configuring it so that it returns aMAP
-type. TheMAP
functionsmap_keys()
andmap_values()
are really fast, and may help you avoid an an extra step to use the key to extract an object. -
It would be nice if
STRUCT
s would be a little bit more flexible. For example, I'd like to be able to extract field values non-constant expressions. It would also be useful to have a function to obtain their keys, and it would help to check if it has a particular key. Another feature that would also have helped is if it would be possible to defineSTRUCT
-fields to be optional, or to have a default value.