DML¶
DML stands for “Data Manipulation Language” and relates to inserting and modifying data in tables.
COPY¶
Copies the contents of a table or query to file(s). Supported file
formats are parquet, csv, json, and arrow.
COPY { table_name | query }
TO 'file_name'
[ STORED AS format ]
[ PARTITIONED BY column_name [, ...] ]
[ OPTIONS( option [, ... ] ) ]
STORED AS specifies the file format the COPY command will write. If this
clause is not specified, it will be inferred from the file extension if possible.
PARTITIONED BY specifies the columns to use for partitioning the output files into
separate hive-style directories. By default, columns used in PARTITIONED BY will be removed
from the output format. If you want to keep the columns, you should provide the option
execution.keep_partition_by_columns true. execution.keep_partition_by_columns flag can also
be enabled through ExecutionOptions within SessionConfig.
The output format is determined by the first match of the following rules:
Value of
STORED ASFilename extension (e.g.
foo.parquetimpliesPARQUETformat)
For a detailed list of valid OPTIONS, see Write Options.
Examples¶
Copy the contents of source_table to file_name.json in JSON format:
> COPY source_table TO 'file_name.json';
+-------+
| count |
+-------+
| 2 |
+-------+
Copy the contents of source_table to one or more Parquet formatted
files in the dir_name directory:
> COPY source_table TO 'dir_name' STORED AS PARQUET;
+-------+
| count |
+-------+
| 2 |
+-------+
Copy the contents of source_table to multiple directories
of hive-style partitioned parquet files:
> COPY source_table TO 'dir_name' STORED AS parquet, PARTITIONED BY (column1, column2);
+-------+
| count |
+-------+
| 2 |
+-------+
If the the data contains values of x and y in column1 and only a in
column2, output files will appear in the following directory structure:
dir_name/
column1=x/
column2=a/
<file>.parquet
<file>.parquet
...
column1=y/
column2=a/
<file>.parquet
<file>.parquet
...
Run the query SELECT * from source ORDER BY time and write the
results (maintaining the order) to a parquet file named
output.parquet with a maximum parquet row group size of 10MB:
> COPY (SELECT * from source ORDER BY time) TO 'output.parquet' OPTIONS (MAX_ROW_GROUP_SIZE 10000000);
+-------+
| count |
+-------+
| 2 |
+-------+
INSERT¶
Examples¶
Insert values into a table.
INSERT INTO table_name { VALUES ( expression [, ...] ) [, ...] | query }
> INSERT INTO target_table VALUES (1, 'Foo'), (2, 'Bar');
+-------+
| count |
+-------+
| 2 |
+-------+