# Features ## General - [x] SQL Parser - [x] SQL Query Planner - [x] DataFrame API - [x] Parallel query execution - [x] Streaming Execution ## Optimizations - [x] Query Optimizer - [x] Constant folding - [x] Join Reordering - [x] Limit Pushdown - [x] Projection push down - [x] Predicate push down ## SQL Support - [x] Type coercion - [x] Projection (`SELECT`) - [x] Filter (`WHERE`) - [x] Filter post-aggregate (`HAVING`) - [x] Sorting (`ORDER BY`) - [x] Limit (`LIMIT`) - [x] Aggregate (`GROUP BY`) - [x] cast /try_cast - [x] [`VALUES` lists](https://www.postgresql.org/docs/current/queries-values.html) - [x] [String Functions](./sql/scalar_functions.md#string-functions) - [x] [Conditional Functions](./sql/scalar_functions.md#conditional-functions) - [x] [Time and Date Functions](./sql/scalar_functions.md#time-and-date-functions) - [x] [Math Functions](./sql/scalar_functions.md#math-functions) - [x] [Aggregate Functions](./sql/aggregate_functions.md) (`SUM`, `MEDIAN`, and many more) - [x] Schema Queries - [x] `SHOW TABLES` - [x] `SHOW COLUMNS FROM ` - [x] `SHOW CREATE TABLE ` - [x] Basic SQL [Information Schema](./sql/information_schema.md) (`TABLES`, `VIEWS`, `COLUMNS`) - [ ] Full SQL [Information Schema](./sql/information_schema.md) support - [x] Support for nested types (`ARRAY`/`LIST` and `STRUCT`. - [x] Read support - [x] Write support - [x] Field access (`col['field']` and [`col[1]`]) - [x] [Array Functions](./sql/scalar_functions.md#array-functions) - [x] [Struct Functions](./sql/scalar_functions.md#struct-functions) - [x] `struct` - [ ] [Postgres JSON operators](https://github.com/apache/datafusion/issues/6631) (`->`, `->>`, etc.) - [x] Subqueries - [x] Common Table Expressions (CTE) - [x] Set Operations (`UNION [ALL]`, `INTERSECT [ALL]`, `EXCEPT[ALL]`) - [x] Joins (`INNER`, `LEFT`, `RIGHT`, `FULL`, `CROSS`) - [x] Window Functions - [x] Empty (`OVER()`) - [x] Partitioning and ordering: (`OVER(PARTITION BY <..> ORDER BY <..>)`) - [x] Custom Window (`ORDER BY time ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING)`) - [x] User Defined Window and Aggregate Functions - [x] Catalogs - [x] Schemas (`CREATE / DROP SCHEMA`) - [x] Tables (`CREATE / DROP TABLE`, `CREATE TABLE AS SELECT`) - [x] Data Insert - [x] `INSERT INTO` - [x] `COPY .. INTO ..` - [x] CSV - [x] JSON - [x] Parquet - [ ] Avro ## Runtime - [x] Streaming Grouping - [x] Streaming Window Evaluation - [x] Memory limits enforced - [x] Spilling (to disk) Sort - [x] Spilling (to disk) Grouping - [x] Spilling (to disk) Sort Merge Join - [ ] Spilling (to disk) Hash Join ## Data Sources In addition to allowing arbitrary datasources via the [`TableProvider`] trait, DataFusion includes built in support for the following formats: - [x] CSV - [x] Parquet - [x] Primitive and Nested Types - [x] Row Group and Data Page pruning on min/max statistics - [x] Row Group pruning on Bloom Filters - [x] Predicate push down (late materialization) [not by default](https://github.com/apache/datafusion/issues/3463) - [x] JSON - [x] Avro - [x] Arrow [`tableprovider`]: https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html