HOWTOs¶
How to update the version of Rust used in CI tests¶
Make a PR to update the rust-toolchain file in the root of the repository:
How to add a new scalar function¶
Below is a checklist of what you need to do to add a new scalar function to DataFusion:
Add the actual implementation of the function to a new module file within:
New function modules - for example a
vectormodule, should use a rust feature (for examplevector_expressions) to allow DataFusion users to enable or disable the new module as desired.The implementation of the function is done via implementing
ScalarUDFImpltrait for the function struct.See the advanced_udf.rs example for an example implementation
Add tests for the new function
To connect the implementation of the function add to the mod.rs file:
a
mod xyz;where xyz is the new module filea call to
make_udf_function!(..);an item in
export_functions!(..);
In sqllogictest/test_files, add new
sqllogictestintegration tests where the function is called through SQL against well known data and returns the expected result.Documentation for
sqllogictesthere
Add SQL reference documentation here
An example of this being done can be seen here
Run
./dev/update_function_docs.shto update docs
How to add a new aggregate function¶
Below is a checklist of what you need to do to add a new aggregate function to DataFusion:
Add the actual implementation of an
AccumulatorandAggregateExpr:In datafusion/expr/src, add:
a new variant to
AggregateFunctiona new entry to
FromStrwith the name of the function as called by SQLa new line in
return_typewith the expected return type of the function, given an incoming typea new line in
signaturewith the signature of the function (number and types of its arguments)a new line in
create_aggregate_exprmapping the built-in to the implementationtests to the function.
In sqllogictest/test_files, add new
sqllogictestintegration tests where the function is called through SQL against well known data and returns the expected result.Documentation for
sqllogictesthere
Add SQL reference documentation here
An example of this being done can be seen here
Run
./dev/update_function_docs.shto update docs
How to display plans graphically¶
The query plans represented by LogicalPlan nodes can be graphically
rendered using Graphviz.
To do so, save the output of the display_graphviz function to a file.:
// Create plan somehow...
let mut output = File::create("/tmp/plan.dot")?;
write!(output, "{}", plan.display_graphviz());
Then, use the dot command line tool to render it into a file that
can be displayed. For example, the following command creates a
/tmp/plan.pdf file:
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
How to format .md document¶
We are using prettier to format .md files.
You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx required a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).
$ prettier --version
2.3.0
After you’ve confirmed your prettier version, you can format all the .md files:
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
How to format .toml files¶
We use taplo to format .toml files.
For Rust developers, you can install it via:
cargo install taplo-cli --locked
Refer to the Installation section on other ways to install it.
$ taplo --version
taplo 0.9.0
After you’ve confirmed your taplo version, you can format all the .toml files:
taplo fmt
How to update protobuf/gen dependencies¶
The prost/tonic code can be generated by running ./regen.sh, which in turn invokes the Rust binary located in ./gen
This is necessary after modifying the protobuf definitions or altering the dependencies of ./gen, and requires a
valid installation of protoc (see installation instructions for details).
./regen.sh
How to add/edit documentation for UDFs¶
Documentations for the UDF documentations are generated from code (related github issue). To generate markdown run ./update_function_docs.sh.
This is necessary after adding new UDF implementation or modifying existing implementation which requires to update documentation.
./dev/update_function_docs.sh