1. Compilation of database queries to low-level code
Modern database query engines interpret queries -- using internal representations close to relational algebra -- and database schemas. This enables database systems to deal with dynamically changing workloads. However, most database queries executed today are hardwired in applications. DBToaster eliminates the need to interpret queries and data. It compiles SQL queries to low-level code, eliminating all overheads resulting from interpretation.
2. Code generation for C++, Scala, and Java
DBToaster is able to generate both C++ and Scala code. The code can be integrated into applications written in these languages. Since Scala lives in the Java ecosystem and compiles to Java Bytecode, code generated by DBToaster can be linked into Java applications, too.
3. Embedded query engines
Code generated by DBToaster can be linked into applications. No separate runtime system is needed.
4. Online query processing
DBToaster generates code that maintains a query result as an in-memory materialized view which is kept fresh as a stream of updates to the base data (the virtual relational database over which the query is asked) arrives. DBToaster code can also be used in alternative ways. For example, one could start from an empty database and use the update stream to load the database. This will cause DBToaster to evaluate the query online, maintaining an accurate temporary query result, at all times, as the data is loaded.
5. Support for standard SQL semantics
DBToaster supports traditional SQL semantics, rather than window semantics. This is worth noting since due to its speed and ability to process high-volume update streams, DBToaster will naturally be used in data stream processing applications. Current data stream processing systems work with limited window semantics to be able to process data streams. Since DBToaster does not suffer from this restriction, it can process update streams and can efficiently combine data streams with historical data.
Note: DBToaster's handling of the domains of GROUP-BY aggregate groups does not fully comply to the SQL standards. The discrepancy is minor: DBToaster's result is always complete according to the SQL standard, but may contain additional null value rows that an application can simply ignore. See the DBToaster documentation for details. This incompatibility will be fixed soon.
6. Materialized views of nested queries
DBToaster supports efficient materialized views of nested SQL queries. Many commercial database systems support materialized views / incremental view maintenance, but no other system does so for nested SQL queries, even though they are essential for complex analytics.
Nesting refers to the presence of select-statements (SQL queries) in the SELECT, FROM, or WHERE clauses of SQL queries.
7. The Viewlet Transform
Modern database management systems frequently support incremental view maintenance, a mechanism for taking a shortcut to efficiently refresh a materialized view when the base data changes. Rather than re-evaluating the query that defines the view, an alternative query (the delta query) is executed which determines what changes need to be effected on the materialized view to bring it up to date. DBToaster deploys a unique mechanism, the viewlet transform to shortcut this computation much more aggressively, reducing the amount of work necessary to refresh a view dramatically.
8. Powerful optimizers
DBToaster is a powerful optimizing compiler that implements state-of-the-art optimizations both from the databases and compilers research literature. Optimization is performed at multiple stages and levels or abstraction, from SQL and DBToaster's internal calculus to the backend functional and imperative code representations.
9. Much more to come
We have big plans for DBToaster, and some key goals are outlined below. Purely regarding performance, we expect to be able to speed up our generated code by another two orders of magnitude in the typical case judging purely from our reverse-engineering of the code we currently produce. Also, sometimes our generated C++ code outperforms Scala substantially, and in other cases it is exactly the other way around, which suggests that we are not even close to being as fast as we can.
10. Feature Roadmap
|Milestone||Expected Date||Feature Summary|
10.1. SQL92 Support
DBToaster presently only supports the COUNT, COUNT DISTINCT, SUM, and AVG aggregates. Support for MIN and MAX is slated for Milestone 3.
DBToaster does not presently suport the DISTINCT, UNION, LIMIT, ORDER BY and HAVING clauses of SELECT statements. Support for DISTINCT, UNION, and HAVING is slated for Milestone 1. Support for LIMIT and ORDER BY is slated for Milestone 4.
DBToaster does not presently support SQL's NULL value semantics (including OUTER JOINs). We are investigating several potential solutions, and will commit to a milestone once more research has been performed.
All other unsupported features of SQL92 will be implemented in a future release.
DBToaster's internal aggregate calculus has several properties that make it extremely amenable to distribution. We are in the process of implementing a scalable distributed runtime for DBToaster, slated for release as Milestone 2.
10.3. Dynamic Runtimes
We are aware of demand for a platform for executing DBToaster-generated engines, where the query workload can be managed dynamically (i.e., queries can be added/removed at runtime). This feature is slated for release, but at present we do not have the resources to commit to a specific milestone.
10.4. On-Demand Template Execution
A powerful application of DBToaster is for evaluating template-style queries. When an application is compiled, a DBToaster-generated engine could be produced to efficiently support evaluation of one or more queries with externally-bound variables. Presently, a fragment of such queries can be implemented by rewriting the query to include externally bound variables as output columns. We hope to have this feature implemented in an upcoming milestone release