Presto, the "preform SQL-on-anything" engine
slides, and the main url
tuning video from presto creator
common table expressions
https://chartio.com/resources/tutorials/using-common-table-expressions/
https://github.com/prestodb/presto/issues/3431
some query tuning references
https://gist.github.com/ian-whitestone/211d0a27687ac3daa3f49699b4606613
understanding the presto query engine and it's optimizations under the hood
what is presto
- distributed query engine for SQL queries
- uses ANSI sql as a query language, no matter what the underlying source is
- runs on prem, on cloud, query all sorts of underlying systems
- no ETL needed
- query data where it is, no need to copy data around (derived tables etc)
presto query execution lifecycle

- this is the important part to optimize
- 3 components
a look into the coordinator

- parsing
- validating input types and values
- operators have the right number of arguments
- sequence of SQL clauses are correct (FROM comes after SELECT etc)
- important to for presto to perform planning of the query execution

- analysis
- takes the tree and makes sure that the tree represents a query that is semantically valid
- e.g. meaning it makes sure the components of the values work with the given operators
- e.g. if we have a group by clause and a select clause, the expresisons in the select clause should satisfy the requirements of the group by
- it starts from the FROM clause
- presto looks for the table and gets its schema using presto's 'connectors'
- then it assigns types to all the nodes (ref above: orderstatus, totalprice, orderdate etc)
- then it looks at its parent operators (e.g. the larger than operator from example above) and resolves that operator from the registry of operators it has (a larger than function that works compares two DATE objects and returns a boolean)
- at the end of the process the analysis step ends up with a tree that's annotated with all the information that's required by the engine to plan it's query.

- planning
- turns a syntax tree into an intermediate representation (Plan IR)

- Optimization
- is the process of applying a set of semantic preserving transformations to the plan to produce a more optimal plan that can be executed
- semantic preserving transformations mean: it transforms a plan, and at every transformation step, it guarantees that the output of the transformed plan matches the output of the original plan
- the end result is a set of fragments (the dotted line boxes) that represent an abstract topology of the query that will be executed in the cluster
- e.g. the Scan operation and Filter operation gets turned into a Filtered Scan operation (which is faster)

- Scheduling and execution
- coordinator identifies workers that can do the work
- submits the fragments of work (logic)
- submits the data that the worker needs to work on (data)
- wires the fragment dependencies according to the linkage structure (N:M exchange, 1:M exchange in the optimization chart) to satisfy that topology.
explain the EXPLAIN

- given a query, how is the engine going to be structuring and optimizing the query?
- use the keyword EXPLAIN to see how presto is going to represent the nodes internally
- instead of the uery, it will output a series of fragments
cost based optimizer