Unordered arguments to join-append

Noticed that UnorderedAppend is currently unused. Both JOIN and APPEND assume their arguments are ordered, but do not check this. All existing resolvers comply with this constraint, and the beforeFirst() operation implies it, however it is not guarenteed by the Tuples interface.

Note:

  • Performance for unordered arguments to append is O(N) in the presence of a prefix as this requires a filter.
  • Performance for unordered arguments to join is O(XN) where X is the product of the size of arguments to the tuples left.
  • Cost to order an argument is O(NlogN)

It is worth noting that the extra cost of a single unordered (as opposed to ordered) argument in the leftmost position is 0 (This is not entirely true with XA2 as unordered iteration will have a - probably small - impact on disk cache performance).

Also remember that the cost of linear searches is effectively zero when the entire tuples is resident in memory; it is disk IO that is our primary performance limit, not linear complexity on in-memory structures.

Therefore I recommend that we address this by introducing a check in the append code that materialises any unordered arguments large enough to qualify, and filters any small enough to qualify as resident. It is also worth considering that those small enough to provide a small advantage to leaving unordered, are by definition small enough that the cost of sorting is negligible anyway, so consider combining all unordered arguments in a single sorted tuples.

It is however worth delaying this until the last possible moment (ie. a beforeFirst called with a prefix), as join unification may desire a different column order, in which case we will be wanting to ensure we support it.

original page by Andrae Muys