Jump to content

Algorithm Implementation/Sorting/Schwartzian transform

From Wikibooks, open books for an open world

Perl's compact list-manipulation functions can perform the entire transform in a single statement, although it must be written backwards:

@sorted_array = 
       map  { $_->[0] }              # extract original list elements
       sort { $a->[1] <=> $b->[1] }  # sort list by keys
       map  { [$_, -M $_] }          # pair up list elements with keys
       @files_array;

This is a complete Schwartzian Transform, showing a multi-stage sort based on memoized keys of size and modtime.

my @sorted_files =
  map $_->[0], # extract original name
  sort {
       $a->[1] <=> $b->[1] # sort first numerically by size (smallest first)
    or $b->[2] <=> $a->[2] # then numerically descending by modtime age (oldest first)
    or $a->[0] cmp $b->[0] # then stringwise by original name
  }
  map [$_, -s $_, -M $_], # compute tuples of name, size, modtime
  glob "*"; # of all files in the directory

Python

[edit | edit source]

Python programmers use the transform in sorts where the comparison operation may be expensive.

In this example, f(x) returns a key which is suitable for using for sorting. For instance, it might return the length of x, or it might do a database lookup based on the value of x.

# python2.4
new_list = sorted(old_list, key=f)

The key function can return a tuple of values if secondary and further sort keys are desired, taking advantage of the fact that tuples are ordered first by the first key, and then by the second key if the first ties, etc. If the default sort for the keys is not enough, a comparator function can be given with an additional keyword argument cmp, for example:

# python2.4; removed in python3.0
new_list = sorted(old_list, key=f, cmp=lambda a,b:cmp(a[0],b[0]) or cmp(b[1],a[1]))

Note that the key function f should do the expensive work as it's called only once for each key whereas the comparator function is called each time the sort algorithm has to compare two elements. A comparator function can no longer be used in Python 3.0; only a key. The function sorted is new in Python 2.4, before that only method sort was available and more than one statement was required. The key function is also new in Python 2.4 so in Python 2.3 one had to spell out the transform in a way similar to the following, taking advantage of the fact that the first element of a tuple will be used for comparison first:

# python2.3
new_list = [(f(x), x) for x in old_list]
new_list.sort()
new_list = [x[1] for x in new_list]

If desired, a comparator function can be given to the sort here as well (Python 2.4; removed in Python 3.0).

The following is a version that is similar to the Perl idiom (except for the extra parentheses at the end), again taking advantage of sorting of tuples by putting the function results first:

new_list = map(lambda x: x[1],
           sorted(
           map(lambda x: (f(x), x),
               old_list)))

The zip built-in can be used instead of the inline lambda function giving:

new_list = zip(*
           sorted(
           zip(map(f, old_list), 
               old_list)))[1]

Ruby programmers have their own shortcut.

new_list = old_list.sort_by {|file| file.mtime }

However, this is not the full Schwartzian Transform, because the comparison of the memoized keys is fixed. (Though there is a trick for getting around that.)

Full Schwartzian Transform

[edit | edit source]

A true Schwartzian Transform also spells out how the various memoized keys are compared (as strings, as numbers) including lazy logic that can avoid comparisons in secondary and tertiary keys if distinction in the primary keys suffice.

sorted_files = Dir["*"].                         # Get all files
                 collect{|f| [f, test(?s, f), test(?M, f)]}.   # compute tuples of name, size, modtime
                 sort {|a, b|                    # sort
                   a[1] <=> b[1] or              #   -- by increasing size
                   b[2] <=> a[2] or              #   -- by age descending
                   a[0] <=> b[0]                 #   -- by name
                 }.collect{|a| a[0]}             # extract original name

Note that since collect, sort, etc. are all method calls, that the actions read forwards, rather than backwards as in the original Perl.

And now for the trick to make the shortcut method do everything that the full transform can. The trick is that in Ruby objects know how to sort themselves, and arrays sort lexicographically (i.e. by comparing the first value, then second value, etc.). This means that you can construct values that will sort themselves by fairly complex logic.

# Here is the helper class.
class InReverse
  include Comparable
  
  attr :value
  
  def initialize (value)
    @value = value
  end
  
  def <=> (other)
    other.value <=> @value
  end
end

# And now we can do the complex transform using sort_by.
sorted_files = Dir["*"].sort_by{|f| [test(?s, f), InReverse.new(test(?M, f)), f]}

Haskell

[edit | edit source]

This is the idiomatic Haskell way of comparing by a transformed version of the elements:

import Data.List (sortBy)
import Data.Ord (comparing)

sortOn f = sortBy (comparing f)

Here the "comparing" function takes a function and returns a comparison function based on comparing the results of the function application. It is defined like this:

comparing f x y = compare (f x) (f y)

Here is a version that identically mimicks the traditional Perl idiom:

import Data.List (sortBy)
import Data.Ord (comparing)

sortOn' f = map fst . sortBy (comparing snd) . map (\x -> (x, f x))

However, we can take advantage of the fact that tuples are first ordered by their first element, then second, etc., by putting the function results as the first element, and then using the default comparator:

import Data.List (sort)

sortOn'' f = map snd . sort . map (\x -> (f x, x))

Since the tuple compare requires both element types to be instances of Ord, however, this last implementation has a subtly different type:

sortOn, sortOn' :: (Ord b) => (a -> b) -> [a] -> [a]
sortOn'' :: (Ord a, Ord b) => (a -> b) -> [a] -> [a]

OCaml

[edit | edit source]

This is probably the easiest way to do it in OCaml:

let comparing f x y = compare (f x) (f y)

let newList = List.sort (comparing f) oldList

Here is a version that identically mimicks the traditional Perl idiom (except for the extra parentheses):

let comparing f x y = compare (f x) (f y)

let newList = List.map fst (
              List.sort (comparing snd) (
              List.map (fun x -> x, f x)
                oldList))

However, we can take advantage of the fact that tuples are first ordered by their first element, then second, etc., by putting the function results as the first element, and then using the default comparator:

let newList = List.map snd (
              List.sort compare (
              List.map (fun x -> f x, x)
                oldList))

Javascript

[edit | edit source]

In modern versions of Javascript (Firefox 1.5, Internet Explorer 9, Opera 9.6, Chrome, Safari 3, etc.; standardized in ECMAScript edition 5), arrays have a map function in their prototype, allowing one to easily write a Schwartzian transform:

function sortOn(arr, fun) {
    return arr.map(function (x) {
	    return [ x, fun(x) ];
	}).sort(function (a, b) {
            // If fun(a) always returns a number, this can be replaced by: return a[1] - b[1];
	    return a[1] < b[1] ? -1 : a[1] > b[1] ? 1 : 0;
	}).map(function (x) {
	    return x[0];
	});
}

Example usage:

js> print(sortOn([ 'Charlie', 'alice', 'BOB' ], function (s) { return s.toUpperCase(); }));
alice,BOB,Charlie

For older browsers, a compatibility map prototype function can be used.