延遲排序

wiki

在快速排序內，有一棵二元樹等著被發現。

基本的快速排序演算法看起來像這樣

To quicksort(Set)
  If Set is empty, then return an empty vector
  Otherwise
    Select a value in Set, Pivot
    Make Left from all values in Set less than Pivot
    Make Right from all values in Set greater than Pivot
    Return the concatenation of Quicksort(Left), Pivot, Quicksort(Right)

如果在最後一個步驟中的串接替換為建立一個二元樹節點，那麼這棵二元樹就會變得非常明顯。然後可以透過壓平二元樹來建立最終的向量。

類似地，我們可以透過先建立一個從 1 到 n 的整數向量，然後將向量的元素相乘來計算階乘 (factorial)(n)。

在這兩種情況下，我們都有某種「虛擬」的中間結構，實際上我們會將它納入程式流程中以將其消滅。[1] 然而，有時揭露中間結構可能會有用。

例如，假設我們有一個非常大的家庭收入向量，比如一百萬個家庭。我們想要計算一些衡量所得不平等的指標；常用的指標是找出最高五分之一（也就是收入最高的 20% 人口）的平均收入與最低五分之一（也就是收入最低的 20% 人口）的平均收入之比。

要計算這個數值，我們需要對收入向量進行半排序。理想情況下，我們只想將其排序到足以讓我們萃取（無序）第一和最後五分之一。一種達成這項目的通用方法是執行延遲的快速排序演算法；首先，當我們到達對應於 20% 點的節點時停止；接著我們重新開始，在到達 80% 點時停止。為了（通常）做到這一點，我們需要讓至少部分隱含二元樹保持存在。

一種方法是變更快速排序，以便將遞迴替換為「承諾」；亦即，我們不實際進行遞迴，而是建立一個函式封閉，而它在被呼叫時將執行下一個遞迴步驟。

Lua 發行版包含 sort.lua，其中有一個快速排序的簡易實作；略作簡化後，核心如下

function qsort(vec, low, high)
  if low < high then
    local middle = partition(vec, low, high)
    qsort(vec, low, middle-1)
    qsort(vec, middle+1, high)
  end
end

我將 partition 函式分開，並將變數改為更有意義的名稱，以便讓演算法更清晰。

延遲版本看起來像這樣

-- Ensure that that the element at i is in the right position,
-- and return a closure which can be used for continuing the sort.
function quicksorter(i, vec, low, high)
  if low >= high then
    return quicksorter
  else -- low < high
    -- partition the vector and initialize the child closures
    local middle = partition(vec, low, high)
    local left, right = quicksorter

    -- Create the promise
    local function self(i, vec, low, high)
      if i < middle then
        left = left(i, vec, low, middle-1)
        return self
      elseif i > middle then
        right = right(i, vec, middle+1, high)
        return self
      end
    end
    
    -- Force the promise until i is in the right position
    return self(i, vec, low, high)
  end
end

function lazysort(vec, low, high)
  local sorter = quicksorter
  return function(i)
    sorter = sorter(i, vec, low, high)
    return vec[i]
  end
end

我並沒有實際測試上述程式碼，但我確實寫了一個延遲排序的真實版本；你可以從 [這裡] 下載。真實版本包含幾個效能：首先，它對小型範圍使用插入排序；其次，它合併二元樹以使其始終完整。

範例程式碼包含一個測試套件和基準，以及一些註解說明部分，可透過使用具有哨兵索引值遞迴封閉來檢視「虛擬」二元樹。我用它來製作以下說明，說明它如何運作

首先，讓我們測試它

function rangemean(v, low, high)
  local sum = v[low]
  for i = low+1, high do sum = sum + v[i] end
  return sum / (high - low + 1)
end

function disparity(v)
  local getter = lazysort(v)
  local bottom = math.ceil(#v / 5)
  getter(bottom)
  local top = math.floor(#v * 4 / 5)
  getter(top)
  return rangemean(v, top, #v) / rangemean(v, 1, bottom)
end

> math.randomseed(31415926)
> test = {}; for i = 1, 1e6 do test[i] = math.random(1e4) end
> now = os.clock(); print(disparity(test)); print(os.clock() - now)
8.9768793870336
0.6796875
>
> -- By comparison:
>
> math.randomseed(31415926)
> test = {}; for i = 1, 1e6 do test[i] = math.random(1e4) end
> now = os.clock(); table.sort(test); print (rangemean(test,8e5,1e6)/rangemean(test,1,2e5)); print(os.clock() - now)
8.9768793870336
1.8828125

所以，即使它是用 Lua 編寫，而 table.sort 是用 C 編寫，對於這個運算來說，它快了三倍。換句話說，它不是玩具。

它是這樣運作的。我們取一個較小的範例

> test = {}; for i = 1, 500 do test[i] = math.random(1e4) end
> getter = lazysort(test)
> =getter(100)
1954
>
> -- By uncommenting the viewing code, we can look at the tree structure:
> getter""
Root
+-Sorted [1, 1)
+-Node [1, 501)
| +-Node [1, 249)
| | +-Node [1, 169)
| | | +-Leaf [1, 47) Unsorted
| | | +-Sorted [47, 48)
| | | +-Node [48, 169)
| | |   +-Leaf [48, 83) Unsorted
| | |   +-Sorted [83, 103)
| | |   +-Leaf [103, 169) Unsorted
| | +-Sorted [169, 170)
| | +-Leaf [170, 249) Unsorted
| +-Sorted [249, 250)
| +-Leaf [250, 501) Unsorted
+-Sorted [501, 500)
> 
> -- "Partitioned" would be a better word than "Unsorted". 
> -- Now, let's ask for another element, some distance away
>
> =getter(400)
7877
> getter""
Root
+-Sorted [1, 1)
+-Node [1, 501)
| +-Node [1, 249)
| | +-Node [1, 169)
| | | +-Leaf [1, 47) Unsorted
| | | +-Sorted [47, 48)
| | | +-Node [48, 169)
| | |   +-Leaf [48, 83) Unsorted
| | |   +-Sorted [83, 103)
| | |   +-Leaf [103, 169) Unsorted
| | +-Sorted [169, 170)
| | +-Leaf [170, 249) Unsorted
| +-Sorted [249, 250)
| +-Node [250, 501)
|   +-Leaf [250, 382) Unsorted
|   +-Sorted [382, 383)
|   +-Node [383, 501)
|     +-Node [383, 441)
|     | +-Leaf [383, 391) Unsorted
|     | +-Sorted [391, 408)
|     | +-Leaf [408, 441) Unsorted
|     +-Sorted [441, 442)
|     +-Leaf [442, 501) Unsorted
+-Sorted [501, 500)
>
> -- Leaf [250, 501) has now been expanded down to element 400.
>
> -- Now, let's watch as it merges tree nodes.
>
> =getter(387)
7546
> getter""
Root
+-Sorted [1, 1)
+-Node [1, 501)
| +-Node [1, 249)
| | +-Node [1, 169)
| | | +-Leaf [1, 47) Unsorted
| | | +-Sorted [47, 48)
| | | +-Node [48, 169)
| | |   +-Leaf [48, 83) Unsorted
| | |   +-Sorted [83, 103)
| | |   +-Leaf [103, 169) Unsorted
| | +-Sorted [169, 170)
| | +-Leaf [170, 249) Unsorted
| +-Sorted [249, 250)
| +-Node [250, 501)
|   +-Leaf [250, 382) Unsorted
|   +-Sorted [382, 408)
|   +-Node [408, 501)
|     +-Leaf [408, 441) Unsorted
|     +-Sorted [441, 442)
|     +-Leaf [442, 501) Unsorted
+-Sorted [501, 500)
>
> Leaf [383, 391] has been fully sorted, so it can be merged into it's parent, Node [383, 441).
> That in turn creates another merge, leaving the tree as shown above.

[1]若要認識略為學術化（但仍可理解）的討論，請閱讀 Lex Augusteijn 關於[排序態射。]寫的有趣論文。

最近變更 · 環境設定
編輯 · 歷程
最後編輯於 2008 年 5 月 12 日下午 12:13 GMT (差異)