In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.
Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.
The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.
More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.
Okasaki passingly mentions that global rebuilding “can be usefully viewed as running the rebuilding transformation as a coroutine”. Also, the situation described above is quite reminiscent of garbage collection. There the classic half-space stop-the-world copying collector is naturally the batched rebuilding version. More incremental versions often have read or write barriers and break the garbage collection into incremental steps. Garbage collection is also often viewed as two processes coroutining.
The goal of this article is to derive global rebuilding-based data structures from
an expression of them as two coroutining processes. Ideally, we should be able to
take a data structure implemented via batched rebuilding and simply run the batch
rebuilding step as a coroutine. Modifying the data structure’s operations and the
rebuilding step should, in theory, just be a matter of inserting appropriate yield
statements. Of course, it won’t be that easy since the batched version of rebuilding
doesn’t need to worry about concurrent updates to the original data structure.
In theory, such a representation would be a perfectly effective way of articulating the global rebuilding version of the data structure. That said, I will be using the standard power move of CPS transforming and defunctionalizing to get a more data structure-like result.
I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with continuations. In that context, a “process” written in continuation-passing style “yields” to the scheduler by passing its continuation to a scheduling function. Normally, the scheduler would place that continuation at the end of a work queue and then pick up a continuation from the front of the work queue and invoke it resuming the previously suspended “process”. In our case, we only have two “processes” so our “work queue” can just be a single mutable cell. When one “process” yields, it just swaps its continuation into the cell and the other “process’” out and invokes the continuation it read.
Since the rebuilding process is always driven by the main process, the pattern is a bit more like generators. This has the benefit that only the rebuilding process needs to be written in continuation-passing style. The following is a very quick and dirty set of functions for this.
module Coroutine ( YieldFn, spawn ) where
import Control.Monad ( join )
import Data.IORef ( IORef, newIORef, readIORef, writeIORef )
type YieldFn = IO () -> IO ()
yield :: IORef (IO ()) -> IO () -> IO ()
= writeIORef
yield
resume :: IORef (IO ()) -> IO ()
= join . readIORef
resume
terminate :: IORef (IO ()) -> IO ()
= writeIORef yieldRef (ioError $ userError "Subprocess completed")
terminate yieldRef
spawn :: (YieldFn -> IO () -> IO ()) -> IO (IO ())
= do
spawn process <- newIORef undefined
yieldRef $ process (yield yieldRef) (terminate yieldRef)
writeIORef yieldRef return (resume yieldRef)
A simple example of usage is:
process :: YieldFn -> Int -> IO () -> IO ()
0 k = k
process _ = do
process yield i k putStrLn $ "Subprocess: " ++ show i
$ process yield (i-1) k
yield
example :: IO ()
= do
example <- spawn $ \yield -> process yield 10
resume 1 :: Int) .. 10] $ \i -> do
forM_ [(putStrLn $ "Main process: " ++ show i
resumeputStrLn "Main process done"
with output:
Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done
I’ll use queues since they are very simple and Purely Functional Data Structures describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global rebuilding. We’ll end up with something quite similar which could be made more similar by changing the rebuilding code. Indeed, the differences are just an artifact of specific, easily changed details of the rebuilding coroutine, as we’ll see.
The examples I’ll present are mostly imperative, not purely functional. There
are two reasons for this. First, I’m not focused on purely functional data structures
and the technique works fine for imperative data structures. Second, it is arguably
more natural to talk about coroutines in an imperative context. In this case,
it’s easy to adapt the code to a purely functional version since it’s not much
more than a purely functional data structure stuck in an IORef
.
For a more imperative structure with mutable linked structure and/or in-place array updates, it would be more challenging to produce a purely functional version. The techniques here could still be used, though there are more “concurrency” concerns. While I don’t include the code here, I did a similar exercise for a random-access stack (a fancy way of saying a growable array). There the “concurrency” concern is that the elements you are copying to the new array may be popped and potentially overwritten before you switch to the new array. In this case, it’s easy to solve, since if the head pointer of the live version reaches the source offset for copy, you can just switch to the new array immediately.
Nevertheless, I can easily imagine scenarios where it may be beneficial, if
not necessary, for the coroutines to communicate more and/or for there to be
multiple “rebuild” processes. The approach used here could be easily adapted
to that. It’s also worth mentioning that even in simpler cases, non-constant-time
operations will either need to invoke resume
multiple times or need more
coordination with the “rebuild” process to know when it can do more than a
constant amount of work. This could be accomplished by “rebuild” process
simply recognizing this from the data structure state, or some state could
be explicitly set to indicate this, or the techniques described earlier
could be used, e.g. a different process for non-constant-time operations.
The code below uses the extensions BangPatterns
, RecordWildCards
, and GADTs
.
We start with the straightforward, amortized constant-time queues where we push to a stack representing the back of the queue and pop from a stack representing the front. When the front stack is empty, we need to expensively reverse the back stack to make a new front stack.
I intentionally separate out the reverse step as an explicit rebuild
function.
module BatchedRebuildingQueue ( Queue, new, enqueue, dequeue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef )
data Queue a = Queue {
queueRef :: IORef ([a], [a])
}
new :: IO (Queue a)
= do
new <- newIORef ([], [])
queueRef return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
@(Queue { .. }) = do
dequeue q<- readIORef queueRef
(front, back) case front of
:front') -> do
(x
writeIORef queueRef (front', back)return (Just x)
-> case back of
[] -> return Nothing
[] -> rebuild q >> dequeue q
_
enqueue :: a -> Queue a -> IO ()
Queue { .. }) =
enqueue x (-> (front, x:back))
modifyIORef queueRef (\(front, back)
rebuild :: Queue a -> IO ()
Queue { .. }) =
rebuild (-> (reverse back, [])) modifyIORef queueRef (\([], back)
This step is where a modicum of thought is needed. We need to make the
rebuild
step from the batched version incremental. This is straightforward,
if tedious, given the coroutine infrastructure. In this case, we incrementalize
the reverse
by reimplementing reverse
in CPS with some yield
calls
inserted. Then we need to incrementalize append. Since we’re not waiting
until front
is empty, we’re actually computing front ++ reverse back
.
Incrementalizing append is hard, so we actually reverse front
and then
use an incremental reverseAppend
(which is basically what the incremental
reverse does anyway1).
One of first thing to note about this code is that the actual operations are
largely unchanged other than inserting calls to resume
. In fact, dequeue
is even simpler than in the batched version as we can just assume that front
is always populated when the queue is not empty. dequeue
is freed from the
responsibility of deciding when to trigger a rebuild. Most of the bulk of
this code is from reimplementing a reverseAppend
function (twice).
The parts of this code that require some deeper though are 1) knowing when
a rebuild should begin, 2) knowing how “fast” the incremental operations
should go2
(e.g. incrementalReverse
does two steps at a time and the
Hood-Melville implementation has an explicit exec2
that does two steps
at a time), and 3) dealing with “concurrent” changes.
For the last, Overmars describes a queue of deferred operations to perform
on the shadow copy once it finishes rebuilding. This kind of suggests a
situation where the “rebuild” process can reference some “snapshot” of
the data structure. In our case, that is the situation we’re in, since
our data structures are essentially immutable data structures in an IORef
.
However, it can easily not be the case, e.g. the random-access stack.
Also, this operation queue approach can easily be inefficient and inelegant.
None of the implementations below will have this queue of deferred operations.
It is easier, more efficient, and more elegant to just not copy over parts of
the queue that have been dequeued, rather than have an extra phase of the
rebuilding that just pops off the elements of the front
stack that we just
pushed. A similar situation happens for the random-access stack.
The use of drop
could probably be easily eliminated. (I’m not even sure it’s
still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.
module GlobalRebuildingQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
import Coroutine ( YieldFn, spawn )
data Queue a = Queue {
resume :: IO (),
frontRef :: IORef [a],
backRef :: IORef [a],
frontCountRef :: IORef Int,
backCountRef :: IORef Int
}
new :: IO (Queue a)
= do
new <- newIORef []
frontRef <- newIORef []
backRef <- newIORef 0
frontCountRef <- newIORef 0
backCountRef <- spawn $ const . rebuild frontRef backRef frontCountRef backCountRef
resume return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
= do
dequeue q
resume q<- readIORef (frontRef q)
front case front of
-> return Nothing
[] :front') -> do
(xpred
modifyIORef' (frontCountRef q)
writeIORef (frontRef q) front'return (Just x)
enqueue :: a -> Queue a -> IO ()
= do
enqueue x q :)
modifyIORef (backRef q) (xsucc
modifyIORef' (backCountRef q)
resume q
rebuild :: IORef [a] -> IORef [a] -> IORef Int -> IORef Int -> YieldFn -> IO ()
= let k = go k in go k where
rebuild frontRef backRef frontCountRef backCountRef yield = do
go k <- readIORef frontCountRef
frontCount <- readIORef backCountRef
backCount if backCount > frontCount then do
<- readIORef backRef
back <- readIORef frontRef
front
writeIORef backRef []0
writeIORef backCountRef $ \rback ->
incrementalReverse back [] $ \rfront ->
incrementalReverse front [] 0 backCount k
incrementalRevAppend rfront rback else do
yield k
= k acc
incrementalReverse [] acc k = k (x:acc)
incrementalReverse [x] acc k :y:xs) acc k = yield $ incrementalReverse xs (y:x:acc) k
incrementalReverse (x
!movedCount backCount' k = do
incrementalRevAppend [] front
writeIORef frontRef front$! movedCount + backCount'
writeIORef frontCountRef
yield k:rfront) acc !movedCount backCount' k = do
incrementalRevAppend (x<- readIORef frontCountRef
currentFrontCount if currentFrontCount <= movedCount then do
-- This drop count should be bounded by a constant.
$! drop (movedCount - currentFrontCount) acc
writeIORef frontRef $! currentFrontCount + backCount'
writeIORef frontCountRef
yield kelse if null rfront then
:acc) (movedCount + 1) backCount' k
incrementalRevAppend [] (xelse
$! incrementalRevAppend rfront (x:acc) (movedCount + 1) backCount' k yield
This step is completely mechanical.
There’s arguably no reason to defunctionalize. It produces a result that is more data-structure-like, but, unless you need the code to work in a first-order language, there’s nothing really gained by doing this. It does lead to a result that is more directly comparable to other implementations.
For some data structures, having the continuation be analyzable would provide a simple means for the coroutines to communicate. The main process could directly look at the continuation to determine its state, e.g. if a rebuild is in-progress at all. The main process could also directly manipulate the stored continutation to change the “rebuild” process’ behavior. That said, doing this would mean that we’re not deriving the implementation. Still, the opportunity for additional optimizations and simplifications is nice.
As a minor aside, while it is, of course, obvious from looking at the
previous version of the code, it’s neat how the Kont
data type
implies that the call stack is bounded and that most calls are tail calls.
REVERSE_STEP
is the only constructor that contains a Kont
argument,
but its type means that that argument can’t itself be a REVERSE_STEP
.
Again, I just find it neat how defunctionalization makes this concrete
and explicit.
module DefunctionalizedQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
data Kont a r where
IDLE :: Kont a ()
REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
REV_APPEND_START :: [a] -> !Int -> Kont a [a]
REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()
applyKont :: Queue a -> Kont a r -> r -> IO ()
IDLE _ = rebuildLoop q
applyKont q REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
applyKont q ($ REV_APPEND_START rback backCount
incrementalReverse q front [] REV_APPEND_START rback backCount) rfront =
applyKont q (0 backCount
incrementalRevAppend q rfront rback REV_APPEND_STEP rfront acc movedCount backCount) _ =
applyKont q (
incrementalRevAppend q rfront acc movedCount backCount
rebuildLoop :: Queue a -> IO ()
@(Queue { .. }) = do
rebuildLoop q<- readIORef frontCountRef
frontCount <- readIORef backCountRef
backCount if backCount > frontCount then do
<- readIORef backRef
back <- readIORef frontRef
front
writeIORef backRef []0
writeIORef backCountRef $ REVERSE_FRONT front backCount
incrementalReverse q back [] else do
IDLE
writeIORef resumeRef
incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> IO ()
= applyKont q k acc
incrementalReverse q [] acc k = applyKont q k (x:acc)
incrementalReverse q [x] acc k :y:xs) acc k = writeIORef (resumeRef q) $ REVERSE_STEP xs (y:x:acc) k
incrementalReverse q (x
incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> IO ()
Queue { .. }) [] front !movedCount backCount' = do
incrementalRevAppend (
writeIORef frontRef front$! movedCount + backCount'
writeIORef frontCountRef IDLE
writeIORef resumeRef @(Queue { .. }) (x:rfront) acc !movedCount backCount' = do
incrementalRevAppend q<- readIORef frontCountRef
currentFrontCount if currentFrontCount <= movedCount then do
-- This drop count should be bounded by a constant.
$! drop (movedCount - currentFrontCount) acc
writeIORef frontRef $! currentFrontCount + backCount'
writeIORef frontCountRef IDLE
writeIORef resumeRef else if null rfront then
:acc) (movedCount + 1) backCount'
incrementalRevAppend q [] (xelse
$! REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount'
writeIORef resumeRef
resume :: Queue a -> IO ()
= do
resume q <- readIORef (resumeRef q)
kont
applyKont q kont ()
data Queue a = Queue {
resumeRef :: IORef (Kont a ()),
frontRef :: IORef [a],
backRef :: IORef [a],
frontCountRef :: IORef Int,
backCountRef :: IORef Int
}
new :: IO (Queue a)
= do
new <- newIORef []
frontRef <- newIORef []
backRef <- newIORef 0
frontCountRef <- newIORef 0
backCountRef <- newIORef IDLE
resumeRef return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
= do
dequeue q
resume q<- readIORef (frontRef q)
front case front of
-> return Nothing
[] :front') -> do
(xpred
modifyIORef' (frontCountRef q)
writeIORef (frontRef q) front'return (Just x)
enqueue :: a -> Queue a -> IO ()
= do
enqueue x q :)
modifyIORef (backRef q) (xsucc
modifyIORef' (backCountRef q) resume q
This is just a straightforward reorganization of the previous code into purely functional code. This produces a persistent queue with worst-case constant time operations.
It is, of course, far uglier and more ad-hoc than Okasaki’s extremely elegant real-time queues, but the methodology to derive it was simple-minded. The result is also quite similar to the Hood-Melville Queues even though I did not set out to achieve that. That said, I’m pretty confident you could derive pretty much exactly the Hood-Melville queues with just minor modifications to Global Rebuilding Implementation.
module FunctionalQueue ( Queue, empty, dequeue, enqueue ) where
data Kont a r where
IDLE :: Kont a ()
REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
REV_APPEND_START :: [a] -> !Int -> Kont a [a]
REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()
applyKont :: Queue a -> Kont a r -> r -> Queue a
IDLE _ = rebuildLoop q
applyKont q REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
applyKont q ($ REV_APPEND_START rback backCount
incrementalReverse q front [] REV_APPEND_START rback backCount) rfront =
applyKont q (0 backCount
incrementalRevAppend q rfront rback REV_APPEND_STEP rfront acc movedCount backCount) _ =
applyKont q (
incrementalRevAppend q rfront acc movedCount backCount
rebuildLoop :: Queue a -> Queue a
@(Queue { .. }) =
rebuildLoop qif backCount > frontCount then
let q' = q { back = [], backCount = 0 } in
$ REVERSE_FRONT front backCount
incrementalReverse q' back [] else
= IDLE }
q { resumeKont
incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> Queue a
= applyKont q k acc
incrementalReverse q [] acc k = applyKont q k (x:acc)
incrementalReverse q [x] acc k :y:xs) acc k = q { resumeKont = REVERSE_STEP xs (y:x:acc) k }
incrementalReverse q (x
incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> Queue a
!movedCount backCount' =
incrementalRevAppend q [] front' = front', frontCount = movedCount + backCount', resumeKont = IDLE }
q { front :rfront) acc !movedCount backCount' =
incrementalRevAppend q (xif frontCount q <= movedCount then
-- This drop count should be bounded by a constant.
let !front = drop (movedCount - frontCount q) acc in
= front, frontCount = frontCount q + backCount', resumeKont = IDLE }
q { front else if null rfront then
:acc) (movedCount + 1) backCount'
incrementalRevAppend q [] (xelse
= REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount' }
q { resumeKont
resume :: Queue a -> Queue a
= applyKont q (resumeKont q) ()
resume q
data Queue a = Queue {
resumeKont :: !(Kont a ()),
front :: [a],
back :: [a],
frontCount :: !Int,
backCount :: !Int
}
empty :: Queue a
= Queue { resumeKont = IDLE, front = [], back = [], frontCount = 0, backCount = 0 }
empty
dequeue :: Queue a -> (Maybe a, Queue a)
=
dequeue q case front of
-> (Nothing, q)
[] :front') ->
(xJust x, q' { front = front', frontCount = frontCount - 1 })
(where q'@(Queue { .. }) = resume q
enqueue :: a -> Queue a -> Queue a
@(Queue { .. }) = resume (q { back = x:back, backCount = backCount + 1 }) enqueue x q
This is just the Haskell code from Purely Functional Data Structures adapted to the interface of the other examples.
This code is mostly to compare. The biggest difference, other than some code structuring differences, is the front and back lists are reversed in parallel while my code does them sequentially. As mentioned before, to get a structure like that would simply be a matter of defining a parallel incremental reverse back in the Global Rebuilding Implementation.
Again, Okasaki’s real-time queue that can be seen as an application of the lazy rebuilding and scheduling techniques, described in his thesis and book, is a better implementation than this in pretty much every way.
module HoodMelvilleQueue (Queue, empty, dequeue, enqueue) where
data RotationState a
= Idle
| Reversing !Int [a] [a] [a] [a]
| Appending !Int [a] [a]
| Done [a]
data Queue a = Queue !Int [a] (RotationState a) !Int [a]
exec :: RotationState a -> RotationState a
Reversing ok (x:f) f' (y:r) r') = Reversing (ok+1) f (x:f') r (y:r')
exec (Reversing ok [] f' [y] r') = Appending ok f' (y:r')
exec (Appending 0 f' r') = Done r'
exec (Appending ok (x:f') r') = Appending (ok-1) f' (x:r')
exec (= state
exec state
invalidate :: RotationState a -> RotationState a
Reversing ok f f' r r') = Reversing (ok-1) f f' r r'
invalidate (Appending 0 f' (x:r')) = Done r'
invalidate (Appending ok f' r') = Appending (ok-1) f' r'
invalidate (= state
invalidate state
exec2 :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
!lenf f state lenr r =
exec2 case exec (exec state) of
Done newf -> Queue lenf newf Idle lenr r
-> Queue lenf f newstate lenr r
newstate
check :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
!lenf f state !lenr r =
check if lenr <= lenf then exec2 lenf f state lenr r
else let newstate = Reversing 0 f [] r []
in exec2 (lenf+lenr) f newstate 0 []
empty :: Queue a
= Queue 0 [] Idle 0 []
empty
dequeue :: Queue a -> (Maybe a, Queue a)
@(Queue _ [] _ _ _) = (Nothing, q)
dequeue qQueue lenf (x:f') state lenr r) =
dequeue (let !q' = check (lenf-1) f' (invalidate state) lenr r in
Just x, q')
(
enqueue :: a -> Queue a -> Queue a
Queue lenf f state lenr r) = check lenf f state (lenr+1) (x:r) enqueue x (
Just for completeness. This implementation crucially relies on lazy evaluation. Our queues are of
the form Queue f r s
. If you look carefully, you’ll notice that the only place we consume s
is
in the first clause of exec
, and there we discard its elements. In other words, we only care about
the length of s
. s
gets “decremented” each time we enqueue
until it’s empty at which point we
rotate r
to f
in the second clause of exec
. The key thing is that f
and s
are initialized
to the same value in that clause. That means each time we “decrement” s
we are also forcing a bit
of f
. Forcing a bit of f
/s
means computing a bit of rotate
. rotate xs ys a
is an
incremental version of xs ++ reverse ys ++ a
(where we use the invariant
length ys = 1 + length xs
for the base case).
Using Okasaki’s terminology, rotate
illustrates a simple form of lazy rebuilding where we use
lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we
interleave the evaluation of rotate
with enqueue
and dequeue
via forcing the conses of
f
/s
. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is
amortized optimal). We need to use Okasaki’s other technique of scheduling to strategically
force the thunks incrementally rather than all at once. Here s
is a schedule telling us when to
force parts of f
. (As mentioned, s
also serves as a counter telling us when to perform a
rebuild.)
module OkasakiQueue ( Queue, empty, dequeue, enqueue ) where
data Queue a = Queue [a] ![a] [a]
empty :: Queue a
= Queue [] [] []
empty
dequeue :: Queue a -> (Maybe a, Queue a)
@(Queue [] _ _) = (Nothing, q)
dequeue qQueue (x:f) r s) = (Just x, exec f r s)
dequeue (
rotate :: [a] -> [a] -> [a] -> [a]
: _) a = y:a
rotate [] (y:xs) (y:ys) a = x:rotate xs ys (y:a)
rotate (x
exec :: [a] -> [a] -> [a] -> Queue a
!r (_:s) = Queue f r s
exec f !r [] = let f' = rotate f r [] in Queue f' [] f'
exec f
enqueue :: a -> Queue a -> Queue a
Queue f r s) = exec f (x:r) s enqueue x (
It’s instructive to compare the above to the following implementation which doesn’t use a schedule.
This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy
rebuilding to spread the xs ++ reverse ys
(particularly the reverse
part) over multiple
dequeue
s via rotate
. The following implementation performs extremely well in my benchmark, but
the operations are subtly not constant-time. Specifically, after a long series of enqueue
s, a
dequeue
will do work proportional to the logarithm of the number of enqueue
s. Essentially, f
will be a nested series of rotate
calls, one for every doubling of the length of the queue. Even
if we change let f'
to let !f'
, that will only make the first dequeue
cheap. The second will
still be expensive.
module UnscheduledOkasakiQueue ( Queue, empty, dequeue, enqueue ) where
data Queue a = Queue [a] !Int [a] !Int
empty :: Queue a
= Queue [] 0 [] 0
empty
dequeue :: Queue a -> (Maybe a, Queue a)
@(Queue [] _ _ _) = (Nothing, q)
dequeue qQueue (x:f) lenf r lenr) = (Just x, exec f (lenf - 1) r lenr)
dequeue (
rotate :: [a] -> [a] -> [a] -> [a]
: _) a = y:a
rotate [] (y:xs) (y:ys) a = x:rotate xs ys (y:a)
rotate (x
exec :: [a] -> Int -> [a] -> Int -> Queue a
!lenf !r !lenr | lenf >= lenr = Queue f lenf r lenr
exec f !lenf !r !lenr = let f' = rotate f r [] in Queue f' (lenf + lenr) [] 0
exec f
enqueue :: a -> Queue a -> Queue a
Queue f lenf r lenr) = exec f lenf (x:r) (lenr + 1) enqueue x (
I won’t reproduce the evaluation code as it’s not very sophisticated or interesting. It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce an enqueue over a dequeue so that the queues would grow. It measured the average time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.
The main thing I wanted to see was relatively stable average enqueue and dequeue times with only the batched implementation having a growing maximum dequeue time. This is indeed what I saw, though it took about 1,000,000 operations (or really a queue of a couple hundred thousand elements) for the numbers to stabilize.
The results were mostly unsurprising. Unsurprisingly, in overall time, the batched
implementation won. Its enqueue
is also, obviously, the fastest. (Indeed, there’s
a good chance my measurement of its average enqueue time was largely a measurement
of the timer’s resolution.) The operations’ average times were stable illustrating their
constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue
time versus the average stabilized around 7000 to 1, except, of course, for the
batched version which grew linearly to millions to 1 ratios at queue sizes of tens
of millions of elements. This illustrates the worst-case time complexity of all the
other implementations, and the merely amortized time complexity of the batched one.
While the batched version was best in overall time, the difference wasn’t that great.
The worst implementations were still less 1.4x slower. All the worst-case optimal
implementations performed roughly the same, but there were still some clear winners
and losers. Okasaki’s real-time queue is almost on-par with the batched
implementation in overall time and handily beats the other implementations in average
enqueue and dequeue times. The main surprise for me was that the loser was the
Hood-Melville queue. My guess is this is due to invalidate
which seems like it
would do more work and produce more garbage than the approach taken in my functional
version.
The point of this article was to illustrate the process of deriving a deamortized data structure from an amortized one utilizing batched rebuilding by explicitly modeling global rebuilding as a coroutine.
The point wasn’t to produce the fastest queue implementation, though I am pretty happy with the results. While this is an extremely simple example, it was still nice that each step was very easy and natural. It’s especially nice that this derivation approach produced a better result than the Hood-Melville queue.
Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue with worst-case constant-time operations.
This code could definitely be refactored to leverage this similarity to reduce code. Alternatively, one could refunctionalize the Hood-Melville implementation at the end.↩︎
Going “too fast”, so long as it’s still a constant amount of work for each step, isn’t really an issue asymptotically, so you can just crank the knobs if you don’t want to think too hard about it. That said, going faster than you need to will likely give you worse worst-case constant factors. In some cases, going faster than necessary could reduce constant factors, e.g. by better utilizing caches and disk I/O buffers.↩︎