Saturday, November 22, 2014

Exception handling and cleanup

So there's been a bit of a slog recently on whether or not cleanup actions (e.g. stuff you'd put in the second argument to bracket or finally) should be run in an implicit uninterruptibleMask. I have argued that this is a desirable change. I'm writing this post mostly to lay out the logic I've used to reach (and support) this position, because it's a bit tricky (much like the issue itself).

A bit of background: currently, if an interruptible function is used within an exception handler, asynchronous exceptions may arrive and cause the handler to behave in unexpected ways. In particular, if an async exception arrives during an interruptible period in a cleanup action run by bracket[1], you can get resource leaks. This is noted in the documentation, but it can be surprisingly difficult to get right. Notably, this has resulted in several bugs in GHC itself (details in the thread).

Two arguments have thus far been presented against the proposal, one that the change is undesired and secondly that more caution should be exercised before making such a silent semantic change, especially as it's possible that currently-working (although likely resource-leaking) programs would deadlock. I reject the first argument (for reasons explained below) and have some sympathy for the second, although I believe the benefits outweigh the costs. I'll refer to the argument that this change is undesired as Undesired below.

Basically there are two cases of "complex" cleanup actions that have been under discussion. Here are naive (incorrect) examples of each:

cleanup1 (Structure h1 h2) = hClose h1 >> hClose h2
cleanup2 var = withMVar var >>= hClose

The first, cleanup1, is a handler that has to perform multiple, unrelated actions. The second, cleanup2, is a handler that has to perform a blocking call to access a resource, then close that resource.

Undesired argues that cleanup1 is better written hClose h1 `finally` hClose h2. I agree that is a better solution (although I tend to use nested brackets myself). That structure is nearly fully exception-safe (except hClose itself isn't async-safe), and it handles synchronous exceptions properly. In particular, uninterruptibleMask is neither necessary nor sufficient to write cleanup1 properly[2].

There is no way to safely write cleanup2 without using uninterruptibleMask[3]. It is possible to write a much more sophisticated handler that could eventually time out and possibly take an alternative action, but I don't believe I've ever seen an example of such code. Even then, it still wouldn't close the handle.

Undesired further argues that adding an uninterruptibleMask to bracket will tend to make people think that the naive handler in the first case is safe, even though it isn't, and therefore we shouldn't add it. Users should instead know to use finally appropriately in that case, and uninterruptibleMask appropriately in the second.

I reject this conclusion. I believe the second case is quite common, likely more common than the first. Both Handle and Process use this structure internally (and hence are both vulnerable to async-exception issues that can only be fixed with uninterruptibleMask, which their cleanup functions do not currently employ). I also think it's quite common in client code when there's a shared structure that requires cleanup. Such cleanups can only be handled safely via uninterruptibleMask, and in general use of that function should be discouraged.

If we want to be leak-free (in the second case, and with hClose), we need to call uninterruptibleMask somewhere. Where should we call it? The obvious choices are in the cleanup action (e.g. within hClose), within bracket, or manually by the user. I reject the last as being undesired and error-prone; we certainly don't want the common idiom to be .. `finally` (uninterruptibleMask_ (hClose h)). I believe within bracket is better than within hClose. First of all, we really only expect exception-safety in the context of a function like bracket. Having a bare hClose is already not regarded as a safe idiom, so having it be vulnerable to async exceptions in another manner should not be a problem. Given that, I believe it's better to change the semantics of bracket so that all such handlers will work properly, rather than requiring that Handle, Process, and others be fixed (and countless 3rd-party libraries audited. I have at least two of my own I need to audit.).

Some opponents of the proposal have presented various control-flow constructs built upon bracket that would lead to deadlock under the proposed semantics. If hClose et al used uninterruptibleMask internally, those control flows would not be expressible at all, unless a new function "interruptibleHClose" were added. Similarly for Process and 3rd-party code. If instead bracket is changed, then we'd need to preserve the current semantics as "interruptibleBracket". Anyone relying on this specific control flow would just need to change from bracket to interruptibleBracket to preserve the current semantics, instead of needing to change hClose, process termination functions, dbClose, etc.[4]

Changing bracket means we'll probably fix a bunch of currently-buggy code (anything like cleanup2 that doesn't use uninterruptibleMask, including Handle and Process cleanups), and it will be less work to adapt for anyone relying upon the current semantics.

As I mentioned above, IMHO there is merit to the argument that we should be cautious when making a silent change like this. However, I think the pain is unavoidable. The current situation is not async-safe, and cannot be made async-safe without changing semantics (i.e. adding uninterruptibleMask somewhere). Here are all the alternative possibilities that have thus far been discussed:

  1. Users need to be intimately familiar with exceptions and interruptible operations to write truly exception-safe code, including knowing when to wrap built-in functions in uninterruptibleMask (the status quo).
  2. Add uninterruptibleMask internally to hClose, certain process-related functions, and anywhere else similar "cleanup2". Anyone relying upon the current interruptible behavior must audit their code, and (if available) switch to interruptible variants. 3rd-party code must all be audited. Concerns about unkillable threads/deadlocks apply.[5]
  3. Add uninterruptibleMask internally to bracket, finally, etc. Anyone relying upon the current interruptible behavior must audit their code, and switch to interruptible variants. 3rd-party code will be async-safe, although if it's currently vulnerable to sync exceptions it will remain vulnerable. (the proposal). Concerns about unkillable threads/deadlocks apply.

Of these choices, I consider the first untenable. I simply do not believe we should expect this level of async exception sophistication from most users. I will gladly accept the possibility of unkillable threads in this context to achieve the greater exception safety of 2 or 3, of which I contend 3 is better. It places the least burden of knowledge on the greatest number of users, involves less code-change than the second option, and also leads to the greatest safety by default.

[1] From now on I'll just write "bracket", but will generally mean the whole cleanup-handler family of functions.

[2] Except uninterruptibleMask is the only way to make hClose fully async-exception-safe.

[3] Or its semantic equivalent. It's often possible to change from a blocking operation to looping over a non-blocking operation under mask, but this has the same effect of making the code unresponsive to async operations.

[4] Given the examples that have been presented thus far, I strongly believe that any user relying upon the interruptibility of cleanup operations understands the Haskell exception system intimately and therefore is in the best position to adapt to any semantic changes of exception-handling functions.

[5] I expect the extra risk from changing bracket to be uninterruptible to be minimal over changing hClose etc, as Handles are ubiquitous.

Thursday, October 4, 2012

runtime meta-programming in Haskell

In the spirit of Edward Yang, half-baked ideas lie ahead

There are many reasons programmers may want to use metaprogramming facilities, but one of the most common is for greater efficiency.  The premise is simple: if runtime data becomes available at different times, a metaprogramming environment may allow the programmer to generate specialized versions of code based upon certain data which is known sooner, leading to more efficient computations.

A running example in a great deal of literature is the power function.   Given pow a b = b ^ a, for certain values of a a much more specialized function can be created, e.g.

pow 1 b = b
pow 2 b = b*b
pow 4 b = let sq = b*b in sq*sq
pow n b = b ^ n

but writing out all possible cases like this is far from ideal.  It's tedious and error-prone, and it still won't result in the hoped-for performance gains.  Another problem is that it's generally not possible for the programmer to enumerate every possible case.  A better solution is desired.

Template Haskell is generally disparaged as a possible solution because it's a compile-time facility.  A meta-programming approach to the pow function would be this Template Haskell expression:

mkPow 0 = [| const 1 |]
mkPow n = [| \x -> x * $(unMeta $ mkPow (n-1)) x |]

while this works, Template Haskell splices may only be evaluated at compile-time.  For a wide range of interesting metaprogramming problems, template haskell simply isn't available.  However, by using the GHC API, we can make use of Template Haskell during runtime.  By using a package like plugins or hint, the code is remarkably simple.

I've placed a proof-of-concept implementation on github.  meta-expressions are created in the Meta newtype, which wraps Template Haskell's ExpQ type.  The Meta monad has an extra phantom type parameter to assist with type safety and inference.  Here's a simple example of Meta in use, with mkPow essentially as defined above:

{-# LANGUAGE TemplateHaskell #-}
{-# OPTIONS_GHC -Wall #-}

module Data.Meta.Example where

import Data.Meta.Meta
import Data.Meta.ExampleExpr

main :: IO ()
main = do
  n1 <- readLn
  let expr1 = mkPow n1 :: Meta (Int -> Int)
  n2 <- readLn
  let expr2 = mkPow n2 :: Meta (Int -> Int)
      expr3 = compose expr1 expr2
  result <- metaCompile expr3
  case result of
    Left err -> putStrLn "failed" >> putStrLn err
    Right fn -> do
      j <- readLn
      print $ fn j
      k <- readLn
      print $ fn k

Incidentally, this approach seems remarkably similar to the ideas in Geoffrey Mainland's presentation at ICFP 2012. The big differences are that he defines his own quasi-quoters instead of using Template Haskell, and he also uses a compiler other than GHC. Both of these are necessary if you want to metaprogram in a language other than Haskell, naturally.

Tuesday, June 12, 2012

Language-ObjC released

I have uploaded a new package to hackage, language-objc.  This package provides an AST, parser, and pretty-printer for Objective-C.  It's a fork of the excellent language-c package, so if you're familiar with that you should be able to pick up language-objc pretty quickly.

Source repository is on github.

If you "cabal unpack language-objc; cd language-objc-", there are a number of examples that can get you started.  For the truly impatient:

import Language.ObjC
import qualified Data.ByteString.Char8 as B

-- parsing bytestrings
objcTranslUnit = B.pack "int a=0;"

errOrAST = parseC objcTranslUnit nopos

-- parsing from a file
fileAST sourceFile = do
    result <- parseCFile (newGCC "gcc") Nothing [] sourceFile
    return $ either (error . show) id result
A few things to note:
  1. Input files must be preprocessed.  'parseCFile' will do this for you.  Non-preprocessed source can only be parsed in very limited situations (did you know that 'id' is a typedef?  And 'id' is also commonly used as an identifier to methods/functions, even in system libraries, so it can't be a keyword?).
  2. The objective-C extensions have only been lightly tested, but I think most of the core functionality works.  I am unaware of a full specification for the language, which makes it difficult to say if it's complete.
  3. Most of the code in Language.ObjC.Analysis is not yet compatible with Objective-C extensions.
  4. One difference from language-c is that certain GNU attributes can no longer be parsed due to grammar ambiguities.
  5. Attempting to compile at -O2 requires a lot of memory due to the giant Happy parser that's produced.

I am most grateful to Benedikt Huber, Manuel Chakravarty, Duncan Coutts, Bertram Felgenhauer, and all contributors to language-c, as they're responsible for most of the code.  Bugs are mine.  Comments and patches welcome.

Monday, June 11, 2012

Understandings of Iteratees

Although still relatively new, iteratee/enumerator style I/O has achieved a fairly large userbase in the Haskell community.  Like many young technologies, the design space is fairly broad, with several different packages available from Hackage.  Upon surveying the alternatives, I came to realize that there are so many alternatives in part because different developers have chosen to focus on different properties of iteratees.  Doubtless this is a contributing factor to much of the confusion surrounding iteratees; when a user's mental model is closely aligned to the developer's, the package is likely to be much easier for the user to understand.

This post isn't meant to explain how to use iteratees, rather to explore what I believe are factors that influenced various design decisions in different packages.  In some cases I have first- or second-hand knowledge of those decisions, but sometimes I'm extrapolating from the code.  If you find one or two packages to be sensible and can't understand why anyone would use something different, you may find enlightenment here.  Or not.  For those who've never used an iteratee package, I hope that these posts might help point you to a good starting option.  Wizards probably know this already.  Note that just because I think the designers have a particular viewpoint for their code doesn't mean that other elements won't be present; these are all just different ways of looking at iteratees.  All packages will have these properties to some extent.


Some definitions, to be perfectly clear (the adventurous can skip this)
  • iteratee: some sort of resource consumer.  The defining feature of an iteratee is that it provides a mechanism to pump it with data until it returns a result.  I may use "iteratee" generally, referring to code in an iteratee style.
  • enumerator: some sort of resource provider.  It stuffs data into an iteratee until it gets a result.
  • iteratee style, iteratee/enumerator style: writing code that makes use of iteratees, enumerators, and other associated functions
  • iteratee package: a library to provide functions used to write in iteratee style.
  • underlying monad: for iteratees that are monad transformers, the monad which is being transformed.  Often either IO or a monad stack based on IO.

1: Iteratees as Resource Managers

One of the design goals of Oleg Kiselyov's original iteratee work was improved resource management.  Lazy I/O can make it difficult to reason about resource management (half-closed?), while strict I/O traditionally does not compose well (compare to network programming in C).  With iteratees, these problems are much closer to being solved.  Input streams are opened on demand, evaluated strictly, and closed immediately, with no possibility of the handle or other resource leaking.(1)  And iteratees are quite composable (perhaps too much so!), solving the other issue as well.

These properties make iteratees extremely useful for network programming, as network sockets are a scarce resource.  However, this usage highlights a particular drawback: iteratees solve the resource management problem for input, but don't address it for output.

I believe that for this issue, all developers currently agree the best solution for output resource management is to use a monadic region, such as that provided by regions or ResourceT .  This is currently the only approach guaranteed to run cleanup code with complicated monad stacks in the presence of exceptions.(2)  In some cases Control.Exception.bracket can be used, however it's often not possible to call it at a convenient time, leaving resources open for longer than necessary.

Having established that a monadic region is necessary for output resource management, using the same mechanism for input resource management as well is a reasonable decision.  Developers taking this view may then write their package so that even input streams may cause resource leaks without the resource management layer, unlike the original design.(3)

2: Iteratees as Stream Processors

Iteratee packages with a stream processing bent focus on composibility as a design goal.  The libraries usually provide a multitude of stream processors. They are frequently designed with a UNIX-pipeline-style model in mind; data can be passed through one processor into the next and so on, until finally the last processor in the chain consumes the resulting data.  There may be clear demarcations between produces, consumers, and processors, or perhaps not.  There are frequently combinator functions to make these connections, sometimes with a syntax inspired by unix pipes.  Many of the functions will be similar to traditional Haskell list-processing functions, such as maps, accumulators, and filters.

People with this viewpoint tend to view stream processor composition, which Oleg refers to as an "Enumeratee", as the most important/useful means of composing iteratees.

Although I've presented stream processing and resource management as two different viewpoints, I addressed them together because packages generally can't exclude either element.  One approach is to divide responsibilities, with a dedicated resource manager and combinators that can use those resources.  Other packages may use the iteratee's own handling of input resource management, leaving output resource management to the user's discretion.


For the curious,  I would place conduit and iterIO squarely in the stream processor camp, with enumerator having strong leanings this way as well.  To some extent all iteratee packages provide similar stream-processing capability; I make these judgements based upon what I think the designers think iteratees are for.  If your favorite iteratee package hasn't shown up yet, maybe next time!


(1)  There is at least one way to force a resource to be open longer than intended.  Since iteratees are generally implemented as monad transformers, it is possible to craft an iteratee over ContT IO that generated a lazy stream from an input source, available within another iteratee.  The resource would be closed when the outer iteratee was evaluated, making the outer iteratee a type of monadic region.  I consider this an egregious abuse of Cont, however, and extremely unlikely to happen accidentally.

(2) I'm fairly outspoken in my dislike for exceptions.  They mostly shouldn't be used.  The main purpose of exceptions is to make it easy to reason about how to deal with various conditions, and they suck at this.  There are basically three kinds of exceptions:
  1. Stuff you can fix, like a file not existing.  This should be modeled by Either or Maybe or something similar.  It's much easier to reason about a Maybe than having to figure out where in IO some result will be evaluated.  This is by far the most common reason programmers use exceptions, and (in Haskell at least) it's the worst choice because Haskell provides a better mechanism for dealing with this problem.
  2. Stuff you can't fix, like out of memory conditions.  The only sensible thing to do is terminate the program and let the OS clean it up.  You shouldn't try to catch these, and if you do catch one you shouldn't try to do anything, because that might fail too.  I suppose it's okay to model these with an exception, but maybe they shouldn't be exposed to the program logic at all.
  3. Asynchronous exceptions.  I begrudgingly accept that these may need to be used in concurrent code.

(3)  The basic resource provided by an OS is generally a cursor, e.g. handle.  Oleg Kiselyov argues that cursors and enumerators are isomorphic, although enumerators have certain desirable properties that should make them preferred.  The traditional design of an iteratee package is to provide enumerators that wrap access to a cursor in a closure, preventing the cursor from leaking.  Packages like pipes or conduit that require a monadic region for safe access to input hold the cursor as data and don't provide enumerators-as-closures.  In essence they evolved from cursor to enumerator back to cursor.  Because of this I'm not entirely convinced they should be classed as "iteratee packages" at all, though they certainly occupy the same design space.

Sunday, April 29, 2012

Adding state support to BNFC

The BNF converter is pretty cool tool.  The idea is simple:  provide a grammer written in Labelled BNF, and it spits out a lexer, parser, pretty printer, and abstract syntax tree.  The very nice part is that the generated code can be in several formats: Alex/Happy/Haskell, flex/bison/C/C++, or JLex/CUP/Java.  It's especially convenient when you're developing a language, as you can make changes as you go and automatically keep your AST, parser, and printers in sync.

BNFC takes as input a file written in Labelled BNF, which is essentially a BNF grammar with a label attached to each production rule.  Brackets "[ ]" indicate lists rather than optional items, and there are few extensions from minimal BNF other than pragmas to define lexer tokens and positions.

Writing one grammar file instead of 4 separate modules is very attractive.  Unfortunately BNFC has a serious problem - only context-free grammars are supported.

LBNF is sufficient to represent context-free grammars, and within this domain the system works very well. But despite the claims of the LBNF Report, many interesting languages are not context-free, and they still need to be lexed and parsed.  The most notable example is probably C, in which a string can be lexed as an identifier or a type name, depending on previous typedefs and other things.  I present here a small set of extensions to enable support for passing state between the lexing and parsing stages, enabling a wider variety of parsers to be constructed from LBNF.

A darcs repository of BNFC, with state support as described here, is available from  These features are only supported in the Haskell/Alex backend (Alex-3 is required).

An initial examination

The approach I took is founded on two simple ideas.  First, the lexer needs to access to a state table to determine the correct token, and secondly, the parser needs to update the state table in response to certain rules.

Lexing rules

In LBNF, tokenizing rules are specified like this:
token Ident (alpha [alpha|digit|'_']*) ;
where "Ident" is the type of the token produced, and the remainder is a regular expression to determine valid "Ident" strings.

A stateful token rule is similar.  The implementation supports multiple sets of state, so it's necessary to specify which state set the tokenizer should check.
state token TypedefName typedefNameSet [alpha|digit|'_']+ ;
First the lexer matches the regular expression against the input string.  It then takes the match and checks if it's present in "typedefNameSet".  If so, a token of type "TypedefName" is produced.  Otherwise an "Ident" is produced.(1).

Parsing Rules

Standard production rules are extended with pragmas that specify state update annotations:
-- after this rule, the final Ident will be lexed as a TypedefName
Etypedef.  Declaration ::= "typedef" TypeName Ident <% state typedefNameSet add 3 %> ;
The number "3" specifies that a token string derived from the 3rd production (Ident) is to be added to the set "typedefNameSet".(2)

Similarly, names can be removed from a state set:
-- after this rule, the second token will be lexed as an Ident
UnTydeclarator. Direct_declarator ::= TypeName TypedefName <% state remove 2 %> ;
When removing a token from a set, it's unnecessary to specify the set the token is being removed from.

Extended support

Initial Values

Initial members of the state sets can be specified in the grammar via the pragma TODO.


The implementation maintains a stack of state values in order to support lexical scoping of changes.  Two additional pragmas, "state push" and "state pop", provide access to stack manipulations.  A common usage would be:
PushParen.  PushParen ::= "(" <% state push %>
PopParen.   PopParen  ::= ")" <% state pop %>
It is an error to pop the last frame off the state stack.

The implementation only supports one stack for all state sets.  It is not currently possible to represent multiple state sets with differing scoping rules.

A drawback of the current implementation is that production rules such as PushParen and PopParen are no longer terminals.  Therefore the implementation creates abstract representations for them, although no additional meaning is conveyed than if they were omitted completely.  I hope to address this in the future.

Future Work

Deriving the state string

A major question is, how to determine which token is added to the state set for a given production?  The above cases are simple.  In a production rule like:
Etypedef.  Declaration ::= "typedef" TypeName Ident <% state typedefNameSet add 3 %> ;
"Ident" is a token, so it's simple to derive the string that should be added to the set.  However, it's frequently desirable to modify state based upon a non-terminal rule, such as:
Rule.  SomeProd ::= Declaration <% state stateset add 1 %> ;
How should the token be determined from the Declaration?

The current implementation only supports state adjustments based upon tokens and lists of tokens, e.g. [Ident].  Code generated from rules like the above "Rule" currently results in a compile-time error due to missing instances.  The instances can be manually provided in a separate file, although creating this file can be tedious.  I plan to add an additional pragma to the LBNF grammar that would allow for the necessary instances to be automatically generated.


(1) - The astute reader will notice that the TypedefName regex matches strictly more identifiers than the built-in "Ident" regex.  Therefore this match provides a "backdoor" into Ident, allowing strings that don't match the "Ident" regex to actually be produced as "Ident"s.  I find this behavior useful, and in any case it can be mitigated simply by using the Ident regex for state tokenizing rules.  Another possibility would be to use a user-provided alternative in case the match fails.

(2) - An alternative notation would omit the numbers and place the pragma adjacent to the production.

Sunday, March 25, 2012

Pure delay lines

While working on my dsp-extras project, I needed a pure delay line.  Based on previous work, I used Data.Sequence.Seq, and was somewhat surprised to see a very large performance hit when compared to an undelayed signal.  Obviously there was more to creating pure buffers in Haskell than my previous microbenchmarks revealed.  I resolved to get the performance back.

I created a test script to measure the performance of different delay line implementations in a real-world application.  The application simply reads in a sound file, applies a fractional delay, and writes the output.  I measured the performance with buffer sizes ranging from 1 to 10000 samples.  Due to interpolation (I used cubic interpolation for all my tests), I couldn't actually measure a 1-sample buffer; the program actually creates a 4-sample buffer for the smallest possible delay.  All other buffer sizes are accurate though.

I tested the following buffer types:
  • Data.Sequence.Seq - cons elements onto the front and drop from the tail as each element is pushed
  • Data.Vector.Unboxed.Vector - a naive copying implementation
  • T10 - a 10 element, strict data type (e.g. data T10 a = T10 !a !a ...)
  • T25 - a 25 element, strict data type
  • Evil - based upon Data.Vector.Unboxed.Vector, the Evil buffer thaws the vector, mutates a single element, and returns the updated vector.  This buffer is closest to an imperative circular buffer implementation, although it violates referential transparency.  Hence the Evil moniker.
  • ComboBuffer - the ComboBuffer consists of two unboxed vectors and an update list.  After the update last grows large enough, it's frozen into a vector and the oldest data is dropped.  Writing elements is performed in amortized O(1) time.  The read behavior is more complex.
The results are summarized online here.  A few notes about interpreting the results:
  • Given times are the result of a single run, although they are fairly stable over multiple executions.
  • The chart shows times in a log scale to help differentiate times for small buffers.  Otherwise the O(n) behavior of Vector obliterates these differences.
  • Current is the behavior of an algorithm that adaptively selects a buffer type based upon size.
  • All the results (except one) are for reading off the end of the buffer.  For most buffer types this shouldn't matter. It's optimal for Seq.  For the ComboBuffer, reads from the last half of the buffer can be done in O(1), but reads from the first half are more complex, as a list lookup may be required.  The Current - first quarter data is the result of accessing points 0.25 of the way into the buffer.
So for reads from the end of the buffer, my ComboBuffer is pretty good.  It needs some tuning though, especially for reads from near the front of the buffer.

Faking it with higher rank (Existential Quantification pt. 2)

In my previous post I provided a short background on Existential Quantification.  In this post, I'll provide an alternative for some uses and suggest why it may be preferred.

Suppose we want to create a data store that uses lists for small amounts of data, but otherwise uses vectors.  We can easily use E.Q.

import Data.Vector (Vector)
import qualified Data.Vector as V
class Buffer c where
  type El c :: *
  init :: Int -> El c -> c
  (!)  :: c -> Int -> El c
  push :: c -> El c -> c
instance Buffer [a] where
  type El [a] = a
  init = replicate
  (!)  = (!!)
  push lst el = el : init lst
instance Buffer (Vector a) where
  type El (Vector a) = a
  init = V.replicate
  (!)  = (V.!) 
  push vec el = V.cons el $ V.unsafeInit vec
data EBuffer a = forall c. (Buffer c, El c ~ a) => EBuffer c
initBuffer :: Int -> EBuffer Double
initBuffer sz
  | sz < 10   = EBuffer (init sz 0 :: [Double])
  | otherwise = EBuffer (init sz 0 :: Vector Double)
and now we have our existentially-quantified buffer.

Unfortunately this code doesn't offer anywhere near the hoped-for performance (at least not for the problem that led to this post).  The problem is that now that the actual type of buffer is hidden in the EBuffer data, the compiler is no longer able to inline function calls to (!) or push.  Instead it needs to go through the dictionary methods, adding a lot of extra overhead.  Even worse, a lot of further optimization opportunities are lost.

It turns out that there is another approach that doesn't suffer from this problem.  The approach is based on higher-rank types.  Normal Haskell code uses rank-1 type variables, which means that any variables in a type signature are instantiated by the calling code.  However, with the RankNTypes extension, it's possible to write code of any rank desired.  Consider the function
hr1 :: (forall a. a -> a) -> (Char, Int)
hr1 myId = (myId 'a', myId 0)
The first parameter to hr1 is a rank-2 function.  It needs to be a function that can take an input of any type, and return a value with the same type.  The only total function that can do this is id, and calling hr1 id returns exactly what we would expect.

(What happens if you change the type signature to hr1 :: (a -> a) -> (Char, Int) ? )

Armed with higher-rank types, we can replace the existentially-quantified type in two steps:

1.  Refactor whichever code uses the data so that it's in the form someFunc :: Buffer buf => buf -> AResult.  This part usually isn't too difficult.

2.  Create another function like following:
  :: Int
  -> (forall buf. (Buffer buf, El buf ~ Double) => buf -> b)
  -> b
runWithBuffer sz fn
  | sz < 10   = let buf = init sz 0 :: [Double]
                in fn buf
  | otherwise = let buf = init sz 0 :: Vector Double
                in fn buf

With our new runWithBuffer function, we've successfully decoupled the buffer selection algorithm from the usage site without using existential quantification.  Plus, in many cases GHC generates better code with more opportunities for further optimization.

A slightly more extensive example is available in my dsp-extras codebase.  The class and various buffer types are in the Data.RingBuffer modules.  The buffers are used in Language.AuSL.CompileArr, with the function optimizeBufType performing the duties of runWithBuffer.