Sunday, January 31, 2016

Haskell Tower of Hanoi with 4 Pegs

For some continued Haskell practice, I've started making my way through the UPenn CIS 194 online Haskell course materials. If you are familiar with basic Haskell, you can skip the lecture notes for the most part and just work through the problem sets. This has already paid me an interesting dividend.

One of the problems asks us to work through the classic Tower of Hanoi problem with 3 pegs and then with 4 pegs. I was very pleased with myself for coming up with the Frame-Stewart algorithm on my own, but I failed to think of the way to determine the optimal k value. Instead, I first naively tried k = n - 2 and k = n `div` 2. These did not achieve the number of steps (129) that the assignment page said was possible. So after a lot of toiling, I started to look up what the solution might be.

I came across this blog post which is actually cited in the above Wikipedia section on the Frame-Stewart algorithm. It's a cryptic post that doesn't offer any justification for the proposed optimal k value. I took that k value and placed it into my 4-peg Hanoi implementation as below. This code is available on GitHub.

type Peg = String
type Move = (Peg, Peg)
hanoi :: Integer -> Peg -> Peg -> Peg -> [Move]
hanoi 0 _ _ _ = []
hanoi 1 a b c = [(a, b)]
hanoi n a b c = moveTopToC ++ moveBottom ++ moveTopToB
    where moveTopToC = hanoi (n-1) a c b
          moveBottom = hanoi 1 a b c
          moveTopToB = hanoi (n-1) c b a
hanoi4 :: Integer -> Peg -> Peg -> Peg -> Peg -> [Move]
hanoi4 n a b c d
    | n <= 2    = hanoi n a b c
    | otherwise = moveTopToD ++ moveBottom2 ++ moveTopToB
        where moveTopToD  = hanoi4 (n-2) a d b c
              moveBottom2 = hanoi4 2 a b c d
              moveTopToB  = hanoi4 (n-2) d b a c
hanoi4' :: Integer -> Peg -> Peg -> Peg -> Peg -> [Move]
hanoi4' n a b c d
    | n <= 2    = hanoi n a b c
    | otherwise = moveTopKToD ++ moveBottom ++ moveTopKToB
        where k           = n `div` 2
              moveTopKToD = hanoi4' k a d b c
              moveBottom  = hanoi  (n - k) a b c
              moveTopKToB = hanoi4' k d b a c
-- "optimal" choice of k
hanoi4'' :: Integer -> Peg -> Peg -> Peg -> Peg -> [Move]
hanoi4'' n a b c d
    | n <= 2    = hanoi n a b c
    | otherwise = moveTopKToD ++ moveBottom ++ moveTopKToB
        where k           = ((round . sqrt . fromIntegral) (2*n + 1)) - 1
              moveTopKToD = hanoi4'' k a d b c
              moveBottom  = hanoi (n - k) a b c
              moveTopKToB = hanoi4'' k d b a c

(Note that in this formulation, the goal is to move the disks from peg "a" to peg "b" with pegs "c" and "d" as the extra pegs available as storage space. This is just a different labeling than the classical formulation, in which the goal is to move from peg "a" to peg "c" with "b" as the storage space.)

Imagine my surprise when this actually resulted in more steps than either of my naive approaches:

ely@eschaton:~/programming/learnmeahaskell/penn_cis194/assignments/a1$\$$ ghci
GHCi, version 7.6.3:  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Assignment1.hs
[1 of 1] Compiling Assignment1      ( Assignment1.hs, interpreted )
Ok, modules loaded: Assignment1.
*Assignment1> length $\$$ hanoi4 15 "a" "b" "c" "d"
*Assignment1> length $\$$ hanoi4' 15 "a" "b" "c" "d"
*Assignment1> length $\$$ hanoi4'' 15 "a" "b" "c" "d"

I started to think about it more. For this way of calculating k, it will be far smaller than the value n. In fact, k should be somewhere on the order of sqrt(n). So there will be n - k disks left over, which will be the majority of them. For example, if there were 25 disks, k would be somewhere around 5 and the leftovers would be about 20.

This immediately doesn't make sense. You don't want to perform the slower recursive call to the 3-peg case on the larger subset of disks. Clearly even a 50/50 split of the disks would be better than that. So my hunch was that you should calculate k' as it is described above, but then you should calculate the true k as k = n - k' -- so that the larger subset of disks is given the more efficient 4-peg treatment and only a much smaller set is moved with the 3-peg method.

When I made that change below and ran it in ghci, I got the expected minimal number of steps for the 15-disk, 4-peg case. So then I edited Wikipedia to fix the mistake.  So there you go: practicing a simple Haskell problem leads you to be a Wikipedia contributor.

-- Final method uses optimal choice of k. 
hanoi4'' :: Integer -> Peg -> Peg -> Peg -> Peg -> [Move]
hanoi4'' n a b c d
    | n <= 2    = hanoi n a b c
    | otherwise = moveTopKToD ++ moveBottom ++ moveTopKToB
        where k           = n - ((round . sqrt . fromIntegral) (2*n + 1)) + 1
              moveTopKToD = hanoi4'' k a d b c
              moveBottom  = hanoi (n - k) a b c
              moveTopKToB = hanoi4'' k d b a c

*Assignment1> :load Assignment1.hs
[1 of 1] Compiling Assignment1      ( Assignment1.hs, interpreted )
Ok, modules loaded: Assignment1.
*Assignment1> length $\$$ hanoi4'' 15 "a" "b" "c" "d"

Friday, January 29, 2016

Solutions for Monad Challenges on GitHub

Doug Beardsley (a.k.a. "MightyByte" a.k.a. one of the original authors of the Haskell Snap web framework) was generous enough to create a very well-written and engaging set of tutorials for learning how to program with Monads in Haskell: The Monad Challenges.

The challenges walk you through three different real-world problems: managing the state of a random number generator; handling computations that can fail; and taking combinations of items from given sequences.

These tasks roughly correspond to the Haskell State Monad, Maybe Monad, and List Monad respectively. The goal is to write down naive solutions and then use common patterns that arise to end up with general, monadic functions. In the final set of exercises, you will actually create a Monad type class for yourself and provide the implementations for State, Maybe, and List.

I've been working on the challenges on and off for about the past week, and I've finally gotten my solutions into a (mostly) completed state. For the sake of there being some reference solutions available online, I've posted them to GitHub. I'll probably be tweaking them in the short term, so don't take them to be set in stone just yet.

Many thanks to Doug for doing this work. It's quite impressive to think that right off the bat the tutorial gets you up and running with the State Monad -- one of the most notoriously tricky Monads to understand.

Tuesday, January 26, 2016

More Adventures in Vegan Cooking: Spicy Thai Tofu Cakes and Garlic Mushrooms

Ever since my sister got me a tofu presser for the holidays, I've been on a kick to try many different ways of cooking with it. When I found a dusty old recipe book on my mom's shelf that involved grated tofu, I just had to try it. Out here in northeast Indiana (read: meatville) it can be difficult to find the necessary ingredients for a given dish, and that proved to be the case this time. But with some minor substitutions, I was able to make it work.

First I found a simple recipe for garlic mushrooms.


1 lb package of white mushrooms
3-5 tbsp of coarsely chopped arugula
2 green onion stalks, thinly sliced
Your choice of medium-to-high heat cooking oil (I used peanut oil).
A dash of salt and pepper.
A dash of Sriracha (optional)
2 or 3 cloves of garlic (more if not a vampire) finely chopped


1. Cut the stems from the mushrooms very close to the caps and discard them.

2. Heat the oil in a heavy pan (I used a cast iron skillet).

3. Add the garlic and heat it for 1-2 minutes until it begins to brown. Optionally add a squirt or two of Sriracha to make a spicier garlic base.

4. Add the mushroom caps and move them around a lot to absorb the oil and help prevent the garlic from burning.

5. Once almost all of the oil is gone, reduce heat to medium/low. Occasionally stir the mushrooms and wait for the liquid in the mushrooms to come out. Then raise the heat back to high and cook until the liquid is mostly boiled away, stirring frequently.

6. Move the hot mushrooms to a serving bowl and stir in the arugula and green onions. You could optionally also use parsley or cilantro

The next step is to make a very simple Thai-style chili sauce.


4 tbsp white rice vinegar
1 red chili pepper, finely chopped
1/2 habanero pepper, finely chopped
2 tbsp cilantro, coarsely chopped
2 green onion stalks, thinly sliced
1 tsp of lemon pepper seasoning
A dash of lemon juice


1. Just mix everything together in a small serving bowl. This is a small portion of sauce that will be used to dress the tofu cakes at the end, so you can set it aside.

Finally, here is the recipe for the tofu cakes


Peanut oil (or your choice of alternative)
1 14 oz package of extra firm tofu, pressed until it is very dry.
2 medium scallions, finely chopped
2 cloves of garlic, coarsely chopped
1 1-inch segment of fresh ginger, either grated or finely chopped
1 serrano pepper, finely chopped
1 red chili pepper, finely chopped
1/2 habanero pepper, finely chopped
2 tbsp Thai curry powder
1 tbsp lemon pepper seasoning
3-4 tbsp chickpea flour
White rice vinegar
Salt to taste

Optional: the recipe I found also called for adding coarsely chopped kaffir lime leaves along with finely chopped lemongrass stalk. I absolutely love these ingredients, but I simply couldn't find them in my area. You can substitute by adding some fresh cilantro or arugula along with a tiny amount of lemon juice.


1. Grate (yup, grate) the tofu. I did this with a standard cheese grater. I sliced the block of tofu into six long slabs, and then just grated them like cheese. This is why the tofu needs to be very dry -- if you go to grate wet tofu, you're gonna have a bad time.

2. Combine all of the ingredients in a large mixing bowl. Stir well to ensure seasoning is well distributed.

3. Add the chickpea flour 1 tbsp at a time. If the overall mixture is too dry when you stir in the flour, then add a little bit of the rice vinegar and/or lemon juice to even it out. In the end, you want to get a slightly goopy, paste-like mixture that has the consistency of ground beef prior to forming hamburger patties out of it. You might have to play a little at this step to get it right. It should still be on the dry side when you're done, but it should be malleable enough to form it into balls.

4. Set the mixture in the refrigerator for about an hour to let it thicken.

5. Heat about 4 tbsp of peanut oil in a heavy pan. Wait for it to get very hot before adding the tofu mixture.

6. Once it is hot, grab a small handful of the tofu mixture and form it into a ball. Place it onto the pan, then use a flat spatula to smash it down into a flatter patty shape. You will probably have room in the pan to cook about 4-5 patties at once. I was able to make 12 patties from the recipe's ingredients.

7. Let patties cook about 3-4 minutes per side, paying close attention not to let them burn. Don't try to flip the patties too soon or they might fall apart while you're cooking. Set aside on a plate or on paper towels to drain excess oil.

8. Heat more oil in the pan as needed and continue cooking the patties in batches until you've used all of the tofu mixture.

When you're all done, you can plate the dish by placing a couple of patties surrounded by the mushrooms, and then spooning the chili sauce over top. Voila:

Overall, the dish came out great. The tofu cakes have a texture similar to a quiche but with a crispy exterior. I love spicy food and this dish is very spicy. If you wanted a different take on it, you could remove the sriracha, use a sweeter onion instead of shallots, and replace the spicy peppers with other ingredients. You might even use peanut butter and/or soy sauce to give it more of a peanut flavor than a spicy kick.

The mushrooms, simple though they are, are also very impressive. It's the first time I've ever made mushrooms for myself and felt that they turned out the same way they are served in an Asian restaurant.

In the preface to his book How Music Works, David Byrne tells us

You can’t touch music—it exists only at the moment it is being apprehended—and yet it can profoundly alter how we view the world and our place in it. Music can get us through difficult patches in our lives by changing not only how we feel about ourselves, but also how we feel about everything outside ourselves. It’s powerful stuff.

I firmly believe that the same idea applies to the concept of a meal. Not to mere food, which of course we can see and touch, but to the pan-sensory (rim-shot) ephemeral experience of a whole meal. It only exists as the confluence and superposition of other things. After it's gone, you merely have the remembrance of the thing, and some evidence of a well-made meal:

Thursday, January 21, 2016

Lazy Dynamic Programming Examples in Haskell

Dynamic programming is a problem solving strategy in which recursive function calls are replaced with recursive data lookups. When solving a problem recursively, one often expresses the solution in a top-down manner: you write a function capable of handling general cases and base cases alike, but then you actually execute the function on the general case of interest to you.

For example, a classic recursive problem is to compute the nth Fibonacci number. In Python, a program to do this might look like this:

def fib(n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fib(n-1) + fib(n-2)

The function body handles the base cases, from which a concrete value is returned, and also handles the general case by appealing to recursive function calls.

The problem with writing recursive functions like this is that they tend to have exponential running time complexity because a single function call spawns multiple recursive calls, each of which further spawns more recursive calls, and so forth.

Dynamic programming stands in contrast to this. Dynamic programming seeks to define the result for the base cases concretely, and store them in an array that has enough room to hold all possible results up to and including the general result requested in the form of the function's arguments.

For the Fibonacci problem, it could look something like this:

def fib(n):
    fibs = [0] + [1]*n
    for i in range(2, n+1):
        fibs[i] = fibs[i-1] + fibs[i-2]
    return fibs[n]

In this case, we make an array `fibs` to store the intermediate results of computing Fibonacci numbers up through the nth value. The start of the array represents our base case for zero, just as in the naive recursive version. The next value represents our other base case for one. After that, the values are all 1 -- it's just a temporary fill value in order to preallocate the whole array we will need.

What follows is a straightforward loop rather than a split of recursive function calls. In the loop, we don't make any new function calls and the only operation performed is to add up the two previous elements. By the time we are done with this, the final result for the general case of n will be stored in the nth slot of the array, so we simply access the array and return the value there.

So we can see how for the extra cost of some memory to have the array of intermediate results, we got rid of the need to compute via recursive function calls.

There are many classic problems that make use of this programming strategy and they often are given as whiteboard hazing riddles during needlessly combative programming job interviews. Typically, like in the Fibonacci case above, it amounts to fairly useless trivia that is better left to some internet reference to be looked up if you ever need it, rather than taking up space and cognitive effort if you commit it to memory.

But because many classic dynamic programming problems are not as easy to work through as the Fibonacci example, I became interested in creating at least a small set of reference implementations, just to give myself the experience of dealing with the classic riddles. But, there's a twist. I didn't want to implement these dynamic programming algorithms in a boring, old imperative language like Python of C. That's been done a thousand times already and you can look up the answers anywhere. Instead, I wanted to give it a try in a language with a very different execution paradigm, so I chose to use Haskell.

As a lazy functional language, Haskell does not make the same kinds of guarantees on the precise order of evaluation of the items you express while programming that an imperative language makes. Even in the context of a single data structure, like a Haskell List, some of the items of the list may have been evaluated, while other entries may be thunks that have not yet been needed, or may never be needed.

In Haskell, it's often the case that you want to write programs recursively, and with care to achieve tail call optimization this won't usually have a bad impact on the running time of the code. But in the case of dynamic programming problems, the requirement for tail call optimization is inherently prevented, because the problems by their very nature require multiple recursive function calls that have to be combined after returning. A further complication arises because Haskell's data structures are immutable by default. So we can't use the same trick from an imperative language -- if we "pre-allocate" an array to store the results in Haskell, we'll have no way of actually placing the results into it because we cannot mutate it!

It sounds like it may be impossible to do this kind of thing in a lazy, pure language. But it turns out not only can you do it, but it's pretty awesome how you do it. What you do is to make a helper function that contains the base cases and also indexes into your result-holding list. Then, separately, you make that list have a definition which is mutually recursive with your helper function.

Entries of the list are defined in terms of calls to the helper function -- which in turn is defined by accesses into the list or by base cases. As long as the construction of the list is allowed to be performed lazily, there won't be any issues and this mutual recursion will sort of "zipper" the intermediate results through a list where they can be referenced when later on cases need to use them.

Best now is to show several examples. All of the code is available on GitHub.

The first example that got me started comes from a blog post called Lazy Dynamic Programming. The post offers a lazy dynamic programming solution to calculating the edit distance between two strings.

I made some slight edits and commentary for my own ease of reading. Note also that instead of using a plain Haskell List, I often use a Data.Array -- just as the post suggests. The reason is that indexing arbitrarily into a plain List is an O(n) operation -- you must traverse from the start -- whereas it's O(1) for an Array. Since the whole point of dynamic programming is to efficiently access a pre-computed answer, rather than slowly re-compute it via recursive calls, we don't want the operation of accessing the cached results itself to be inefficient.

Here's the code:

Edit Distance Problem (blog source) (github)
module EditDistance where
import Data.Array((!))
import qualified Data.Array as A
 | This code is modified directly from the blog post ``Lazy Dynamic
 | Programming`` retrieved on January 2, 2016, from
 | < >.

 | Slow recursive implementation of string edit distance. In this
 | algorithm, the helper function `d` is not memoized or cached, so
 | each recursive call must recompute results of `d` for earlier
 | substrings. This results in an exponential running time.
naiveEditDistance :: String -> String -> Int
naiveEditDistance a b = d (length a) (length b)
  where d i 0 = i
        d 0 j = j
        d i j
          | a !! (i - 1) ==  b !! (j - 1) = d (i - 1) (j - 1)
          | otherwise = minimum [ 1 + d (i - 1) j
                                , 1 + d i (j - 1)
                                , 1 + d (i - 1) (j - 1)
 | In this case, use a recursively defined Array as a means for
 | caching the results of intermediate calculations. Under the hood
 | because of the way Haskell automatically deals with the evaluation
 | of lazy computations, it means each substring's edit distance will
 | only be calculated one time.
basicEditDistance :: String -> String -> Int
basicEditDistance a b = d m n
  where (m, n) = (length a, length b)
        d i 0  = i
        d 0 j  = j
        d i j
          | a !! (i - 1) ==  b !! (j - 1) = ds ! (i - 1, j - 1)
          | otherwise = minimum [ 1 + ds ! (i - 1, j)
                                , 1 + ds ! (i, j - 1)
                                , 1 + ds ! (i - 1, j - 1)
        bounds = ((0, 0), (m, n))
        ds = A.listArray bounds [d i j | (i, j) <- A.range bounds]
 | In the final version, all use of List and (!!) index access is replaced
 | by more robust Array indexing (!). The arrays are also set to be 1-indexed
 | which very slightly tweaks the formulas for reading from them.
betterEditDistance :: String -> String -> Int
betterEditDistance a b = d m n
  where (m, n) = (length a, length b)
        a'     = A.listArray (1, m) a
        b'     = A.listArray (1, n) b
        d i 0 = i
        d 0 j = j
        d i j
          | a' ! i ==  b' ! j = ds ! (i - 1, j - 1)
          | otherwise = minimum [ 1 + ds ! (i - 1, j)  
                                , 1 + ds ! (i, j - 1)  
                                , 1 + ds ! (i - 1, j - 1)
        bounds = ((0, 0), (m, n))
        ds = A.listArray bounds [d i j | (i, j) <- A.range bounds]
The secret sauce is in the `where` clause definitions for the helper function `d` and the list of cached results `ds`. Basically, we are saying that the function computation `d i j` produces the solution for the problem for a case when the first string has length `i` and the second string has length `j`. But note how the helper function actually works inside. It has several base cases, and then in the general case both options work by simply accessing other entries in `ds`. But then, how is the array `ds` defined? It's defined as a lazy Array of all potential function calls to `d`!

If you're like me, you'll need to stare at the definition of `d` and the definition of `ds` several times. What is the benefit of this? How is it different than just plain recursive calls to the helper function `d`?

The difference is that `ds` as a container allows function calls to `d` to be "zippered" through -- evaluated from being a thunk to being a concrete value as they go. When this happens, now you've got a concrete, evaluated entry in `ds` that can be merely accessed rather than re-computed as some recursive call from `d`. So in the end, the mutual recursive between `ds` and `d` is exactly what allows you to re-use previously computed steps -- which is the whole point of dynamic programming.

For completeness, I will share three other code samples below for some other classic dynamic programming problems. They all follow more or less the same format as above and I borrowed heavily from this first solution to think about how the other solutions would go. The only one that has any slightly different wrinkle is the Rod Cutting Problem, which requires not only recursion, but recursion within a loop in an imperative language, and so you've got to restructure it a little even just to write the naive recursive solution in a functional language.

0-1 Knapsack Problem (github)

module Knapsack where
import Data.Array((!))
import qualified Data.Array as A
type Value = Int
type Values = [Value]
type Weight = Int
type Weights = [Weight]
 | Plain recursive implementation.
naiveKnapsack :: Weight -> Weights -> Values -> Value
naiveKnapsack 0 _ _  = 0
naiveKnapsack wt wts vals
    | length wts /= length vals = error "Mismatching lengths."
    | length wts == 0 = 0
    | last wts > wt = naiveKnapsack wt (init wts) (init vals)
    | otherwise = max includingFinalElement ignoringFinalElement
        where initVals              = init vals
              lastValue             = last vals
              newWeight             = wt - (last wts)
              initWeights           = init wts
              ignoringFinalElement  = naiveKnapsack wt initWeights initVals
              includingFinalElement = lastValue +
                  (naiveKnapsack newWeight initWeights initVals)
 | Improve on naive solution by using a recursively defined array for dynamic
 | programming.
basicKnapsack :: Weight -> Weights -> Values -> Value
basicKnapsack wt wts vals
    | length wts /= length vals = error "Mismatching lengths."
    | otherwise = d n wt
        where n      = length vals
              wts'   = A.listArray (0, n-1) wts
              vals'  = A.listArray (0, n-1) vals
              bounds = ((0, 0), (n, wt))
              d 0 j = 0
              d i 0 = 0
              d i j
                  | j < wts' ! (i-1) = ds ! (i-1, j)
                  | otherwise = max a b
                      where a = vals' ! (i-1) + ds ! (i-1, (j - wts' ! (i-1)))
                            b = ds ! (i-1, j)
              ds :: A.Array (Int, Weight) Value
              ds = A.listArray bounds [d i j | (i, j) <- A.range bounds]

Subset Sum Problem (github)

module SubsetSum where
import Data.Array((!))
import qualified Data.Array as A
 | This code is my own implementation of the subset-sum algorithms, but it
 | is based on the edit distance implementations found at the blog post
 | ``Lazy Dynamic Programming`` retrieved on January 2, 2016, from
 | < >. The blog post is a
 | tutorial on the mutually recursive array and function used for doing
 | dynamic programming in a lazy functional language.

 | Slow recursive implementation of subset sum, which doesn't store
 | any intermediate calculations, and hence is exponential.
naiveSubsetSum :: [Int] -> Int -> Bool
naiveSubsetSum _ 0 = True
naiveSubsetSum [] _ = False
naiveSubsetSum arr sum  
    | (last arr) > sum = naiveSubsetSum (init arr) sum
    | otherwise = ignoringLastElement || includingLastElement
        where front = init arr
              ignoringLastElement  = naiveSubsetSum front sum
              includingLastElement = naiveSubsetSum front (sum - (last arr))

 | Improve on naive solution by using a recursively defined array.
basicSubsetSum :: [Int] -> Int -> Bool
basicSubsetSum arr sum = d sum n
    where n      = length arr
          arr'   = A.listArray (0, n-1) arr
          bounds = ((0, 0), (sum, n))
          d :: Int -> Int -> Bool
          d i 0  = True
          d 0 j  = False
          d i j
              | i > arr' ! (j-1) = ds ! (i, j-1)
              | otherwise = ignoringLastElement || includingLastElement
                  where ignoringLastElement  = ds ! (i, j-1)
                        includingLastElement = ds ! ((i - arr' ! (j-1)), (j-1))
          ds :: A.Array (Int, Int) Bool
          ds = A.listArray bounds [d i j | (i, j) <- A.range bounds]

Rod Cutting Problem (github)

module RodCutting where
import Data.Array((!))
import qualified Data.Array as A
type Price = Int
type Prices = [Price]
type RodLength = Int
-- Helper for eager evaluation in folds. In the imperative solutions to
-- the rod cutting problem, loops are used, which will be replaced with
-- folds. This implementation is used from
--     <' >
-- retrieved on January 3, 2016.
foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f z []     = z
foldl' f z (x:xs) = seq z' $\$$ foldl' f z' xs
    where z' = z `f` x
 | Plain recursive implementation. Each recursive call to naiveRodCutting
 | results in a fold that will perform further calls to naiveRodCutting
 | until reaching the base case of a zero length rod. Thus, it is very
 | exponential.
naiveRodCutting :: Prices -> RodLength -> Price
naiveRodCutting prc rl = foldl' (naiveRodCutting' prc rl) 0 [0..rl-1]
 | I use a helper function to store the state of "current price" and
 | "position" as part of the arguments. The only computation the helper
 | function does is to select the max between the current price and
 | the next calculated price, thus this can be used in a fold. However,
 | the next price in the where clause is where the recursion happens.
naiveRodCutting' :: Prices -> RodLength -> Price -> Int -> Price
naiveRodCutting' [] _ _ _ = 0
naiveRodCutting' prc rl curPrc pos = max curPrc newPrc
    where p      = prc !! pos
          newRl  = rl - pos - 1
          newPrc = p + naiveRodCutting prc newRl
 | Improve on naive solution by using a recursively defined array for dynamic
 | programming. The overall solution is the function `v` evaluated on the rod
 | length. But v references entries in the array vs which in turn bottoms out
 | with base cases of function v. A helper function is used to allow for the
 | same fold styled computation, but not that instead of recursing, the helper
 | merely accesses previous calculations in vs.
basicRodCutting :: Prices -> RodLength -> Price
basicRodCutting prc rl = v rl
    where prc'         = A.listArray (0, rl-1) prc
          -- v carries out the work of the calculation by indexing into vs
          -- through the use of a helper function.
          v :: RodLength -> Price
          v 0          = 0
          v ii         = foldl' (helper ii) 0 [0..ii-1]
          -- Same as the outer loop bounds in an imperative implementation.
          bounds       = (0, rl)
          -- vs is the array recursively defined in terms of function v
          vs :: A.Array RodLength Price
          vs           = A.listArray bounds [v i | i <- A.range bounds]
          -- helper is used in a fold to compare the current max price with
          -- the price obtained via a recursive lookup into vs.
          helper i x j = max x (prc' ! j + vs ! (i - j - 1)) 

Monday, January 18, 2016

Paul Graham Might Be Wrong About Income Inequality

Paul Graham writes that income inequality, in and of itself, can sometimes be the result of good things. I believe he wrote about this much more eloquently in an older post called Great Hackers, where Graham writes:

"I didn't say ... that variation in wealth was in itself a good thing. I said in some situations it might be a sign of good things. A throbbing headache is not a good thing, but it can be a sign of a good thing-- for example, that you're recovering consciousness after being hit on the head. 
Variation in wealth can be a sign of variation in productivity. (In a society of one, they're identical.) And that is almost certainly a good thing: if your society has no variation in productivity, it's probably not because everyone is Thomas Edison. It's probably because you have no Thomas Edisons. 
In a low-tech society you don't see much variation in productivity. If you have a tribe of nomads collecting sticks for a fire, how much more productive is the best stick gatherer going to be than the worst? A factor of two? Whereas when you hand people a complex tool like a computer, the variation in what they can do with it is enormous."

Variation of income, to Graham, is ideally about variation in productivity. Want more income? Ok, cool. Then be more productive. In this sense, we should not vilify Graham for taking the perspective that he does. Whether we agree with him or not, there is evidence from Graham's writing over a long time span that his primary interest in this topic is about stimulating greater productivity, and not defending unqualified amassed wealth, and certainly not defending people who have amassed wealth without providing society with compensating productivity.

When "productivity" is somehow tethered to "value for society at large," then, according to my understanding of Graham's position, this form of income inequality is a good thing. It means that society, en masse, has agreed to transfer some wealth to an especially productive individual or group, and in return society, en masse, has gotten more than its money's worth of newly created value out of that deal. Society at large has gotten more than the ~60 Billion of net worth that Larry and Sergey have. Yes, Larry and Sergey are very well off now, but, Graham would argue, society got the better end of that deal, and the fact that Larry and Sergey have income vastly unequal to yours or mine is a good thing because it represents the even greater degree of cumulative value that society got.

There is an assumption underlying what Graham writes. The assumption is that people are mostly only productive in response to chances to win more and greater consumptive experiences.

This is not all that obvious so I want to say it again and provide some description around it.

The assumption is that people are mostly only productive in response to chances to win more and greater consumptive experiences.

By consumptive experiences I mean things like access to greater medical care, better quality food and nutrition, access to cosmetic surgery and professional services, like personal exercise training, that may improve health and longevity, and access to sexual opportunities like exclusive sex parties or sex with people who would otherwise be too genetically superior to select you as a mate; and also access through wealth to political power, membership in exclusive clubs, friendship with celebrities or other high-status individuals; the ability to secure a more comfortable future for your offspring, and to secure a more comfortable decline into death for yourself and others, and many other things. And the extremity of such consumptive experiences probably will only increase over time, and is likely itself a function of the degree of wealth inequality and affordances of technological progress.

What this means is that, in Graham's world, the action of being productive and creating value for others is not intrinsically rewarding, or at least any intrinsic reward for exhibiting productivity is not great enough to sufficiently incentivize people towards it.

Instead, in order for people to want to be productive and to execute on that wish and actually be productive, people must be incentivized by the opportunity to convert their productivity into more and greater consumptive experiences.

And so, if society wants the productivity that these folks can provide, society must be willing to impart more and greater consumptive experiences onto these individuals. And in the context of modern society, this generally means giving someone either an income vastly unequal to the income of most (giving them substantially more income) or giving them power to circumvent political and legal limitations, or both.

As a side note, I believe this also helps explain why sociopaths and psychopaths tend to orient themselves towards positions in high finance, entrepreneurship, and politics. These individuals are not encumbered by the same social norms that most humans have -- social norms which tell us to devalue purely consumptive experiences and to have a sense of what excess means and to not violate principles of excess. Most people incorporate these types of values into their way of life and choose to derive personal value from things that are not widely regarded as consumptive experiences, like the simple pleasures of life and family and friends. We might aspire to more and greater consumptive experiences to some degree, but we work hard to put a ceiling on that and have common norms about why going past that ceiling of consumptive experiences is ethically questionable. Sociopaths and psychopaths tend not to feel the same way. They tend to feel that materialistic consumptive experiences are the only verifiably utility-increasing activities that an entity can undertake, and so they attempt to live life in a manner more suited to optimizing for more and greater consumptive experiences, and they also espouse ideas that make our overall systems of living more amenable to their pursuit of more and greater consumptive experiences even when it might be destructive to others.

Now, the biggest question to me seems to be whether it really is true that people will only be productive in response to opportunities for more and greater consumptive experiences. Is that really true about people? I don't pretend to know the answer. But I do think it is not obvious what the answer is and that there is not one single answer that covers all states of affairs in the world.

Capitalists, generally speaking, probably want the answer to be: "yes, people are only productive in response to opportunities for more and greater consumptive experiences." That way, the activity of deploying capital is central to human productivity and it glorifies the capitalist and solidifies them in a position of importance and power. And since capitalism is the prevailing way of economic life in most of the world, and it is often the only type of labor market that humans are exposed to, there is a powerful force of creature comfort at work. We are comfortable with capitalism, but just because some productivity has been driven by the deployment of capital (in the form of coaxing productivity out of people who want more and greater consumptive experiences) does not prove that this is the best way nor the only way to capture that productivity, and leaves open the possibility that we are failing to capture sources of productivity that do not resonate with such an incentive scheme.

I should be careful to point out that this is not an argument against capitalism. All I am saying is that we live on the inside of a capitalist system, and we need an outside view of things, preferably with well-controlled experiments, to offer real information about what best incentivizes human productivity. We have some biased and limited evidence that capital markets do add value. But by and large the world hasn't tried any other system; we don't really know that other systems would add less value. And we also know with great certainty that capitalist systems do lead to many undesired outcomes as well, like regulatory capture, and the cognitive dissonance of a world in which there can simultaneously be a person in deep thought about which yacht to purchase and a person in deep thought about which dumpster they will search through for tonight's dinner.

In my way of thinking, there are also limited examples of evidence that humans can be especially productive even when not incentivized by opportunities for more and greater consumptive experiences. For example, in the book Predictably Irrational by Dan Ariely, there are experiments which show that when people are asked to perform a task as a personal favor, they tend to work much harder than when they are told they will be compensated for their labor with money or given a gift. I understand that such a tiny experiment may not scale up to the larger issues of massive scale productivity, but the point is that we don't really know, and at the very least it should give us a lot of pause before unblinkingly accepting the idea that variation in wealth is a necessary or good by-product of variation in productivity.

What would a world look like where we did have variation in productivity, but we did not have much variation in wealth? One possibility is that people would be freed from burdensome heteronomous goals and allowed to pursue autonomous goals instead.

People would be interested in being productive because doing so made them happy for intrinsic reasons, and they did not expect that they should get outward rewards greatly beyond their fellow man for doing so.

I should not get the opportunity to attend exclusive sex parties in exchange for creating a globally transformative search engine technology. That seems like a reasonable sentence to me, not because there is something wrong with sex parties (if that's your cup of tea) but because it is unreasonable to think that the productivity of Edisons would only blossom in response to such "rewards."

Everyone, regardless of intrinsic aptitudes, would expect to have excellent opportunities for consumptive experiences, and everyone would celebrate the productivity of those people who are predisposed to being inventive and productive. The Edisons of the world have an internal desire to create and invent, mostly for purposes widely understood to be good, and they would do so regardless of whether or not they were rewarded with consumptive experiences that were vastly superior to their fellow man. The non-Edisons of the world would still do what they can, and would not have to live in the fear that their non-Edison-ness relegates them to a life of lessened consumptive opportunities (and, ideally, also not a life of lessened social status either).

On the other hand, when people state an explicit goal that they should have more and greater consumptive experiences than their fellow man (or when that goal is imputed from their actions), that is usually a great signal that such a person is not an Edison. They don't have the itch to invent or to be productive, but rather they merely have the itch for greater consumptive experiences and either they can make themselves be inventive or they dupe other Edison-like inventive people into cheaply providing the productivity on their behalf (the latter of these is usually referred to as a 'start-up').

As I mentioned before, sociopaths and psychopaths often exhibit this trait that their up-front primary concern is for obtaining more and better consumptive experiences. And in a world where variation in productivity doesn't yield a significant variation in wealth, we might see selection pressure getting rid of sociopathy or psychopathy, instead of rewarding it as we do now.

This is why I believe Paul Graham might be wrong. Graham is not wrong because "income inequality" as a vague concept is "bad." This is what most criticisms of him have said, and those criticisms are naive. Graham is very correct to say that the vague notion of "income inequality" consists of many different sub-components, some of which are considered to be bad and some of which are considered to be good. One sub-component is that in our current system, capital is often deployed as a means to incentivize productivity, and allows for people to obtain more and better consumptive experiences by being productive. Graham believes this is the fundamental aspect of variation in wealth, and that fundamentally it is good and delivers overall value bargains to society. And that is the reason Graham might be wrong.