Tuesday, June 23, 2015

And Now For Something Completely Different: Marzipaste! ... and Some Vegan Adventures

I rarely crave sweets and now that I've been experimenting with a vegan diet for a few months (eliminating dairy-based ice cream from contention) my drive for sweets has been even lower than normal. But one thing I do enjoy is marzipan (both the delicious candy that I will sort of explain how to make and also the lovable Homestar Runner character from my youth).

I've also been experimenting (partly out of necessity and partly out of nerdy love for optimization) with hyper optimizing my monthly grocery budget. So I noticed I had a bunch of left over sliced almonds and I wondered what exactly it would take to use them for marzipan.

I consulted a number of easily Googleable recipes, but they all required almond flour and rose water, which are expensive, obscure, and not used in many other recipes, or else honey which I didn't have at the time and didn't want to break my optimization streak to buy. So then, like any sane engineer, I looked around my kitchen and I said, "hmm... what is sort of like honey?"

The glorious answer: St. Germain and chocolate chips! I mixed 1/4 cup of dairy-free chocolate chips with a dash of vanilla extract and about 1/6 cup of St. Germain, and simmered the mixture in 1/3 cup of water until the chocolate chips were melted, then I stirred in 1 cup of sugar and continued stirring the mixture while it simmered until the sugar was dissolved.

Separately I put 2 and 1/2 cups of sliced almonds into a small food processor and pulsed them until they were coarsely chopped. Then I added the not-quite-boiling sugar/chocolate/vanilla/St. Germain/water/Thor's blood mixture to the food processor and I processed on high for about 1 minute continuously. I stopped to scrape down the sides of the processor bowl and repeated additional 1-minute increments of processing until the mixture acquired a gooey almond paste consistency.

Warning! As you can see, I totally put more stuff into this tiny food processor than I should have and as a result I am pretty sure that I nearly burnt out the motor. Since then, I've invested in a larger food processor, but you could do equally as well by just making this recipe in smaller batches and combining them into the same storage container at the end.

I placed the mixture into a plastic-wrap-lined container and left it in the fridge over night. It did not develop the same firmness that typical marzipan has, though it was slightly more firm than almond paste or almond butter. It was definitely spreadable and so I enjoyed it on some English muffins... voila:

Not too shabs for outright refusal to buy any ingredients I didn't happen to already have on hand. And I think the St. Germain/chocolate chip mixture was a delicious substitute for honey.

While I have been a strict vegetarian for almost 5 years now, this recent foray into vegan cooking has been most interesting. I was prompted to try it out from a simple conversation with my brother (he is also a vegetarian/trial-vegan). He pointed out that pretty much the only source of animal products in my life was cheese. Particularly cheese on pizza. I'm fairly health-conscious about my diet and I love cooking new things, but pizza is one of those guilty pleasures that just will be a part of my life forever and there's no sense in trying to force myself to be otherwise. Pizza is just going to happen.

So I said to myself that if I could find some kind of vegan substitute for cheese that worked well enough on a vegan pizza, then I would give a vegan diet a try. After all, I already prefer soy or almond milk to dairy milk, I drink coffee black, and I don't care much for traditional dessert foods (except ice cream, but I'll get to that in a minute).

I found such a head-slappingly simple vegan cheese alternative, vegan Parmesan cheese, that at first I thought it must be disgusting. That's never stopped me from eating things before, so I went ahead and made a pizza with some roasted veggies, baby spinach, and a tomato sauce that I tweaked with some paprika, curry and garlic powder, and some herbs.

I must say that the vegan Parmesan substitute was delicious. It offered me everything I was looking for on a pizza and I'm certainly content to trade-off whatever extra satisfaction the dairy fat from cheese might provide in order to get the calorie reduction and avoidance of creating further economic demand for animal food products. Here's a photo using prepared pizza dough... I prefer rolling my own dough, but it requires use of a lot of flour and cornmeal to make sure I don't screw up the transfer to the baking stone -- a source of wasted supplies that I don't want to incur during my budget optimization experiments.

I mentioned ice cream above as well. That is one dairy-based dessert that seduces me. So I wanted to find a way to make a dairy-free alternative. I found this interesting recipe for Moscato ice cream and flipped my bad ass welder's helmet visor down, blasted a flame out of my bad ass welding torch that I just carry around, and said "let's do it."

For this, I researched a bit on baking substitutes for vegan cooking and decided that either silken tofu or coconut cream were going to be my best bets. I opted for coconut cream, but it was somewhat of a challenge. First, I walked to not one but two eccentric health food stores, and a Whole Foods market, nearby and exactly none of them had coconut cream in stock. They had plenty of coconut oil, coconut "manna", and cream of coconut, but these are wrong, as I politely told the grocery store clerks who insisted these were what I needed.

Finally I had to order coconut cream from Amazon and wait two g.d. days before I could make the ice cream! I also substituted cherries for blackberries which created a lot of extra work (e.g. it's not a good idea to just throw un-pitted cherries into a food processor; and even when you get them pureed, it is more like a jam than a juice so you have to force it through a mesh strainer to harvest juice from it). But in the end, simply replacing dairy cream with coconut cream worked extremely well. I used an immersion blender instead of a hand mixer. Because of the slightly different fat composition of coconut cream, you won't get exactly the same fluffy, aerated texture you would from dairy cream, but the mixture still freezes well. When serving it, I recommend taking a little bit out of the freezer about 15 minutes in advance, letting it melt a little, and then spooning that on top of the fully frozen ice cream. It helps provide a slightly creamier texture.

The last vegan item that I want to highlight is tofu ricotta. Wait. Let me correct that. Tofu Badass Rambo Ricotta. This stuff is amazing and equally as head-slappingly easy to make as the vegan Parmesan cheese. I used this recipe (the tofu ricotta section) and since I already had a bunch of prepared vegan Parmesan cheese, it was super easy. I did opt to use a food processor rather than hand mixing and I used the optional vegan cream cheese. With the food processor this turns out much creamier and packs a lot of flavor. I can easily imagine that if you season it differently, this can serve as the base for custard or pudding, or even be used as part of a pie filling. I used it to make a baked rigatoni recipe, and saved some left overs to add as an extra ricotta "cheese" topping to the vegan pizza I mentioned before.

So far vegan cooking has been delightful, tasty, and at least as cost-effective as my previous vegetarian grocery shopping (probably more so). Yum!

Sunday, May 31, 2015

Search Engine Sexiness

Search engine optimization (SEO) and the more general problem of user experience optimization (UXO) are ragingly popular fields of study these days. The basic idea is to use qualitative and quantitative methods to determine any sort of factor that a web service or content creator has at her or his disposal and which appears to profitably steer the attention of users.

This problem is not new at all and has certainly been around since the beginning of monetized media. In the book Bias, Bernard Goldberg discusses the first ever Spring Break story aired on 48 Hours (in the early 1990s when 48 Hours was threatened with cancellation) and the dramatic increase in ratings this type of journalism brought to the show. This is clearly a form of user experience optimization.

Not all forms of SEO or UXO are bad things. One simple way of optimizing your content is to create good content, and presumably good content is good for people. If you want more people to visit your blog, write better articles (advice I really should pay more attention to). If you want more people to put their eyeballs on your mobile app, make a better app. In this sense, it's perfectly healthy and beneficial for businesses to be incentivized by seeking profit opportunities based on upping their own quality, and using qualitative or quantitative methods to figure out from customer preferences how best to up their own quality seems like a fine and smart thing to do.

But a lot of SEO focuses on clicks. Someone famous should say the phrase "you are what you click" if they haven't already.

There is a clear moral hazard: businesses are incentivized to cause consumers to click on things. The key issue is that they are not necessarily incentivized merely to provide the opportunity to choose to click on things, but to go further and figure out if they can reliably cause clicks to happen, through big data and frightening research into consumer psychology.

If you had a magic wand you could use to point at someone and make them click on whatever you wanted, every advertiser in the world would be competing to have access to you. If, instead, you can merely promise that consumers will on average or probably click on things, then your ability to make profit from those advertisers is a lot more speculative. Even worse, if you have any sort of ethical constraint such that you don't want to put clickbait in front of someone's eyeballs unless you have confidence that it's actually in line with their long-term, higher-level thinking capacity's assessment of what is in their own best interest, then you're doomed because advertisers who might pay don't care at all about that, only the precious, precious, clicks (yes, this is how I imagine all advertisers).

To make matters worse, decisions to click on things are extremely short-term. The timescale of these decisions is on the order of time it takes for your eye to make saccade motions across a screen. This is quite a bit different even from television advertisement, where you have multiple seconds or more to digest an ad, and pretty much the fastest way you can interact with the ad is to follow some instructions to navigate to a website, open an app, send a text message to a number, or call someone. All of these actions require a huge chain reaction of conscious thought involving many seconds to whole minutes worth of time, giving your higher-level thinking faculties many opportunities to veto the decision by comparing it with the values, beliefs, and preferences you have.

But with millisecond timescales, the thinking faculties being engaged by click-driven optimization are far removed from conscious awareness of your personal priorities, values, beliefs, etc. They are driven far more by basal desires, cognitive biases, impulses, and short-term preferences. As a result, a huge component of work on SEO and UXO focuses on how to quantitatively manipulate short-term foibles and evolutionary biases in human thinking.

To be clear, this happens even if a company wants to be an upstanding company and does not overtly seek to monetize cognitive foibles. The problem is that they are incentivized to cause clicks, not incentivized to legitimately win clicks. So a company might sit back and says, "Hey, we don't mean any harm and we're not malicious, we're just doing what our big data driven engine told us to do in order to cause clicks, because that's how we as a business make money." It wouldn't matter. The outcome of that operation, even if no explicit malicious intentions exist, would be exploitative of consumers by data mining techniques for exploiting short-term cognitive foibles that govern click choices.

On this topic I had an interesting encounter with someone from Stack Exchange, which I'll summarize below. The conversation took place in some comments on a question I posted to Meta Stack Exchange, a site about the policies of Stack Exchange. My question focused on a request to remove a certain way of ordering job listings on the Stack Exchange careers site.

At that site, job ads are ordered by an "interesting" property that attempts to quantify how interested you would be in a job. My personal experience is that this service leads to jobs that are wildly uninteresting for me and that I would prefer if it were easier to see jobs listed by the date of their posting rather than by whether Stack Exchange's algorithm speculates that I am interested.

Here's the relevant portion of the discussion, I'll use "SO" as the initials for the user from Stack Exchange and "ES" for my comments:

SO: We have strong evidence that the interesting tab leads to more clicks and more applications (by up to 30%!) so, in the general case, users benefit from the interesting tab being the default tab.

ES: Hmm, that's an interesting claim. I'm not sure that I believe the effect of leading to more clicks or more applications is necessarily beneficial to the people making those clicks or applications. I'd have to think about it more. It's a bit like the moral hazard of data-driven ad systems. The mechanism by which they cause a consumer to make a click decision is not at all clearly in the best interest of the consumer. It could be the case that they were prompted by some other causal factor about the ad, one which they consciously would argue motivates them to not click.

SO: Your opinion and use case definitely deserves some consideration :) Here we're not talking about ads, only the sort order changes. It is not clear from the tab names, but "interesting" and "relevance" are two different things: "interesting" emphasizes jobs that are interesting to you, while "relevant" emphasizes jobs that have the best match with the keywords you entered.

ES: To make a shamelessly extreme example, you probably could get an even bigger increase in clicks and applications by creating a "sexiness" tab, and displaying pictures of attractive male and female models around the jobs (or letting the companies pay to do this). This would certainly motivate people to click, but probably on conscious reflection, they would say it's not really in their best interest to click to locate jobs this way. Yet it would be very hard to decouple the extremely short-term mental processes governing clicks from the longer term processes reasoning about appropriateness. Yes, yes, it's an extreme example, but I think the idea carries well to any sort of data-driven optimization tool which has as a goal to drive up the amount of short-term attention paid to a thing (whether it is an ad or a job description or whatever). There's a fine line between the service provided by locating appropriate opportunities that attract a click for real reasons versus exploiting cognitive heuristics and biases to win "empty" clicks. It's probably a really hard problem to quantify how this happens, which is why I don't trust click statistics as an unequivocal success metric.

Maybe a better suggestion would have been a "cat" tab with pictures of kitties surrounding the job ads.

At any rate it draws a suggestive line. We all know that merchants and service vendors use sex to sell things. That dead horse will never stop being beaten.  But why isn't it common to drench advertisements for somewhat legitimate things (like jobs) with basal clickbait like sex or kitties? At one level we know better. We know that it makes no sense to sexualize our decision to pick a job, even though elite firms favor sexually attractive employees. But there is certainly a pressure to cause clicks, even it means shamelessly appealing to basal desires that will take over in a millisecond-scale click decision time frame. So why don't we see it?

So it's something of a puzzle. I want to think about it more before speculating about explanations.

Saturday, May 30, 2015

Invoke Haskell Functions from Python!

Probably the only word to describe how I felt after meeting the Haskell programming language is smitten. Immediately I felt an impulsive and completely immature gush of infatuation: everyone should write everything in Haskell! This is nonsense of course. It would be somewhat like saying that every poem anybody ever writes should be a haiku. Functional programming (and Haskell in particular) has many warts and inconveniences. It's not always the right tool for the job, but it often is an excellent tool for a wide range of jobs, and it is certainly a vastly underappreciated tool.

At the end of the day you still have to live with real-world constraints and in the sphere of software development where I travel this means mostly sticking with Python and occasionally sprinkling in some C code for performance-intensive applications. Yet my affinity for Haskell is indefatigable. So, naturally, I want to write applications in Haskell where I get great benefits from excellent static typing, expressive type systems, functional purity, and many other tools. But then, I want to be able to use them ... so wouldn't it be nice if I could invoke Haskell functions from Python?

This is a short tutorial for doing that in one particular manner: using Cython to create a C-based Python extension module that wraps compiled Haskell code. The basic steps go like this:
  • Write some Haskell code (with some part of it written to use the C Foreign Function Interface) and a special C module to help with initialization of Haskell within a C program.
  • Compile the initializing C module and the Haskell module with GHC and necessary compiler flags for building a shared library.
  • Write a C header file that externally declares any Haskell functions you wish to call, and the function prototypes for any C-defined wrapper functions you will want to call later from Python (this will allow for Cython external definitions as discussed later).
  • Write a Cython source file (.pyx) that externally defines any functions desired from your compiled shared library, and then write Python-accessible functions (e.g. with def or cpdef in Cython) that wrap the externally defined functions.
  • Create a setup.py file to be used with distutils and Cython's build and compile options. Specify that the extension module is built against the C module and include any necessary library directories and compiler options.
  • Import the Cython-generated extension module and call a Haskell function from Python!
For the purposes of this tutorial, we'll do something outrageously simple: create a function that accepts an integer, adds 1 to it, and returns the result of that addition. Once we get this working, it will be clear how to modify things to create more advanced Haskell functions, but still with some limiting restrictions that I will describe in my concluding remarks.

In order to follow along, you will need the following (or whatever equivalents you are comfortable with in your chosen computing platform):
  • The Glasgow Haskell Compiler (GHC) and the Haskell dynamic libraries. Ubuntu (and perhaps others) requires an extra installation of the package ghc-dynamic. Without this, you may get an error about being unable to find Prelude or base when compiling to a shared library. I used version 7.6.3, and there is a particular compiler option needed that is sensitive to the GHC version, so watch out for this.
  • The GNU Compiler Collection's C compiler, commonly just GCC. I am using version 4.8.2, but did not notice any part of this exercise that was overly sensitive to the version. This will be used implicitly by Cython.
  • Python. If you clicked to read this, you probably have Python installed. If not, I recommend the Anaconda Python distribution from Continuum Analytics, and the related conda package management tool. I use Python 2.7.9, but did not notice anything that would be overly sensitive to the version. In particular, since I am not using NumPy, the example should also work without much effort for PyPy and Python 3.
  • Cython. Cython is a programming language that allows for the expression of native Python constructs using native Python syntax, the expression of pure C constructs using Python-like syntax with extra annotations, and special expression of constructs that interoperate between C and Python, such as extension data types, pre-existing C libraries, manipulation of the CPython API, or functions that automatically have both a C and Python implementation. I use Cython version 0.22 which can be easily installed with conda.
  • distutils, a Python library that assists in the building and distribution of a Python module or package and has special compiler capabilities that are used heavily by Cython (it's easy to find a discussion of Cython and distutils at almost any Cython tutorial).
And with all of that out of the way, let's write some code!

Part One: Compile Your Haskell Code

First let us write an excruciatingly simple Haskell function that adds 1 to whatever integer it is given. Because we will want to call this function from C, and eventually Python, we will not use a Haskell integer for the data type. Instead we will use a C integer. This is a design choice which I will elaborate on later.

File: Test.hs
module Test where
import Foreign.C.Types
foreign export ccall "addOne" addOne :: CInt -> CInt
addOne = fromIntegral . (+ 1) . fromIntegral

Some things to note: we need to create this as a module or else GHC will complain that there is no main function to serve as an entry point. We also forego the standard Haskell type declaration syntax in favor of a foreign export statement that includes type information (here, our function accepts a CInt and returns a CInt, since we'll be using it from C).

Here there is a special step which is not at all obvious. The Haskell Foreign Function Interface has a C API (given from HsFFI.h). We are going to use this API to deal with handling the Haskell function in C code.

One of the conventions of the API is that it must be initialized before any Haskell-compiled structures can be used, and should also be freed prior to program termination. We'll need to do this within some C code, and then find a way to tack on that C code to our compiled Haskell module.

To do this, we create another file:

File: hswrapper.c
#include <HsFFI.h>

static void _enter(void) __attribute__((constructor));
static void _enter(void){
    static int argc = 1;
    static char *argv[] = {"libHSTest.so", 0};
    static char **argv_ = argv;
    hs_init(&argc, &argv_);

static void _exit(void) __attribute__((destructor));
static void _exit(void){

Let's talk about this. First we include the Haskell FFI's C API. Next we create a function called _enter that will represent the initialization of the Haskell in our code. This typically receives the arguments that would be passed to a C main function, which is why I mock the names argc and argv here, but the names are not important. We pass into hs_init the name of the shared library that our Haskell code will be compiled into, and a count (trivially 1) of the number of arguments passed. As you might guess, hs_init does the heavy lifting for accessing compiled Haskell objects. Lastly we have a wrapper function _exit around a call to hs_exit which cleans up anything necessary for the Haskell FFI.

You'll notice the funny __attribute__ annotations. This is a lesser-known feature permitted by GCC in C code to have the compiler enforce properties about a function. Declaring a function to be a constructor means that it should be called the moment the library that it is compiled into is loaded, and similarly the label of destructor means it should be called prior to when the library is unloaded, usually at program termination.

This is very nice because it means that when we compile our Haskell shared library, we will not need to write manual C code that calls hs_init or hs_exit in the right places; it will automatically be a part of loading the library, and the compiler will ensure it is created such that the functions are called at the appropriate times. Here are some other tips about __attribute__ constructors.

So with all of that out of the way, we can compile our Haskell code. First we prepare our helper C module, then compile and link the Haskell code. Note that the final compiler option is sensitive to the GHC version you are using. Also be sure to use proper naming conventions for the output .so file you generate.
ghc -O2 -fPIC -c hswrapper.c
ghc -O2 -dynamic -shared -fPIC -o libHSTest.so Test.hs / hswrapper.o -lHSrts-ghc7.6.3
At this point, you will have generated the first of 2 shared libraries, libHSTest.so.

I learned the hard way that it is unpleasant if you forget to include the foreign export signature and instead just write a normal Haskell type signature. Here is what happens. When you compile such a module into a shared library, Haskell must do some name mangling to account for the fact that the addOne function will reside in the Test namespace, and to resolve conflicts with other functions potentially named addOne. So, in the resulting shared library file, there will be no symbol produced with the name "addOne" -- it will be variations on things like "Test_addOne" and you can directly inspect this by using the Unix command line tool nm.

I encourage you to try this. Rewrite the above Haskell function using a regular Haskell type signature, then follow the same steps to compile it. Use nm to inspect the resulting .so file and you will see that the name "addOne" doesn't appear. If you mistakenly believed that "addOne" would appear, and you externally defined it in the downstream C code, then you're likely to start getting mysterious "symbol not found" or "undefined symbol" errors. This happened to me and using nm to discover that in fact the symbol truly was undefined, and not merely missing due to some incorrect or forgotten compiler argument for the correct library path, helped me realize I had forgotten the necessary foreign export statement in the source Haskell code! This was actually quite instructive for helping me better understand multi-step linking with these compilers.

It's also worthwhile for us to take a short detour to visit the Python ctypes module, which is a part of the standard library that facilitates type marshaling to C types and interfacing with raw shared libraries. There is the convenient class ctypes.CDLL which can be given a path to our compiled Haskell library (since our function was exported as a C function) and can already access the function directly. Here's an example in IPython just from what we've already done:
In [1]: import ctypes

In [2]: foo = ctypes.CDLL("libHSTest.so")

In [3]: foo.addOne(1)
Out[3]: 2
So, we're like done, right? Well, no.

For one thing, if this was the only way we ever exposed our Haskell code, we'd never have any hope of interfacing Python with more advanced data structures in Haskell, or vice versa, because we'd never have access to an intermediate C representation where we could actually manipulate those things.

But more importantly this:
In [4]: foo.addOne("whoops")
Out[4]: 498650485
Since Python is dynamically typed, and nothing anywhere in this code has done any work to deal with Python exceptions based upon type errors, we're at the mercy of our C implementation as far as what this means. So it's certainly not safe, and that's practically half the reason we would have wanted to write something in Haskell in the first place!

So while it may be tempting to bypass what seems like an annoying amount of work explained below, there are good reasons why in general it's not just as simple as compiling some Haskell code into a shared library and using ctypes.

Part Two: Compile Cython Extension Code

Now that we have a Haskell shared library complete with the necessary library entry and exit functions, the goal is expose it in Python using Cython. For this approach, I need a header file that will provide the actual extern prototype for the Haskell function to be used in Python.

File: Test.h
extern int addOne(int);
That's it! Of course it would be more complicated with many more functions and if the C-level interface needs to do more than just wrap things.

Our final goal is to write a function that will be accessible from Python and which invokes the addOne function for our compiled Haskell module. This is a perfect task for Cython, shown below in our Cython implementation file:

File: add_one.pyx
cdef extern from "Test.h":
    int addOne(int)

def py_addOne(int i):
    return addOne(i)
All we need to do here is to let Cython know about the external definition of any of the C wrapper functions we want to use. After that, we are free to invoke such functions anywhere we'd like within the Cython module we are creating, even if it is inside the function definition of a pure Python function, such as in this case with py_addOne.

Though this example is contrived, it highlights the opportunity to add whatever kinds of data type marshaling between Python types and C types that is required to make use of the underlying shared library. This gives considerable flexibility in terms of the range of functions you could write in Haskell and make use of in Python without needing to fiddle with the C-level Haskell API.

To turn our Cython implementation file into a Python extension module, we will use the build tool that Cython is designed for: distutils. We will place instructions about how Cython should build the extension module inside of a file called setup.py.

File: setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext


This is a short script to tell Cython what kind of extension compilation to perform. We import setup which is the workhorse function of distutils: it will set up properties according to options we provide and it will also parse command line arguments when this setup script is executed with the Python runtime. Depending upon our configuration, the parsed command line arguments will cause our desired build process to occur.

At the bottom of the script we see the call to setup: an (arbitrary) name is given, something called build_ext from Cython is put into a dictionary as a value with a key of the same name, and the extension modules defined above are passed as the ext_modules argument.

The cmdclass argument allows for the keys of the dictionary to be discovered as command line arguments, upon which the provided function (in our case, Cython's build_ext callable) will be invoked.

From distutils we also use the Extension class. We give it a name "hstest" which will be the Python module's name when all is said and done. We tell it about our Cython source file, and we tell it to look in the current directory for a library named "HSTest" (it is assumed that the library name obeys standard conventions, so that a name of "HSTest" implies it will search for "libHSTest.so").

All that remains is to run this setup script through the Python runtime. I will use the --inplace option so that this build occurs in the current directory.
ely@computer:~$ python setup.py build_ext --inplace
running build_ext
cythoning add_one.pyx to add_one.c
building 'hstest' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ely/anaconda/include/python2.7 -c add_one.c -o build/temp.linux-x86_64-2.7/add_one.o
gcc -pthread -shared -L/home/ely/anaconda/lib -Wl,-rpath=/home/ely/anaconda/lib,--no-as-needed build/temp.linux-x86_64-2.7/add_one.o -L. -L/home/ely/anaconda/lib -lHSTest -lpython2.7 -o /home/ely/programming/c_examples/hstest.so
First you can see that Cython generates add_one.c on our behalf. This is an ugly file. It's over 1400 lines long! Most of that is spent on meticulous declarations for Python types and automatic type marshaling to C types. A lot of it is also spent on the necessary CPython API overhead required for things like a pure Python function call. For example, this is the generated prototype for our py_addOne function (which appears on or near line 533):
static PyObject *__pyx_pw_5hstest_1py_addOne(PyObject *__pyx_self, PyObject *__pyx_arg_i); /*proto*/ 
You can also see the GCC commands generated by Cython, including things like -L. (because we specified '.' in the library_dirs argument to our Extension definition in setup.py) and some various paths that are preconfigured because of my Anaconda Python installation.

When this step completes, you will have the final shared library, hstest.so (so named because we supplied the name "hstest" to our setup Extension object). Cython will have ensured that this module is ready to be imported from Python.

If you encounter issues on import, remember to check a few things. First, ensure that you properly included foreign export ccall in the Haskell code, for without it, we would never define the necessary symbol that everyone is expecting in our shared library. Further make sure your compiler options match up with your GHC version and that you build and link with the hswrapper.c helper module that provides the initialization and teardown functions required. Lastly, make sure that the two shared libraries (both hstest.so and libHSTest.so) exist in directories where Python and any dynamic library loaders know to look (and add directories to LD_LIBRARY_PATH if needed).

Once we're set with hstest.so, we can give it a spin, as in IPython below:
In [1]: import hstest

In [2]: hstest.py_addOne(1)
Out[2]: 2

In [3]: hstest.py_addOne(-1)
Out[3]: 0

In [4]: hstest.py_addOne(0)
Out[4]: 1

In [5]: hstest.py_addOne(2**20000)
OverflowError                             Traceback (most recent call last)
<ipython-input-5-46452871be93> in <module>()
----> 1 hstest.py_addOne(2**20000)

ython.pyx in hstest.py_addOne (add_one.c:568)()

OverflowError: Python int too large to convert to C long

In [6]: hstest.py_addOne("whoops")
TypeError                                 Traceback (most recent call last)
<ipython-input-6-2a080d2f857b> in <module>()
----> 1 hstest.py_addOne("whoops")

ython.pyx in hstest.py_addOne (add_one.c:568)()

TypeError: an integer is required

Whew! Now we can scratch off from our bucket list the glorious occasion of using a Python extension module to call a Haskell function to verify that 1+1=2! In particular, notice how Cython's static typing capabilities automatically turn into TypeError exceptions in dynamic Python.

Part Three: What Might You Really Do With This?

What? You're not satisfied with just adding 1 to things? That's all that Kurt Gödel needed and look what he was able to do! But seriously, if all we can do is simple arithmetic, then what is the use of this, especially given the multi-step compilation, link, and build overhead we have to manage?

Well, even if we were forced to use only very primitive types in our Haskell code, like CInt, we would still get lots of the benefits of writing in Haskell. We would get type safety guarantees and the use of an optimized compiler for catching lots of silly bugs prior to spending run time on them. We can organize our programs functionally and make purity guarantees. Another nifty trick would be to write tail-call optimized recursive functions in Haskell, and then use them in Python when we want to write recursively but don't want to deal with Python's recursion depth limits.

On top of that, one simple observation is that we are very free to include whatever kinds of Haskell functions we want in our Haskell compiled shared library. They don't have to accept only lightweight FFI argument types. Instead, we can just write a select few functions, akin to a public API or interface, which we promise to write using only C types. Internally, those functions may then make use of whatever other Haskell functions we want, so long as by the end of the computation they are able to provide the C type needed as output. So this limitation is not as severe as it seems at first glance.

Earlier I had mentioned that it can be viewed somewhat as a design choice that C integers were chosen for our Haskell function. I say this because another option is to instead use regular Haskell data types and we can choose to deal with data type concerns at the C level instead of the Haskell level.

In this case, it might mean using a Haskell Int in the Haskell code, and then marshaling data from the Haskell API type HsInt to a C integer in the C code, effectively pushing the work of type marshaling down to our C-level shared library. A benefit is that we can be pretty much unrestricted in what we write in Haskell. A huge downside is that we have to be willing to write lots of C code to accommodate Haskell-specific data structures in our C wrapper code. Python doesn't provide a direct mapping to Haskell data types, so one way or another you have to move from Python data types to C data types, and from Haskell data types to C data types.

In such a situation, you might even involve an additional round of shared library compilation: creating an intermediate C source file that will have its own extern inclusions from the compiled Haskell library. Inside this C source code, you would have access to both the functions from precompiled Haskell and the Haskell foreign API, so you could write more complicated C wrappers that do a lot more work with Haskell type marshaling. At the end, after compiling that intermediate C library, you could then use Cython to generate an extension wrapper for the C library instead of directly for the Haskell library as we have done here. While this would involve more pain managing additional compilation steps, the opportunity to do intermediate work in C could add flexibility required to interface with complicated Haskell code, ultimately allowing you to write more and more of the overall application in Haskell, which could save time in the long run.

Which way of doing this will be best for you? It likely depends on your application and your comfort level with very low level type marshaling in C. I suggest that a good default attitude is to try to write Haskell functions that accept and return simple C types, even if these functions internally call more sophisticated Haskell functions. Make the simple C-typed Haskell functions serve as your "public API" and if you run into a wall where this limitation prevents you from doing what you must do, then consider whether it will be better to deal with type marshaling in the C code.

But it is an interesting question, perhaps for a future post:  how to make use of more advanced Haskell structures, like polymorphic functions or data constructors, such that they can be meaningfully used and integrated with Python data types when calling from Python?

Friday, May 15, 2015

Why Hire? Underemployment, Autonomy, and Corporate Culture.

Following up on my last post about subordinating compromises, I will share some thoughts about what motivates people to make hiring choices. A lot of this is just my interpretation of experiences that I have had coupled with some of the ideas from principled studies of bureaucracy, such as Moral Mazes. It comes from my perspective as a job candidate with a high aptitude in some hotly demanded skill areas as well as education and work experience that is generally considered "prestigious." At the end, I add an explanation that in some attempts to bring all of these alternatives (except for number 6) under a single umbrella.

So why hire?

1. You are merely a fan of my skill set. You see that in some potential future, such a skill set could possibly add value. You see that having this skill set is a mark of credential or possibly domain-specific trendiness and you want your team or organization to be viewed as "with it." Though you don't have any work for me to do that will exercise this skill set, you like thinking about me as a "latent" resource, waiting to spring forth with all sorts of innovative value creation at the moment that changing political tides or market conditions will allow it (which predictably never comes).

2. You have halo bias about all of the soft skills that this role will require. Because you are a fan of my skill set or otherwise view my credentials and interview performance as impressive, and maybe even you like me, you will make biased inferences about my simulated behaviors and reactions to certain aspects of the job. You will infer that I will not become frustrated. You will infer that, of course, I will "just do what I am told" with no regard for the way my aptitudes and goals match up with what I am told to do. You will infer that regardless of how wildly inappropriate a task might be compared with my skill set, that I will happily just "find a way to get it done." You will infer that the rampant political issues won't bother me, either because you think I will happily accept "junior" status and will somehow lobotomize away any critical thinking skill as they might apply to the political situation I walk into, or because you think that I value wage more highly than dysfunctionality-avoidance, again, due to the halo effect.

3. You are desperate to fill a seat. You're fighting your own political battles and many of them are based on attrition and headcount. Perhaps some of your yearly compensation incentives are based on building a team and the clock is ticking. Pretty much anyone will do as long as they meet some bare bones requirements that make the employment offer appear defensible on paper. You haven't given any thought at all to the impedance mismatch between what I am capable of and what the job will actually require. Nor do you care. You need to say whatever it takes to get my ass in the chair.

4. You are not knowledgeable about the domain-specific requirements of the position. Mostly you evaluate "personal fit" and "culture" or "team player" attributes in a candidate. You look for signs of impressiveness and credential on a CV. You have no idea whether my skills will be used in the job, and it may even be a significant surprise to you that there exists variety between people in my skill area, that we aren't all fungible, and that we may resent being placed in a job that is fundamentally different than what we set out to accomplish. Any discontent I display after being hired will be a surprise to you. You won't be situated to evaluate the domain-based merit of my claims, so you will default to believing that it is a problem with me -- that I am "not a team player" or "not a good fit" or some other HR-approved catch-all buzzword escape valve that lets you continue living in a snow globe of misunderstanding about the makeup of a domain expert.

5. You are fully aware of what you're doing and have ulterior motives or just plain don't care. You plan to bait-and-switch me by selling me a job completely different than the real work I'll be asked to do. You hope you can gain some leverage on me in the meantime that requires me to stay in the job. You might even look for this in my personal characteristics: do I have children, student loans, or a mortgage that might imply financial needs and thereby a need to endure workplace bullshit to service those needs? If I don't, you may very well not hire me because you don't forecast an ability to get leverage. You may try to see if I am motivated by prestige, by eventual high-level promotions, by level-grinding may way to a private office, by attending annual conferences, or whatever other carrots you might be able to dangle.

6. Rarest of all: you have an adequate understanding of the domain expertise demanded by the duties of the job and you are trying to locate a candidate with specifically the right skills and experiences to meet the demands. You're not looking to play games. You're not looking to task me with unrelated or menial work. You know what work needs to be done and you have a real plan for mapping that work onto a candidate's qualifications. You hope that I will be a good cultural fit and you are prepared to make sacrifices, change policies, or provide accommodations if necessary. But you also realize that because you are hiring for the purposes of matching up a domain-expertise need with a candidate's domain knowledge, you don't get to be picky about enforcing your fluffy HR-approved notions of "cultural fit" or "team player" -- you have to collaborate with me to determine if those things will work out. You don't get to dictate them and because you actually care about solving the domain-specific problem, you don't want to dictate them either.

I admit that items 1-5 are written with an angry tone, yet they are accurate depictions of motivations out in the wild. In any real hiring scenario, the motives are likely to be combinations of the various options above. Even in good scenarios when it is mostly number 6 that dominates, there can still be elements of the others items, and it's not always bad or irrational that this is so.

Yet when items 1-5 dominate the picture, which from my own experience, from the testimony of others, and from academic studies of this sort of thing is by all accounts the overwhelmingly dominant case, it creates a very toxic environment -- and truly it's only survivable in the long-run if you are happy to engage in those subordinating compromises I mentioned earlier.

So it might be useful to try to understand the confluence of items 1-5 more systematically, and I believe that the concepts of overqualification and underemployment (specifically underutilization of skill) can help with exactly that.

Basically, if you strip away all of my loaded language and try to see it not as malicious, ignorant, or political, these kinds of hiring problems are at their root an issue of underemployment, except possibly the case when any employee will do to fill a seat, as in item 3. In the other cases, a hiring manager is seeking someone overqualified for the specific duties that await them in the role.

Why should this be a bad thing? In fact some argue it is not. Even just a cursory Google search for overqualification brought up a link to a prominent Harvard Business Review article, The Myth of the Overqualified Worker. The article is pretty weak, but illustrates a pervasive kind of rationalization that managers really want to make. Basing its conclusions on some cursory and poorly controlled research publications, the article says things like, "In addition to achieving higher performance, these cognitively overqualified employees were less likely than others to quit. The researchers point out that many overqualified workers stay put for lifestyle reasons, such as the hours or the company’s values."

It perpetuates the idea that managers want to hear: overqualified candidates will more assuredly produce the baseline amount of labor output necessary for the role. The worry, that they will become discontented with the lack of learning or growth opportunity in the role, is soothed away by arguing that these folks are motivated by other factors, exactly the subordinating compromises that I keep incessantly bringing up.

The HBR article goes on to give a perfunctory nod to a factor that I believe plays a huge role in this issue: autonomy. For instance, the article continues,

"Berrin Erdogan and Talya N. Bauer of Portland State University in Oregon found that overqualified workers’ feelings of dissatisfaction can be dissipated by giving them autonomy in decision making. At stores where employees didn’t feel empowered, “overeducated” workers expressed greater dissatisfaction than their colleagues did and were more likely to state an intention to quit. But that difference vanished where self-reported autonomy was high."

This is backed up by some heavier research too. Generally, this type of work has focused on studying heteronomous goals (goals expected of you from others) versus autonomous goals (goals you choose for yourself). One branch of this theory is called Self-determination Theory (SDT) and one research paper from this approach is On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being, by Ryan and Deci, 2001.

Here are some select quotes (see the original paper for the citations; SWB stands for subjective well-being):

"Another actively researched issue concerns how autonomous one is in pursuing goals. SDT in particular has taken a strong stand on this by proposing that only self-endorsed goals will enhance well-being, so pursuit of heteronomous goals, even when done efficaciously, will not. The relative autonomy of personal goals has, accordingly, been shown repeatedly to be predictive of well-being outcomes controlling for goal efficacy at both between-person and within-person levels of analysis (Ryan & Deci 2000). Interestingly this pattern of findings has been supported in cross-cultural research, suggesting that the relative autonomy of one’s pursuits matters whether one is collectivistic or individualistic, male or female (e.g. V Chirkov & RM Ryan 2001; Hayamizu 1997, Vallerand 1997)."

"Sheldon & Elliot (1999) developed a self-concordance model of how autonomy relates to well-being. Self-concordant goals are those that fulfill basic needs and are aligned with one’s true self. These goals are well-internalized and therefore autonomous, and they emanate from intrinsic or identified motivations. Goals that are not self-concordant encompass external or introjected motivation, and are either unrelated or indirectly related to need fulfillment. Sheldon & Elliot found that, although goal attainment in itself was associated with greater well-being, this effect was significantly weaker when the attained goals were not self-concordant. People who attained more self-concordant goals had more need-satisfying experiences, and this greater need satisfaction was predictive of greater SWB. Similarly, Sheldon & Kasser (1998) studied progress toward goals in a longitudinal design, finding that goal progress was associated with enhanced SWB and lower symptoms of depression. However, the impact of goal progress was again moderated by goal concordance. Goals that were poorly integrated to the self, whose focus was not related to basic psychological needs, conveyed less SWB benefits, even when achieved."

Another research paper, If money does not make you happy, consider time, by Aaker, Rudd, and Mogilner, 2011, puts it like this:

"... having spare time and perceiving control over how to spend that time (i.e. discretionary time) has been shown to have a strong and consistent effect on life satisfaction and happiness, even controlling for the actual amount of free time one has (Eriksson, Rice, & Goodin, 2007; Goodin, Rice, Parpo, & Eriksson, 2008)."

"Therefore, increase your discretionary time, even if it requires monetary resources. And if you can't afford to, focus on the present moment, breathe more slowly, and spend the little time that you have in meaningful ways."

(Both of these are part of a much larger review article at LessWrong, covered in the section on the relationship between work and happiness. That whole article is highly worthwhile.)

This can be a disaster in highly specialized jobs, however, because such jobs tend to be extremely demanding of both personal time sacrifices and on-the-job autonomy sacrifices. My experiences have been in the technology and financial sectors and in these places, it's bad. It's arguably even worse in start-ups unless you are sitting at the top of the start-up and personally feel that all of the necessary tasks for growing the business are aligned with your autonomous goals. This is why some start-ups obsess over locating employees who deeply resonate with the company's ethos and purpose. It's not because they want to create a cult of their company (although that does happen), and it's not purely because they want to rip off unsuspecting employees who incorrectly forecast that their enjoyment of the company will compensate them for the reduced salary that the start-up will pay them. It's also because it would be death for the company if they hire a lot of people who are highly skilled, and who need autonomous goals or lots of personal time in order to be happy, and cannot provide them with either. Some start-ups have begun going in the other direction, and trying out things like unlimited (or even mandatory) vacation, since the supply of workers who just so happen to deeply resonate with a particular business idea is necessarily scarce. The success of these kinds of discretionary time approaches seems mixed.

In the end, this is why these underemployment traps are so debilitating and why they often entail above market wages, bonuses, or other compensation benefits: the company believes they are obtaining less volatile, surplus labor, but they have little freedom in allowing the worker to have autonomy, and the nature of the job requires long working hours without much personal time. The job itself often leaves an employee exhausted and without the necessary energy to use limited personal time to undertake the restorative autonomous goal achievement they need to be healthy.

Prolonged states of this surely lead to burnout.

Saddest of all is that, like many things, there is a blame-the-victim culture in this issue. Since not everyone is underemployed or overqualified, and some workers happen to have jobs which afford them adequate free time and energy to pursue autonomous goals outside of work, and higher-level decision makers in a firm often have the most freedom to pursue work-based autonomous goals, it creates a very dangerous in-group versus out-group mentality.

On one side, you have the higher-ups who can access freedom at work, and you have the workers who are happy making subordinating compromises to obey heteronomous goals while at work because they are satisfied with autonomous goals outside of work. Together this collection forms a large group of people who characterizes itself by "being able to get shit done" and "just doing what needs to be done" at work. They view their fortunate ability to not feel cognitively distressed by the lack of work autonomy as their own virtue, earned through their efforts to endure work, rather than considering whether it could just be a lucky coincidence that they have other ways of obtaining the needed autonomous goal achievement to be happy.

On the other side, you have overqualified / underemployed people who for whatever reasons are not able to engage in autonomous goals at work, and whose jobs place such a strain on their discretionary time that they also cannot get autonomous goal satisfaction outside of work, and any potential compensation increases they are paid for this arrangement don't provide them with replacement satisfaction that enables their cognitive health in the circumstance. Take me, for example. The autonomous goals that I want to achieve are all about writing quality-focused scientific software to solve worthwhile applied problems. If I have to write crappy software to solve worthless problems while at work, in a demanding and long-hour job, then I will not have the time, energy, or impetus to even try to pursue the necessary autonomous goals in my personal time. So there is nothing that any workplace can do for me to help with my cognitive health and job satisfaction except provide me with opportunities to write the sort of scientific software that my autonomous goals draw me towards. Raises, bonuses, promotions, lots of vacation, etc., all won't work. Which makes me a villain (or perhaps a whiny, entitled brat) in the eyes of most bureaucratic managers.

As with so many other majority/minority issues, especially when stigmas of cognitive health are involved, the maligned, minority group is used as a scapegoat and vilified for the suffering they must endure. The problem is offloaded from the majority group, so that they need not feel any stress about helping to find a solution, and HR codewords can be created, such as "not a good fit" or "not a team player" that let tightly-wound business managers wrap the issue up neatly in some foil and place it in the trash can like the Anal Retentive Chef :)

The introvert / extrovert spectrum is another great example of this divide, manifested in the prevalence of open-plan offices and vilification of naturally-introverted folks who cannot function normally in such offices. It's not enough to merely fail to provide reasonable accommodations, even productivity-boosting accommodations that are in the business's interests: the in-group has to go further and label the vocal minority as whiny, complacent, or entitled. It can often result in unhealthy workplace gaslighting where you are made to feel like you are the crazy, problem person for having a sane reaction to insane conditions.

I can't draw any useful conclusions other than to point out what a destructive long-term force this type of phenomenon is. Over time, it drives organizations to monoculture. People who express very natural and healthy tendencies, such as a desire to either work on autonomous goals while at work, or else to have enough discretionary time to feel satisfied with autonomous goals outside of work, or people who express natural inclinations, such as an introvert's natural inclination to be more productive in a highly private environment, are punished and weeded out over time. The corporate population converges to a large, dominant in-group made up of people who are willing to subordinate their own urges for the sake of the company, with all kinds of unpleasant side-effects regarding their career motivations, their aptitudes in the actual domain-specific business area, and the prevailing culture of the workplace.

That is the state of affairs in modern first-world employment. A hiring process that seeks to underemploy people tends to produce cultural environments where only those who are happy to find another way to satisfy autonomous needs, or who can compromise those needs away, can achieve the corporate, HR-approved definitions of success. If you are so arranged internally that you cannot get rid of your itch for autonomous goals, and if your job doesn't leave you with enough discretionary time or energy to do it outside of work, then you are a Bad Guy, a toxic, uncooperative whiner that the bureaucratic system will not attempt to accommodate. Your labor productivity, however great it may be, just doesn't matter next to your organizational fealty.