Sunday, October 9, 2016

Notes on Bumble

1. Very few people bother to write anything at all. Among those who do write something, most write vapid lists of locations to which they've traveled or ineffectual attempts to be cute/quirky with an indecipherable smattering of emoji. Among people who write an actual message, the phrases "vegetarians swipe left", "no hook-ups or pen pals", and some variant of "the only thing I value in life is pizza" are frustratingly common.

2. A huge fraction, possibly even a majority, of women post photos in exotic locations, either as some kind of poorly disguised wealth-based status signal, or as some kind of desperate look-how-cultured-I-am gasp. It almost always comes off as complacent and entitled. To similar effect they also post a lot of photos of extreme skiing, surfing, skydiving, appearing on fancy boats, photos with celebrities, rock climbing, and photos holding up large fish they've caught (not kidding). Not just regular versions of one or two of these, but six pro-quality photos that are each more extreme than the last. Many profiles don't even include a single regular down to Earth photo of the person in any plausibly reasonable situation.

3. Based on what I can find online, the Bumble algorithm is still fundamentally driven by men. Men on these sites tend not to be as selective as women and so a given man on the site has probably swiped right much more than a given woman. So basically you can think of a given woman on the site as a large stack of already-existing right-swipes from men who have visited her before. Those people form a queue from which profiles will be selected for her to see when she uses Bumble. For an attractive female user, that queue has probably hundreds or even thousands of men already in it. Unless you pay for some kind of extra feature, like LinkedIn premium, that puts you closer to the front, you can forget it. Swiping right is every bit as ineffectual as sending a lazy "wuts up" message in OKCupid -- in fact they are functionally identical given that many female OKC users disable notifications and don't bother checking many messages. Basically, the point is that for a man, swiping right on someone who hasn't already been presented with your profile and made her swipe decision in advance is just a waste of time. Unless her queue of swipe-righters is small (it isn't, since perpetually swiping right in Bumble just replaces the equivalent previous inundation of women with bad messages in other apps) she won't be presented with your profile for such a long time that neither of you will care.

This is not just a complaint from the male perspective either. This is bad for women too. This is why I don't understand why more women don't write a substantive bio and use somewhat more realistic photos. If you present yourself as an airbrushed travel goddess with no bio, you're just creating a larger queue of always-swipe-right types that you'll have to nauseatingly swipe left through to get to any good matches.

Basically, Bumble is an exercise in proving that the general public, at least between the ages of about 25 and 40, is terrible at game theory and exceedingly unrealistic, vain, and entitled.

Wednesday, September 28, 2016

Yes yes

Friday, August 26, 2016

Mindspace and Geospace

One of the most foolish things I've ever done was to pour my heart out and declare strong emotions for a girl that I barely knew who lived far away. In one sense I feel proud of how unabashedly I said the most foolish things. It showed me a capacity for feeling that I didn't know I had. But at the same time, the poor girl had no choice but to let me down and ask me to back off.

One of the things she said to me, with all of the reasonableness in the world, is that she wanted to keep our relationship strictly a friendship, at least in part due to a great geographic distance between us.

Now this is imminently reasonable. No one could fault her for feeling this way. It's entirely human. But I began to wonder why I didn't feel the same way. In fact, I had been part of a long-term, long-distance relationship earlier in my life: living in Indiana while the girl I was madly in love with at the time lived in Colorado. That relationship lasted three years: the first 18 months between Colorado and Indiana and the final year also spent often about an hour apart on the East Coast. The relationship had many wonderful parts, but ultimately didn't work out. Yet neither I nor my partner ever felt significantly bothered by the geographical distance.

Why is that? Are some people oriented such that long-distance relationships don't bother them? Am I such a person? I started to ask myself a lot of these questions and the thing I came up with is that really there are many ways to be distant from someone else. When living in Indiana, there has been a large collection of Amish people living nearby. But really they couldn't have been farther away. I speed past their horse-drawn buggies blasting Arcade Fire over a bluetooth connection from a device in the palm of my hand that contains more sophisticated technology than the entirety of that particular Amish commune.

So that's cultural distance and of course we probably all agree that relationships don't tend to blossom between people who are culturally very distant. I think there is a bit more to it than culture. I came up with the label "mindspace" to represent the vast mental space where things like culture, sense of humor, personality style, and other mental characteristics live. The contrast would be "geospace" -- the space of physical bodies, oceans, miles of prairie, highways, sky, and physical distance.

What is bigger, mindspace or geospace? I argue that mindspace is so much bigger than geospace that it makes geospace almost entirely irrelevant -- as long as physical communication is still possible (so that two minds can exchange info). Geospace feels big to us because it's what we're adapted to care about. Geospace meant life and death on the savanna. Distance to the river. Distance back to the tribe for help. Distance to scramble up that tree. Deep in our ancient meat software we've got programs dedicated to the awe of geospace distances.

But we're far worse at understanding mindspace distances. We approximate them with memes, like copying popular trends or expressing allegiance to (or ironic disregard for) the prevailing political coalitions in our various communities. These are frankly small potatoes. There are so many variations on human mentality and so many different directions to engender that variety: openness to new experiences, sexuality, kindness, response to stressful situations, creativity, worldview, emphasis on faith, intuition. The distances between two people on even just one of these scales can be much larger than the visceral meaning of their physical distance.

What I realized is that in my mind, I automatically evaluate potential relationship partners almost totally in terms of mindspace distance. Do they live far away from me in terms of the aspects of their mental landscape? The feeling of love at first sight for me is a reflection of instantly recognizing characteristics that make someone close to me in mindspace. Their geospace distance basically doesn't even factor in at all. It's just too insignificant.

It has helped me to understand this about myself because not everyone feels this way. Even people whom I immediately recognize as very close to me in mindspace don't necessarily put mindspace distance above geospace distance. And in fact many people subscribe to prevailing wisdom that you should make something like a personal rule: don't even evaluate the mindspace distance between you and someone else unless that person already has a small geospace distance. Basically, imagine a geospace radius around yourself. Everyone inside that radius is allowed to be considered for mindspace suitability. Everyone outside: tough luck.

This is alien to me ... because out of the total distance (mindspace and geospace), geospace distance is perhaps the least meaningful part. To put it another way: if you see someone outside of your geospace radius who is nonetheless amazingly close to you in mindspace, their mindspace-closeness is such an amazing rarity that you should be falling all over yourself to establish a strong connection. Mindspace-closeness is so, so precious and rare that when you experience it in another person, it's not unreasonable to suddenly feel like you need to drop everything to secure it.

This last realization led to a further discovery: the weirder you are in mindspace, the more you will emphasize mindspace-closeness above geospace-closeness. If your mental characteristics differ from most of the people you meet, then by definition geospace-closeness isn't helpful to you for finding a partner. So naturally, you'll stop counting physical proximity so highly. On the other hand, if your mental characteristics are more or less the same as just about everyone you meet, then you won't believe that mindspace-closeness is as exceptional as it really is -- being mentally close to others will be a common experience for you.

This explains why the radius rule I described earlier is such a common belief. If most people are near some center of the distribution of mental traits among their community, then that large fraction of the population will be well-served by having a geographic closeness prefilter. And people on the fringes between average and weird will want to gravitate towards average to fit in, so they'll also endorse and adopt that sort of rule, and it will be venerated as relationship common sense or pragmatism.

Meanwhile, if your mental properties make you a weird enough human -- far enough away from the "normal" center such that you don't even fit in on the fringes -- you won't be helped much by the radius rule, and your passionate connections will reside more in mindspace than in geospace.

Like with many things, my way of looking at it (mindspace matters much, much more than geospace) seems uncommon, and is actually treated with some degree of hostility by folks who are pretty committed to valuing close geographic proximity for in-group reasons. So I guess weird folks have to fake all the talk and verbally reinforce the idea that geospace distance is a hugely important first filter while suffering that search for rare and precious mindspace neighbors.

Sunday, August 14, 2016

Your heart

Whenever your heart must tell you something you must take it and place it in scare quotes and set it in italics and paste it into a comment box and then type yeah right or lolwut and click post.

Tuesday, August 9, 2016

Garbage Can Regressions -- Back to Some Regularly Scheduled Programming

On the wonderful tech blog The Green Place there is a recent introductory article about linear regression. It's a great introduction to the topic and treats the different mathematical approaches to actually solving for a linear regression model fit in a well-rounded manner -- highly reminiscent of the famous lecture notes from Andrew Ng.

But one issue with this kind of popular treatment of linear regression is that it tacitly endorses linear regression as a sort of Swiss army knife modeling tool. There is an Indiana Jones vibe about linear regression: it just somehow gets the job done. Even if the situation is messy and complex and linearity is obviously just a convenient fiction, regression will supposedly crack its whip around that deus ex machina wooden post and swing you safely across the chasm before the rocks cave in on you.

I was delighted to see a commenter pick up on this in a forum where the regression post was being discussed:
"There is one conceptually simple issue often missing in such nicely presented write-ups (and which appears to be missing here): error in the abscissa ('x-axis') values. Things like time-series tend to dominate such analytics and in such data collections it's typically assumed that the time-stamp data is of suitable accuracy that any error there can be neglected. But there are many other data sources where there is notable error in both 'x' and 'y' data for which commonly employed linear regression doesn't allow, (quick example: my friend the hydrologist collects flow rates in rivers at sample transverse distances which are hard to be sure of as one is dangling above the raging waters). As a respectable starting point for regression which allows for error in both axes i'd recommend Deming regression: < https://en.wikipedia.org/wiki/Deming_regression >"

One special case of this general effect is non-linear coding error, especially in cases when it ends up being the level of a covariate (i.e. the log of the covariate) that matters for causal inference, or when some covariates are categorical or isotonic.

The paper Let's Put Garbage Can Regressions and Garbage Can Probits Where They Belong by Christopher Achen is a great discussion about some particular properties of this, and the tacit assumptions used to ignore it.

In that paper it's demonstrated that with just a tiny bit of coding error in the covariates you can end up with a fitted regression coefficient that is statistically significant and has the wrong sign -- even when there is no noise whatsoever in the target variable (i.e. you can set up a toy example in which the target variable is synthetically generated as a true linear function of two covariates with positive coefficients, then perform a slight non-linear distortion on one of the covariates, regress the synthetic target variable on the clean covariate and the distorted covariate, and get wildly incorrect coefficients that appear to be statistically significant).

People seem to think these toy example are some kind of alien phenomenon that could never happen with real-world data, but the paper is very explicit in the construction of the example data set. It's not harebrained or contrived, like Anscombe's Quartet or something. It's very much a plausible data set.

I think it's not hyperbolic at all to say that results like this more or less conclusively show that naive linear regression cannot be trusted. If you're careful with model validation, using randomized hold out data, lots of diagnostic plotting and sanity checking, then regression is a fine tool. But if you do something shocking like take two different univariate models with the same target, fit their regression coefficients, and then select the model with a more favorable t-stat as "the winner" then you are committing an egregious statistical fallacy that often, in real world situations, is giving you not just an inaccurate answer, but an answer pointing totally in the opposite direction of the truth.

What's frightening to me is that across many industries, even in places like high finance -- where "real money is on the line" -- it is extremely common to see huge business intelligence systems predicated entirely on this type of fallacious statistical approach with regression. Sadly, it's often because the regression approach was historically more tractable and the fallacies weren't as well known. And so as certain people gained more senior positions and sought to retain political control of the business tools that they oversaw, they grasped for convenient fictions like "interpretability" to justify their political choice to shun modern techniques.