Asst
Patterns in static

Some fluff, some info





navigational aids:
 





topics covered:

12 February 05. Apophenia

Just a little note that I've finally put up my library of stats functions for public consumption and modification.


Before you all lose interest entirely, the name means a tendency to see patterns in static, which is the fundamental human tendency which statistics aims to combat. That is, the intent of good statistics, as far as I'm concerned, is not to uncover facts that we as humans couldn't work out ourselves, but to invalidate all of those claims that we come up with all day long but which are just an overactive imagination at work. Using statistics to uncover patterns we hadn't seen before is called `data mining', which used to be a dirty word, used to accuse those whose papers you didn't like, but management types have grown fond of it and now hire people to do it. I was once invited to a data mining conference.

But back to me. You may recall that a few months ago, before the software patent book, I'd written half a book about doing statistics in C. It received resistance because (1) I'm not a great writer and (2) it rejected the Universal Truth that all statistics must be done using a statistics package. One publisher, I recall, was rather explicit about (2).

But I still work in C, and I still do statistics, and there are other people out there just like me, I just know it. My desires are supremely simple: I just want a good, reliable toolbox. I'm not a visual person, and rarely do exploratory stuff in the way of just meandering through a data set. Usually, I have a specific question and want a nice, precise answer. That is, I want to apply a specific tool to the data.

The cute user interface of the typical stats package is also wasted on me---and it should be wasted on you too. If a reader asks nicely, you should be able to send him/her/it instructions for replicating your procedure from raw data to little stars after the t-statistic in your paper, and that means eschewing the clicky buttons for a written script.

From there, it's just a question of the language one wants to write the script in, and ya know, C is the language for me. Have already blathered about this, but here's the executive summary: I'm f.ing tired of learning new languages. I was feeling as though every time I start a new project, my former favorite language couldn't handle it. `Oh, you have limited dependent variables? Then use Limdep. Maximizing subject to constraints? That's what GAMS is for. You wanna do lots of matrix operations? Then switch back to Matlab. Except we don't have a license for it here.'

The library approach means never having to switch languages again: you'll need to find some C functions for the new trick you're trying to pull, but the ugly syntax is exactly the same, and the environment in which you program doesn't change, and if you had some cool functions in the last program that you want to reuse, you can call them directly. All that language-specific knowledge about what's easy or hard, and where you need to be careful to not mis-state things only builds from project to project. You just have to learn a sufficiently versatile language that can handle anything that may come forth, like C.

This project's contribution: a library of statistical functions at the same level as the stats packages. That is, a function which does OLS, a function which does factor analysis, et cetera. The lower-level stuff, about shifting matrices about and drawing from Gaussian distributions and querying the data, is handled by other libraries, so we don't have to worry about that.

So, dear reader, next time you find that you need to do a new statistical analysis, and your current language du jour doesn't work, give C a try. Maybe the library already has the function you need, and you're done. Maybe it doesn't, in which case you can exert the effort you would have taken to learn a new language and write the necessary function, but then you can contribute it for future use by the rest of us.
[link][no comments]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


11 April 06. Anti-intellectual

[PDF version]

Pundit is a term from Hindi meaning “wise and learned man”, but it is usually used sarcastically in modern parlance. But, y'know, I don't feel so sarcastic about it. You can decide the “wise” part for yourself, but having spent a couple of years studying the narrow topic of subject matter expansion in patent law, I am confident describing myself as an authority. It's been months since I've heard a new argument on either side of the debate, and the new facts I'm learning are increasingly fine details. I don't feel any hubris when I say that nobody is going to blindside me on the tiny, narrow bit of subject that I have chosen for myself.

And ya know, most of the arguments that I have presented in various media and to various bigwigs over the last few months are arguments that numerous non-experts have also made.

I often run into people who divide academic results into two categories: (1) things anybody could have come up with after a bit of thought, and (2) things that are too esoteric to be worth anything. Some exceptions are made for chemists and engineers, whose work the commonsense folk have some sense is esoteric but will somehow eventually lead to new toys or a cure for something, but everybody else--the mathematicians who study tensors in R14 , the biologists who study odd tropical flora, and most importantly, the anthropologists and sociologists and economists who study people, whom we all study every day--are wasting their time and our money.

Findings
Nor is the righteous `my common sense trumps their PhDs' attitude restricted to the stereotypical hick. The back page of Harper's magazine, the page most magazines reserve for the humorous finale, is the Findings section, that lists a series of out-of-context study results. From the March 2006 issue: “...It was discovered that guppies experience menopause and that toxic waste in the Arctic was turning polar bears into hermaphrodites. ...A survey found that Americans are becoming less repulsed by the sight of obese people. Scientists launched a study to determine what sorts of clothing make a woman's bottom look too big. A study found that Americans are more miserable today than they were in 1991, and British researchers discovered that many young girls enjoy mutilating their Barbie dolls.”

OK, what are we to make of this? What message is being sent? Mashing together the studies means that the findings do not add up to any real image of the world, even if the page does categorize the findings for some sense of flow. Readers can't drop these tidbits into cocktail party conversation, because they only have one piece of information and so aren't armed for even the simplest follow-up. Interested readers can't learn more, because there are no citations. More importantly, there is no context: we are not given the reason for studying guppy reproductive systems, so we don't know why a scientist would care to do such a thing.

Being the back page, we know that it's supposed to be humorous, and with everything taken out of context, it can be, the way that so many statements out of context or in a different context are funny. But there's also the sense of laughing at the scientists. The subject of every sentence (but the passive-voice ones) is a researcher or a study or a survey. If the editors just wanted to list facts, they'd say “Americans are becoming less repulsed...” but instead they waste ink pointing out that “A study found that Americans are becoming less repulsed...”.

If there were an American Association Against Science, they would probably reprint the Findings page verbatim. The AAAS would ask, in big red letters, "Why are we spending money on this?" and the answer to why would not be anywhere to be found.

But you know that I spend all day studying obscure features of people's behavior and reading math books, so it's no surprise that I'm anti-anti-intellectual. It's no secret that if I had an anti-intellectual in the room here, I'd tell him or her (reading from Harper's again) “New data suggested that Uranus is more chaotic than was previously thought.”

[See, statements in a different context are downright hilarious!]

But it goes further than my kind of academic. The anti-intellectual sentiment--the insistence that it's either common sense or it's not worth the trouble--is a belief that there is no such thing as an expert. It is the myopic belief that if I don't know it, then there's nothing to know. As such, the anti-intellectual sentiment is often aimed at targets well far afield from intellectuals.

At the Baltimore Museum of Art, the same establishment that houses Picasso's Mother and Child, are such aggressively simple works of art as two silkscreen reprints of the Last Supper, and a curtain of blue and silver beads. Some readers will recognize the first as a work by Andy Warhol, and thus know the context: Mr. Warhol felt that the repetition and mutation of familiar images created new perspectives. For the second, as for a great deal of art that was clearly easy to execute, we don't know the context at all.1 But even though we don't know it, there is a context. The guy went to art school, has had a few focal ideas that drove all his work, and has done years of pieces that led to this simple bead curtain.

So what is an expert to do? One approach is to always stick to things that are obscure and look hard. Make sure that every study, every work of art, every essay says fuc* you, I'm an expert and you can't do what I do. But we value people who make it look effortless, whether they're figure skating, producing a painting, or running regressions. We always value simplicity, so if all it takes to get across the message is a curtain of beads, then why overcomplicate things to remind the viewer that it took years of work to get there? Some of the best guitarists out there never really ventured past four chords, while the guys who can play intricate solos are often dubbed wankers.

I'm glad I wrote my PhD thesis, and more generally love the idea of a thesis in general, including for high school seniors, BAs, or anywhere in between. A good thesis means that the author has become an expert in some tiny, irrelevant little corner of the world. Research ability by itself is valuable, and it's good practice for when the student needs to be an authority in something of more practical value, but it also gives the student an idea of what the other experts of the world have gone through to get to their simple ends. Remember that part in Zoo Story where the guy says that “sometimes it's necessary to go a long distance out of the way in order to come back a short distance correctly”? A student who has gone a long way in becoming an expert, and must then reduce that to the sort of ten second summaries that we all give to friends and family, will have a better understanding of the long distance that other experts have gone before they could string together simple words or beads or chords.


Footnotes

... all.1
Sorry, I can't help the art snobs in the audience with the guy's name. Enjoy being in the dark with me here.



[link][no comments]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


14 May 06. The Web as human network

[PDF version]

I'd like to discuss the question of how technology has changed personal relations. That'll come next time. For now, let's look at a specific, vaguely related question:does the link structure of the Net mirror the link structure of human networks?

Back when Alta Vista was the highest view in Internet search, a few IBM and Alta Vista researchers did a rather detailed study of the Web's structure (1). They, as with many others, found that the distribution of links on the Net looked a lot like the distribution of human links. There is a power law distribution where there are a few sites that are linked endlessly, and a long tail of sites that only have a few links.


Figure One: Junior high class photo. That's me on the far right.

To give an example of a power law, here is a graph based on data from junior high classes. The most popular student is on the X-axis at the far left (at X=0), and was nominated as a best friend by a mean of 9.75 other students (over 88 classrooms in the sample). Over on the other end of the X axis, the 25th through 35th ranked student in the classroom was nominated as a best friend by a mean of less than one other student. So you've got a few very well-connected students and a lot of students who have no connections at all.

We see this pattern in social networks of all scales, and among Web pages. The nomination count graph is typically a little more curvy than this one, with even more of a steep slope down from the most popular members of the group and a longer tail at the other end.

It sounds like the WWW as interpersonal network metaphor is working OK, but two caveats: first, there is much debate as to whether the best fit for the link distribution of the Web is a Negative Exponential, a Gamma, a Zipf, or a variety of other distributions that all look identical to a non-expert. Unless you hope to study this stuff seriously, you don't have to care about this caveat and can just call it a power law. The best fit to the student data is a Gamma distribution, by the way.

Second, human networks are pretty symmetric, in that there are few face-to-face contacts where one party is ignorant of the other. This is true of celebrities, whom we know but don't know us, but we can throw those out and have a reasonably symmetric set of acquaintance links. The popular kids may not want to hang out with the unpopular ones, but they know them nonetheless. But with Web pages, it happens all the time that a page makes no indication of what other pages are linking to it.


Figure Two: The Insidious Bowtie of Nyroth\ae{}nim, aka The Internet.

Broder et al found that this asymmetry occurs on a grand scale. They divide the Web into a giant Strongly Connected Component (SCC) comprising about a quarter of the Web; these are sites that interlink with each other. Then there's a quarter that only links in to the SCC but does not receive links. That would be blogs from losers like me. Then there's a quarter that is linked from the SCC but does not link to anything in particular, comprising corporate sites that just go in internal circles and things like online books and manual pages that are informative but not filled with links. The final quarter, they called <span class="airq">tendrils</span>, indicating a trail of limited links that doesn't readily fall into the first three categories. Thus, because a web page is not a person, the symmetry of human networks does not map to web links.

Another important distinction is that the whole small world game, where we try to find a chain of people from a guy in Katmandu to a guy in Omaha, does not work for the Net, because if you start on the right side of the bowtie, you can not get to the left side. For humans, you can almost certainly find a chain, and it'll be well under ten people in almost all cases; for the Net, you only have about a 25chance of being able to form a chain from any randomly selected site to any other randomly selected site. E.g., try getting from This haphazard site in Canada to this site here (hint: you can't). When you can form a chain, say from the in-feeding region to the SCC region, then it can still be hundreds of nodes long if one element is well-buried in a subculture.

Now, with human networks, we can distinguish between acquaintance, which is almost by definition symmetric, and friends, which is depressingly unidirectional, typically from low-status to high-status. I don't believe this metaphor is particularly well-studied, but it doesn't work very well. The net receivers of links for the Net are not high-status pages, but pages that just provide information (corporate, technical, whatever).

But getting back to the part of the metaphor that does work, there are two characteristics to both networks. First, there's a cost to linking both socially and online, because you need to find the subject of your interest and know them. Second, there is a cost to searching for new links. An immediate corollary to expensive search is a principle that the rich get richer: the easiest way to find new links for your own personal address book is to ask others for their contacts, so well-linked people/sites are more likely to get more links.

More on this next time.

(1) @articlebroder:net, title = "Graph Structure in the Web",
author= Andrei Broder and Ravi Kumar and Farzin Maghoul and Prabhakar Raghavan and Sridhar Rajagopalan and Raymie Stata and Andrew Tomkins and Janet Wiener,
journal = "Computer Networks",
volume = 33,
year = 2000,
pages = 309-320



[link][a comment]

on Thursday, May 18th, L-San Diego said

when did you put the h for human box in? what's it mean? i'm very fascinated about this.

please explain.

Oh, got it.

I'm not a machine! I'm a man!!!!!

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


26 May 06. Invariants

[PDF version]

This is about two technological revolutions that didn't happen, and aren't going to happen any time soon.

To some extent, this is also about a recent revolution in economics, where the study of how people interact has shown that there ain't nearly as much variation as we'd thought before: what we thought was wide variety is actually just a combination of invariants. More generally, it's a result of computational progress that has allowed us to pay more attention to distributions that are not in the Gaussian family (binomial, Normal, t, F, chi-squared) like the exponential, poisson, Zipf, &c.

The problem is that we humans have limits, and they have not in any way changed thanks to technology. The key limits are time and memory.

Who here bought R.E.M.'s Out of Time on vinyl or cassette?

The first result of these limits is the size of our comprehensible network. That is, how many people do I know well enough that I could hold a friendly conversation with them?

We can connect faster via cellular telephones, email, ntalk, or whatever point-and-talk technology has emerged since I wrote this, and so the time spent connecting is shorter, and we can cheaply connect to more distant people. But once the connection is made, we still have to resort to just talking or writing as before. This takes time, and the new toys don't speed this up at all.

Sure, you've got Friendster (or whatever the cool kids are using these days) allowing you to browse through photos of your pals, but back in the day, you had a paper address book, with scraps of everything hanging out of it, that let you do the same thing.

de Sola Pool and Kochen [C] made various attempts at estimating the number of acquaintances that a person has, and found that folks generally have about 1,500 immediate acquaintances whom they will see over the next two months once or twice and say hi to, and then about 4,500 less direct acquaintances, like the people from college whom they'll only see every few years. Perhaps our online networks have sort of blurred the lines on the close-by acquaintances and distant acquaintances, but how many hundreds of your high school pals have emailed you lately?

But that's all scale: what about structure? Are our social hierarchies flatter and more egalitarian now that we've got the Net? Again, no. We still see the same sort of pattern we saw in last episode: a few people who are very well connected and a lot of people who are minimally connected. The debate (about which I am no authority) is whether this is because some people have a higher capacity to maintain pals, due to more time dedicated to it or an innate name-and-face memory; or because of a rich-get-richer story that people find new pals via their old pals, so those who are well-networked will only wind up better-networked in the future. The true story is no doubt a bit of both.

Costly maintenance of links and costly search for new links have not changed for us humans. Generally, if you've got both of those characteristics, you're going to have a network that looks like standard social networks, and if those limits are set by the human brain and our 24 hour day, then the scale of those networks is set.

Content
Moving on from social networks, the second limit is in what we can produce. If you spent every minute of the next year typing away at your keyboard, your computer's hard drive would barely notice it all. [1 word= about 6 bytes. Given 60 words per minute times 1440 minutes per day = 518,400 bytes/day; in a year that's 180MB.] For most of us, everything we ever wrote would easily fit onto a single CD. That is, the technology of text processing has blown past the human ability to produce text.

For music and still pictures, we're in about the same place. The roadblock is not in storage and transmission, but in the process of finding artistic inspiration and the time and skill needed to execute it. Moving pictures are not far behind, and twenty years from now, downloading a movie won't take a moment's thought by anybody. Nobody will worry about the price of film stock, but the process of writing and producing a movie will still be a massive effort.

On the consumption side, it still takes 70 minutes to listen to Beethoven's Ninth, though you no longer have to get up and flip the disc in the middle. It still takes 90 minutes to watch a ninety-minute movie. The articles that I have on my hard drive in the `read any day now' pile has certainly grown, but the `articles I've read' pile grows at the slow, steady pace it always has, and the `articles I remember reading' pile continues to wither.

So scale is again set. As for structure, we find that there is again the same power-law type distribution in consumption. If we plot sales and Amazon sales rank on a log-log scale, we find that it's linear. In other words, the top ten best-selling books sell ten times as much as the bottom of the top 100, and those sell ten times as many as the bottom of the top 1,000, and so on down into the millions. [Below the top sellers, by the way, the ranking is basically the order of last sale, by the way.] That is, content is another power law, and that structure doesn't change with onlineness: before millions of blogs only read by three people, there were `zines only read by three people, and before that, letters.

So the distribution of book popularity happens to match the distribution of people popularity, which is no surprise, because the same two problems--costly search and costly linking/consumption--are an issue in both cases.

Policy implications
We are all more-or-less as networked as we're going to be by maybe age sixteen [socially; sexual networks follow different patterns from social networks, and tend to take more of a rich-get-richer form.[S]]. When you meet somebody new, they're crowding out somebody else, as time spent cultivating your new pal is not time spent cultivating the old. The same works for entire networks: just as advertisers must compete for your few dollars, networks must compete for your limited networking resources. Similarly, having a wealth of new content available just means that we have a wealth of things that we'll never read because they're crowded out by the other things we're reading.

I don't mean to say that the Web as a whole is a stagnant waste or that our information processing abilities are irrelevant. But with regards to certain basic human desires, we arrived about fifteen years ago when everybody got a PC, and everything since then has just been adding more features, giving you one more place where you can start a blog and one more list of contacts to keep synced.

[C] @articlepool:contacts,
author= de Sola Pool, Ithiel and Manfred Kochen,
title = Contacts and Influence,
journal = Social Networks,
volume = 1,
number = 1,
pages = 5-51,
year = 1978/79

[S] @articlelijeros:sex,
title = "The Web of Human Sexual Contacts",
author = Fredrik Lijeros and Christofer R Edling and Lus A Nunes
Amaral and H Eugene Stanley and Yvonne Åberg,
journal = "Nature",
volume = 411, day = 21, month = June, year = 2001,
pags=907-908




[link][no comments]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


10 September 06. The statistics style report

[PDF version]

It may sound like an oxymoron, but there is such a thing as fashionable statistical analysis. Where did this come from? How is it that our tests for Truth, upon which all of science relies, can vacillate from season to season like hemlines?

Before answering that question, note that statistics as a whole is not arbitrary. The Central Limit Theorem is a mathematical theorem like any other, and if you believe the basic assumptions of mathematics, you have to believe the CLT. The CLT and developments therefrom were the basis of stats for a century or two there, from Gauss on up to the early 1900s when the whole system of distributions (Binomial, Bernoulli, Gaussian, t, chi-squared, Pareto) was pretty much tied up. Much of this, by the way, counts not as statistics but as probability.

Next, there's the problem of using these objective truths to describing reality. That is, there's the problem of writing models. Models are a human invention to describe nature in a human-friendly manner, and so are at the mercy of human trends. Allow me to share with you my arbitrary, unsupported, citation-free personal observations.

Number crunching
The first thread of trendiness is technology-driven. In every generation, there's a line you've got to draw and say `everything after this is computationally out of reach, so we're assuming it away', and the assume-it-away line drifts into the distance over time. Here's a little something from a 1939 stats textbook on fitting time trends:

To fit a trend by the freehand method draw a line through a graph of the data in such a way as to describe what appears to the eye to be the long period movement. ...The drawing of this line need not be strictly freehand but may be accomplished with the aid of transparent straight edge or a “French” curve.

As you can imagine, this advice does not appear in more recent stats texts. In this respect, a stats text can actually become obsolete. However, true and honest approximations like this are relatively rare. Instead, more computing power allows new paradigms that were before just written off as impossible.

Computational ability has brought about two revolutions in statistics. The first is the linear projection (aka, regression). Running a regression requires inverting a matrix, with dimension equal to the number of variables in the regression. A two-by-two matrix is easy to invert (ad - bc mathend000#, remember?) but it gets significantly more computationally difficult as the number of variables rises. If you want to run a ten-variable regression using a hand calculator, you'll need to set aside a few days to do the matrix inversion. My laptop will do the work in 0.002 seconds. It's still in under a second up to about 500 by 500, but 1,000 by 1,000 took 5.08 seconds. That includes the time it took to generate a million random numbers.

So revolution number one, when computers first came out, was a shift from simple correlations and analysis of variance and covariance to linear regression. This was the dominant paradigm from when computers became common until a few years ago.

The second revolution was when computing power became adequate to do searches for optima. Say that you have a simple function to take in inputs and produce an output therefrom. Given your budget for inputs, what mix of inputs maximizes the output? If you have the function in a form that you can solve algebraically, then it's easy, but let us say that it is somehow too complex to solve via Lagrange multipliers or what-have-you, and you need to search for the optimal mix.

You've just walked in on one of the great unsolved problems of modern computing. All your computer can do is sample values from the function--if I try these inputs, then I'll get this output--and if it takes a long time to evaluate one of these samples, then the computer will want to use as few samples as possible. So what is the method of sampling that will find the optimum in as few samples as possible? There are many methods to choose from, and selecting the best depends on enough factors that we call it an art more than a science.

In the statistical context, the paradigm is to look at the set of input parameters that will maximize the likelihood of the observed outcome. To do this, you need to check the likelihood of every observation, given your chosen parameters. For a linear regression, the dimension of your task was equal to the number of regression parameters, maybe five or ten; for a maximum likelihood calculation, the dimension is related to the number of data points, maybe a thousand or a million. Executive summary: the problem of searching for a likelihood function's optimum is significantly more computationally intensive than running a linear regression.

So it is no surprise that in the last twenty years, we've seen the emergence of statistical models built on the process of finding an optimum for some complex function. Most of the stuff below is a variant on the search-the-space method. But why is the most likely parameter favored over all others? There's the Cramer-Rao Lower Bound and the Neyman-Pearson Lemma, but in the end it's just arbitrary. Gauss had no theorems that this framework gives superior models relative to linear projection, but it does make better use of computing technology.

Hemlines
The second thread of statistical fashion is whim-driven like any other sort of fashion. Golly, the population collectively thinks, everybody wore hideously bright clothing for so long that it'd be a nice change to have some understated tones for a change. Or: now that music engineers all have ProTools, everything is a wall of sound; it'd be great to just hear a guy with a guitar for a while. Then, a few years later, we collectively agree that we need more fun colors and big bands. Repeat the cycle until civilization ends.

Statistical modeling sees the same cycles, and the fluctuation here is between the parsimony of having models that have few moving parts and the descriptiveness of models that throw in parameters describing the kitchen sink. In the past, parsimony won out on statistical models because we had the technological constraint.

If you pick up a stats textbook from the 1950s, you'll see a huge number of methods for dissecting covariance. The modern textbook will have a few pages describing a Standard ANOVA (analysis of variance) Table, as if there's only one. This is a full cycle from simplicity to complexity and back again. Everybody was just too overwhelmed by all those methods, and lost interest in them when linear regression became cheap.

Along the linear projection thread, there's a new method introduced every year to handle another variant of the standard model. E.g., last season, all the cool kids were using the Arellano-Bond method on their time series so they could assume away endogeneity problems. The list of variants and tricks has filled many volumes. If somebody used every applicable trick on a data set, the final work would be supremely accurate--and a terrible model. The list of tricks balloons, while the list of tricks used remains small or constant. Maximum likelihood tricks are still legion, but I expect that the working list will soon find itself pared down to a small set as optimum finding becomes standardized.

In the search-for-optima world, the latest trend has been in `non-parametric' models. First, there has never been a term that deserved air-quotes more than this. A `non-parametric' model searches for a probability density that describes a data set. The set of densities is of infinite dimension. If all you've got a hundred data points, you ain't gonna find a unique element of mathend000# with that. So instead, you specify a certain set of densities, like sums of Normal distributions, and then search for that subset that leads to a nice fit to the data. You'll wind up with a set of what we call parameters that describe that derived distribution, such as the weights, means, and variances of the Normal distributions being summed.

But `non-parametric' models allow you to have an arbitrary number of parameters. Your best fit to a 100-point data set is a sum of 100 Normal distributions. If you fit 100 points with 100 parameters, everybody would laugh at you, but it's possible. In that respect, the `non-parametric' setup falls on the descriptive end of the descriptive-parsimonious end of the scale. In my opinion.

I don't want to sound mean about `non-parametric' methods, by the way. It's entirely valid to want to closely fit data, and I have used the method myself. But I really think the name is false advertising. How about distribution-fitting methods or optimal distribution estimation?

Bayesian methods are increasingly cool. There are the computational problems, that if you want to assume something more interesting than Normal priors and likelihoods, then you need a computer. Those have been surmounted, leaving us with the philosophy issues. In the context here, those boil down to parsimony. Your posterior distribution may be even weirder than a multi-humped sum of Normals, and the only way to describe it may just be to draw the darn graph. Thus, Bayesian methods are also a shift to the description-over-parsimony side.

Method of Moments estimators have also been hip lately. I frankly don't know where that's going, because I don't know them very well.

Also, this guy really wants multilevel modeling to be the Next Big Thing in the linear model world, and makes a decent argument for that.

You can see that the increasing computational ability invites shifting away from parsimony. Since PCs really hit the world of day-to-day stats recently, we're in the midst of a swing toward description. We can expect an eventual downtick toward simpler models, which will be helped by the people who write stats packages--as opposed to the researchers who caused the drift toward complexity--because they write simple routines that implement these methods in the simplest way possible.

So is your stats textbook obsolete? It's probably less obsolete than people will make it out to be. The basics of probability have not moved since the Central Limit Theorems were solidified. In the end, once you've picked your paradigm, there aren't really many methods out there for truly and honestly cutting corners; most novelties are just about doing detailed work regarding a certain type of data or set of assumptions. Further, those linear projection methods or correlation tables work pretty well for a lot of purposes.

But the fashionable models that are getting buzz shift every year, and last year's model is often considered to be naïve or too parsimonious or too cluttered or otherwise an indication that the author is not down with the cool kids--and this can affect peer review outcomes. A textbook that focuses on the sort of details that were pressing ten years ago, instead of just summarizing them in a few pages, will have to pass up on the detailed tricks the cool kids are coming up with this season--which will in turn affect peer reviews for papers writen based on the textbook's advice. All this is entirely frustrating, because we like to think that our science is searching for some sort of true reflection of constant reality, yet the methods that are acceptable for seeking out constant reality depend upon a bit more on human whim than I'd really like.




[link][2 comments]

on Monday, September 11th, Andy said

Interesting idea that methods as well as theories can go through paradigm shifts. But how do you know that this doesn't really represent progress? Regressions are more powerful than (i.e. a superset of) AN(C)OVA, and we ain't going back to those old days. So there is a competitive process of creative destruction, yadda yadda, until the best stats win. For example, take Huber-White robust standard erros. Or Dickens/Moulton style clustered standard errors. Nowadays people just use them without making a big deal about it, because they work. Maybe A-B will be in that same category ten years from now. Really, one problem I always have is figuring out how much the reader knows already and how much I should spell out.

on Monday, September 11th, Miss ALS of San Diego said

I think the truly interesting thing about shifts in methods is that until they are considered 'street wear' and not just haute coutour, those using the latest thing have to examine (and, gasp, explain) their assumptions. People forget all about the requirements for OLS all_the_frickin_time...but people see OLS and they know how to interpret the results, so they don't bother figuring out whether OLS is appropriate. if you're using a Bayesian technique (which is also horribly named, i.m.o), you're got to convince people that your priors are reasonable, you've got to have a deeper understanding of the method because uber-human friendly programs like stata won't just chug it out for you.

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


26 October 06. Is Ruby halal?

[PDF version]

The starting point here is last episode's essay on programming languages, and this here is basically an explanation and generalization of why I wrote it. For those who didn't read it (and I don't blame ya), here's a summary in the form a description of my ideal girlfriend: she should be an Asian Jewess, around 172-174cm tall, gothy, sporty, significantly smarter than me, significantly cuter than me, significantly better socialized than me, willing to hang out with me, very well organized but endlessly spontaneous, enjoys walks along the beach, does intellectually challenging work that involves being outdoors, and plays guitar in a rock band. Yeah.

So: too bad half of those things contradict the other half, eh.

The first key difference between the problem of picking a programming language and the problem of picking a significant other is that the programming language doesn't have to like you back. The liking-you-back issue creates many volumes' worth of interesting stories, all of which I will ignore here, in favor of the the other key difference: unlike many girl/boyfriends, programs are often shared among friends and coworkers, meaning that there are externalities in my arbitrary, personal-preference choice.

Personal preference plus externalities is the perfect recipe for never-ending, repetitive debate.

Debating the undebatable
Under Jewish law, one must never say the Name of God. In fact, there is none--it's sort of a mythical incantation, used to breathe life into Golems and otherwise tell monotheistic fairy tales. Under Islamic law, one must speak the Name of God when slaughtering an animal for the animal's flesh to be halal. My reading here is that there is therefore no way for meat to be both halal and kosher.

And let's note, by the way, that kosher and halal laws are not cast as rules about keeping clean for the sake of disease prevention. They're ethical laws, meaning that, like personal preference, they can't really be debated. It's not like somebody will finally find the correct answer and write it down for everybody to see. We can't even agree to basic axioms like `you should be nice to people' or `don't be wasteful'.

Do ethical laws induce externality problems? From the looks of it, yes they do, because so many people spend so much time trying to get other people to conform to their personal ethics. Ethics are an extreme form of that other personal preference, æsthetics, and seeing somebody commit what you consider to be an unethical act is often on par with watching somebody wearing a floppy brown sweater with spandex safety orange tights.

Fortunately, almost everybody understands that there is no point going up to Mr. Brown-and-orange and telling him he needs to change, because we all know exactly how the conversation will go: some variant of `I have my own personal preferences' or `who are you to impose your arbitrary choices upon me'. That is, it would be a boring argument, because there is fundamentally no right answer.

When does human life begin? I have no idea, and anybody who says otherwise is guilty of hubris.

Gee, that was a fun debate, wasn't it.

And the problem with that non-debate, as with this essay, is that it has no emotionally satisfying conclusion. The natural form of a debate is for one side to present its best arguments, the other side to present its own, and then both sides go home and think about it. But the form of debate that is emotionally satisfying has a resounding conclusion, where one side tearfully confesses to the other, `OK, I was wrong!' But with arguments of ethics or personal preference, this sort of resolution happens about once every never.

But there's a simple way to fix this problem: invent statistics.

After all, not all debates are mere issues of personal preference. A question like `will building this road or starting this war improve the economy' has a definite answer, though we're typically not smart enough to know it. There is valid grounds for debate there.

But for ethics and personal preference issues, we can still make it look like there are valid grounds for debate. Find out whether abortions decrease crime The paper that claims this, by Steven “Freakonomics” Leavitt and another not-famous economist, has been shown to be based on erroneous calculations. PDF, find out whether people commit more errors when commas are used as separators or terminators, run benchmarks, accuse the author of the file system you don't like of being a murderer. With enough haphazard facts, any debate about pure personal preference regarding simple trade-offs can be extended to years of tedium.

This turns debates that should be of the natural form (both sides state opinions, then go home) into the resounding form of debate, where both sides attempt to get the other side to tearfully confess the errors of its ways. But the sheen of facts doesn't change the fundamental nature of debates over ethics or personal preference, and because these are debates where nobody is actually wrong, nobody will ever be convinced to bring about an emotionally satisfying conclusion. We instead simply have a new variant on the recipe for tedious, never-ending debate.

Relevant previous entries:
The one three years ago when I advocated C, and came off sounding very reasonable, I think. The one where I complain about an especially vehement set of proselytizers The one where I talk about the value of stable standards




[link][2 comments]

on Sunday, October 29th, rd said

-i think Levitt claims the mistakes weren't significant and dont alter his main conclusions

-your claim that ethical debates are fundamentally unresolvable might also be a matter of opinion. Eg, some people might think that human life starts at x and this can be proven axiomatically, we just havent figured out how to do it yet. it's possble (but highly unlikely?) that at some pt 'somebody will finally find the correct answer and write it down for everybody to see', no?

on Monday, October 30th, the author said

Yes, Donohue and Levitt has a response (PDF) to the claims. I have not had time to really look at the `metrics on any of the papers involved; if anybody has, let me know. But the key allegation is that the original abortion-prevents-crime paper used ln(arrests) as the dependent variable, and if you redo the regression with the much more sensible ln(arrests per capita) and jiggle the variables a bit, then the effect disappears. In the response linked above, Donohue and Levitt respond that if you do use ln(arrests per capita), and jiggle the variables more then the effect reappears.

In short, we have a specification fight. I think we should throw out specifications of the regression with ln(arrests). The authors of the critique, Foote and Goetz, found a valid specification where abortions have no relation to crime, and then Donohue and Levitt found another specification where abortions reduce crime. For my part, all I can do is modestly and respectfully comment that I told you so.

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


04 April 07. Philosophizing from the bench

[PDF version]

I have written elsewhere about how there is no economic justification for patents on software or business methods, and how the legal basis of such patents is based on a very specific (and in my opinion, false) reading of prior case law.

But there is one final question regarding such patents: are they ethical? To answer this question, we must answer another entirely unanswerable question: do people invent mathematical results or discover them? Are the symbols mathematicians write down a reflection of some innate structure of the universe, or just human symbols manipulated using human rules?

One could find people on all levels of the spectrum between math as pure invention and math as pure discovery, but this has not always been true: before about two hundred years ago, mathematical results were firmly a part of nature that humans stumbled upon. In such a context, the ideal of granting a patent--an ownership right--in a mathematical algorithm would have been taken as simply absurd, an unethical and unenforceable handing over of a piece of nature to one person. The monopolies now granted for mathematical algorithms are thus the product of a few centuries' worth of development in mathematics and our attitude toward the subject.

Unfortunately, software patents do not represent the cutting edge in modern sensibilites regarding the nature of mathematical algorithms. Instead, they make sense only via a school of thought that was prevalent from the late 1800s until it was discredited in 1931. Thus, the courts have tried to keep the law up-to-date by revising the scope of patent law from where it stood in 1790, but they remain behind the times nonetheless.

The realists
The realist view originated with Pythagoras (about 582-507 BCE). Pythagoras observed various regularities, like how the sound of a plucked chord made the most harmonious sound when played in concert with a chord exactly half its length (what we now call an octave apart), and then with a chord a third its length (a fifth apart), et cetera.1 He concluded from these pleasing regularities that all of the world is a reflection of a set of harmonious mathematical relationships--a music of the spheres.

Plato (born about 75 years after Pythagoras's death) picked up on the Pythagorean's geometrical obsession. If you've ever taken a philosophy class, you are familiar with Plato's view that the forms we see are vague, secondary reflections of a perfect ideal--nature is a reflection of mathematics. Plato said that people remembered mathematical results, because they are imprinted in our minds and we need only get the right signal to remind ourselves of the mathematical truth inside ourselves.

Around this time, it was a popular trick to try to try to write down as many theorems as possible from the basic axioms of geometry. The most famous such attempt is Euclid's Elements. This is an oft-told story, but here are the first five basic assumptions Euclid needed to derive all of geometry:

  1. A straight line segment can be drawn by joining any two points.
  2. A straight line segment can be extended indefinitely in a straight line.
  3. Given a straight line segment, a circle can be drawn using the segment as radius and one endpoint as center.
  4. All right angles are congruent (i.e., equal).
  5. If two lines are drawn which intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines inevitably must intersect each other on that side if extended far enough.

If you're like most people, you were nodding your head up until you got to that last one, which is something of an eyesore in its lack of simplicity; we'll get back to it shortly.

2000 years pass. The history books typically characterize these as periods of religious fervor, but that doesn't mean that they're periods of scientific inactivity. However, the realist viewpoint took a small twist: it's not that nature haphazardly reflects mathematical ideals, but that the Divine Creator used math to design everything. But people like Copernicus and Newton still saw themselves as just marvelling at how neatly the Divine Watchmaker designed the mathematical world around us, and still placed themselves in the role of observer rather than inventor.

The formalists appear
Back to that fifth assumption. Some tried to derive it from Euclid's other axioms. They would begin by assuming that the fifth assumption is false, and then search for a contradiction to the other axioms. But a funny thing happened: under some means of constructing a system where the fifth axiom is false, no contradictions turn up. One could construct a whole world that in many ways looks absolutely nothing like Euclid's. Gauss, the inventor/discoverer of Gaussian elimination, the Gaussian distribution, Gaussian quadrature, and et cetera, was one of many in the mid-1700s to question that fifth postulate. What if you could have lines point toward each other but still never meet? We thus get non-Euclidian geometry, which caused something of an explosion.

This was the first chance for the formalist viewpoint to take hold. Many of these non-Euclidian geometries didn't describe anything we have seen here on Earth. Some got the last laugh a few centuries later when Einstein showed that non-Euclidian geometry sometimes did a better job of describing reality than Euclidian, but at the time there was the gnawing question that maybe these axioms and their derived results were just a set of amusing inventions--human-made symbols that reflect nature only by sheer luck, if that.

The project of mathematics became the problem of designing systems of symbols and their manipulation that are interesting in and of themselves. Of course, such a system should not self-contradict, and reflects at least some of our intuitive beliefs, like how if A = B and A = C , then B = C .

Here in the present day, mathematical geometry courses build the subject in a series of steps. They start with defining sets, then establishing the characteristics of open and closed sets, then describing networks of those (aka topologies), and then adding in neighborhoods (aka manifolds), and only then can the concept of distances (metric spaces) come in. So what Euclid took to just be space, we take to rely on definitions of sets, algebras, topologies, manifolds, and metrics.

The larger project is to have a unifying set of symbols, beginning with sets, that would allow one to trace the most advanced mathematical ideas all the way back to basic manipulations of sets. Like the Euclidian craze of the Greek era, the late 1800s-early 1900s brought about a flurry of people writing derivations of as much as possible from basic axioms of sets. The stand-out attempt was Whitehead and Russel's Principia Mathematica, which went pretty darn far in starting with very simple symbols and building up basically everything.

It will be relevant below that Bertrand Russel, the paragon of the set theoretic formalization of mathematics, was not very happy with his symbolic designs:

I wanted certainty in the kind of way in which people want religious faith. I thought that certainty is more likely to be found in mathematics than elsewhere. [...] But as the work proceeded, I was continually reminded of the fable about the elephant and the tortoise. Having constructed an elephant upon which the mathematical world could rest, I found the elephant totterling, and proceeded to construct a tortoise to keep the elephant from falling. But the tortoise was no more secure than the elephant, and after some twenty years of very arduous toil, I came to the conclusion that there was nothing more that I could do in the way of making mathematical knowledge indubitable. (Russel, 1956, pp 54-55)

Russel thus raises a natural question: how far can all this go? Kurt Gödel famously showed that it's not as far as one would hope, using the formalization of the following simple declaration. This sentence is false. If that sentence actually is false, then the sentence is proclaiming a true statement--which means that the sentence is actually false--which means that it's true....

Gödel's version was the statement “This statement is not provable using the logical system L .” Name this statement S . If S were provable using the system L , then the statement would be false, meaning that L has proven a contradiction. If S is not provable using L , then L is incomplete, in the sense that S is a provable statement (it is not provable using L , as promised), but L can not prove it.

This was a horrible blow to the formalists. However powerful their system, there would still be simple logical chains like the last paragraph that prove things that the logical system can not handle. The formalist movement basically lost credibility. The proof that there is something to mathematics that our symbolic systems can not handle clearly advocates for the realist side of the spectrum.

The derivation of computing
At this point, the symbol-manipulators did not entirely give up, but instead rephrased the question: having accepted that for any logical system some expressions are not evaluable, what mathematical expressions are evaluable? Two people simultaneously provided suggestions of determining what is evaluable, in 1937-8. The first, Alonzo Church, invented a means of writing expressions, claiming that his writing scheme covered all possible evaluable operations. Alan Turing took it in a slightly more imaginative turn: he described a machine with a tape (memory) and a head that moves along the tape and modifies the data written thereon; if Turing's machine can evaluate the expression in finite time, then it is evaluable.

That is, Turing described a computer, and said that if something is evaluable via computer, then it is evaluable via the systems of set theory as well. In fact, every modern computer out there is equivalent to Turing's machine and Church's lambda calculus (which are themselves equivalent). Barring highly specialized languages, virtually every modern programming language is Turing equivalent, meaning that it is equivalent to a Turing machine, the lambda calculus, and all of the other Turing equivalent languages. That means that programs written in a modern programming language are equivalent to mathematical expressions using traditional mathematical notation.

Thus, modern computing has its roots in the set theoretic attempts to write a language that describes everything, which in turn has its roots in the formalist perspective that mathematical symbols need not reflect any inherent logic of the universe.

In computing, the bias toward formalism is still heavier, because programs look like human designs. Further, they are often describing systems built by humans. Geometry is Greek for “measuring the Earth,” because it was first used (by the Egyptians) for surveying land, but as the mathematical and computational edifice grows taller, it becomes increasingly difficult to see the ground below.

The final step in formalist philosophy
The birth of formalism laid the foundations for the software patent.

There is an understanding that laws of nature may not be patented, which persists to this day. The law of gravity, or a newly discovered element, are not human inventions, but discoveries regarding nature. Within the law of nature exception lay a sub-exception: mathematical algorithms may not be patented. In the terminology above, setting mathematics as a subset of nature is clearly and firmly a realist view. This is appropriate, because Thomas Jefferson wrote the first patent law in 1790, while Gauss had written his development of non-Euclidian geometry around 1820-1830. The realist school was thus the prevalent (and only) understanding of mathematics when the patent law was written. Legal scholars often ask what the “congressional intent” was behind a bill, and it is effectively impossible for the congressional intent to have been that mathematical results are not laws of nature.

Now let us skip forward to 1980, at the founding of the Court of Appeals for the Federal Circuit (CAFC), to consolidate patent hearings (and some other issues) into one specialist court. Several of the judges on the CAFC bench are assigned to hear only patent cases--cases about human inventions. Many of them are former prominent patent attorneys. Therefore, it is no surprise at all that with regards to mathematics, they are formalists.

Let us open with a law review article from 1986, five years after the Supreme Court ruled for the third time that mathematical algorithms may not be patented: “A mathematical or other algorithm is neither a phenomenon of nature nor an abstract concept. [A mathematical] algorithm is very much a construction of the human mind. One cannot perceive an algorithm in nature. The algorithm does not describe natural phenomena (or natural relationships).”(Chisum, 1986) This passage is clearly a product of the towers of elephants and tortoises above. Russel's Principia Mathematica was published in 1913, and this law review passage arrived 73 years later. Given the speed at which attitudes toward mathematics move, this perspective is downright trendy.

In the courts, the origins of the software patent are typically traced to the ruling written by Judge Giles Rich in In re Alappat (33 F.3d 1526, 31 USPQ2d 1545, 1994), which split the mathematical algorithm exception off from the law of nature exception--and then denied the existence of the mathematical algorithm exception:

[T]he Supreme Court explained that there are three categories of subject matter for which one may not obtain patent protection, namely “laws of nature, natural phenomena, and abstract ideas.” ...the Supreme Court also has held that certain mathematical subject matter is not, standing alone, entitled to patent protection. ...A close analysis ... reveals that the Supreme Court never intended to create an overly broad, fourth category of subject matter excluded from Section 101.

Clearly, this discussion makes no sense if a mathematical algorithm falls into the categories of “law of nature, natural phenomena, and abstract ideas.”

Having split off mathematical algorithms as a separate category from things existing in nature, the Federal Circuit killed it off by 1999. AT & T v Excel (172 F.3d 1352, 50 USPQ2d 1447, 1999), cited earlier CAFC rulings to determine that: “the judicially-defined proscription against patenting of a `mathematical algorithm,' to the extent such a proscription still exists, is narrowly limited to mathematical algorithms in the abstract.” Such a narrow limitation is no limitation at all, because it is trivial to state “I claim a machine on which is loaded an algorithm to...” before any purely abstract algorithm. Indeed, patents granted based on such wording abound.

But the world is not formalist
As you can see, the CAFC has positioned itself at the formalist extreme of the formalist-realist spectrum. However, since Gödel, few practitioners of math and computer science placed themselves at such an extreme.

Dedekind was a mathematician instrumental in the development of set theory, and was thus essential to the formalist camp. In the opening to his notes on differential calculus (Dedekind, 1901, pp 1-2), he cast himself as a formalist, complaining that resorting to intuition “...can make no claim to being scientific, no one will deny. For myself this feeling of dissatisfaction was so overpowering that I made the fixed resolve to keep meditating on the question till I should find a ...perfectly rigorous foundation for the principles of infinitesimal analysis [differential calculus].” But there his formalism toward differential calculus gives way to his realist belief, that he was working to “... discover its true origin in the elements of arithmetic and thus at the same time to secure a real definition of the essence of continuity.”

I have not yet mentioned the intuitionist movement, which Kline (1980) traces back to the early 1900s. The position of the intuitionist is closer to the realist: we all know what zero and one are, an we all have an idea of what addition and multiplication mean, so we should build from there. Causality is something about which we all have an intuitive grasp, but which is simply impossible to pin down using statistical tools. Judea Pearl, the author of the standard reference on causality (Pearl, 2000), is entirely unfazed by the fact that his chosen subject is completely ungrounded: “For me, the adequacy of a definition lies not in abstract argumentation but in whether the definition leads to useful ways of solving concrete problems. The definitions of causal concepts that I have used in my book have led to useful ways of doing things....”2

So while patent law has followed the single thread of formalism to the exclusion of all other threads, the typical person having ordinary skill in the art of computing and mathematics believes a mix of the realist, the intuitionist, the formalist, and perhaps even the theological. Therein lies the conflict: Judge Rich was philosophizing from the bench, and mandated that patent law shall take the formalist viewpoint that mathematics is the human manipulation of human symbols--but mathematicians themselves have prevalently had the view that strict formalism is an inaccurate description of mathematics and computing since the 1930s. Practitioners thus see the patentability of software and mathematical results as based on a false--and even condescending--view of their chosen field.

@book{russel:portraits,
author="Bertrand Russel",
title= "Portraits from Memory, and Other Essays",
year= 1956,
publisher="Simon and Schuster"
}

@book{dedekind:essays,
author="Richard Dedekind",
title="Essays on the Theory of Numbers",
year= 1901,
publisher="Open Court Publishing"
}

@book{kline:certainty,
author="Morris Kline",
title="Mathematics: the Loss of Certainty",
year= 1980,
publisher="Oxford University Press"
}



Footnotes

... cetera.1
See Donald Duck in Mathemagic Land.
... things\dots.”2
http://www.mii.ucla.edu/causality/?p=33



[link][2 comments]

on Sunday, April 8th, Miss ALS of San Diego said

Well this post was just delightful. thank you, mr. blair.

on Thursday, April 12th, GK said

I once heard about "NOA", a Natural Ontological Attitude, which scientists and technical people have. I think this means they generaly accept formulas and theories that seem to work as having reality, without pushing too hard on the philosophical definition of truth.

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


12 June 07. Your genetic data

[PDF version]

[Or, The ethical implications of SQL.]

Our paper on the genetic causes of bipolar disorder finally came out last week. The lead author has repeatedly said things like `we really couldn't have done it without you,' though, to tell ya the truth, I have only a limited grasp of the paper's results, and have been unable to read it through, due to my lack of background in the world of genetics and biology in general. Fortunately, there have been press releases and a few articles to explain my paper to me.

The tools of the data processing field known as Biology
Figure One: The tools of the data processing field known as Biology

The figure explains how this is all possible. It is what a genetics lab looks like. That's a work bench, like the ones upon which thousands of pipettes have squirted millions of liters of fluid in the past. But you can see that it is now taken up by a big blue box, which hooks up to a PC. Some of these big boxes use a parallel port (like an old printer) and some run via USB (like your ventilator or toothbrush). The researcher puts processed genetic material in on the side facing you in the photo, onto a tray that was clearly a CD-ROM drive in a past life. Then the internal LASER scans the material and outputs about half a million genetic markers to a plain text file on the PC.

I know I'm not the first to point this out, but the study of human health is increasingly a data processing problem. My complete ignorance regarding all things biological wasn't an issue, as long as I knew how to read a text file into a database and run statistical tests therefrom.

Implication one: Research methods
We are in the midst of a jump in how research is done. Historically, the problem has been to find enough data to say something. One guy had to sail to the Galapagos Islands, others used to wait for somebody to die so they could do dissections, and endless clinical researchers today post ads on bulletin boards offering a few bucks if you'll swallow the blue pill.

But now we have exactly the opposite problem: I've got 18 million data points, and the research consists of paring that down to one confident statement. In a decade or so, we went from grasping at straws to having a haystack to sift through.

As I understand it, the technology is not quite there yet. There's a specific protocol for drawing blood that every nurse practitioner knows by heart, and another protocol for breaking that blood down to every little subpart. We have protocols for gathering genetic data, but don't yet have reliable and standardized schemes for extracting information from it.

When we do have such a protocol--and it's plausible that we soon will--that's when the party starts.

Implication two: Pathways
If you remember as much high school biology as I do, then you know that a gene is translated in human cells into a set of proteins that then go off and do some specific something (sometimes several specific somethings).

So if you know that a certain gene is linked to a certain disorder, then you know that there is an entire pathway linked to that disorder, and you now have several points where you could potentially break the chain. Or at least, that's how it'd work in theory. Again, there's no set protocol. There are many ways to discover the mechanism of a disorder, but the genetic root is the big fat hint that can make it all come together right quick.

Then the drug companies go off and develop a chemical that breaks that chain, and perhaps make a few million per year in the process.

Implication three: Free will versus determinism
One person I talked to about the search for genetic causes thought it was all a conspiracy. If there's a genetic cause for mental illness, then that means that it's not the sufferer's fault or responsibility. Instead of striving to improve themselves, they should just take a drug. And so, these genetic studies are elaborate drug-company advertising.

From my casual experience talking to folks about it, I find that this sort of attitude is especially common regarding psychological disorders. See, every organ in the human body is susceptible to misfiring and defects--except the brain, which is created in the image of '', and is always perfect.

Annoyed sarcasm aside, psychological disorders are hard to diagnose, and there's a history of truly appalling abuse, such as lobotomies for ill behavior, giving women hysterectomies to cure their hysteria, the sort of stories that made One Flew over the Cuckoo's Nest plausible, &c. Further, there are often people who have no physiological defect in their brains, but still suffer depression or other mood disorders. They get some sun, do some yoga, and everything works out for them.

But none of that means that the brain can not have defects, and that those defects can not be treated.

The problem is that our ability to diagnose is falling behind our ability to cure. We know that certain depressives respond positively to lithium carbonate, Prozac, Lexapro, Wellbutrin, Ritalin, Synthroid, and I don't know today's chemical of the month. But we still don't have a system to determine which are the need-of-drugs depressives and which are the get-some-sun depressives.

Or to give a physical example, we don't know which obese individuals have problems because of genetic barriers and which just need to eat less and exercise. It's only harder because, like the brain, the metabolism is an adaptive system that can be conditioned for the better or for the worse, confounding diagnosis. Frequently, it's both behavior and genetics, albeit sometimes 90% behavior and sometimes 90% genes.

A genetic cause provides genetic tests. If we have a drug based on a genetic pathway, as opposed to a drug like Prozac that just seemed to perk people up, we can look for the presence or absence of that genetic configuration in a given individual. This ain't a silver bullet that will sort people perfectly (if that's possible at all), but having a partial test corresponding to each treatment is already well beyond the DSM checklists we're stuck with now.

Implication four: Eugenics
We can test for genetics not only among adults and children, but even fetuses. On one small survey, five out of 76 British ethics committee members (6.6%) “thought that screening for red hair and freckles (with a view to termination) was acceptable.”citation

Fœtal gene screens to determine Down syndrome or other life-changing conditions are common, and 92% of fetuses that return positive for the test for Down Syndrome are aborted [1].

Biology has an embarrassing past in eugenics. And we're not just talking about the Nazis--the USA has a proud history of eugenics to go along with its proud history of hating immigrants (I mean recent immigrants, not the ones from fifty years ago, who are all swell). My above-mentioned lead author refers me to this article on eugenics, and having read it I too recommend the first 80%.

If I may resort to a dictionary definition, the OED tells us that eugenics is the science “pertaining or adapted to the production of fine offspring, esp. in the human race.” In the past, that meant killing parents who turned out badly in life or had big noses, but hi-tech now allows us to go straight to getting rid of the offspring before anybody has put in too heavy an investment.

Anyway, I won't go further with this, but to point out that what we'll do with all this fœtal genetic info is an open question--and a loaded one, since the only choices with a fœtus are basically carry to term or abort. The consensus seems to be that aborting due to Down syndrome is OK and aborting due to red hair is not, but there's a whole range in between. If you know your child has a near-certain chance of getting Alzheimer's 80 years after birth, would you abort? This Congressional testimony approximately asks this question.

Implication five: the ethics of information aggregation
This is also well-trodden turf, so I'll be brief:

It is annoying and stupid that every time you show up at the doctor's office, the full-time paperwork person hands you a clipboard with eight papers, each of which asks your name, full address, and Social Security Number. By the seventh page, I sometimes write my address as “See previous pp” but they don't take kindly to that, because each page goes in a different filing cabinet.

You may recall Sebadoh's song on data and database management: “You can never be too pure/ or too connected.” If all of your information is in one place, either on your magical RF-enabled telephone or somewhere in the amorphousness of the web, then that's less time everybody wastes filling in papers and then re-filling them in when the bureaucrat mis-keys everything. I have a FOAF whose immigration paperwork was delayed for a week or two because somebody spelled her name wrong on a form.

Having all of your information in one place makes it easier for people to violate your privacy and security. As advertisers put it, it makes it easier to offer you goods and services better attuned to your lifestyle, which is the nice way of saying `violate your privacy'. It means more things they can do to you on routine traffic stops.

The data consolidation=efficiency side is directly opposed to the data disaggregation=privacy side. There is no solution to this one, and both sides have their arguments. A prior entry discussed how information aggregation can lead to disaster, but we should bear in mind that the same technology discussed there made the innocuous and essential U.S. Census possible. The current compromise is to consolidate more and put more locks on the data, but that doesn't work very well in practice, as one breach anywhere can ruin the privacy side of the system.

Back to genetics, when we have a few more snips of information about what all those genes do, your genetic info will certainly be in your medical records. This is a good thing because it means that those who need to will be able to diagnose you more quickly and efficiently; it is a bad thing because those who don't need to know may also find a way to find out personal information about you.

At the moment, you can rely on the anonymity of being a needle in a haystack, the way that some people who live at the top of high rise buildings are comfortable walking around naked and with the curtains open--who's gonna bother to look? But as the tools and filters and databases become more sophisticated, the haystack may provide less and less cover.

So we're going to have a haystack of data about you (and your fœtus) right soon. Unfortunately, we don't quite yet know how to analyze, protect, or act on that haystack. I guess we'll work it out eventually.

[1] @article{mansfield:downs,
title = "Termination Rates After Prenatal Diagnosis of {D}own Syndrome, Spina Bifida, anencephaly, and {T}urner and {K}linefelter Syndromes: A Systematic Literature Review",
author = "Caroline Mansfield and Suellen Hopfer and Theresa M Marteau",
journal = "Prenatal diagnosis",
Volume=19, number=9 , pages="808-812",
url= here }




[link][a comment]

on Wednesday, June 13th, lead author said

Is this your best-referenced blog post ever?

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


14 June 07. Micronumerosity

[PDF version]

Today's subject is two studies of money and its relation to medical efficacy. Both found no relation, but this non-finding may be suspect.

The first, found via this blog, explains that doctors aren't nearly as effective as one would hope. His key citation is this early `80s study by RAND Corp (PDF). I actually posted much of the below as a comment on the blog, but the guy has on one of those `I will post only comments that agree with me' filters on the blog.

The second, found via the New York Times, is this study of PA cardiac surgeons (PDF).

The RAND study followed several thousand people over almost a decade. It paid for the health care of some of them, and those people naturally visited the doctor (statistically) significantly more than the control group who had to pay for their own health care.

However, the study did not find a statistically significant difference in many standard measures of health, such as death rate, between those who received paid health care and those who purchased their own. We conclude, therefore, that public subsidies for healthcare is stupid and people should fend for themselves.

The NY Times interpretation of the PA study went along the same lines: it looked at what insurance companies were paying for health care at each of two dozen hospitals, and the success rate of those hospitals. Again, it found little difference in the death rates from one hospital to the next.

Catastrophic events
Having spent a reasonable amount of time with both studies, I encourage the reader interested in the running gag that is US health care finance to look at both reports. That said, there's a problem with the statistics: The studies aren't very powerful.

Most major medical events, death included, are catastrophic events, in the formal sense that they happen very infrequently but are a big deal when they happen. Colloquial catastrophes, like earthquakes, are also catastropic in the statistical sense. Doing stats on catastrophic events is difficult, due to the difficulty of gathering enough observations. Our guest blogger from a few episodes ago has often pointed out to me that the ocean is very clumpy, and is therefore impossible to sample. Either you'll get the 99.9% dead spots where nothing is going on, or the 0.1% where a huge menagerie of critters are collectively following the currents and feeding off of each other.

When I first arrived in DC, I was lounging in a coffee shop attempting to keep myself occupied, when a TV crew came in, looking for man-on-the-street interviews. The interviewer showed me a paper with a news release about terrorism futures, for about, oh, three seconds tops, which was long enough to read the single highligted line about TERRORISM FUTURES. The videotape rolled, the mic was shoved in my face, and I was supposed to say, `I am apalled!!!'. But they got an unlucky draw; instead I said something like, `I just got my PhD from a major research institution--the department that had a hand in the development of markets like these, even--and these markets do a wondeful job of aggregating information. However, terrorist events themselves--not to be confused with ancillary events around terrorism--are catastrophic events, and so it's very difficult to aggregate information about them.' What else could I say? I didn't bother to see if I showed up on TV that night.

Table four in the RAND study states that the range in deaths over the various treatments was from 0.9% to 1.1%, for a total of forty deaths in the entire study.

So bear with me here (or if you're wimpy, skip to the boldface difference below): assume a binomial distribution, and a true rate of death for those paying for health care of 1%. In order for us to reject the null hypothesis at a p=0.01 level, the 2000-member test group would need to see under ten deaths, or a death rate of under 0.5%. A halving of the death rate is what most of us would call `miraculous'.

Now let's say we had 2 million people in the test group instead of two thousand. Then we could reject the null hypothesis at the 1% level if, instead of observing the 20,000 deaths expected, we observe only about 19,680--a death rate of 0.98%. A 0.02% drop in death rate is beginning to look like something that could actually happen, and it would be both socially and statistically significant at this scale.

The difference is what statisticians call power. How much true difference does there need to be between one group and another before the test is able to actually detect that difference? To give a physical metaphor, some people have crappy vision and can't distinguish letters from a distance; others have powerful vision and can easily tell the difference. With catastrophic events, we're trying to read very faint lines, and so we need to gather lots of data to reasonably detect them and say that the control's line is definitely different from the case's line.

The RAND test is low-power because one rate has to be fully half the other before it can state a difference with confidence. The PA data is also low-power for the same reason. I leave the stats as an exercise to the reader, but the number of surgeries is around 20,000, and the rate of various measures of success for various procedures range from 1.9% for death rate to 19% for readmission (reading from page one of the study). This means that the power will be better than RAND's, but still not incredibly good. The PA study includes no regressions or hypothesis tests, which one could argue that they were right to do.

Interpreting a low-power study
So what are we to make of a study that gathered data, but did not gather enough data to state anything with statistical significance? Well, we're back to eyeballing it and using our intuition. For more money you get a longer hospital stay; people who get free health care see the doctor more; people like to overcharge insurance companies. These things make sense, so when we see that there is no evidence against these intuitive statements, then I suppose we can bolster our belief a bit. But then there are things that are counter to intuition, like how health care is irrelevant to human health, or additional services are completely useless, which seem a bit counterintuitive. Got cancer? Walk it off.

There seems to be a popular belief that studies are standalone events that conclusively prove or disprove; newspapers are happy to push this perception because 'a recent study proved a sexy fact' sounds a lot better than 'a recent study marginally raised our subjective belief regarding a certain sexy fact'. But that's what a single study does: it marginally raises or lowers our confidence in the truth of a statement. I've already discussed this extensively in the context of creationism. A study that has low power certainly contributes less to the debate than a powerful study would, but it is still data that we can scrutinize, in conjunction with all the other data that we have about the issue.



[link][2 comments]

on Wednesday, June 20th, me said

Statistics. You ever read a paper finding that "throwing" money at shitty schools doesn't help test scores, so we shouldn't fund inner city edu more? Learning in a class of 13 is the same as 30. That's why all the rich school districts don't bother "throwing" money at the schools either. wait a second....

on Monday, July 2nd, Miss ALS of San Diego, of course said

Ah, power. Did they do a t-test? (hell, even excel can do this one, and it solves the problem of power in one shot, as n is taken into account).

Of course, i just love love love anecdotal data; see our discussion of maureen dowd for proof of this.

anyway, good entry.

i'd write more, but i need to walk off some cancer...

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


20 October 07. Definition of a crackpot

[PDF version]

A crackpot is somebody who does not respect the prior literature.

The easy way to not respect the prior lit is to ignore it. For example, a few months ago, a pal of mine asked me to review a paper from a physicist about an exciting new statistical mechanics approach to economic decisionmaking. Several pages of symbols later, I realized that he was describing the logit model, which had been written up by economists somewhere around the 1960s. In fact, the author even cited the standard citation for the logit model (from the mid-70s), but failed to make the connection. He just assumed that what he had was new, and did only a cursory browse throughout the economics literature before proclaiming as much.

A great many blog comments, and comments while made over booze, are of exactly this form. You've got what may be a generally smart person, commenting on somebody else's field. But when chatting with pals, it's OK to not have the existing literature on hand, because everybody in the room is aware that nobody has any authority, and that nobody's comments on global warming or what-have-you really makes any difference. In that context, you're no fun unless you're at least a bit of a crackpot.

In a sense, every grad student is a crackpot because they just haven't had time to really read up. However, most but not all are able to recognize this and act accordingly. They inquire of others, `I have this nifty idea; how would you propose I fit it into the existing setup?' Those who say things like `I have this nifty idea, and it is new and wonderful' are readily (and most of the time, rightly) accused of hubris.

Root causes
Beyond just ignoring the literature, there are the self-proclaimed revolutionaries who go out of their way to disdain the prior literature (e.g., FT Marinetti). History talks a lot about people who brought about fundamental change, and doesn't say much about people who made incremental changes. But the biographies often fail to mention how much time the revolutionaries spent reading the literature. I may have mentioned these guys before, but Thomas Edison didn't invent the light bulb, and never claimed to; he just made (significant) improvements on the filament materials. Albert Einstein's general theory of relativity was pretty darn original, but it was based on the Lorenz Equations, which are not named the Einstein equations. Einstein was famously an outsider--he was a patent examiner, not a physicist--but he enlisted the help of other prominent physicists in hammering out his theories.

The individualistic mythos is allegedly a U.S.A. thing, but the world over has people who strive to be as self-sufficient as possible--meaning that they go out of their way to not read the literature. Further, we all have the tendency to think we're smarter than everybody else, which often translates to either not bothering to check the literature, `cause it's all dumb, or dismissing it quickly as not on track.

All of this borders on crackpot. Those individualists in the Upper Peninsula are like this, as are the people who write lengthy tracts criticizing the status quo. Every class of grad students has one or two of `em, who reject the literature out of hand. I think I used to be like that, once upon a time, when I was less cynical than I am now.

So the other means of being a crackpot is knowing that there has been prior work done on a subject, but just assuming that it's all stupid. Those other people just don't `get it' the way you do. I.e., everybody else is dumber than you are.

The root of such a belief is a massive failure of theory of mind--the ability of non-autistic and non-asshole people to develop a model of what is going on in other people's heads. Which is why crackpot is not a compliment.

Most patent holders are very level-headed types, but crackpots also flock to the patenting world, because the concept of a patent is built on the idea that the recipient of the patent is smarter than everybody else and deserves to be paid for being a revolutionary. Folks like that are why I stopped reading the comments on patent blogs.

The academic response
So you could read the literature review of the typical paper as the `prove you're not a crackpot' section. Indicate a modest familiarity with the literature and a respect for those who came before. Then, when you say crap that's completely off the wall, at least the reader knows that you are aware of the context in which you're saying it.

Academics have a crackpot sensor, and it is often very sensitive. But the requirement that you be at least modestly versed in the literature can easily create the sort of inner-circle feel that many academic organizations have.

It does go too far sometimes: some people like to characterize an academic journal as an `ongoing dialogue,' meaning that your contribution is irrelevant unless it centrally focuses on problems presented in the prior lit. This is where that sense of cliquishness starts to appear--especially when the lit you're supposed to be respecting is considered to consist of the writing of a handful of star academics.

All that talk about the value of interdisciplinary research goes out the window if the key criterion for credibility is being well-versed with the existing literature by a few people. Being that I'm kinda multidisciplinary, what with articles in genetics journals and law reviews, I get the crackpot glare all the time, even though I definitely know not to claim that my ideas are not in the lit.

Anyway, there's a balance to be struck. There are people who show up to economics seminars from their work as a know-it-all to just present their single piece of intuition as fact. They are annoying. But there's a long ways between those guys and the people who have an active interest but are not entirely up on the inner circle's writing.

But the policy implications are easy for those of you dealing with academics on a regular basis, because it can be easy to not set off their hypersensitive crackpot sensors: just respect the literature.



[link][a comment]

on Friday, August 3rd, rnhjetjer said

hrthw4hw35hya3h35h5h45h53hqw3q34hq3hyyq3h53q531h53qbum bum bum bum bum bum bum bum bum bum bum bum bum

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


14 December 07. Academia doesn't scale

[PDF version]

The current academic model is based on the academic societies of the 18th and 19th centuries, e.g., The Royal Society of London for Promoting Natural Knowledge, est 1660. These societies were everything you'd imagine: a bunch of wealthy white guys, all best of pals, generally brilliant, debating and experimenting. Many wore powdered wigs. Journals were often filled with cleaned-up letters or speeches given by members.

This is the root of the modern academic journal. Unfortunately, the model doesn't scale from a gentleman's club of up to a thousand members to the global mass that is modern academia. It reveals a specific worldview about where value comes from, which is not as advertised.

Ad hominem caveats
This little column was hard to write because I don't want you to think I'm ranting about how all academics are evil. No, the simple thesis is that the current peer-reviewed journal system is ill-suited to the modern world. It works well for the small academic societies of the 1800s, and needs to be dropped now that the academic community is global and decentralized. After the initial statement of problem, I'll get to some suggestions of how modern academia could evolve past the 1800s.

As for my own experience, I've got my share of rejections from academic journals on the one hand, and a couple of papers published (or on their way to being published) on the other. I've also written two peer-reviewed books, partly because the peer-review process for books overcomes some of the small-circle problems that journal peer-review suffers. So I don't comfortably float around the peer-reviewed world, but I'm not a total outsider either.

That said, we can get to the key conflict, which is basically the eternal fight between the small-circle meritocracy and egalitarian democracy, except sometimes those with the most merit aren't in the small circle.

The conversation
The academic literature is frequently described as an ongoing conversation. This is a fundamentally different concept from being a repository of the best current work. The ongoing conversation story neatly follows the tradition of the society journals, which were sometimes literally the record of ongoing discussions held by the individuals in person or via letters.

As we've all experienced at parties, it's hard to walk in on an ongoing conversation, especially walking in on a conversation with people who don't know you. The typical first response is not my, what an interesting new perspective but who the fuck are you?. The same holds with a submission to an academic journal: the first question is who you are, where you're coming from, what perspective you have on the ongoing conversation, and then finally what you actually have to say. Law reviews, which are not really peer-reviewed in the traditional sense, take the direct approach and typically require a résumé with submission.

For a small society, the who-are-you stage of things goes pretty quickly, because if the society's members are willing to talk to you then you've already passed a crackpot test. But now that we're an egalitarian world and journals accept paper submissions from anybody, the editor needs to start with filtering crackpots, and then move on to evaluating the merits of the work.

Fear of crackpots
The `ongoing conversation' model is inherently conservative. After all, sometimes the topic of conversation really needs to change. To give a concrete example, there is a thread in the voter turnout literature over the claim that people turn out based on purely self-interested means: they find the likelihood that they'd be the pivotal voter, multiply by the expected personal gains from one candidate over another, and then choose to vote accordingly. The ongoing conversation is over reconciling this claim with the data, which contradicts it every step of the way. Gee, maybe the theory, which is countered by both intuition and empirical data, is just wrong. But it remains the baseline model, and if you want to write a paper about why people turn out to vote, you need to spend some portion of your time engaging in this ongoing conversation that really should have died a decade ago.

The conservativeness is rooted in a fear of crackpots, as I'd discussed a few columns ago. Frankly, the number of times that somebody walks in from the proverbial left field and says something earth-shattering really is rare. So I agree with the general academic consensus that a paper that has no (or minimal) literature review of any sort is certain to be uninformed. However, that doesn't mean that a paper that is trying a new direction or using new techniques is necessarily wrong. Nonetheless, a change of topic that doesn't fit into the ongoing conversation has minimal odds of getting through peer review.

Speaking a New Word
I submitted a model of network formation to Economics Letters, which gave me a one-paragraph rejection, the gist of which was: a model of network formation isn't Economics. I suppose this fact would be news to the editors of the Review of Network Economics. The intent of this example is to point out that the humans editing the journals evaluate the work based upon what they're comfortable with, not based upon some sort of objective criterion that they as humans aren't capable of achieving.

We can go back to my favorite question: where does value come from? It doesn't come from new knowledge about the world, but from the belief by other humans that such knowledge is important and relevant. That is, the peer review system establishes science as firmly subjectivist and relativist, no matter how much it pays lip-service to objectivity.

Or, we can go back to what I consider to be the fundamental rule of nonfiction writing: the work should make the reader feel smarter. The reader should know how to do something s/he didn't know how to do before, or learn facts and a means of structuring them that the reader hadn't known before, or otherwise make the reader feel more secure about his or her existing knowledge. In the political context, people are much more likely to read articles and books that agree with them than works that oppose them (and yes, this phenomenon has been extensively documented in peer-reviewed journals).

In the specific case of a peer-reviewed journal, it's not just anybody you're trying to make feel smarter, but the referee, who already has an established worldview and an established set of tools that he or she learned in graduate school. The process of presenting a new method, like presenting a network model to an old school economist, is exceptionally difficult, because new things are threatening and make the reader feel stupid until the reader has had time to absorb the full implications. Meanwhile, the rift between `useful' and `useful to the current members of the ongoing conversation' grows.

Let me again clarify that this isn't about being evil, it's about a well-known fact of human nature: we are more comfortable embracing the familiar than the foreign. [1] explain that this is not cronyism or an `old boy' network, but the tendency to pick people from your intellectual school of thought over outsiders--which can often look a lot like cronyism and an old boy network.

Anonymity
Anonymous review may make sense in a small society, where you may have to reject your best pal. In cases like these, you always know who wrote the rejection anyway. But when the author and reviewer may be continents away, anonymity just produces low-quality reviews.

Anonymous peer review doesn't scale, but it is necessary to perpetuate the system as it exists today. Pretty much every academic has a story about a rejection they got that was rude to the point of humorous. My own rejections have just been boring and typically indicated that the review didn't read the paper. E.g., I once received a rejection--after a year and a half wait--based on how I numbered my theorems. A rejection I received after I started writing this essay chided me for failing to discuss the multiple possible modes in a probit model--except the probit likelihood function is globally concave, and therefore always has exactly one mode. But hey, there are no peer review reviewers to check the facts or merits of the anonymous review.1

This one has at least a partial solution, because anonymity is an endogenous social norm: we can sign our reviews. I do, and make a point of never saying anything in a rejection letter that I wouldn't say to the author's face. The problem with this, and why it's not the norm, is that I can't be lazy. I can't reject based on ad hominem excuses or theorem numbering or a vague sense of dislike. In short, I can't pretend that I'm being egalitarian and working from the merits of the paper while actually working to maintain a closed society.

Competition
In both small society and the globe at large, there's always a conflict between striving for the greater good and for individual advancement. For a system where peers review your work, this is a central conflict, because the people most qualified to evaluate your work are your direct competitors.

In a small society, your direct competitors are probably also your pals. Any business organization is naturally something of a social organization as well, and apart from some famous disputes (e.g., Liebenitz v Newton), we find that all those letters between famous colleagues were generally collegial. So when somebody accepted an article from a competitor, it was at least from a competing pal, who would be able to give a leg up next time in return.

In a global society, the author and the person reviewing a paper are just competitors. The one who is doing the review was selected because s/he is considered to be part of the established system, and has something to lose, like real live funding, by expanding the membership of the established system.

I went to a delightful conference the other day regarding a new paradigm for demography. When the question of getting funding from the NIH came up, a couple of people suggested just not bothering. The people who read the grant applications are the people who were most successful in the last decade, and therefore are the ones who most stand to lose from a change in paradigm, and are in some ways the least qualified to evaluate a change of topic from their own life's work.

The alternatives
Now and then, people tell me that the peer review system is a revered part of science. This is even political--the PRISM coalition is a group of academic publishers who oppose open access journals under the presumption that it is the private, traditional peer review system that ensures high quality in publishing. The claim as I stated it here implies that peer review has evolved into what it is based on centuries of refinement and improvement. Rather, it's a throwback: it's a system that primarily emerged among small academic societies and has entirely failed to adapt as modern academia stopped being about a few inner circles and became an open, global meritocracy.

So, what can we do? There are already many online repositories to be had. My favorites are the Arxiv for math/stat/physics and the self-descriptive Social Science Research Network. The SSRN is especially important because of the absolutely pathetic speed of peer review in the social sciences. That paper above that got rejected over nonexistent nonconvexities in the probit (it's December): I submitted that paper in January. Amusingly enough, the reviewer criticized my literature review for not citing papers published in May. So if we relied solely on the formal journals for social science research, the entire system would quickly grind down, as everybody would be a year behind all the time. Instead, we spend more time at places like the SSRN.

So the SSRN is already eating the journals' lunch with regards to the work of archiving and dissemination. But the SSRN lacks peer review, and an endorsement system is still valuable and important. It's still the case that 90% of everything is crap, and some papers are more important, better written, or otherwise of higher quality than others. We humans with limited time on this Earth need some sort of guidance toward what is worth our time.

Could Arxiv and the SSRN implement peer review? Sure, in a heartbeat. Especially if we give up on the gentleman's society rule of anonymity, a paper's web page could include endorsements or comments from others. Readers will then have more than enough to evaluate whether the paper is useful, accurate, and so on. Depending on how reviewers are assigned, authors who write about relatively new methods may be more likely to find another party who doesn't feel dumber when confronted with that specific method. The Arxiv already has a very weak endorsement system, but it doesn't yet provide as much information as users need.

We'd like authors to revise based upon comments and improve things accordingly; which means that there'd need to be some sort of revision control system in place. The author may have the right to publicly respond to public peer review, in which case the ongoing conversation would happen right there on the page. Since a peer review is now an invitation from the editor to publish a short article, colleagues now have a half-decent incentive to actually do peer review beyond the vague sense of responsibility that is the sole, insufficient motivator now.

And yes, people suck, and there are bad academic apples who would say mean things and try to ruin the system for everyone. But this is still a system that restricts commenting access to named and identified peer reviewers, not YouTube. All of these details are a low-grade kind of problem which SSRN and Arxiv could easily surmount with a few days of coding and some vigilance on the part of the editors.

A system like this would also take the semi-sacred significance off of peer review, which is a good thing. The popular media often refer to peer-reviewed papers as if they are unquestionably valid, and not-peer-reviewed papers as necessarily pseudoscience, but with all the problems underlying the system above, the signal is not so clear. A public endorsement system would guide the reader toward good papers but not imply that the paper is the gospel truth--just that two or three knowledgeable but fallible humans found it to be of high quality.

I expect that the concept of a paper--a single unit of scholarship that others can read and refer to--will continue to exist, but its delivery and evaluation will have to change. Delivery has already changed: nobody goes to the library to pick up dusty bound volumes from the 1800s, since they're in PDF format on Jstor. Nobody in the current ongoing conversation of social science even bothers with new journals as they are mailed out parcel post, because they're just the archiving of research from a year or two ago. The archiving process will be online no matter what.

The endorsement system as it stands will live a lot longer, because it clearly benefits the incumbents, provides a means for the small inner circle to keep itself small, and provides a shield of anonymity that many reviewers continue to use as a crutch. Especially in social sciences, this is a sad state of affairs: we have a dozen journals devoted to mechanism design and making sure that people's incentives are aligned with our overall social goals, and yet we still base decisions on anonymous comments from those who are most likely to lose funding and relevance to something new.

[1] @articletravis:collins, author = Travis, G D L and Collins, H M, journal = Science, Technology, & Human Values, number = 3, pages = 322-341, title = New Light on Old Boys: Cognitive and Institutional Particularism in the Peer Review System, url = Jstor link, volume = 16, year = 1991,



Footnotes

... review.1
A not-anonymous peer points out that the editor should be checking the reviews for quality and acting accordingly. However, part of the referee's job is to save time and effort for the editor, and editors are generally inclined to trust the referee, so human laziness and trust generally prevent editors from overturning all but the truly worst reviews. I did once have an editor who told me that a referee report was so bad that it indicated more that the referee was a crackpot than that there were problems with the submission, so the editors are not entirely asleep.



[link][no comments]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


02 July 08. Still just parametrized models

[PDF version]

You will recognize Wired as a heavy-paper glossy magazine, owned by the same company (Condé Nast) that owns Ars Technica, Glamour, Modern Bride, Teen Vogue, and Bon Apétit.

A few months ago, it ran an issue whose cover story claimed that everything you knew about environmentalism is wrong, and you should do contrarian things like buying a rugged SUV instead of a hybrid, live somewhere where you'll run your air conditioning 24/7, and so on. A great many people commented on how ill-founded the recommendations were.

However, there were at least a few points that were kinda true: it is generally ecologically cheaper to run the air conditioning than the heater, but there are many places and many houses where you don't have to run either. My brother lives in San Diego and just runs a fan from time to time. San Diego sprawls, but he shouldn't buy a hybrid to get around--he should buy a bike.

This month's issue vends paper via the same revolutionary formula as the antienvironmentalist issue, but gets none of it right at all. In fact, its own examples sometimes support the opposite conclusion. I'm writing a response, despite the don't feed the trolls rule, because the authors are actually making very common mistakes, which have been made repeatedly over the last several decades, so this is a springboard to discuss a few other scientific revolutions that didn't happen.

This month's declaration: The End of Science(!!). The guy who wrote about the End of History was maybe right for a year or two--The Lull in History(!!). But Wired's scientific revolution doesn't have half as much going for it. Frankly, the other Condé Nast publications better maintain credibility by just offering 15 NEW HAIR AND SEX TIPS every month.

The basic claim is that having petabytes of data is fundamentally different from having smaller amounts, to the point that the traditional method of developing and testing a model is somehow no longer relevant. In the words of the lead essay, “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”

Rather than reading Wired directly, I recommend that you instead read KK's response to Wired, which is much more coherent, although still a bit hyperbolic. It uses the word pioneer.

Some examples
Back to Wired, it presents a long series of examples where people develop and test models using large data sets. I could write a paragraph about how every last one misses the mark, but I'll just give you three that should give you a sense of how inquiry, data, and models interact and how that differs from the End of Science story.

What they're trying to say above is that we can program our computers to just suck in data and spit out correlations, and that has meaning, and is outside of models. More or less: we give it data, and the computer thinks for us and draws conclusions that are true but beyond our comprehension.

We'll start with an example to show you what we're not talking about:

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

You can ask Wikipedia about shotgun sequencing, and it'll tell you that it is based on the same genetic model everybody else is using, but uses a novel random component to gather a lot more data a lot faster than prior methods could. That is, this is the type of science we call gathering data. Where and how to look has always been advised by some human-sensical story, and observers have always striven to let the machinery work and not judge the data while gathering it.

No naturalist feels the need to prove the causal mechanism underlying a new frog before declaring the new frog's existence, so this says nothing about the rise or fall of causal mechanisms. But it is certainly true that methods like these are giving us an order of magnitude more data. We want less restrictive models that will let these reams of data speak for themselves as much as possible.

So let's move on to what happens after the data is gathered: taking action or deriving meaning from data. Any action by Google is taken by tech enthusiasts as divine, so it is naturally a key example (or three):

Google's founding philosophy is that we don't know why this page is better than that one: If [sic] the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required. [...]

This makes two statements: Google doesn't primarily rank quality via content analysis (i.e., computer-reading the page and evaluating the relevance or quality of the words on the page), but it does uses a simple, human-comprehensible model relating page relevance to incoming links. That is, it doesn't use a literal content model, but it does use another link-based model. We'll see this pattern of passing on one model for a `looser' model a few times more below. Google's page ranking model even has an underlying causal story, that high quality content causes people to link to the page.

One last example, from this page, about political micro-targeting. I conclude with this one because it's the only one Wired gets right.

As databases grow, fed by more than 450 commercially and privately available data layers as well as firsthand info collected by the campaigns, candidates are able to target voters from ever-smaller niches. Not just blue-collar white males, but married, home-owning white males with a high school diploma and a gun in the household. Not just Indian Americans, but Indian Americans earning more than $80,000 who recently registered to vote.

This is known as “data mining,” and has its formal origins probably thirty years ago, or so. Being from the pre-Wired dark ages, it is very model-dependent: claiming that elements of the data set are correlated to each other is a model, and searching for the best correlation is a series of model tests, often done using the traditional hypothesis tests they forced you to learn in undergrad stats. Your typical data mining textbook includes lots of other models, including overlapping categories, trees, separating hyperplanes, and other very structured forms.

To clarify, think of what a model-free search would really consist of. Think of all the ways that our politicos could handle this data: they could look for the political preferences of people with blood type AB positive, or the product of age cubed times the cube root of sushi consumption for women or the 3.2nd root of sushi consumption for men, or the preferences of people whose house number starts with a 4. Rest assured, they ain't wasting any processor time testing the 3.2nd root of anything, even though it would be as valid to the computer as any other list of numbers. Instead, they set a framework such as a hierarchy of characteristics, and set the computer to find the best such hierarchy.

Wired gives an example of data mining airline prices. There was a period where airlines sent signals for the purposes of colluding on prices using the cents part of a price, so there really was a pattern among tickets whose price has a four in the dimes place (or whatever). I have no idea how that pattern was spotted. U.S. ticket prices are now in whole dollar amounts by law.

The pattern in the data
People marketing products such as toothpaste or politicians love data mining. After all, it's a results-oriented system that only asks how many people purchase the product, not why. Thus, it's perfect for non-causally oriented analysis, and has been for decades before Wired declared it to be a paradigm shift. But every data mining textbook is heavy on causal models. This is not a contradiction: the model is just not where you expect it to be.

Or, consider the field of `nonparametric statistics'. By that, we mean writing down models that aren't the one- or two-parameter models you get in the back of stats textbooks (Normal, Poisson, Binomial, ...). Instead, a typical procedure defines a bar chart with maybe a hundred segments, and then estimates the heights of all 100 bars. Great, so this `nonparametric' method has a hundred parameters to fit instead of two.

All of the examples here have a similar flavor: we don't specify a tight model with only a few parameters, but instead a loose model which may need a million little parts to be specified: instead of a broad regression on a few variables, calculate a different value from scratch for every web page, or every combination of age/race/gender. But that doesn't mean there's no model, or that you've somehow escaped the paradigm of describing a human-sensible model and then asking the computer to fill in parameters from the data. Also, it doesn't save you from the problem that you can fit a loose enough model to any garbage--more on this next time.

Or consider the case of agent-based modeling, which is a hair's breadth from simple simulation. This was trendy in the 1980s because of the new study of chaos and all the resultant poster and calendar sales. All the rules of the agents in the simulation or the steps in your chaos model are all very simple and easy to understand, but there's no way to know what it will do in the end but to follow the iterations of the model to their computer-calculated conclusion. We can now repeat everything stated above: the outcome is beyond small-scale parametrization, and causal mechanisms are hard to come by (otherwise we wouldn't need to follow the simulation along, but could predict the outcome). But on another level, it's still just a model: simple rules and typically a set of parameters that can be tweaked as desired.

For all of agent-based modeling, chaos theory, nonparametric stats, and whatever the fuck Wired is talking about, some proponents trumpet how their new method is outside of the parametrized small-scale model that we had been contending with since the advent of modern science. But upon further inspection, we find a framework of assumptions that is really just a model pushed behind the curtain, and we find that the final goals of the data search are a set of numbers or relations, which is another way of saying parameters.

One retort is that the framework underlying the computer's search for parameters for a regression model is a model, but the framework underlying the computer's search for parameters in these complicated systems is a meta-model, or just a set of rules, or a constraint set, or some other means of avoiding the word model. But this is semantics. When we sit down to the computer to fit the old-school models to data or fit the broad meta-heuristic-constraints to data, we do exactly the same thing, albeit with more or less typing.

I was heavily involved in the writing of a book on computational methods for models like these, aimed at treating a range of models as broad as that described here. It opens by declaring that its goal is to estimate the parameters of a model with data. Save perhaps for the pure data-gathering exercise, that phrase describes every example here. In every case, we're assigning a human structure with a finite number of levers, and relegating the computer to finding how to best position the levers. You can ask (and may benefit from asking) the same questions of any study: what is the underlying framework, and its underlying causal story? What parameters are being tweaked and/or output? How do you know when the parameters are good so you can stop searching?

From a philosophy of science perspective, nothing seriously new is happening, save for an increasing trust when the machine gives us a million parameters instead of two. From a practical perspective, the engineering advances are clearly incremental: a question of distributing computation among PCs, managing databases, and finding ways for us humans to comprehend and take action on all that computer output.




[link][a comment]

on Saturday, August 9th, Angelique said

Woo wee. I wish I was smart enough to know what the heck you're saying here but I'm pretty sure I agree! I threw up my hands at the last flipping issue of Wired. Well I cursed a bit first, then threw up my hands and threatened (to myself) not to renew my subscription. The same ridiculousness has infiltrated my off-and-on profession, marketing. The we-don't-know-why-people-do-what-they-do-but -they-do-it way of marketing is short-sighted. Sure, maybe the data overwhelms the need for theory but if human brains don't bother trying to make sense of the data, you're just chasing your tail because the target is always moving. Most companies, though, seem dee-lighted to run around in circles chasing data. I say let them, the rest of us who know better will prevail.

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


06 August 08. The two sides of the statistical war

[PDF version]

There is a little war in the statistical world. Like other little wars, like Mac vs PC or Ford versus Chevy or Protestant versus Catholic, everybody who isn't on one of the teams has no idea how to differentiate between the two sides. Also, there is no resolution to the central question.

John Tukey (1977, pp 1-2) gives a metaphor of the detective and the judge. The detective gathers all the evidence he can, regardless of whether the evidence will be admissible in court or whether it proves guilt or innocence. He just compiles a thick a notebook as possible and worries about sorting it out later. The judge does the sorting. She is bound by law to ignore some evidence, and is comfortable ignoring most of the detective's notebook as irrelevant to the final, narrow question before the court.

Data-oriented inquiry has a very similar division, of descriptive modeling and hypothesis testing. The descriptive modeling step simply gathers information and puts it into a human-comprehensible format. The hypothesis test uses the strict laws that you forgot from statistics class to make a more objective statistical claim.

In the last episode, we saw many examples of descriptive modeling: take in all the airline prices, and list all the patterns you--or a computer--can find. Find the smallest demographic/marketing subgroup who all want to vote for Obama. Observe that pesticide use has been going up with time, and cancer rates have gone up with time.

There are two steps to take from there, one of which (developing a causal link) I won't talk about until next time. The main step is the hypothesis test, wherein you come up with some means of verifying the claim that the relationship you just found is what you claimed it was.

We need those extra steps because correlations could be sheer coincidence, meaning that they may reflect a true statement about the data at hand, but we shouldn't rely on them next week, or claim that there is some causal story that made that correlation happen. Stupid coincidences happen all the time and are easy to manufacture.

The problem with all our wonderful technology, however, is that as the power of your relation-searching machinery goes up, the power of your hypothesis testing diminishes. Here are two questions:

Randomly draw a person from the U.S. population. What are the odds that that person makes more than $1m?

Randomly draw 350 million people from the U.S. population. What are the odds that that wealthiest person in your list makes more than $1m?

The odds in the second case will be much higher, because we took pains in that one to pick the wealthiest person we could. That is, the first is a hypothesis about just data, the second is a hypothesis about an order statistic of data.

Now say that you have a list of variables before you.

Claim based on intuition that A is correlated to B1. What are the odds that your claim is OK with more than 95% odds?

Write down the best correlation between A and B1, B2, ..., B1, 000, 000. What are the odds that your best correlation is OK with more than 95% odds?

With a big enough list of variables, you are guaranteed to find a correlation (or any other model) that passes any hypothesis test you want.

You've read stories like this before: researcher inspects the data very carefully, eventually stumbles upon a relationship that works, thinks about how it makes sense that those two variables are related, and then publishes. With luck, it's something quirky enough to get into the NYT, Economist, or any other pop science outlet that happily reports one-off, unreplicated studies about how a crazy and unexpected variable has an important effect on the things we care about.

And that's the core of the conflict. The descriptive camp points out that it can develop badass means of testing a thousand hypotheses, and the hypothesis testing camp points out that once they do that and pick the best correlation out of a thousand, all the hypothesis tests are basically invalid until modifications are made that the descriptive kids won't bother to make.

There are a few ways by which we can have too many hypotheses. The simplest is to just have a systematic list of a few million possibilities in need of testing. If we can get a million genetic markers from a drop of blood, which we can do, then we need to correct for that as we run a million hypothesis tests. People usually do the corrections in this case.

Before moving on to the real disasters, let me note that some people reject the discussion to this point. If variables A and B2891 are truly and honestly correlated, then that fact is true no matter whether we ran exactly one test or ran a million. There is no Heisenberg weirdness here: observing the correlations does not change them.

However, our tests and how we interpret them are changing. A hypothesis test makes sense only in a given environment, and that environment has to include the data, how the data was gathered, cleaned, and pre-inspected, and what other tests are being run at the same time. In the cookbook-format manual, none of this gets mentioned: the recipe calls for a list of numbers, mashed into a certain statistic, compared to a certain table, and you're done. But once a human observer comes along, you're already out of the textbook.

But the people who don't quite get the concept of the multiple testing problem don't get much cred. It's subtle and easy to get wrong, but people eventually work it out. If you write a loop to run every regression of a list of twenty variables against some outcome (usually GDP or some overall productivity number), then you are guaranteed to find an excellent fit to your data, and you will have no proof that what you found is any good, and nobody will respect you.

No, that's not where the debate lies.

Eyeballing multiple testing
Here's another way to get too many hypotheses: given a list of twenty variables, you can produce what is called a TrellisTM or lattice plot, which gives a 2-D dot plot of every variable against every other. It's not hard to put plots for twenty variables on a screen, and then scan to find the pair whose line is sharpest and shows the best correlation. Congratulations, you've just run 20×19 = 380 hypothesis tests. When tested more formally, the correlation you just spotted is almost guaranteed to hold, even if your data is pure noise. Or you can try any of a multitude of other visualizations that will similarly allow you to see hundreds of relations at once.

The DataViz field is trendy right now. There are a few icons of the field who are working hard on self-promotion, such as Edward Tufte, whose books show how graphs can be cleaned up, chartjunk eliminated, and grainy black and white fliers from the 1970s cleaned up through the use of finely detailed illustrations in full color. John Tukey's Exploratory Data Analysis (cited above) is aggressively quirky, and encourages disdain for the hypothesis testing school.

These guys, and their followers, are right that we could do a whole lot better with our data visualizations, and that the stuff based on facilitating fitting the line with a straightedge should have been purged at least twenty years ago.

The underlying philosophy, however, is humanist to a fault. The claim is that the human brain is the best data-processor out there, and our computers still can't see a relationship among a blob of dots as quickly as our eye/brain combo can. This is true, and a fine justification for better graphical data presentation. And hey, we humans would all rather look at plots than at tables of numbers.

A slice from plate nine of the Rorschach series of inkblots.
Figure One: If you don't see faces, you're crazy. Oh, and there's a penis and vagina in every inkblot too.

But the argument forgets that humans are so good at seeing relationships among blobs of dots that we often see patterns in static (there's a word for this tendency: apophenia). We look at clouds and see bunnies, or read the horoscope and think that it's talking directly to us, or listen to a Beatles song about playground equipment and think it's telling us to kill people. Given ten scatterplots, you will find a pattern--in fact, if a psychologist were to show you a series of ten seemingly random inkblots and you didn't see a reasonable number of patterns in them, the psychologist might consider you to be mentally unhealthy in any of a number of ways.

Better data visualization doesn't address the problem of apophenia. In fact, following Tukey's lead, the people who focus on clean testing are characterized as not seeing the value of all these full-color diagrams. They're wearing blinders for the sake of being good Boy Scouts and not seeing the trees and grass and chirping birds around them. Conversely, the testing people generally see little value in all these full-color plots and want to go back to inferring things.

So this is the current battleground in the descriptive-versus-testing war. No side can win--there are no tests for overtesting, so this is all just intuition and opinion. We can write down in a cookbook that if your data-analysis model includes a series of ten tests, you need to make such-and-such an order-statistic correction. But how do you write into a textbook model framework that you surfed charts of the data for forty-five minutes, including eight 3D plots and two TrellisTM diagrams?

Further, both sides are necessary, and both sides have valid points. So this is a perfect recipe for sniping back and forth forever.

But overcharting (and defining what that means) is not where the true problem lies.

The looming problem
After all there is a middle ground, where a person comes in with some idea of what the data will say, rather than waiting for the scatterplot of Delphi to reveal it. Then the researcher refines the original idea in dialog with the data. The closer something fits prior human beliefs, the more we are inclined to accept it, so the researcher is not on a pure fishing expedition, but is not wearing blinders to what the data has to say.

So one researcher could be reasonable--but what happens when there are thousands of reasonable researchers? When a relevant and expensive data set has been released, a large number of people will look at it. I've been to an annual conference attended by about a hundred people built entirely around a single data set, and who knows how many weren't able to fly out. With so many humans looking at the same set of numbers, every reasonable hypothesis will be tested. Even if every person maintains the discipline of balancing data exploration against testing, we as a collective do not.

Every person was careful to not test every option, so the order statistic problem seemed to be dodged, but the environment is not just one researcher at a computer, but thousands across the country, and collectively, a thousand hypothesis tests were run, and journals are heavily inclined to publish only those that scored highly on the tests. So it's the multiple testing problem all over again, but in the context of the hundreds or thousands of researchers around the planets studying the same topic. Try putting that into a cookbook description of a test's environment.

There's no short-term solution to this one.

In the next episode, I'll take this a little further.

[1] @book{tukey:eda, author= "John W Tukey", title = "Exploratory Data Analysis", publisher = "Addison-Wesley", year= 1977 }




[link][no comments]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


26 August 08. Statistics as unbearable longing

[PDF version]

Introductory logistics: This is part two; see the beginning here. Also, I just finished writing a textbook on statistical computing--about three hours ago as I write this, and I'm half relieved about getting my time back, half anxious about how long it will take before I get my first complaint letter about how the betas on page 317 should be in boldface, and half amazed that I managed to write something like that. I suppose this is a bit of catharsis after writing hundreds of pages of math and data technique.

Statistics--by which I mean all of mathematical inquiry aimed at explaining the real world, and sometimes even plain measurement--has fundamental failings for its intended purpose of allowing us humans to better understand the world.

Picking up from last time, statistics can never prove. The real world is uncertain and messy, but mathematics is pure and certain and unwavering. Mix the two together, and what do you get? An uncertain mess.

Our language is inclined toward our desire to accept things as true. Statistical language makes at least a half-assed effort--maybe even a three-quarter-assed effort--to retain skepticism at all times. A good and pedantic hypothesis test comes up with two outcomes: reject or fail to reject. This seems appropriately skeptical, and it means that if somebody is snooping around in the data beforehand, and inappropriately failed to reject, then it's no big deal: we just learned nothing from that experiment (in formal stats language, the test had insufficient power).

But this fine point breaks down at every opportunity, because we long for certainty, and statistics just won't give it to us.

The first breakdown and non-math-geeks are welcome to skip to the next paragraph is that the system is asymmetric regarding what should be symmetric hypotheses. Given two variables, we typically wind up with H0: the variables are equal, and H1: the variables are not equal. The above reject/fail-to-reject language typically refers to H0. Failing to reject H0 is appropriately indefinite, but rejecting H0 is definite, not-squishy language stating that we know the variables are not equal, because we reject the claim of equality. Since the reasearcher is probably trying to show that the variables are different, the language is slightly skewed in favor of the reasearcher. In an ideal world, perhaps we'd say that the test fails to reject and fails to accept. Then when we fail to reject one hypothesis, we're failing to accept the alternate, which has the same level of confidence on both sides.

The second problem is that even that little bit of legalese, fail to reject, is hard to keep in place for long--it turns into accept even in many stats textbooks, especially the ones with a `tude that tries to make statistics fun. And don't expect the phrase to ever appear in the newspaper: my brief search of the NYT turns up one op-ed making exactly the point I'm making here, one correct use of the phrase, and assorted cruft. The longing for certainty is just too strong to let weak language stand.

But there are benefits to accepting the weakness of statistics. If we bear in mind that statistics can not prove, then my lament last time about how all our published positive results are doomed to be too confident is not so bad. An article with a solid result from a statistical test should simply slightly raise our confidence in whatever they found. If the research was especially carefully conducted, then it will raise our confidence a lot. Perhaps another article will come by next year that bolsters our belief or cuts it down a bit.

So after incredibly tedious and careful mathematical contortions, the best result we can get is that the human reader believes the claim a little bit more.

Some people are disappointed by the inability of mathematics to touch the core of what we as humans want, and just reject the entire project. Forget all those studies: they either tell us what we already know or are a pile of sophistry that will be contradicted next week. That's extreme. Our measurements are never perfect, but we make them. We're surrounded by black boxes that we'll never be able to crack open, and situations where we know any measurement will be imprecise. Despite knowing that we'll never be able to fully and truly understand anything, we still try.

Correlation is not causation, but neither is anything else
But to really make our model good, we need to tell a story, almost invariably of the form A causes B. Unfortunately, statistics has no concept of causality.

This is one of those philosophy of science things that you could expound on forever, though I won't go into it too deeply here. But the concept of causation happens only inside the human brain. It's not something we can measure, perhaps with a causality ruler (or a more portable causality tape), and then write down that A causes B with 3.2 causal units, but C causes B with 8.714 causal units. There are intuitive ways to measure a causal claim, like saying that if A always comes before B, then A causes B; in direct correspondence, there are easy ways to break such a simple measure, like how Christmas card sales cause Christmas.

But people like stories. As kids, we're taught how the world works via causal stories, that were not just a list of incidents but were a chain of events. Because granny was ill, Ms Hood took her basket of food and went walking over the river and through the woods; because the wolf was evil, he conspired to eat Ms Hood; because Ms Hood was virtuous, she was saved. A story where a bunch of unconnected, seemingly random things happen is just not satisfying, and correlation without causation is dissatisfying in exactly the same way.

You could take the basic intuition about how causality works and build machinery to draw causal flowcharts, which give a wealth of means to reject the flowchart; look up structural equation modeling or read Perl (2000). But apply the above rule that statistics can never prove a model of the world: statistics can never prove a causal model of the world--and this case is only worse because we're not even entirely certain about how to measure or even identify causality. As with any model, stats can bolster or cut down our confidence in the causal claim, but that's where it ends.

Of course, people fake it all the time. You will rarely if ever find a newspaper article declaring a correlation without strongly implying (if not directly stating) that the statistical model showed a causal link. Get your favorite researcher drunk and he or she will stop talking about correlations and start talking about causation, even though everybody in the room knows that it's just a mathematical mirage.

There's so much that we want to understand about our world and those around us that we'll never come close to. We're just guessing at reality based on our sadly limited information, and nothing makes that more evident and visceral than statistics.

Relevant previous entries:
The one about how people often reject academic studies without consideration.

@book{perl:causality,
author="Judea Perl",
title="Causality",
publisher="Cambridge University Press",
year=2000, month=mar
}




[link][2 comments]

on Thursday, August 28th, SueDoc said

I failed to reject your mom's null, if you know what I mean.

on Saturday, December 20th, h is indeed for humans, don't you agree? said

This is poor, and by that, I mean, very, very poor. As in, not worth posting. I did not fail to etc, if etc, nor do I want to, but this is making your work too easy. Not only do you exclude the best of the best in the area you care about (Neymann...etc, although that makes MY work too easy), but you ignore even the basic premise of Bayesian analysis. Wtf? No modern theorist can afford to do that. Do you agree? Or are you too confused about your prior to answer that question? O, little Dorritt, do answer!

TS Eliot (noted bad mathema-whateverer, really only a rough philosoph) disdained causality, and argued that accumulated "causes" "become" the (true) cause. Comment?

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage:
 
 


24 September 08. Causality and ethics

[PDF version]

There's a platitude that it is ethics that distinguishes humans from the rest of the natural world. In prior posts, you've seen me say that humans are distinguished by their ability and tendency to perceive causal relationships. These two statements are closely related: without causality, there can be no ethics.

Some causal chains are obvious, even to young children: if I drop a plate, then it breaks. If I kick the dog, the dog will bite me. For those that are not so obvious, you can help your child by laying it all out line by line. Here is Joe. Joe committed a misdeed. As a result, Joe's misdeed came back to him and he suffered. Here is Jane. She committed a virtuous act, and as a result, she was rewarded for it. The end.

Person does good, is rewarded; person does bad, is punished may sound simplistic, but it is the canonical format used by most of the stories we hear or see or read. The modern version of Little Red Riding Hood (as alluded to last time), all of Aesop's Fables, the one about Snow White and the vainglorious queen, any romantic comedy, they all tie reward to the virtuous and punishment to the misbehaving. We'll get to the stories that don't riff on that theme below.

These stories help us to move up the ladder of causal subtlety from mechanical misdeeds like kicking the dog to societal issues like littering. Thus, causal stories of the form virtue reward and ill behavior punishment are really central to building a society.

It so happens that religious stories directly fit into the same structure: the omnipotent overseer makes certain that good reward and bad punishment. Where no simple causal mechanism exists, the omnipotent overseer defines one.

The lit
I think it's so completely obvious that morality is taught through causal chains that I don't feel much compulsion to provide a host of references, but let me give you one or two so you know I'm not entirely making this up.

First, we can point to Jean Piaget, an oft-cited pioneer in the academic study of child development. Among others, he wrote many books on how children develop cause-and-effect relationships, and one entitled The Moral Development of the Child (that has almost no discussion of causality). So this could be traced back to Piaget's writings circa 1930 if you were so inclined.

The intro pages to Karniol (1980) give a nice summary of the modern interpretation of Piaget's moral stories, and examples of how kids sometimes take the causal story to what we consider an absurd extreme (e.g., the boy stole the bike the bridge collapsed). She also ran experiments on about 150 elementary school children. They were read skeletal stories of the form Joe stole money. Later, Joe fell down the stairs. or Jane lied. Later, Jane fell in a puddle. There were a range of types of causality, including immanent causality (the result is because of something inside the person), asyndetic1and/or mediated causality (it was the person's action, but mediated via another force), or chance causality (which is delightfully not jargon). Chance causality explanations were basically the least popular, ranging in use among the five grades from 16 to 34 percent; mediated causality ranged from 58 to 86 percent usage; immanent causality ranged from 23 to 47%.

That's the first experiment; the final experiment, using only kids who'd given a mediated causality response in the first experiments, and a story in which the kid in the story gets struck by lightning, was able to induce a greater recourse to chance causality among the listeners (70%). But the first two experiments (and another story in the third experiment where the boy breaks his leg) still show that if there is no causal story spelled out, the brain of the listener will probably invent one. If you want more, Karniol gives a dozen or so other papers that come to similar conclusions: even the youngest kids will see a link between a person's actions and the eventual outcome when there is a relationship to be had, and will invent one when there isn't.

Variants of the story
Now that the canonical story is ingrained in us, hard, there are all sorts of variants that turn our causal expectations around. Some just make for a better story, but others begin to show flaws in the system.

The ending to Moby Dick was so gut-wrenching because it was so outside of the entire framework. I'm a bit amazed that it got published and sold well enough that we've heard of it, given how much it bucks convention.

Adult fiction is filled with what we call moral ambiguity, by which we mean that the virtuous aren't rewarded and the evil aren't punished. This is not to be confused with stories that create tension by allowing the bad guy to win halfway through, getting the princess or the thousand pounds of gold boullion both props play the same rôle in the typical story. In those half-win stories, tension comes from our knowledge that the inevitable downfall will only be worse after the temporary victory.

Many bookshelves have been filled with Dark Knight-type stories about characters of ambiguous virtue. But we humans have an easy solution for these stories: if we are firmly wired to see virtue reward, then we eventually start to see reward virtue. In logic class, it'd be a blatant error to conclude the second relation from the first, but we're not talking about logic, we're talking about how people think.

If you're an Objectivist, you learn that whatever it takes to gain reward is by definition virtuous. If you follow other sorts of commerce-oriented ethical systems, then you follow a similar but looser line. And as the cliché goes, might makes right. In the other direction, I've heard more than enough people give me a line like `it's not illegal, so it's not unethical', which in this context means no punishment not evil.2

Or, it's easy for both kids and adults to miss what the cause that led to the final outcome. It's downright cliché that the protagonist is attractive and the antagonist ugly, from which we are taught that attractive reward; ugly punishment. Add this to the last paragraph, and we find that unattractive = evil, which I find really is how a lot of people think.

If the virtuous are always rewarded and the evil always punished, then anybody who is being punished must be doing something wrong. If we see a person, or a group of people (grouped by language, size of nose, or genitalia), and find that they are doing worse than others, our brains work overtime to fill in the blank in the relation ______ punishment. E.g., if they hadn't eaten from the Tree of Knowledge of Good and Evil, they wouldn't be worse off.

Now, all those stories are really just practice for what happens here in reality, where we write our own stories. The non-fiction evening news is making a huge effort to fulfill our expectations: the evil have to be punished, and (from time to time) the virtuous have to be rewarded. As viewers, our expectations about how the world should be are very high. If the assailant doesn't go to jail, then we're left with the frustration of a story cut short just before the resolution. If their country is evil, and our country is virtuous, then there is tension until we find a way to bring about some sort of punishment for them, preferably in a manner that brings rewards to our contractors.

And so we see a great deal of our legislative and interpersonal effort put into making sure that rewards and punishments are eventually paid out, even though the only real benefit may be the sense of resolution that comes from making the world fit the stories we were told as kids.

We all have these virtue reward and evil punishment relations tatooed to the inside of our foreheads. Our parents made sure of it, by teaching us ethical causal stories at the same time that we were learning more mechanistic causal stories. If they didn't present us such stories, we'd just make up our own. But the mechanical relationships like I drop the plate the plate breaks are much more robust than the relationship between nice behavior and reward, to the point that we can easily invent unverifiable relationships, like how a pretty face and big muscles implies virtue, or spilling one's seed is evil, or that whatever person we've never met before is getting exactly what he or she deserves. The ability to develop and understand causal stories, which makes us human, gives us ethical beliefs, and allows us to construct a society, is exactly the same force that lets us dress up self-interested behavior as virtue, makes us pine for retribution against perceived slights, and nudges us to wish ill upon those who look or behave differently from our ideal.

@article{karniol:immanent,
title = "A Conceptual Analysis of Immanent Justice Responses in Children",
author = "Karniol, Rachel",
journal = "Child Development",
volume = 51, number = 1,
pages = "118-130",
url = here,
publisher = "Blackwell Publishing on behalf of the Society for Research in Child Development",
year = "1980"
}



Footnotes

... asyndetic1
Syndetic: Serving to unite or connect; connective, copulative.
... evil.2
It's not as if I know what the True and Correct ethical system is, but an ethical system that directly equates individual benefit with ethics is really just the state of nature calling itself ethics, and a rejection of the idea that we humans can develop beyond biology.



[link][2 comments]

on Wednesday, September 24th, Spoofy said

Nice post. What happens if an attractive person spills his seeds? Is this person still evil? OR does this go back to the ambiguity thing= only ugly people that spill seeds are truly evil.

on Saturday, December 20th, yeah...still human said

Moby-Dick? Seriously? The most conventional (now) of morality tales? C'mon -- Queequeg is a completely exonerated non-Christian. What more can you possibly ask for? If you want to integrate non-analytical morality tales into your plot, why exclude the most 'Christian' plot available?

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage: