Patterns in static

Apophenia





navigational aids:
 




News ticker:





topics covered:





This site is listed on Blogwise, the DC Metro blog map, and (sort of) DC blogs.

the feedback logo. It rotates.

12 February 05.

Just a little note that I've finally put up my library of stats functions for public consumption and modification.


Before you all lose interest entirely, the name means a tendency to see patterns in static, which is the fundamental human tendency which statistics aims to combat. That is, the intent of good statistics, as far as I'm concerned, is not to uncover facts that we as humans couldn't work out ourselves, but to invalidate all of those claims that we come up with all day long but which are just an overactive imagination at work. Using statistics to uncover patterns we hadn't seen before is called `data mining', which used to be a dirty word, used to accuse those whose papers you didn't like, but management types have grown fond of it and now hire people to do it. I was once invited to a data mining conference.

But back to me. You may recall that a few months ago, before the software patent book, I'd written half a book about doing statistics in C. It received resistance because (1) I'm not a great writer and (2) it rejected the Universal Truth that all statistics must be done using a statistics package. One publisher, I recall, was rather explicit about (2).

But I still work in C, and I still do statistics, and there are other people out there just like me, I just know it. My desires are supremely simple: I just want a good, reliable toolbox. I'm not a visual person, and rarely do exploratory stuff in the way of just meandering through a data set. Usually, I have a specific question and want a nice, precise answer. That is, I want to apply a specific tool to the data.

The cute user interface of the typical stats package is also wasted on me---and it should be wasted on you too. If a reader asks nicely, you should be able to send him/her/it instructions for replicating your procedure from raw data to little stars after the t-statistic in your paper, and that means eschewing the clicky buttons for a written script.

From there, it's just a question of the language one wants to write the script in, and ya know, C is the language for me. Have already blathered about this, but here's the executive summary: I'm f.ing tired of learning new languages. I was feeling as though every time I start a new project, my former favorite language couldn't handle it. `Oh, you have limited dependent variables? Then use Limdep. Maximizing subject to constraints? That's what GAMS is for. You wanna do lots of matrix operations? Then switch back to Matlab. Except we don't have a license for it here.'

The library approach means never having to switch languages again: you'll need to find some C functions for the new trick you're trying to pull, but the ugly syntax is exactly the same, and the environment in which you program doesn't change, and if you had some cool functions in the last program that you want to reuse, you can call them directly. All that language-specific knowledge about what's easy or hard, and where you need to be careful to not mis-state things only builds from project to project. You just have to learn a sufficiently versatile language that can handle anything that may come forth, like C.

This project's contribution: a library of statistical functions at the same level as the stats packages. That is, a function which does OLS, a function which does factor analysis, et cetera. The lower-level stuff, about shifting matrices about and drawing from Gaussian distributions and querying the data, is handled by other libraries, so we don't have to worry about that.

So, dear reader, next time you find that you need to do a new statistical analysis, and your current language du jour doesn't work, give C a try. Maybe the library already has the function you need, and you're done. Maybe it doesn't, in which case you can exert the effort you would have taken to learn a new language and write the necessary function, but then you can contribute it for future use by the rest of us.

[link] [No comments]
[Previous entry: "Effluence"]
[Next entry: "Economists as mystics"]

Comment!
Yes, the comment box is tiny; write in a real text editor then just cut and paste here.
If you are a human, type the letter h in the first box.
h for human:
Name:
E-Mail:
Homepage: