Stack Overflow: Simple Data Dump Download Archive
7-zip or bit torrent an inconvenience for you? This page is for you!
The folks over at Stack Overflow publish the cc-wiki licensed data dumps as a .7z file via bit torrent. For some, this poses a small logistical problem. Thanks to xtendx's large pipe, I've elected to host these data dump archives on one of our servers. These downloads won't even be noticeable for us.
Because the data dumps are a snapshot, and do not contain anything like a change log or history, this page will contain all the dumps over time as they are released. So far, that is only two dumps, but the intent seems to be one a month.
The data dump files have also been "trans compressed" from 7-zip to bzip2'd tar files. Those folks on a *nix system will probably appreciate this.
I hope this makes someone's life a little bit easier. If so, let me know!
My first Erlang program: Hello World!
Also, 'Hello Erlang!' and 'Hello YAWS!'
Sigh...It's not every day that a programmer tries out a new language, but that is exactly what I did this evening! A few months ago some of the hype around Erlang caught my eye. Specifically, Erlang's approach to parallel processing and multi-threading are interesting to me because 1) multiple cores are becoming the norm and increasing in number, 2) the existing approaches by popular platforms and architectures, including my beloved Java, may not scale out as well with more cores, and 3) streaming media servers, which butter my bread, seem like a great application for Erlang.
The first thing I did was download the source to Erlang, compile it, and then install. Tim Dysinger has a wonderfully concise blog post (Compiling Erlang on Mac OS X Leopard from Scratch) which I have reproduced below slight altered to my way of doing things:
And with that Erlang is ready to go. Next I found a great "Hello World!" blog post from Edward Garson's blog: The *Real* Erlang "Hello, World!". Garson has an opinion on what should be a learner's first Erlang program, that it should use the Actor Model, a core approach of the language. So, the code:cd /tmp
wget http://www.erlang.org/download/otp_src_R12B-5.tar.gz
tar xzfotp_src_R12B-5.tar.gzcd otp_src_R12B-5
./configure --enable-hipe --enable-smp-support --enable-threads
make
sudo make install
cd ~
With a quick copy from Garson's blog into a new text file, hello.erg, in my home directory, I was ready to compile and run in the shell:--module(hello).
-export([start/0]).
start() ->
spawn(fun() -> loop() end).
loop() ->
receive
hello ->
io:format("Hello, World!~n"),
loop();
goodbye ->
ok
end.
Fantastic! Afterward, I went through the official "Getting Started with Erlang" page at erlang.org, which touches lightly on deeper topics, such as the Erlang shell, job control, distributed Erlang, and debugging.manoa:~ stu$ erl
Erlang (BEAM) emulator version 5.6.5 [source] [smp:2] [async-threads:0]
[hipe] [kernel-poll:false]
Eshell V5.6.5 (abort with ^G)
1> c(hello).
{ok,hello}
2>
2>
2> Pid = hello:start().
<0.37.0>
3> Pid ! hello.
Hello, World!
hello
4> Pid ! goodbye.
goodbye
5> halt()
5> .
manoa:~ stu$
Also on my googles this evening I found criticisms of Erlang, which is healthy to see in the community. An example would be Damien Katz's (creator of CouchDB, which is written in Erlang) "What Sucks About Erlang". It was good to see the ugly side of Erlang up front before I become too enamored.
Next steps? I need to go through the Hello World and Getting Started pages again, write some little programs, and they try to do something with YAWS, a web server written in Erlang. I have a project in mind already which is one one hand very motivating, but on the other probably means I will skip or miss some basic concepts and learn them the hard way.
What is the project you ask? Well, I'd like to write an FLV pseudo-streaming module for YAWS and potentially an accompanying FLV indexing tool. I've written one from scratch in Java for xtendx, and there are plenty of other implementations in PHP, C, etc for other web servers.
Hello Erlang!
Our DL380 G5 Servers Turn Two Years Old
Happy Birthday, My Babies!
It was two years ago that the team at xtendx setup a pair of new servers at a 'proper' hosting company, Aspectra, with quality infrastructure, engineers and network peering in support of new, demanding customer requirements. (Our previous hosting company was second rate, at best, and our entry level servers were getting old.) After shopping for a new hosting company, and selecting Aspectra, we then began analysis/negotiations regarding the hardware. Our requirements boiled down to these critical items:
Note that on both server chassis one of the hard disk drives' handle is ajar. This picture was taken during the initial setup and configuration, and the drive is disconnected to test that a) the RAID1 array does not fail, and b) the monitoring system reports a failure to the engineers.
(Larger Image)
(Larger Image)
- Hardware Vendor must be an industry leader and well supported by hosting company
- Support from the hardware vendor for a modern flavor of Linux
- The hardware itself should have a solid degree of internal redundancy
- Expandable/upgradable in the future
- There should be a 'hot backup' server
- A large, shared, performant, fully backed-up file system
- (2x) HP DL380 G5 - purchased
- A single E5320 "Clovertown" 4-core 1.86GHz Intel Xeon CPU (other CPU socket is open)
- 4GB of PC2-5300 DDR2 RAM in 2 modules (6 slots free)
- A pair of 10k RPM 72GB 2.5" disks in a RAID 1 (mirrored) configuration
- Red Hat Enterprise Linux 4
- (1x) HP MSA 1500 - shared, leased
- (4) 300GB 3.5" drives in a RAID 1+0 (mirrored + stripping) configuration
- Fiber Optic SAS connection
Front View
Note that on both server chassis one of the hard disk drives' handle is ajar. This picture was taken during the initial setup and configuration, and the drive is disconnected to test that a) the RAID1 array does not fail, and b) the monitoring system reports a failure to the engineers.
Rear View
(Larger Image)Again, note that one of the power supplies is disconnected and has no green power light. This photograph was also taken while the servers were undergoing initial setup and configuration, and the power cable is disconnected to test that a) the continues to function with only one power supply, and b) the monitoring system reports a failure to the administrators.
(Original) Storage Array
(Larger Image)Four of these 300GB drives were dedicated to our servers, giving us a usable ~575GB of space. Since the system was built up we have expanded our storage twice. Now the storage is in another chassis and we have ~1.5TB to work with.
We made a solid choice with these servers and they should last us another two or three years, even with current customer growth and additional features accounted for. While we have not upgraded the memory or CPUs yet, that is sure to happen in the next year. That said, the current platform's limitations have forced me to continuously tune my application in multiple dimensions: reduce memory consumption, reduce CPU load, and keeping response times low. This has been a good thing for both my application and my programming skills.
Anyway, "Happy Birthday", my babies!
We made a solid choice with these servers and they should last us another two or three years, even with current customer growth and additional features accounted for. While we have not upgraded the memory or CPUs yet, that is sure to happen in the next year. That said, the current platform's limitations have forced me to continuously tune my application in multiple dimensions: reduce memory consumption, reduce CPU load, and keeping response times low. This has been a good thing for both my application and my programming skills.
Anyway, "Happy Birthday", my babies!
The Personalities of Stack Overflow by the Numbers
A look for patterns in reputation, posts and profile views of the 'personalities' on Stack Overflow
After reading some commentary on meta.stackoverflow.com about the famous users of Stack Overflow, and the 'star power' of the two founders, I decided to have a look at the one metric that can easily be interpreted as a measure of name recognition: Profile Views. The first resulting graph is below and contains the Top 1000 users by reputation score.
Reputation (x-axis) vs. Votes (y-axis) vs. Profile Views (z-axis)

While much of the graph meets my expectations (Jon Skeet has by far the most profile views, the top 10 users generally have larger bubbles), there are some interesting data points:
- Rich B, the most prolific down voter and a bit of a rabble rouser among SOpedians both on and off the site, has a huge bubble for someone not in the Top 10 or even Top 100 of users
- Two other Top 1000 users, but not Top 10 or 100, have relatively large Profile View bubbles: Darron and Ates Goral. I am not familiar with them, although they are apparently popular by this measure.
Reputation (x-axis) vs. Posts (y-axis) vs. Profile Views (z-axis)

This graph turns out to be much more of a surprise. While some users' data points are very much predictably prominent (Atwood, Spolsky, Skeet, Gravell, de Lizard, and Rich B), others strike me as unusual:
- geocoin, Reputation Score: 425 (!)
- Oliver Green, Reputation Score: 1413
- AgentConundrum, Reputation Score: 1300
Beyond these three users, the correlation between both a) reputation and profile views, and b) posts and profiles views is weaker than I expected. Note the density of Top 100 user data points (red) with less than 500 posts and reputation scores well under 10,000. Interesting...
