Stack Overflow: Up and Down Voting Pattern Analysis

SOpedians are getting nicer as time goes on, except for the occational flair up

The kids at stackoverflow.com, most prominently Jeff Atwood, recently released the Creative Commons licensed data behind Stack Overflow via bit torrent, and I eagerly downloaded the database dump and imported into MySql for some analysis of voting patters.

Since the beta, I have always been a fan of the down vote. Many SOpedians find them hostile and mean–to the point off getting their knickers all bunched up. My belief is that they have a cleansing effect of the questions and answers. Down votes are in the spirit of (what I have interpreted the founder’s goals to be for) Stack Overflow.

After whipping up an embarrassingly crude Python script to import the voting data into MySql, I ran a simple query that gave me the daily up and down vote totals for each day. Then I graphed it all in Excel and added three new series: up-to-down ratio, 9-day up-to-down ratio average, and a up-to-down trend line.

(larger image)

My interpretations:

  • Stack Over flow went live in mid-September, hence the huge jump in votes then. No surprise there.
  • The humps are weekdays, the troughs are weekends, and the winter holidays are clearly visible.
  • For every down vote there are 10 to 12 up votes
  • The up vote to down vote ratio is increasing over time. My gut tells me that this is related to the introduction and expansion of post closing, deleting and moderator warning functionality. Or maybe “Down Voting Fatigue” sets in with many users? The ideas that maybe SOpedians are posting less junk or that they are just being nicer as time goes on are ridiculous!
  • There is a huge spike in down votes and/or a corresponding drop in up votes on 21 February. There does not seem to be any one post that sparked this. Interesting.
  • The single most down voted post is in response to What is the most spectacular way to shoot yourself in the foot with C++? with (as of 6 June 2009) 39 down votes! (This does not show in the graph…just an ad hoc ‘I wonder…’ query.)

Interesting stuff. It will be entertaining to comb over the Stack Overflow data in more detail in the future.