ArticleRank

One thing I seem to love doing is making rather large projects that take me months to complete. Recently I came up with an idea called Artic.ly, a sort-of mashup of Digg and Readability. I've now ditched that project, but one part of it, an implementation of a feature I had in mind, has stuck with me. That feature was a list of the most popular news articles, and the implementation was an algorithm I'd termed ArticleRank.

ArticleRank has stayed with me for a few days now, as something I keep thinking of. Although I now have no use for it, I hope that putting it on the Internet will both help people and help me to improve it.

The problem lies in traditional news websites. The issue is they either rank their articles by what's been most recently published, or which articles have the most views. Now obviously ArticleRank isn't perfect, but there has to be a better solution to what's currently in use.

If an article is really poorly done, but someone with a large following posts the link, that article will increase in "popularity" even though it shouldn't have. Social media is now so ingrained into most news sites that you can tweet and Facebook status update straight from the article. Why is this important? Because if you're sharing an article, generally it's because you've found it interesting, so chances are someone else will too. The next logical step is to put this social media data into the core of the article's visibility to others.

ArticleRank revolves around a number of variables that are then combined to produce the final value. The higher the end value of ar, the more popular an article is. It will probably become obvious once you've seen the algorithm that I'm not the world's best mathematician, but my goal is to provide the variables for others to mix into their own algorithms, rather than taking mine and attempting to implement it.

ArticleRank revolves around the following variables:
ar - the end result
v - the number of hits, or visits, the article has received since it was published
s - the number of times the article has been shared through social networking sites
d - decay, how old the article is in seconds (best way to get this would probably be to take away the published UNIX timestamp away from the current UNIX timestamp)
e - exaggeration, a value that could be used to separate the ArticleRank results, your "special sauce", the only value that you can influence. Should default to 1.

Now for a quick mashup of these variables into an algorithm:
ar = (((v / 40) * s) - (d / 2000)) * e

Now as you can see above the algorithm is far from perfect, but is still a massive improvement on what's currently used. If you do end up using something here, I'd love to hear about it!