Tuesday, May 20, 2008

Netflix Submission in 2 lines of awk

Netflix are offering a prize if you can develop an algorithm that improves the accuracy of their suggestion system by 10%.. http://www.netflixprize.com

Some people have developed elaborate schemes based around cross-correlation and clustering etc. I submitted my first results using two lines of awk:

NF==2 {print $0}
NF==1 {print 3.4}

which just scores all movies at 3.4 and gives an RMSE of 1.16.

Using the average for each movie instead of a single number gives an RMSE of 1.0533.

Of course, this has all been done.. and I could have just surfed and found it.

There is also a python library called pyflix which lets you get past building infrastructure and onto the fun of the algorithm.