Friday, April 22, 2011

Using a Different mahout Algorithm from the taste-getting-started Example

We were successful in running the taste-getting-started example as seen here: http://teachitshem.blogspot.com/2011/04/simple-java-wicket-using-mahout.html. However, mahout offers other recommendation algorithms which are so interesting to study. For instance, the PearsonCorrelationSimilarity whose parameters are similar to that EuclideanDistanceSimilarity used in the original taste-getting-started example.

How then can do this?

First, we duplicate the taste-getting-started folder say to tgs and we achieve this by issuing cp taste-getting-started tgs –R in the command line.

Second, we change the node <artifactId>taste-getting-started</artifactId> to <artifactId>tgs</artifactId> from the pom.xml file.

Third, open the recommeder-context.xml file at src/main/webapp/WEB-INF folder and replace euclideanDistanceRecommeder with pearsonCorrelationSimilarityRecommender, euclideanDistanceSimilarity with pearsonDistanceSimilarity, and similarity.EuclideanDistanceSimilarity with similarity.PearsonCorrelationSimilarity.

Fourth, run mvn package against tgs.

Fifth, in order to compare results with the original taste-getting-started we have run both the original taste-getting-started and the new tgs web applications using different ports. Thus for taste-getting-started we can use the usual mvn jetty:run-war that uses port 8080 but for tgs we must use something like mvn –Djetty.port=9090 jetty:run-war which uses the URL http://localhost:9090/tgs/movies.

Simple eh?

Monday, April 18, 2011

Simple Java Wicket using Mahout

While continuing the study of mahout, it is inevitable to write code that uses the various algorithms stored with it. So I asked a colleague to help out and one option was to start with a java wicket found here: http://blog.jteam.nl/wp-content/uploads/2010/04/taste-getting-started.zip

Using it is quite straightforward. Download the zip file. Unzip it to some folder (say taste-getting-started in the mahout machine). Using Ubuntu’s Terminal application, go to that folder and run mvn package. After packaging is complete, run mvn jetty:run-war. Once the jetty server is up, you can make use of the wicket by browsing http://localhost:8080/taste-getting-started/movies.

Although the above is the straightforward usage of the said wicket, setting it up requires some effort such as

  • downloading the MovieLens’ 100K Ratings Data Set: http://grouplens.org/system/files/ml-data_0.zip
  • Unzipping it to some other folder.
  • Copying u.data to taste-getting-started/src/main/resources/grouplens/100K/ratings
  • Copying u.item to taste-getting-started/src/main/resources/grouplens/100K/data
  • Executing the initialize_movielens_db.sql using the following command mysql --user=mysqluser --password=mysqlpassword < initialize_movielens_db.sql while at the taste-getting-started/src/main/resources/sql folder.

Oh and if you do not have MySQL yet, install it by issuing the following command sudo apt-get install mysql-server mysql-client.