org.apache.mahout.clustering.minhash
Class LastfmDataConverter
java.lang.Object
org.apache.mahout.clustering.minhash.LastfmDataConverter
public final class LastfmDataConverter
- extends Object
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
convertToItemFeatures
public static Map<String,List<Integer>> convertToItemFeatures(String inputFile,
org.apache.mahout.clustering.minhash.LastfmDataConverter.Lastfm dataSet)
throws IOException
- Reads the LastFm dataset and constructs a Map of (item, features). For 360K
Users dataset - (Item=Artist, Feature=User) For 1K Users dataset -
(Item=User, Feature=Artist)
- Parameters:
inputFile
- Lastfm dataset file on the local file system.dataSet
- Type of dataset - 360K Users or 1K Users
- Returns:
-
- Throws:
IOException
writeToSequenceFile
public static boolean writeToSequenceFile(Map<String,List<Integer>> itemFeaturesMap,
org.apache.hadoop.fs.Path outputPath)
throws IOException
- Converts each record in (item,features) map into Mahout vector format and
writes it into sequencefile for minhash clustering
- Throws:
IOException
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.