|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.cf.taste.impl.model.file.FileDataModel
public class FileDataModel
A DataModel
backed by a comma-delimited file. This class typically expects a file where each line
contains a user ID, followed by item ID, followed by preferences value, separated by commas. You may also
use tabs.
The preference value is assumed to be parseable as a double
. The user IDs and item IDs are
read parsed as long
s.
This class will reload data from the data file when refresh(Collection)
is called, unless the file
has been reloaded very recently already.
This class will also look for update "delta" files in the same directory, with file names that start the same way (up to the first period). These files should have the same format, and provide updated data that supersedes what is in the main data file. This is a mechanism that allows an application to push updates to without re-copying the entire data file.
The line may contain a blank preference value (e.g. "123,456,"). This is interpreted to mean "delete preference", and is only useful in the context of an update delta file (see above). Note that if the line is empty or begins with '#' it will be ignored as a comment.
It is also acceptable for the lines to contain additional fields. Fields beyond the third will be ignored.
Finally, for application that have no notion of a preference value (that is, the user simply expresses a preference for an item, but no degree of preference), the caller can simply omit the third token in each line altogether -- for example, "123,456".
Note that it's all-or-nothing -- all of the items in the file must express no preference, or the all must. These cannot be mixed. Put another way there will always be the same number of delimiters on every line of the file!
This class is not intended for use with very large amounts of data (over, say, tens of millions of rows).
For that, a JDBC-backed DataModel
and a database are more appropriate.
It is possible and likely useful to subclass this class and customize its behavior to accommodate
application-specific needs and input formats. See processLine(String, FastByIDMap, boolean)
and
processLineWithoutID(String, FastByIDMap)
Constructor Summary | |
---|---|
FileDataModel(java.io.File dataFile)
|
|
FileDataModel(java.io.File dataFile,
boolean transpose)
|
Method Summary | |
---|---|
protected DataModel |
buildModel()
|
static char |
determineDelimiter(java.lang.String line,
int maxDelimiters)
|
java.io.File |
getDataFile()
|
char |
getDelimiter()
|
LongPrimitiveIterator |
getItemIDs()
|
FastIDSet |
getItemIDsFromUser(long userID)
|
int |
getNumItems()
|
int |
getNumUsers()
|
int |
getNumUsersWithPreferenceFor(long... itemIDs)
|
PreferenceArray |
getPreferencesForItem(long itemID)
|
PreferenceArray |
getPreferencesFromUser(long userID)
|
java.lang.Float |
getPreferenceValue(long userID,
long itemID)
Retrieves the preference value for a single user and item. |
LongPrimitiveIterator |
getUserIDs()
|
boolean |
hasPreferenceValues()
|
protected void |
processFile(FileLineIterator dataOrUpdateFileIterator,
FastByIDMap<?> data,
boolean fromPriorData)
|
protected void |
processFileWithoutID(FileLineIterator dataOrUpdateFileIterator,
FastByIDMap<FastIDSet> data)
|
protected void |
processLine(java.lang.String line,
FastByIDMap<?> data,
boolean fromPriorData)
Reads one line from the input file and adds the data to a Map data structure which maps user IDs
to preferences. |
protected void |
processLineWithoutID(java.lang.String line,
FastByIDMap<FastIDSet> data)
|
protected long |
readItemIDFromString(java.lang.String value)
Subclasses may wish to override this if ID values in the file are not numeric. |
protected long |
readUserIDFromString(java.lang.String value)
Subclasses may wish to override this if ID values in the file are not numeric. |
void |
refresh(java.util.Collection<Refreshable> alreadyRefreshed)
Triggers "refresh" -- whatever that means -- of the implementation. |
protected void |
reload()
|
void |
removePreference(long userID,
long itemID)
See the warning at setPreference(long, long, float) . |
void |
setPreference(long userID,
long itemID,
float value)
Note that this method only updates the in-memory preference data that this maintains; it does not modify any data on disk. |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public FileDataModel(java.io.File dataFile) throws java.io.IOException
dataFile
- file containing preferences data. If file is compressed (and name ends in .gz or .zip
accordingly) it will be decompressed as it is read)
java.io.FileNotFoundException
- if dataFile does not exist
java.io.IOException
- if file can't be readpublic FileDataModel(java.io.File dataFile, boolean transpose) throws java.io.IOException
transpose
- transposes user IDs and item IDs -- convenient for 'flipping' the data model this way
java.io.IOException
FileDataModel(File)
Method Detail |
---|
public java.io.File getDataFile()
public char getDelimiter()
protected void reload()
protected DataModel buildModel() throws java.io.IOException
java.io.IOException
public static char determineDelimiter(java.lang.String line, int maxDelimiters)
protected void processFile(FileLineIterator dataOrUpdateFileIterator, FastByIDMap<?> data, boolean fromPriorData)
protected void processLine(java.lang.String line, FastByIDMap<?> data, boolean fromPriorData)
Reads one line from the input file and adds the data to a Map
data structure which maps user IDs
to preferences. This assumes that each line of the input file corresponds to one preference. After
reading a line and determining which user and item the preference pertains to, the method should look to
see if the data contains a mapping for the user ID already, and if not, add an empty List
of
Preference
s to the data.
Note that if the line is empty or begins with '#' it will be ignored as a comment.
line
- line from input data filedata
- all data read so far, as a mapping from user IDs to preferencesfromPriorData
- an implementation detail -- if true, data will map IDs to
PreferenceArray
since the framework is attempting to read and update raw
data that is already in memory. Otherwise it maps to Collection
s of
Preference
s, since it's reading fresh data. Subclasses must be prepared
to handle this wrinkle.protected void processFileWithoutID(FileLineIterator dataOrUpdateFileIterator, FastByIDMap<FastIDSet> data)
protected void processLineWithoutID(java.lang.String line, FastByIDMap<FastIDSet> data)
protected long readUserIDFromString(java.lang.String value)
IDMigrator
to perform
translation.
protected long readItemIDFromString(java.lang.String value)
IDMigrator
to perform
translation.
public LongPrimitiveIterator getUserIDs() throws TasteException
getUserIDs
in interface DataModel
TasteException
- if an error occurs while accessing the datapublic PreferenceArray getPreferencesFromUser(long userID) throws TasteException
getPreferencesFromUser
in interface DataModel
userID
- ID of user to get prefs for
NoSuchUserException
- if the user does not exist
TasteException
- if an error occurs while accessing the datapublic FastIDSet getItemIDsFromUser(long userID) throws TasteException
getItemIDsFromUser
in interface DataModel
userID
- ID of user to get prefs for
NoSuchUserException
- if the user does not exist
TasteException
- if an error occurs while accessing the datapublic LongPrimitiveIterator getItemIDs() throws TasteException
getItemIDs
in interface DataModel
List
of all item IDs in the model, in order
TasteException
- if an error occurs while accessing the datapublic PreferenceArray getPreferencesForItem(long itemID) throws TasteException
getPreferencesForItem
in interface DataModel
itemID
- item ID
Preference
s expressed for that item, ordered by user ID, as an array
NoSuchItemException
- if the item does not exist
TasteException
- if an error occurs while accessing the datapublic java.lang.Float getPreferenceValue(long userID, long itemID) throws TasteException
DataModel
getPreferenceValue
in interface DataModel
userID
- user ID to get pref value fromitemID
- item ID to get pref value for
NoSuchUserException
- if the user does not exist
TasteException
- if an error occurs while accessing the datapublic int getNumItems() throws TasteException
getNumItems
in interface DataModel
TasteException
- if an error occurs while accessing the datapublic int getNumUsers() throws TasteException
getNumUsers
in interface DataModel
TasteException
- if an error occurs while accessing the datapublic int getNumUsersWithPreferenceFor(long... itemIDs) throws TasteException
getNumUsersWithPreferenceFor
in interface DataModel
itemIDs
- item IDs to check for
TasteException
- if an error occurs while accessing the data
NoSuchItemException
- if an item does not existpublic void setPreference(long userID, long itemID, float value) throws TasteException
setPreference
in interface DataModel
userID
- user to set preference foritemID
- item to set preference forvalue
- preference value
NoSuchItemException
- if the item does not exist
NoSuchUserException
- if the user does not exist
TasteException
- if an error occurs while accessing the datapublic void removePreference(long userID, long itemID) throws TasteException
setPreference(long, long, float)
.
removePreference
in interface DataModel
userID
- user from which to remove preferenceitemID
- item to remove preference for
NoSuchItemException
- if the item does not exist
NoSuchUserException
- if the user does not exist
TasteException
- if an error occurs while accessing the datapublic void refresh(java.util.Collection<Refreshable> alreadyRefreshed)
Refreshable
Triggers "refresh" -- whatever that means -- of the implementation. The general contract is that any should always leave itself in a consistent, operational state, and that the refresh atomically updates internal state from old to new.
refresh
in interface Refreshable
alreadyRefreshed
- s that are known to have already been
refreshed as a result of an initial call to a method on some
object. This ensure that objects in a refresh dependency graph aren't refreshed twice
needlessly.public boolean hasPreferenceValues()
hasPreferenceValues
in interface DataModel
public java.lang.String toString()
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |