org.apache.pig.piggybank.evaluation.util.apachelogparser
Class SearchTermExtractor
java.lang.Object
org.apache.pig.EvalFunc<String>
org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor
public class SearchTermExtractor
- extends EvalFunc<String>
SearchTermExtractor takes a url string and extracts the search terms. For example, given
http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
then
purpose of life
would be extracted.
From pig latin, usage looks something like
searchTerm = FOREACH row GENERATE
org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchTermExtractor(referer);
Supported search engines include alltheweb.com, altavista.com, aolsearch.aol.com, arianna.libero.it,
as.starware.com, ask.com, blogs.icerocket.com, blueyonder.co.uk, busca.orange.es, buscador.lycos.es,
buscador.terra.es, buscar.ozu.es, categorico.it, cerca.lycos.it, cuil.com, excite.it, godado.com,
godado.it, gps.virgin.net, hotbot.com, ilmotore.com, it.altavista.com, ithaki.net, libero.it, lycos.es,
lycos.it, mamma.com, megasearching.net, mirago.co.uk, netscape.com, ozu.es, ricerca.alice.it,
search.aol.co.uk, search.bbc.co.uk, search.conduit.com, search.icq.com, search.live.com,
search.lycos.co.uk, search.lycos.com, search.msn.co.uk, search.msn.com, search.myway.com,
search.mywebsearch.com, search.ntlworld.com, search.orange.co.uk, search.sweetim.com,
search.virginmedia.com, simpatico.ws, soso.com, suche.fireball.de, suche.web.de, terra.es, tesco.net,
thespider.it, tiscali.co.uk, uk.altavista.com, uk.ask.com
Thanks to Spiros Denaxas for his URI::ParseSearchString, which is the basis for the lookups.
Methods inherited from class org.apache.pig.EvalFunc |
finish, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SearchTermExtractor
public SearchTermExtractor()
exec
public String exec(Tuple input)
throws IOException
- Description copied from class:
EvalFunc
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Specified by:
exec
in class EvalFunc<String>
- Parameters:
input
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
getArgToFuncMapping
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException
- Overrides:
getArgToFuncMapping
in class EvalFunc<String>
- Returns:
- A List containing FuncSpec objects representing the Function class
which can handle the inputs corresponding to the schema in the objects
- Throws:
FrontendException
Copyright © ${year} The Apache Software Foundation