org.apache.pig.piggybank.evaluation.util.apachelogparser
Class SearchEngineExtractor
java.lang.Object
org.apache.pig.EvalFunc<String>
org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor
public class SearchEngineExtractor
- extends EvalFunc<String>
SearchEngineExtractor takes a url string and extracts the search engine. For example, given
http://www.google.com/search?hl=en&safe=active&rls=GGLG,GGLG:2005-24,GGLG:en&q=purpose+of+life&btnG=Search
then
Google
would be extracted.
From pig latin, usage looks something like
searchEngine = FOREACH row GENERATE
org.apache.pig.piggybank.evaluation.util.apachelogparser.SearchEngineExtractor(referer);
Supported search engines include abacho.com, alice.it, alltheweb.com, altavista.com, aolsearch.aol.com,
as.starware.com, ask.com, blogs.icerocket.com, blogsearch.google.com, blueyonder.co.uk, busca.orange.es,
buscador.lycos.es, buscador.terra.es, buscar.ozu.es, categorico.it, cuil.com, excite.com, excite.it,
fastweb.it, feedster.com, godado.com, godado.it, google.ad, google.ae, google.af, google.ag, google.am,
google.as, google.at, google.az, google.ba, google.be, google.bg, google.bi, google.biz, google.bo,
google.bs, google.bz, google.ca, google.cc, google.cd, google.cg, google.ch, google.ci, google.cl,
google.cn, google.co.at , google.co.bi, google.co.bw, google.co.ci, google.co.ck, google.co.cr,
google.co.gg, google.co.gl, google.co.gy, google.co.hu, google.co.id, google.co.il, google.co.im,
google.co.in, google.co.it, google.co.je, google.co.jp, google.co.ke, google.co.kr, google.co.ls,
google.co.ma, google.co.mu, google.co.mw, google.co.nz, google.co.pn, google.co.th, google.co.tt,
google.co.ug, google.co.uk, google.co.uz, google.co.ve, google.co.vi, google.co.za, google.co.zm,
google.co.zw, google.com, google.com.af, google.com.ag, google.com.ai, google.com.ar, google.com.au,
google.com.az, google.com.bd, google.com.bh, google.com.bi, google.com.bn, google.com.bo, google.com.br,
google.com.bs, google.com.bz, google.com.cn, google.com.co, google.com.cu, google.com.do, google.com.ec,
google.com.eg, google.com.et, google.com.fj, google.com.ge, google.com.gh, google.com.gi, google.com.gl,
google.com.gp, google.com.gr, google.com.gt, google.com.gy, google.com.hk, google.com.hn, google.com.hr,
google.com.jm, google.com.jo, google.com.kg, google.com.kh, google.com.ki, google.com.kz, google.com.lk,
google.com.lv, google.com.ly, google.com.mt, google.com.mu, google.com.mw, google.com.mx, google.com.my,
google.com.na, google.com.nf, google.com.ng, google.com.ni, google.com.np, google.com.nr, google.com.om,
google.com.pa, google.com.pe, google.com.ph, google.com.pk, google.com.pl, google.com.pr, google.com.pt,
google.com.py, google.com.qa, google.com.ru, google.com.sa, google.com.sb, google.com.sc, google.com.sg,
google.com.sv, google.com.tj, google.com.tr, google.com.tt, google.com.tw, google.com.uy, google.com.uz,
google.com.ve, google.com.vi, google.com.vn, google.com.ws, google.cz, google.de, google.dj, google.dk ,
google.dm , google.ec, google.ee, google.es, google.fi, google.fm, google.fr, google.gd, google.ge,
google.gf, google.gg, google.gl, google.gm, google.gp, google.gr, google.gy, google.hk, google.hn,
google.hr, google.ht, google.hu, google.ie, google.im, google.in, google.info, google.is, google.it,
google.je, google.jo, google.jobs, google.jp, google.kg, google.ki, google.kz, google.la, google.li,
google.lk, google.lt, google.lu, google.lv, google.ma, google.md, google.mn, google.mobi, google.ms,
google.mu, google.mv, google.mw, google.net, google.nf, google.nl, google.no, google.nr, google.nu,
google.off.ai, google.ph, google.pk, google.pl, google.pn, google.pr, google.pt, google.ro, google.ru,
google.rw, google.sc, google.se, google.sg, google.sh, google.si, google.sk, google.sm, google.sn,
google.sr, google.st, google.tk, google.tm, google.to, google.tp, google.tt, google.tv, google.tw,
google.ug, google.us, google.uz, google.vg, google.vn, google.vu, google.ws, gps.virgin.net, hotbot.com,
ilmotore.com, ithaki.net, kataweb.it, libero.it, lycos.it, mamma.com, megasearching.net, mirago.co.uk,
netscape.com, search.aol.co.uk, search.arabia.msn.com, search.bbc.co.uk, search.conduit.com,
search.icq.com, search.live.com, search.lycos.co.uk, search.lycos.com, search.msn.co.uk, search.msn.com,
search.myway.com, search.mywebsearch.com, search.ntlworld.com, search.orange.co.uk, search.prodigy.msn.com,
search.sweetim.com, search.virginmedia.com, search.yahoo.co.jp, search.yahoo.com, search.yahoo.jp,
simpatico.ws, soso.com, suche.fireball.de, suche.t-online.de, suche.web.de, technorati.com, tesco.net,
thespider.it, tiscali.co.uk, uk.altavista.com, uk.ask.com, uk.search.yahoo.com
Thanks to Spiros Denaxas for his URI::ParseSearchString, which is the basis for the lookups.
Methods inherited from class org.apache.pig.EvalFunc |
finish, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SearchEngineExtractor
public SearchEngineExtractor()
exec
public String exec(Tuple input)
throws IOException
- Description copied from class:
EvalFunc
- This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
- Specified by:
exec
in class EvalFunc<String>
- Parameters:
input
- the Tuple to be processed.
- Returns:
- result, of type T.
- Throws:
IOException
getArgToFuncMapping
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException
- Overrides:
getArgToFuncMapping
in class EvalFunc<String>
- Returns:
- A List containing FuncSpec objects representing the Function class
which can handle the inputs corresponding to the schema in the objects
- Throws:
FrontendException
Copyright © ${year} The Apache Software Foundation