public class ComputeResponseTool
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
Each query run consists of three MR jobs:
(1) Map: Initialization mapper reads data using an extension of the BaseInputFormat or elasticsearch and, according to the QueryInfo object, extracts the
selector from each dataElement according to the QueryType, hashes selector, and outputs
Reduce: Calculates the encrypted row values for each selector and corresponding data element, striping across columns,and outputs each row entry by column
position:
(2) Map: Pass through mapper to aggregate by column number
Reduce: Input:
(3) Map: Pass through mapper to move all final columns to one reducer
Reduce: Creates the Response object
NOTE: If useHDFSExpLookupTable in the QueryInfo object is true, then the expLookupTable for the watchlist must be generated if it does not already exist in
hdfs.
TODO:
-Currently processes one query at time - can change to process multiple queries at the same time (under the same time interval and with same query
parameters) - using MultipleOutputs for extensibility to multiple queries per job later...
- Could place Query objects in DistributedCache (instead of a direct file based pull in task setup)
- Redesign exp lookup table to be smart and fully distributed/partitioned
Constructor and Description |
---|
ComputeResponseTool() |
Modifier and Type | Method and Description |
---|---|
int |
run(java.lang.String[] arg0) |
public ComputeResponseTool() throws java.io.IOException, PIRException
java.io.IOException
PIRException