merge {SparkR}R Documentation

Merges two data frames

Description

Merges two data frames

Usage

## S4 method for signature 'SparkDataFrame,SparkDataFrame'
merge(x, y, by = intersect(names(x),
  names(y)), by.x = by, by.y = by, all = FALSE, all.x = all,
  all.y = all, sort = TRUE, suffixes = c("_x", "_y"), ...)

merge(x, y, ...)

Arguments

x

the first data frame to be joined

y

the second data frame to be joined

by

a character vector specifying the join columns. If by is not specified, the common column names in x and y will be used.

by.x

a character vector specifying the joining columns for x.

by.y

a character vector specifying the joining columns for y.

all.x

a boolean value indicating whether all the rows in x should be including in the join

all.y

a boolean value indicating whether all the rows in y should be including in the join

sort

a logical argument indicating whether the resulting columns should be sorted

Details

If all.x and all.y are set to FALSE, a natural join will be returned. If all.x is set to TRUE and all.y is set to FALSE, a left outer join will be returned. If all.x is set to FALSE and all.y is set to TRUE, a right outer join will be returned. If all.x and all.y are set to TRUE, a full outer join will be returned.

See Also

join

Other SparkDataFrame functions: $, $<-, select, select, select,SparkDataFrame,Column-method, select,SparkDataFrame,list-method, selectExpr; SparkDataFrame-class, dataFrame; [, [[, subset; agg, agg, count,GroupedData-method, summarize, summarize; arrange, arrange, arrange, orderBy, orderBy, orderBy, orderBy; as.data.frame, as.data.frame,SparkDataFrame-method; attach, attach,SparkDataFrame-method; cache; collect; colnames, colnames, colnames<-, colnames<-, columns, names, names<-; coltypes, coltypes, coltypes<-, coltypes<-; columns, dtypes, printSchema, schema, schema; count, nrow; dapply, dapply, dapplyCollect, dapplyCollect; describe, describe, describe, summary, summary, summary,AFTSurvivalRegressionModel-method, summary,GeneralizedLinearRegressionModel-method, summary,KMeansModel-method, summary,NaiveBayesModel-method; dim; distinct, unique; dropDuplicates, dropDuplicates; dropna, dropna, fillna, fillna, na.omit, na.omit; drop, drop; dtypes; except, except; explain, explain; filter, filter, where, where; first, first; groupBy, groupBy, group_by, group_by; head; histogram; insertInto, insertInto; intersect, intersect; isLocal, isLocal; join; limit, limit; mutate, mutate, transform, transform; ncol; persist; printSchema; rbind, rbind, unionAll, unionAll; registerTempTable, registerTempTable; rename, rename, withColumnRenamed, withColumnRenamed; repartition; sample, sample, sample_frac, sample_frac; saveAsParquetFile, saveAsParquetFile, write.parquet, write.parquet; saveAsTable, saveAsTable; saveDF, saveDF, write.df, write.df, write.df; selectExpr; showDF, showDF; show, show, show,GroupedData-method, show,WindowSpec-method; str; take; unpersist; withColumn, withColumn; write.jdbc, write.jdbc; write.json, write.json; write.text, write.text

Examples

## Not run: 
sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df1 <- read.json(sqlContext, path)
df2 <- read.json(sqlContext, path2)
merge(df1, df2) # Performs a Cartesian
merge(df1, df2, by = "col1") # Performs an inner join based on expression
merge(df1, df2, by.x = "col1", by.y = "col2", all.y = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE, all.y = TRUE)
merge(df1, df2, by.x = "col1", by.y = "col2", all = TRUE, sort = FALSE)
merge(df1, df2, by = "col1", all = TRUE, suffixes = c("-X", "-Y"))

## End(Not run)

[Package SparkR version 2.0.0 Index]