Hello! I am a newbie and try to insert spark dataset into redis server. And belows are my codes.
List<Row> rows = Arrays.asList(
RowFactory.create(1, "A", 121.44, true),
RowFactory.create(1, "B", 300.01, false),
RowFactory.create(1, "C", 10.99, null),
RowFactory.create(2, "A", 33.87, true),
RowFactory.create(2, "B", 77.04, null),
RowFactory.create(2, "C", 121.67, true)
);
StructType schema = DataTypes.createStructType(
new StructField[] {
DataTypes.createStructField("id", DataTypes.IntegerType, false),
DataTypes.createStructField("category", DataTypes.StringType, false),
DataTypes.createStructField("value", DataTypes.DoubleType, false),
DataTypes.createStructField("truth", DataTypes.BooleanType, false)
}
);
SparkSession spark = SparkSession.builder().appName("Spark SQL Test Java")
.master("local[*]").getOrCreate();
Dataset<Row> df = spark.createDataFrame(rows, schema);
df.printSchema();
df.show();
df.write().format("org.apache.spark.sql.redis")
.option("spark.redis.host", "localhost")
.option("spark.redis.port", "6379")
.option("table", "tbl_test")
.option("key.column", "category") // this line is my issue
.mode(SaveMode.Append)
.save();
spark.close();
And belows are the console output.
root
|-- id: integer (nullable = false)
|-- category: string (nullable = false)
|-- value: double (nullable = false)
|-- truth: boolean (nullable = false)
outputs
+---+--------+------+-----+
| id|category| value|truth|
+---+--------+------+-----+
| 1| A|121.44| true|
| 1| B|300.01|false|
| 1| C| 10.99|false|
| 2| A| 33.87| true|
| 2| B| 77.04|false|
| 2| C|121.67| true|
+---+--------+------+-----+
As you see, redis table name is “tbl_test” and key.column is “category”. but the redis hashes output does not contain the whole values of spark dataset.
> hgetall tbl_test:A
1) "id"
2) "2"
3) "value"
4) "33.87"
5) "truth"
6) "true"
> hgetall tbl_test:B
1) "id"
2) "1"
3) "value"
4) "300.01"
5) "truth"
6) "false"
> hgetall tbl_test:C
1) "id"
2) "2"
3) "value"
4) "121.67"
5) "truth"
6) "true"
Because, the key column of spark dataset is not only “category” column itself, but the composite key (“id”, “category”). So I thnk the “key.column” of spark dataset option has to be “id:category”. But I have no idea how to do. Kindly inform me how to set the key column of redis hashes to composite key, id:category.