How to save spark dataframe into redis server with composite keys?

aupres · April 22, 2024, 10:30am

Hello! I am a newbie and try to insert spark dataset into redis server. And belows are my codes.

List<Row> rows = Arrays.asList(
    RowFactory.create(1, "A", 121.44, true),
    RowFactory.create(1, "B", 300.01, false),
    RowFactory.create(1, "C", 10.99, null),
    RowFactory.create(2, "A", 33.87, true),
    RowFactory.create(2, "B", 77.04, null),
    RowFactory.create(2, "C", 121.67, true)
);
        
StructType schema = DataTypes.createStructType(
    new StructField[] { 
        DataTypes.createStructField("id", DataTypes.IntegerType, false),
        DataTypes.createStructField("category", DataTypes.StringType, false),
        DataTypes.createStructField("value", DataTypes.DoubleType, false),
        DataTypes.createStructField("truth", DataTypes.BooleanType, false)
    }
);
        
SparkSession spark = SparkSession.builder().appName("Spark SQL Test Java")
        .master("local[*]").getOrCreate();
Dataset<Row> df = spark.createDataFrame(rows, schema);
        
df.printSchema(); 
df.show();

df.write().format("org.apache.spark.sql.redis")
        .option("spark.redis.host", "localhost")
        .option("spark.redis.port", "6379")
        .option("table", "tbl_test")
        .option("key.column", "category")  // this line is my issue
        .mode(SaveMode.Append)
        .save();

spark.close();

And belows are the console output.

root
 |-- id: integer (nullable = false)
 |-- category: string (nullable = false)
 |-- value: double (nullable = false)
 |-- truth: boolean (nullable = false) 


outputs
+---+--------+------+-----+
| id|category| value|truth|
+---+--------+------+-----+
|  1|       A|121.44| true|
|  1|       B|300.01|false|
|  1|       C| 10.99|false|
|  2|       A| 33.87| true|
|  2|       B| 77.04|false|
|  2|       C|121.67| true|
+---+--------+------+-----+

As you see, redis table name is “tbl_test” and key.column is “category”. but the redis hashes output does not contain the whole values of spark dataset.

> hgetall tbl_test:A
1) "id"
2) "2"
3) "value"
4) "33.87"
5) "truth"
6) "true"
> hgetall tbl_test:B
1) "id"
2) "1"
3) "value"
4) "300.01"
5) "truth"
6) "false"
> hgetall tbl_test:C
1) "id"
2) "2"
3) "value"
4) "121.67"
5) "truth"
6) "true"

Because, the key column of spark dataset is not only “category” column itself, but the composite key (“id”, “category”). So I thnk the “key.column” of spark dataset option has to be “id:category”. But I have no idea how to do. Kindly inform me how to set the key column of redis hashes to composite key, id:category.

Topic		Replies	Views
Spark-Redis connection Redis commands & data structures hashes , python	8	3707	August 3, 2020
Issue loading data set using spark-redis Redis commands & data structures	0	1359	June 7, 2021
Spark compatibility RedisJSON	4	1576	July 22, 2020
filter push down (by key) when querying redis using spark (pyspark) Redis client libraries (Java, Python, JS, etc.)	0	609	July 21, 2021
Storing and retrieving files is redis Redis modules	1	1775	September 19, 2021

How to save spark dataframe into redis server with composite keys?

Related topics