Explode Function in Spark Scala

Parmanand
2 min readOct 28, 2020

--

As we often deal with JSON type. Which is very common source of data. But in some cases the data we receive from the source may not be in simple format. It could be complex which may involve Struct and Arrays type as well. So in this case you need to flatten your JSON data using spark functions.

In this article, i will talk about explode function in Spark Scala which will deal with Arrays type. if you want to know about Struct type . Please read below article.

Let’s see an example -

{
"total_record":3,
"desc" : "all students records",
"students":[{"name": "Ram ","roll":"21","phone": 965483633},
{"name": "shayam ","roll":"22","phone": 905483683},
{"name": "rahul ","roll":"28","phone": 985483677}]
}

In the above example, key students contains three students record. Here, students key is an array type.

Let’s get started !

Read JSON data using spark and apply explode method to flatten your JSON

val rawDF:DataFrame= sparkSession.read.option("multiline","true").json("data/fcst_catg/data.json");rawDF.printSchema()
rawDF.show(false)

OutPut-

As you can see the schema students col is of array type.

Let’s apply explode function on this column.

Explode function takes column that consists of arrays and create sone row per value in the array.

val tempDF:DataFrame=rawDF.select(explode(col("students")).as("students"))
tempDF.printSchema()
tempDF.show()

Output -

Above schema shows that students is now struct type. we will simply use dot to get all cols. If you don’t know about struct type. Please read above article.

val finalDF:DataFrame=tempDF.select(col("students.*"))
finalDF.printSchema()
finalDF.show()

Now, finally we have selected all columns of students.

Output-

How to read multi character delimited file using spark?

Thanks for reading!

Please do share the article, if you liked it. Any comments or suggestions are welcome.

--

--