Handling Multi Character Delimiter in CSV file using Spark

Parmanand
1 min readSep 1, 2020

--

In our day-to-day work, pretty often we deal with CSV files. Because it is a common source of our data.

Using Multiple Character as delimiter was not allowed in spark version below 3. But in the latest release Spark 3.0 allows us to use more than one character as delimiter.

For Example, Will try to read below file which has || as delimiter

Let’s see an example-

Using Spark: 2.3

val sparkSession=SparkSession.builder.appName("TestAPP").master("local[2]").getOrCreate()
val rawData:DataFrame=sparkSession.read.option("header","true").option("delimiter","||").csv("data/data.csv")
rawData.show()

Let’s see another example —

Scala version: 2.12.10 & Spark version : 3.0.0

val sparkSession=SparkSession.builder.appName("TestAPP").master("local[2]").getOrCreate()
val rawData:DataFrame=sparkSession.read.option("header","true").option("delimiter","||").csv("data/data.csv")
rawData.show()

--

--