Handling Multi Character Delimiter in CSV file using Spark

Parmanand
1 min readSep 1, 2020

--

In our day-to-day work, pretty often we deal with CSV files. Because it is a common source of our data.

Using Multiple Character as delimiter was not allowed in spark version below 3. But in the latest release Spark 3.0 allows us to use more than one character as delimiter.

For Example, Will try to read below file which has || as delimiter

Let’s see an example-

Using Spark: 2.3

val sparkSession=SparkSession.builder.appName("TestAPP").master("local[2]").getOrCreate()
val rawData:DataFrame=sparkSession.read.option("header","true").option("delimiter","||").csv("data/data.csv")
rawData.show()

Let’s see another example —

Scala version: 2.12.10 & Spark version : 3.0.0

val sparkSession=SparkSession.builder.appName("TestAPP").master("local[2]").getOrCreate()
val rawData:DataFrame=sparkSession.read.option("header","true").option("delimiter","||").csv("data/data.csv")
rawData.show()

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet