UDF in spark Scala with examples

val tempDF=sparkSession.createDataFrame(Seq(
("rahul sharma",32,"Patna",20000,"Store Manager"),
("Joy don",30,"NY",23455,"Developer"),
("Steve boy",42,"Delhi",294884,"Developer"))
).toDF("name","age","city","salary","designation")
def getFistName= (name: String) => {
val temp:Array[String]=name.split(" ")
temp(0)
}
val getFistNameUDF = sparkSession.udf.register("fist_name",getFistName)this can be also used as string expression for example -tempDF.selectExpr("fist_name(name)").show(2)another way to register: import org.apache.spark.sql.functions.udf
val getFistNameUDF = udf(getFistName)
we can use this only as a DataFrame function.it can’t be used within a string expression.
def  isManager= (name: String) => {
if(name.contains("Manager"))
"yes"
else
"No"
}
val isManagerUDF = sparkSession.udf.register("is_manager",isManager)
val finalDF=tempDF.withColumn("first name",getFistNameUDF(col("name")))
.withColumn("is_manager",isManagerUDF(col("designation")))

--

--

--

Data Engineering | Machine Learning | Front-end | NIT Trichy

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Package common utilities by hosting own Nuget feed

Beat The Dunning Krueger Effect To Propel Yourself As A Developer

Getting Started with Brownie: Part 1

CS 373 Spring 2021: Chanakya Remani

Reduce Cost and Increase Productivity with Value Added IT Services from buzinessware — {link} -

MoonTools | Product Beta Sneak Peak

8 Key Challenges of Wearable App Development

FrontendLove — In the loop with Tim van der Lippe

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Parmanand kumar

Parmanand kumar

Data Engineering | Machine Learning | Front-end | NIT Trichy

More from Medium

Apache Spark RDD

Packaging PySpark application using pex and whl.

About reading raw json files in spark

Partitioning vs Bucketing — In Apache Spark