Typesafe config file for spark applications

It is best practice to have a config file for your Scala spark applications which may contain pipeline configurations. There are many options available, but Typesafe is one of the popular configuration library for JVM based languages. we are going to use use SBT build tool to manage dependencies.


Case When statement in SQL

In SQL world, very often we write case when statement to deal with conditions. Spark also provides “when function” to deal with multiple conditions.

In this article, will talk about following:

  1. when
  2. when otherwise
  3. when with multiple conditions

Let’s get started !

Let’s consider an example, Below is a spark…

Spark applications must have a SparkSession. which acts as an entry point for an applications. It was added in park 2.0 before this Spark Context was the entry point of any spark application. It allows you to control spark applications through a driver process called the SparkSession.

Let’s get started…

Spark is one of the powerful data processing framework. It offers many functions to handle null values in spark Dataframe in different ways. Spark also includes a function to allow us to replace null values in Dataframe. It’s na package contains functions to deal with null values.

In this article…

Spark is interesting and one of the most important things you can do with spark is to define your own functions called User defined Functions(UDFs) in spark. Which allows us to write our own transformations in Scala, Python or Java.

In this article, I will be discussing about spark UDF…

Parmanand kumar

Data Engineering | Machine Learning | Front-end | NIT Trichy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store