This post is merely a reference-like article to set up your Scala environment for Apache Spark development.
Simplest Thing
Your build.sbt
should looks like:
name := "My Project"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"
Your Entry.scala
:
import org.apache.log4j.{Level, LogManager}
object Entry {
def main(args: Array[String]) {
val spark = SparkSession
.builder()
.master("local")
.getOrCreate()
LogManager.getRootLogger.setLevel(Level.ERROR)
// use spark variable here to write your programs
}
}
Integrating with Azure Pipelines
Azure Pipelines has built-in support for sbt, therefore you can build and package with the following task (simplest version):
- task: CmdLine@2
displayName: "sbt"
inputs:
script: |
sbt clean
sbt update
sbt compile
sbt package
workingDirectory: 'project-dir'
To pass version number, you can use a variable from your pipeline. Say it's called projectVersion
, then pipeline task is:
- task: CmdLine@2
displayName: "sbt"
inputs:
script: |
sbt clean
sbt update
sbt compile
sbt package
workingDirectory: 'project-dir'
env:
v: $(projectVersion)
which merely creates an environment variable called v for the sbt task. To pick it up, just modify version line for build.sbt
:
version := None.orElse(sys.env.get("v")).orElse(Some("0.1")).get
You can create uber JAR, however they are relatively large (70kb grows into over 100Mb) therefore I'd try to avoid it.
This article was originally published on my blog.
Top comments (0)