Recently aws has announced release of AWS Serverless in re:invent 2022 event. If you'are a Data Engineer or Bigdata developer with AWS data services then One obvious question will raise in everyone mind is why Serverless EMR when AWS Glue is already there in service list which is almost doing the same job.
Introduction of EMR
Before we move to understand EMR serverless, it is more helpful to get brief about EMR first.
EMR is fully managed Hadoop cluster in AWS to store, process and analyze big data systems. It is a combination of Map reduce process that typically data enginners were doing in past on local machines or cluster.
In EMR to store intermidiate results we have HDFS/EMRFS/Local File system(Instance store/EBS). This is same as HDFS - hadoop distributed file system provided by spark or hadoop.
EMR support nearly 50+ softwares to use on your EMR cluster that you spin up to perform your daily jobs/tasks. i.e. Spark, Hive, HBase, Hue, Pig, JupyterLab, etc.
Top comments (0)