Big Data Analytics using Spark

It’s an open secret that we are living in the world of Big data. If you ask any industry expert

July 1, 2019
5 Days
PES Participants:
Rs. 5,000
Non-PES Participants:
Rs. 10,000


Thenmozhi S
Associate Professor


Crucible of Continuing Education (CCE)
PES University Campus
100 Feet Ring Road, BSK III Stage
Bengaluru – 560 085   View map

About the Course

It’s an open secret that we are living in the world of Big data.Spark is one of the hottest Big Data technologies. The amount of data generated today by devices, applications and users is exploding. Therefore, there is a critical need for tools that can analyze large-scale data and unlock value from it. Spark is a powerful technology that meets that need. Spark is an open-source fast and general-purpose cluster computing framework for large-scale data analysis. Spark is rapidly growing and is replacing HadoopMapReduce as the technology of choice for big data analytics.Spark is being used at Facebook, Pinterest, NetFlix, Conviva, TripAdvisor for Big Data and Machine Learning applications. Spark is one of the most in-demand skills right now, and with this course you can learn quickly and easily!

Course Objectives(What will you learn)

The course is designed to

  • Understand the Limitation of Map reduce and role of Spark in overcoming these limitations
  • Install Spark as a standalone cluster
  • Explore the most common as well as some complex use-cases to perform large-scale data analysis with Spark

Master and describe the features of Spark ML Programming

Who should attend

Someone who already knows how to program and is interested in learning Big Data Technologies.

Prerequisite:It is appreciable if you have basic math skill

Out station students / candidates have to make their arrangements for accommodation and boarding

Course Outline and schedule


Introduction, Hadoop Vs Spark, Spark Programming model (1 Hr Theory)

Working spark with python shell, Loading and Saving Data(1 Hr Hands-on)

Working with Data frames ( 2 Hrs Hands-on)

Resilient Distributed Databases ( 30 Min)

Working with RDD (30 Min Hands-on)

Submitting applications to a cluster(1 Hr Hands-on)


Tricky Statistics with Spark (2 Hrs Hands-on) : Variable identification, sampling, splitting, slicing, sorting, filtering and grouping dataframes)

Data Analysis with Spark (3 Hrs Theory + Hands-on): Univariate, bivariate, outlier detection, Missing Value treatment.

Use case : Uber Dataset– Ungraded Assignment –Discussion (30 Min)


Introduction to Machine Learning (1 Hour Theory)

Machine Learning with Spark (30 Min Theory)

Spark’s MLib for Machine Learning (30 Min Hands-on)

Regression with Spark – (3 Hours Theory + Hands-on)

Practice Assignment Discussion – On Uber open dataset (30 Min)


Classification with spark (3 Hours Theory + Hands-on)

Clustering with Spark ( 3 Hours Theory + Hands-on)

Practice Assignment Discussion – Movie Lens Dataset


PCA with Spark ( 2 Hours Theory + Hands-on)

Model Evaluation ( 2 Hours Hands-on)

Spark Recommender System Implementation ( 2 Hours Hands-on)

Evaluation and Assessment

Quiz – 30% (30 Min)
Big Data Analysis for a Business Use-Case – 70% (2 Hours)

Share this page