Download PDF of Apache Spark Interview Questions

 

11. Can RDD be shared between SparkContexts?

Ans: No, When an RDD is created; it belongs to and is completely owned by the Spark context it originated from. RDDs can’t be shared between SparkContexts.

Premium Training : Spark Full Length Training : with Hands On Lab

12. In Spark-Shell, which all contexts are available by default?

Ans: SparkContext and SQLContext

13. Give few examples , how RDD can be created using SparkContext

Ans: SparkContext allows you to create many different RDDs from input sources like:

· Scala’s collections: i.e. sc.parallelize(0 to 100)

· Local or remote filesystems : sc.textFile("README.md")

· Any Hadoop InputSource : using sc.newAPIHadoopFile

14. How would you brodcast, collection of values over the Sperk executors?

Ans: sc.broadcast("hello")

15. What is the advantage of broadcasting values across Spark Cluster?

Ans: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.\

Premium : Cloudera Hadoop and Spark Developer Certification Material

 Databricks Spark 2.x Developer Certification    Databricks PySpark 2.x (Python Spark) Certification Exam     Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications     MapR V2 Spark Developer Certification Exam

16. Can we broadcast an RDD?

Ans: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.

17. How can we distribute JARs to workers?

Ans: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.

18. How can you stop SparkContext and what is the impact if stopped?

Ans: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application.

Premium : Hortonworks Spark Developer Certification Material (HDPCD:Spark)

19. Which scheduler is used by SparkContext by default?

Ans: By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.

20 .How would you the amount of memory to allocate to each executor?

Ans: SPARK_EXECUTOR_MEMORY sets the amount of memory to allocate to each executor.

Premium Training : Spark Full Length Training : with Hands On Lab

Spark Professional Training   Spark SQL Hands Training   PySpark : HandsOn Professional Training    Apache NiFi (Hortonworks DataFlow) Training   Hadoop Professional Training   Cloudera Hadoop Admin Training Course-1  HBase Professional Traininghttp  SAS Base Certification Hands On Training OOzie Professional Training     AWS Solution Architect : Training Associate

AWS Exam Prepare : Kinesis Data Stream   Free Core Java 1Z0-808 Training   Scala Professional Training   Python Professional Training  Read Spark SQL Fundamental and Cookbookhttps://sites.google.com/training4exam.com/spark-sql-2-x-fundamentals/  Book : AWS Solution Architect Associate : Little Guide  NiFi CookBook By HadoopExam  AWS Security Specialization Certification: Little Guide SCS-C01     Spark Interview Questions