Download PDF of Apache Spark Interview Questions

 

11. Can RDD be shared between SparkContexts?

Ans: No, When an RDD is created; it belongs to and is completely owned by the Spark context it originated from. RDDs can’t be shared between SparkContexts.

Premium Training : Spark Full Length Training : with Hands On Lab

12. In Spark-Shell, which all contexts are available by default?

Ans: SparkContext and SQLContext

13. Give few examples , how RDD can be created using SparkContext

Ans: SparkContext allows you to create many different RDDs from input sources like:

· Scala’s collections: i.e. sc.parallelize(0 to 100)

· Local or remote filesystems : sc.textFile("README.md")

· Any Hadoop InputSource : using sc.newAPIHadoopFile

14. How would you brodcast, collection of values over the Sperk executors?

Ans: sc.broadcast("hello")

15. What is the advantage of broadcasting values across Spark Cluster?

Ans: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.\

Premium : Cloudera Hadoop and Spark Developer Certification Material

Spark Professional Training   Spark SQL Hands Training   PySpark : HandsOn Professional Training    Apache NiFi (Hortonworks DataFlow) Training   Hadoop Professional Training   Cloudera Hadoop Admin Training Course-1  HBase Professional Traininghttp  SAS Base Certification Hands On Training OOzie Professional Training     AWS Solution Architect : Training Associate

16. Can we broadcast an RDD?

Ans: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.

17. How can we distribute JARs to workers?

Ans: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.

18. How can you stop SparkContext and what is the impact if stopped?

Ans: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application.

Premium : Hortonworks Spark Developer Certification Material (HDPCD:Spark)

19. Which scheduler is used by SparkContext by default?

Ans: By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.

20 .How would you the amount of memory to allocate to each executor?

Ans: SPARK_EXECUTOR_MEMORY sets the amount of memory to allocate to each executor.

Premium Training : Spark Full Length Training : with Hands On Lab

AWS Exam Prepare : Kinesis Data Stream   Free Core Java 1Z0-808 Training   Scala Professional Training   Python Professional Training  Read Spark SQL Fundamental and Cookbookhttps://sites.google.com/training4exam.com/spark-sql-2-x-fundamentals/  Book : AWS Solution Architect Associate : Little Guide  NiFi CookBook By HadoopExam  AWS Security Specialization Certification: Little Guide SCS-C01     Spark Interview Questions