为啥Spark 的Broadcast要用单例模式

发布时间：2019-06-13 06:07:47 所属栏目：教程来源：浪院长

导读：副标题#e# 很多用Spark Streaming 的朋友应该使用过broadcast，大多数情况下广播变量都是以单例模式声明的有没有粉丝想过为什么?浪尖在这里帮大家分析一下，有以下几个原因：广播变量大多数情况下是不会变更的，使用单例模式可以减少spark streaming每次jo

在接受到GenerateJob事件的时候，会执行generateJobs代码，就是在该代码内部产生和调度job的。

/** Generate jobs and perform checkpointing for the given `time`.  */ 
  private def generateJobs(time: Time) { 
    // Checkpoint all RDDs marked for checkpointing to ensure their lineages are 
    // truncated periodically. Otherwise, we may run into stack overflows (SPARK-6847). 
    ssc.sparkContext.setLocalProperty(RDD.CHECKPOINT_ALL_MARKED_ANCESTORS, "true") 
    Try { 
      jobScheduler.receiverTracker.allocateBlocksToBatch(time) // allocate received blocks to batch 
      graph.generateJobs(time) // generate jobs using allocated block 
    } match { 
      case Success(jobs) => 
        val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time) 
        jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos)) 
      case Failure(e) => 
        jobScheduler.reportError("Error generating jobs for time " + time, e) 
        PythonDStream.stopStreamingContextIfPythonProcessIsDead(e) 
    } 
    eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false)) 
  }

（编辑：西安站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

4/5

首页

尾页