Here are the steps to write a simple network word counter using Spark Streaming.
Refer to my previous post and create scala project.
Modify the App.scala to following. For description on the code refer to the Spark Streaming Programming Guide.
package com.nxtify
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._ // not necessary in Spark 1.3+
/**
* @author ${user.name}
*/
object App {def foo(x : Array[String]) = x.foldLeft(“”)((a,b) => a + b)
def main(args : Array[String]) {
// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.val conf = new SparkConf().setMaster(“local[2]”).setAppName(“NetworkWordCount”)
val ssc = new StreamingContext(conf, Seconds(1))// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream(“localhost”, 9999)// Split each line into words
val words = lines.flatMap(_.split(” “))// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}
Use the following command to build and run
mvn package exec:java -Dexec.mainClass=c.nxtify.App
Use the following command on another command prompt to start the feed using NetCat.
nc -lk 9999 //assuming you are on linux or mac
Enter text on the netcat prompt and the results will show up on the other prompt. Now you have a basic Spark App up and running.