Spark Streaming – Simple Network Word Count

Here are the steps to write a simple network word counter using Spark Streaming.

Refer to my previous post and create scala project.

Modify the App.scala to following. For description on the code refer to the Spark Streaming Programming Guide.

package com.nxtify

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._ // not necessary in Spark 1.3+
/**
* @author ${user.name}
*/
object App {

def foo(x : Array[String]) = x.foldLeft(“”)((a,b) => a + b)

def main(args : Array[String]) {

// Create a local StreamingContext with two working thread and batch interval of 1 second.
// The master requires 2 cores to prevent from a starvation scenario.

val conf = new SparkConf().setMaster(“local[2]”).setAppName(“NetworkWordCount”)
val ssc = new StreamingContext(conf, Seconds(1))

// Create a DStream that will connect to hostname:port, like localhost:9999
val lines = ssc.socketTextStream(“localhost”, 9999)

// Split each line into words
val words = lines.flatMap(_.split(” “))

// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

// Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.print()

ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
}
}

Use the following command to build and run

mvn package exec:java -Dexec.mainClass=c.nxtify.App

Use the following command on another command prompt to start the feed using NetCat.

nc -lk 9999 //assuming you are on linux or mac

Enter text on the netcat prompt and the results will show up on the other prompt.  Now you have a basic Spark App up and running.

Scala using Maven

To create a new scala project using Maven, one can use the following archetype.

mvn archetype:generate -B   -DarchetypeGroupId=net.alchim31.maven -DarchetypeArtifactId=scala-archetype-simple -DarchetypeVersion=1.5   -DgroupId=com.company -DartifactId=project -Dversion=0.1-SNAPSHOT -Dpackage=com.nxtify

This generates the maven directory structure and following sample app.

package com.nxtify

/**
* @author ${user.name}
*/
object App {

def foo(x : Array[String]) = x.foldLeft(“”)((a,b) => a + b)

def main(args : Array[String]) {
println( “Hello World!” )
println(“concat arguments = ” + foo(args))
}

}

To build project, use the following command

mvn install

To run the project, use

mvn package exec:java -Dexec.mainClass=com.nxtify.App -Dexec.args=”Manish”

Learning R

A great place to start learning R is codeschool. Check it out – http://tryr.codeschool.com/levels/1/challenges/1

BTW, R is a free software environment for statistical computing and graphics.

How we train the people who join us…

Most of the people who join us at nxtify have some background in Java and Servlet Programming.

We ask our trainees to go through the HTML, CSS and Javascript introductory course at  CodeAcademy. This is more of a refresher to ensure they are grounded in these basic skills.

Then we get them onto Spring Roo. The second chapter gives a great tutorial – PizzaShop. We use Roo extensively in our work. Most of our team members are quite well versed with Roo. We then ask them to go through the code generated by Roo. This is a great way to learn how to write good code and understand the various concepts like MVC, AOP, Hibernate, Maven etc. We don’t focus much on JSP. We move them onto exposing services using REST/JSON using the Roo. By this time, they are comfortable working with Roo. Also with tools like Git, Maven etc.

Then we get them to use the AngularJS to build single page applications. Angular Seed is a great starter. Along the way we help them understand the MVC implementation on the client side, declarative programming etc. Debugging using the browser developer tools. Using tools like curl etc.

A few of them also train on d3js. A great place to start is Mike Bostock’s “Let’s Make a Bar Chart”

nxtify – New Experiences. New Insights.

Every time humans have discovered a new way to exchange information, it has transformed society in a multitude of ways. Language allowed us to organise in small bands. Writing enabled us to send information to travel to far off places using horses and pigeons. Printing accelerated that further and books transformed society. Telegraph and then telephones ushered us into the information age. Every time information could be exchanged in new ways – new organisations and new marketplaces emerged – the rules of the game changed. The internet changes the way we exchange information in a dramatic manner. We have seen that in the last two decades it has altered a lot of things, but this is just the beginning.

We are experiencing the next information explosion powered by social networks and sensor networks. This new information is quite different from what we have been used to until now in terms of volume, velocity and variety. To make sense from this new information, new systems are evolving with new ways to store, compute and visualise information. These new systems are also intelligent – they don’t just keep doing the same time over and over again but as they churn this new information they learn and improve. As these intelligent systems are becoming pervasive we are entering what many call – The Age of the Intelligent Machines.

How fast as organisation grows is a function of how fast it processes information and learns. The new competition is about how organisation can build machines that can consume this new information and provide them with new insights to identify new needs, develop new products and experiences, identify new inefficiencies, enable new ways to experiment and innovate rapidly.

Organisations that can build unique customer experiences, can attract customers whose interactions will provide them with new information. This new information can then provide them with new insights that can help them deliver still better experience. A feedback loop that can enable sustainable differentiation for the organisation. The more people search on Google, the better their search results become.

At nxtify, we are building capabilities to enable this new feedback loop. We build systems that deliver New Experiences coupled with systems that deliver New Insights. 

Computer Mediated Transactions – How can they alter your business?

Here is a nice paper by Hal Varian who is the chief economist at Google. The paper describes how computer mediated transactions are impacting economic transactions. He classifies them into four categories

  1. Facilitates New Forms of Contract
  2. Facilitates data extraction and analysis
  3. Facilitates controlled experiments
  4. Facilitates personalisation and experimentation

He gives some great examples how monitoring technologies like the bell in the cash drawer, etc. have led to new forms of contracts. How existence of airline reservations systems enabled sophisticated differential pricing. How Google ran 6000 experiments in Web Search that led to 450-500 changes – this he links to the notion of “kaizen” or continuous improvements. How recommender systems are enabling deep personalisation based on past transactions.

These insights can be leveraged by any organisation to transform themselves. A good way to start is to transform these insights into questions that they can ask about their business. You can formulate your own questions but here is a set that I have formulated.

1. List all the economic transactions they are involved with. Ideate on these transactions be transformed if they are computer mediated. What new kinds of contracts can be formulated?

2. List out the data they have in the systems. What insights can be extracted from this data that will enable new offerings e.g. differential pricing?

3. What are the ideas waiting in the line? How can we accelerate innovation and continuous improvements by controlled experimentation?

4. Where can deep personalisation transform economic transactions?

Beyond – Reimagining Conversations

Screen Shot 2015-02-24 at 7.14.30 pm

With Beyond, we are attempting to reimagine computer mediated conversations between two people on the internet. This is still under development and unless somebody else is online and available – you may not be able to use it. But go ahead and try it out at here. Let us know what you think. Keep checking it out as it is evolving rapidly.

With Beyond, we are also trying out Meteor. It is an open source framework for developing real time web applications. It propagates the changes made by one user to all others who have subscribed to the same data without the developer writing synchronisation code. So far, we are lovin’ it.