Yann's blog

Montag, 24. März 2014

Neue URL

I'll write now on http://yanns.github.io/

Dienstag, 28. Mai 2013

DevOpsDays Berlin 2013

direct from DevOpsDay Berlin 2013, my notices:

Program
Videos

Day #1

From the presentation from Immobilienscout - Marcel Wolf, Felix Sperling

Dev AM rotation: exchange between Dev and Ops
Self service VM
unique configuration server, accessible to anyone

DevTools team at Etsy - Daniel Schauenberg

Slides
deployinator
Each new employee should deploy on her first day.
LXC container for tests
Statistics with statsd, logster and graphite
Log streamer with supergrep
Open Source projects

Day #2:

How the QA team got Prezi ready for DevOps - Peter Neumark

presentation
Like the error handling: "When blame inevitably arises, the most senior people in the room should repeat this mantra: if a mistake happens, shame on us for making it so easy to make that mistake". Very similar to risk management culture.

Podularity FTW! - Tim Lossen

slides
team = autonomous cell (even technological stack, product...)
The organization is a supercell, bindings autonomous cells together.
Lunch roulette
Interesting question from the audience about the business continuity: if each team can choose its technological stack, is not it a problem then the team change, and when the new members do not know the new stack he is working with? Tim answered that it was indeed a problem the organization though of, but in practice, it never happened.
I like this approach that I could try to summarize like this: do not spend your time trying to avoid problems, but solve real problem that exist.

Island Life: How we built and deployed the Honshū way - Wes Mason

from one monolith app to several islands like components

Other notices:

robot for chat room to post more information
secrets are hard to deploy in a secure way
distributed file system: ceph
more info: Ceph: A scalable, high-performance distributed file system (2006)
doker.io to manage linux containers. The demo was impressive, deploying one version and then another one in a few minutes.
Discussion with gutefrage.net who use Scala / Finagle with Thrift.
Storage of statistics with OpenTSDB
LiveRebel: ZeroTurnaround made a presentation of LiveRebel
LiveRebel contains some versions of the application.
These versions can be uploaded, manually or automatically (maven plugin, command line tool...)

Then LiveRebel can deploy a specific version on production (or staging...)
For this, an agent is running on each server.

LiveRebel can deploy to one server, check it with some configured smoke test.
If the test is successful, the server is activated on the cluster, and the deployment process continues with the next server.

Liquibase is used to deploy a version to a database.

Montag, 20. August 2012

Why having opened a blog, while I regularly post on G+?

I like posting on G+, it fells spontaneous.

But I felt frustrated about the formatting possibilities on G+, especially for code.
I've also opened this blog to experiment a new possibility.
It is possible that I move this blog to another platform if I am not happy with it.

To reduce the gap with G+, I compiled some of my posts:

Sonntag, 12. August 2012

Handling data streams with Play2 and Server-Send Events

Handling data streams
As the version 2 of Play! Framework was published, I was very interested in its new capabilities to handle data streams reactively.
As a technical proof of concept, I wrote a parser that works with chunks of data instead of loading the whole content in memory.
My source was a file containing the geographical coordinates of Wikipedia articles.
(This file is the result of an experience of Triposo, showing how Wikipedia has spread over the planet since the start of the Wikipedia project. Do not forget to watch the other labs from Triposo, they are great!)

Play2 architecture is based on event, and gives us some tools to work with streams of data:

Enumerators produce chunk of data
Enumeratees transform these chunks
Iteratees consumes these chunks

(For more information, you can read:

Is socket.push(bytes) all you need to program Realtime Web apps? from Sadek Drobi, CTO Zenexity.
Zound, a PlayFramework 2 audio streaming experiment using Iteratees from Gaetan Renaudeau
If you understand french, you can read Realtime Web Application, un exemple avec Play2 from Nicolas Martignole)

The production of data is an Enumerator, sending line after line of the input file:

def fileLineStream(file: File) : Enumerator[String] = {
  val source = scala.io.Source.fromFile(file)
  val lines = source.getLines()

  Enumerator.fromCallback[String] (() => {
    val line = if (lines.hasNext) {
      Some(lines.next())
    } else {
      None
    }
    Promise.pure(line)
  }, source.close)
}

With an Enumeratee, each line can be possibly parsed into a Coordinate class:

val lineParser: Enumeratee[String, Option[Coordinate]] = Enumeratee.map[String] { line =>
  line.split("\t") match {
    case Array(_, IsDouble(latitude), IsDouble(longitude)) => Some(Coordinate(latitude, longitude))
    case _ => None
  }
}

The Enumerator can be composed with an Enumeratee with:

fileLineStream(file) &> lineParser

I know, the method's name "&>" can make some of you go away. Please stay! This sign is like the pipe in bash. It is very easy to understand:

fileLineStream(file) &> lineParser

is the same as (removing infix notation)

fileLineStream(file).&>(lineParser)

which is the same as (method alias)

fileLineStream(file).through(lineParser)

Use the last form if the first one is not your taste... :)

With a last Enumeratee to produce JSON, I can send the stream directly to the browser with Server Send Events.

Ok.feed(fileLineStream(file) &> lineParser &> validCoordinate &> asJson ><> EventSource()).as("text/event-stream")

What to notice:

Only chunks of data are in memory. The whole content of the source file is never loaded completely.
Each step of the process is isolated in an Enumertor or Enumeratee, making it very easy to modify, to re-use, to combine in a different way.
The Enumerator is reading a file, but you can imagine it could read data from a web service, of from a database.

Server-Send Events
When we want to send events in "real time" to the browser, what technologies are available?

polling: the browser pools the server every x milliseconds to check if there is a new message. This method is not very efficient, because a lot of requests are necessary to give the illusion to update the application in real time.
long-polling (or Comet): the browser opens a connection to the server (for example in a iframe), and the server keeps the connection opened. When the server wants to push data to the client, it sends this data with the opened connection. The client receives the data, and opens a connection again for further messages. With this method, the browser is always showing that it is waiting for data. This technology does not scale on threaded system, as each opened connection uses a thread. In JEE environment, we need an asynchronous servlet 3.1 not to make the server exploding.
Server-Send Events (SSE) are quite similar to Comet. The main difference is that the browser manages this connection. For example, it opens the connection again if it falls.
WebSockets provide a bi-directional, full-duplex communications channels. It is a different protocol than HTTP.

I choose to use Server-Send Events instead of WebSockets because of the following reasons:

I've already played with WebSockets and wanted to try something new.
WebSockets are great and can communicate in both directions. But this technology is a new protocol, sometimes difficult to integrate in an existing infrastructure (Proxy, Load-Balancer, Firewall...) Server-Send Events, on the other hand, use the HTTP protocol. The PaaS Heroku does not support WebSockets yet, but support SSE. When pushing data from the server to clients is all what you need, SSE can be what is the most appropriate and is well supported (except in IE for the moment)

Server-Send Events API
The Javascript API is very simple:

feed = new EventSource('/stream');

// receive message
feed.addEventListener('message', function(e) {
    var data = JSON.parse(e.data);
    // do something with data
}, false);

Visualizing the results
As the stream is sending coordinates, my first attempt was to display them on a earth in 3D. For this, I used three.js, which was very simple. The first results were promising, but sadly, the browser could not display so much information in 3D. I had to found an alternative.
My second attempt was to display these coordinates on a 2D canvas, and that worked well, although less impressive that a 3D map... :)
You can see the result on Heroku: http://wiki-growth.herokuapp.com/
The code source is available on github: https://github.com/yanns/play2-wiki-growth-sse
You can run it by yourself with Play 2.0.3 and let heroku sleeping.