Posted by: lbroudoux (@lbroudoux) | January 20, 2015

Manage your Elasticsearch rivers with Sluice

As you may have noticed, I am an addicted user of Elasticsearch and have already written some river plugins for indexing different data sources such as Amazon S3 buckets or Google drives.

As a river plugins developper, I usually find myself in situations where I should test many Elasticsearch version and configuration combos – for that I have to reinstall fresh copies of my plugins. Also, as a river user, I have to create many rivers – and it’s often a trial and error process in order to find the correct configuration in regards of content to index, data freshness, index and request performance, mapping issues and so on…

So this lead me to the point that I felt tired about those long CURL commands needed to configure all this stuffs and decided to react ! Today I’m introducing you a new Elasticsearch plugin I’ve wrote to make my life easier (and maybe yours if you’re using rivers too ;-)): Sluice !


So what’s Sluice ? As stated above, Sluice is also an Elasticsearh plugin whose goal is to help you manage your rivers : it simplifies installation of the required plugins but also helps you setup and tune your rivers. The idea with Sluice is to no longer have CURL commands to type, just install the Sluice plugin and then use its simple User interface.

Sluice is hosted on Github and install as a regular plugin by typing the following command in a shell :

$ bin/plugin --install com.github.lbroudoux.elasticsearch/sluice/0.0.1

Now restart your Elasticsearch instance and point your browser to http://localhost:9200/_plugins/sluice. You should see the following dashboard appear.


You see here that Sluice checks installed river plugins among supported ones and offers simple way to install river plugins not already installed. Just click the Install link and its cares about retrieving and setting up Amazon S3 river plugin for example. You just then need to restart your ES node.

Picking up the dedicated section, you may also have the list all the river instances created into your ES cluster. For now, you can just edit and modify existing river – not remove them.


Finally, it offers a convenient way to add a new river. Configuration attributes of the river are grouped together with clear explanation of its meaning and supported format.


Easy, no ? For the moment, supported River plugins are :

  • Amazon S3 River plugin,
  • Google Drive River plugin


Sluice has only a first release named 0.0.1 and it’s far from being feature complete !
The current limitations are :

  • Only work with local development instances (yep ! http://localhost:9200 is hard coded… so ugly ! :-()
  • No way of removing rivers,
  • No way to start/stop rivers,
  • ES reboot is required after plugin installation

Future plans

Many useful features come to my mind – the order of the list has no relation with priority :

  • Configuration of Elasticsearch cluster endpoint,
  • Ability to remove or duplicate rivers,
  • Support of other river plugins such as the excellent FSRiver or the TwitterRiver,
  • Ability to start/stop or force a refresh of river settings while running,
  • Ability to get the CURL command of river for recreating it later (useful when tuning has been done in Dev or QA and that river creation should be scripted on production),
  • Rivers indexing statistics on dashboard !

Do not hesitate giving me feedback and sharing your feature ideas for future release of Sluice !

Posted by: lbroudoux (@lbroudoux) | May 5, 2014

Use Elasticsearch as a data store for your Spring Roo app

So long ago since my last post but be sure I have not been devoided of thoughts since then (have seen the blog title ? ;-)). Just a lack of time and energy to write things down…

I resume today with blogging with a Spring Roo plugin I finished last week. For those that didn’t know about Spring Roo : it’s a productivity tool helping you bootstrapping a Spring application within seconds. And although excitement seems to be more around Spring Boot these days, I found Roo to be a valuable tool for a developper toolbox… Anymway, Roo comes with many plugins allowing you to chose your persistence layer and APIs : typically JPA based or MongoDB based.

I’ve started some months ago this plugin allowing you to have a persistence layer based on Elasticsearch. The idea here is to have your domain objects directly persisted into an Elasticsearch index and – thanks to the conventions of Roo – quickly having a CRUD service layer and scaffolded web screens directly generates for us. After a little contribution to Spring Data for Elasticsearch (here), the plugin was on its way and is now hosted here on Github.

Twitter example development

The plugin is not yet released to official Spring Roo repository to installation is a bit teadious… The on Githud explains how to do that so I won’t delve into this part. Instead, I propose to illustrate more in details the Twitter example that is used to illustrate the plugin commands.

In order to complete this tutorial, you’ll need :

  • Spring Roo with plugin installed (I’ve used 1.2.2.RELEASE),
  •  Maven installation (I’ve used 3.0.4),
  • Elasticsearch installation running on port 9200 and 9300 (I’ve used 1.1.1).

So let start with a brand new project. In a new directory, start a Roo shell and create a new project with this command :

project --topLevelPackage

Project initialization

This produces a bunch of configuration files as shown by screenshot above. Now, next thing to do is to activate the Elasticsearh layer plugin for Roo and setting it up for using a ES node that is non local to the JVM and hosted on localhost:9300. You do this with this line :

elasticsearch setup --local false --clusterNodes localhost:9300

Elasticsearch setup

Configuration files are generated for you, dependencies (to spring-data-elasticsearch) are added for you and Spring version is updated to required one. Following step is to tell Roo you want a Tweet domain object that will be backed by Elasticsearch. This is done through this new variation of the entity command available in Roo :

entity elasticsearch --class ~.domain.Tweet

Domain creation

Tweet domain Java class is generated and followed by AspectJ ITD. You can now embellish your domain class with fields such as author and content that should be limited to 140 characters length. This is done with the following based commands in Roo :

field string --fieldName author
field string --fieldName content --sizeMax 140

Fields addition

Nothing more to say here : Tweet class is modified. Next step is more interesting : it’s here that you’re asking the plugin to generate a Spring Data repository layer for persisting Tweets into ES. This is done by :

repository elasticsearch --interface ~.repository.TweetRepository --entity ~.domain.Tweet

Repository creation

You see that a new interface TweetRepository has been generated and that an ITD that triggers an Elasticsearch implementation proxy is also present. By now, we have to create a CRUD Service layer for our repository and its done simply using this command :

service --interface ~.service.TweetService --entity ~.domain.Tweet

CRUD service creation

The TweetService interface and its implementations are generated in a way that they’re using the repository we’ve generated earlier in order to persist and retrieve Tweet instances. Finally, in order to easily test and check the resulting application, we have to setup a web layer and generate scaffolded screens for our domain objects. This is done by sequencing these 2 commands :

web mvc setup
web mvc all --package ~.web

Web scaffolding

And a bunch of web resources, controllers and configuration files are now present into our application. Development is done !

Twitter example execution

We now want to execute all of this in order to properly test our app (Yes : Roo offers many way to unit and integration test your app but a screen is more expressive, at least for a blog post ;-)).

First, in a terminal, start your Elasticsearch node on localhost. Default command will do the job, you don’t need extra configuration :


Then, from the terminal you were working with Roo shell : exit the shell and launch the Tomcat plugin executing your app. This is done with this Maven command :

mvn tomcat:run

After Tomcat has started up, you can now open a browser to http://localhost:8080/es. You’ll get this screen this is the default home page for application.

Home screen

From there, you can access a page allowing you to create new Tweets with the fields we have added to our domain class.

Tweet creation

Persistence works fine and you’ll see by checking icons that every services are here for showing, updating, finding and deleting Tweets.

First tweet

Twitter example validation

Then you would told me : “Ok, ok… Stuffs are persisted but how do you know that they’re persisted into Elasticsearch node ?”. A simple thing to do is to check on ES using Marvel monitoring solution (I highly recommend you to install it if not already done !). So open a new browser tab to http://localhost:9200/_plugin/marvel/ and check the “Cluster Overview” dashboard.

Marvel indices

You see that a new index called tweets containing 1 document is now present. In order to check its content, you can go to the “Sense” dashboard that offers you online querying tool for your indices : http://localhost:9200/_plugin/marvel/sense/index.html.

Validation query

Now, you see our first tweet has really been persisted into Elasticsearch !


So I have demonstrated you how to write a full-blown Spring application that :

  • Persist and retrieve its domain object into Elasticsearch,
  • Is correctly architectured with a repository layer and a service layer,
  • Presents basic administrative web frontend,

in no more than 9 lines of Roo commands ! Wouah !

Much much more over this basic persistence stuffs, we’re able – as a developer – to build cool apps using the powerful indexing and querying features of Elasticsearch easily. Just consider this tutorial has a quick-starter and think about : full-text search, geo query, analytics and aggregates on various fields of your Tweets … it is close to hand !

Posted by: lbroudoux (@lbroudoux) | June 13, 2013

Plugin isolation support in Elasticsearch

As I blogged yesterday, I recently discover a limitation into Elasticsearch architecture regarding the isolation of plugins. The fact is that every plugin and its libraries are added to the same Java ClassLoader during startup and thus all the plugins share resources and classes definitions.


I encounter this by developping and testing 2 plugins : one for indexing documents stored onto Google Drive ; the other for indexing documents stored onto Amazon S3. Unfortunately, each one has Apache httpclient coming from its Maven dependencies : version 4.0.1 is used by Google SDK and version 4.1 is used by Amazon SDK.

So when you start Elasticsearch with both, you end up with a beautiful exception as follow :

laurent@ponyo:~/dev/elasticsearch-1.0.0.Beta1-SNAPSHOT$ bin/elasticsearch -f
[2013-06-13 22:25:29,044][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: initializing ...
[2013-06-13 22:25:29,144][INFO ][plugins                  ] [Brother Tode] loaded [river-twitter, river-google-drive, mapper-attachments, river-amazon-s3], sites [head]
[2013-06-13 22:25:31,989][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: initialized
[2013-06-13 22:25:31,989][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: starting ...
[2013-06-13 22:25:32,131][INFO ][transport                ] [Brother Tode] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
[2013-06-13 22:25:35,187][INFO ][cluster.service          ] [Brother Tode] new_master [Brother Tode][LSvX2bRIRCWsQGcqvvvC7Q][inet[/]], reason: zen-disco-join (elected_as_master)
[2013-06-13 22:25:35,233][INFO ][discovery                ] [Brother Tode] elasticsearch/LSvX2bRIRCWsQGcqvvvC7Q
[2013-06-13 22:25:35,304][INFO ][http                     ] [Brother Tode] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
[2013-06-13 22:25:35,305][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: started
[2013-06-13 22:25:36,339][INFO ][gateway                  ] [Brother Tode] recovered [3] indices into cluster_state
[2013-06-13 22:25:38,429][WARN ][river                    ] [Brother Tode] failed to create river [amazon-s3][s3docs]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.lang.NoSuchMethodError: org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method <init>()V not found
  at com.github.lbroudoux.elasticsearch.river.s3.river.S3River.<init>(Unknown Source)
  while locating com.github.lbroudoux.elasticsearch.river.s3.river.S3River
  while locating org.elasticsearch.river.River

1 error
	at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(
	at org.elasticsearch.river.RiversService.createRiver(
	at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(
	at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
	at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method <init>()V not found
	at com.amazonaws.http.ConnectionManagerFactory.createThreadSafeClientConnManager(
	at com.amazonaws.http.HttpClientFactory.createHttpClient(
	at com.amazonaws.http.AmazonHttpClient.<init>(
	at com.amazonaws.AmazonWebServiceClient.<init>(
	at com.github.lbroudoux.elasticsearch.river.s3.connector.S3Connector.connectUserBucket(
	at com.github.lbroudoux.elasticsearch.river.s3.river.S3River.<init>(
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
	at java.lang.reflect.Constructor.newInstance(
	at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(
	at org.elasticsearch.common.inject.ConstructorInjector.construct(
	at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(
	at org.elasticsearch.common.inject.FactoryProxy.get(
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(
	at org.elasticsearch.common.inject.Scopes$1$1.get(
	at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(
	at org.elasticsearch.common.inject.InjectorBuilder$
	at org.elasticsearch.common.inject.InjectorBuilder$
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(
	at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(
	... 10 more
[2013-06-13 22:25:38,489][INFO ][] Establishing connection to Google Drive
^C[2013-06-13 22:25:39,214][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: stopping ...
[2013-06-13 22:25:39,502][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: stopped
[2013-06-13 22:25:39,502][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: closing ...

What happens here ? Both plugins are loaded and Google Drive river seems to be loaded first. As you can see here, its libraries are added to ClassLoader first. So the 4.0.1 definition of org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager is first and will be later resolved by classes referencing it. During its init phase, Amazon plugin will try to use this class but needs the 4.1 definition that holds the new ()V method !


An an enhancement proposition, I’ve forked the Elasticsearch repository here and make some rework onto the classloading scheme of plugins. You may now have the possibility to force the loading of plugins into dedicated and isolated classloaders that will try to resolve requested classes using the plugin libraries first and then the main classloader.

Although I’ve made tests with some other plugins (twitter, head, attachment, fsriver) and see no regression, I thought it will be safer to add a feature toggle in order to activate this. Plugin isolation is then only done if the plugin.isolate settings flag is set to true (either from the YAML configuration file or from the command line).

The result is shown below, when started with the -Des.plugin.isolate=true property, dedicated classloaders are used making use of conflicting plugins a breeze :

laurent@ponyo:~/dev/elasticsearch-1.0.0.Beta1-SNAPSHOT$ bin/elasticsearch -f -Des.plugin.isolate=true
[2013-06-13 22:39:59,905][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: initializing ...
[2013-06-13 22:39:59,908][INFO ][plugins                  ] [Commando] Plugin isolation set to true, loading each plugin in a dedicated ClassLoader
[2013-06-13 22:39:59,948][INFO ][plugins                  ] [Commando] loaded [river-twitter, mapper-attachments, google-drive-river, amazon-s3-river], sites [head]
[2013-06-13 22:40:02,801][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: initialized
[2013-06-13 22:40:02,801][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: starting ...
[2013-06-13 22:40:02,941][INFO ][transport                ] [Commando] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
[2013-06-13 22:40:05,990][INFO ][cluster.service          ] [Commando] new_master [Commando][2Xp9SsHsQ_SmFqiDGZUhzg][inet[/]], reason: zen-disco-join (elected_as_master)
[2013-06-13 22:40:06,037][INFO ][discovery                ] [Commando] elasticsearch/2Xp9SsHsQ_SmFqiDGZUhzg
[2013-06-13 22:40:06,097][INFO ][http                     ] [Commando] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
[2013-06-13 22:40:06,098][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: started
[2013-06-13 22:40:07,274][INFO ][gateway                  ] [Commando] recovered [3] indices into cluster_state
[2013-06-13 22:40:11,166][INFO ][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Starting amazon s3 river scanning
[2013-06-13 22:40:11,190][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] lastScanTimeField: 1371154754606
[2013-06-13 22:40:11,190][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Starting scanning of bucket famillebroudoux since 1371154754606
[2013-06-13 22:40:11,985][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Amazon S3 river is going to sleep for 36000 ms
[2013-06-13 22:40:12,182][INFO ][] Connection established.
[2013-06-13 22:40:12,182][INFO ][] Retrieving scanned subfolders under folder Travail, this may take a while...

I am in the process of suggesting this enhancement to Elasticsearch through a pull request. What is your opinion on it ? Will it be useful ? As usual, do not hesitate to send me your comments.

Posted by: lbroudoux (@lbroudoux) | June 13, 2013

Indexing your Amazon S3 bucket with Elasticsearch

I pursue my Elasticsearch journey with a new plugin release today …

So, your company uses Amazon S3 as a storage backend for internal documentation ? Or you’re running a Web application where users can upload and share files and content backed by S3 ? Now, you want/have/need to have the whole suff indexed and searchable using a “mind blowing searh engine” (say Elasticsearch ;-)) ? Well the solution might be the Amazon S3 River plugin for ES released today.

Main features

So what does this plugin do ? Here are the features for this first release :

  • Connect to your S3 bucket using AWS Credentials,
  • Scan only changes from last scan for better efficiency,
  • Filter documents based on folder path (no restriction on the depth level, you can use such path as Work/Archives/2012/Project1/docs/),
  • Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,
  • Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),
  • Indexes document content and document metadata (cause based onto the Attachment plugin),
  • Support ms office, open office, google documents and many formats (full list here),
  • Support scan frequency configuration,
  • Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.


As a disclaimer : when developping this plugin, I discover an Elasticsearch limitation in the fact that all loaded plugins are not isolated from each other and share the same resources (this because plugin libraries are added to main ClassLoader as you can see here). As a consequence, using this new plugin in the conjonction of the Google Drive River plugin previoulsy released is not possible (both Amazon and Google libraries are using conflicting versions of Apache http-client). I’ll tackle this subject if enough time in the forthcoming days.

As usual, do not hesitate to give me your feedback through comments on this post, issues on GitHub projet or tweets (@lbroudoux) !

Posted by: lbroudoux (@lbroudoux) | May 15, 2013

A river plugin for Elasticsearch that index Google Drive

Hi there,

I’ve blogged some weeks ago about a test run I’ve done with Elasticsearch and Kibana3 (now just Kibana, the ‘3’ has been dropped since ;-)). And the fact is that is was so much fun and so pleasant to go with them that I’d like to go further and start digging into Elasticsearch.

Few days scratching my head and looking around the plugin ecosystem of ES and I’ll get the idea of writing a Google Drive river to actually learn from the trenches. So I am happy to announce the 1st release of this Elasticsearch plugin that allows you to index with ES the content of a Google Drive !

Main features

So what does this plugin do ? Here are the features for this first release :

  • Connect to Google Drive in ‘offline’ mode (no need to be connected to your Google account, just to authorize the plugin to do so) using OAuth 2,
  • Scan only changes from last scan for better efficiency,
  • Filter documents based on folder path (only 1 level for the moment),
  • Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,
  • Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),
  • Indexes document content and document metadata (cause based onto the Attachment plugin),
  • Support ms office, open office, google documents and many formats (full list here),
  • Support scan frequency configuration,
  • Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.

Some features are still missing and some may be improved but the basic stuffs should work well and fast. Want to give it a try ? Or help with some ideas, tests or contributions ? Do not hesitate to give me your feedback, I’ll keep on digging and investigating in Elasticsearch the forthcoming weeks, months … who knows !?

Posted by: lbroudoux (@lbroudoux) | May 5, 2013

La cabane est finie !

Enfin !

Ce long week-end a porté ces fruits et je peux enfin annoncer que “Ca y est : la cabane est finie !”. Alors pour tout ceux qui en entendent parler depuis des mois, voici le résultat :


Je voulais aussi profiter de ce moment pour remercier toutes les personnes qui ont contribué en idée, en coup de main ou en prêt d’outils à ce projet. Dans l’ordre chronologique des contributions, merci à :

  • Yann, pour son perforateur,
  • Vincent, pour son coup de clé à pipe d’expert et ses bras,
  • Nicolas, pour ses grands bras (utile pour les poutres en hauteur !),
  • Christophe, pour ses vis et son foret de 13,
  • Jean-Luc, pour la pose des rambardes de terrasse,
  • Ma chérie, pour avoir joué la “commission sécurité” pendant toute la durée du chantier !

Et puis surtout, un grand merci à Alain qui a été présent pendant toute la durée du chantier pour les avis éclairés, les conseils et les nombreux coups de main. Je pense même que mon projet de départ est finalement devenu le nôtre à tous les 2 ! Merci encore.

Une petite visite guidée

Quelques points de vues pour vous donner une idée plus précise du rendu …


La terasse devant la porte décorée :


Même l’oiseau Twitter est de la partie !


Les enfants ont commencé l’aménagement intérieur dés le dernier clou enfoncé :


Un petit historique …

Pour ceux qui ont suivi l’histoire depuis le début, quelques flashbacks en photos ci-dessous.

Les idées, réflexions et études ont eu lieu pendant tous le 1er semestre 2012 (j’avais eu un beau livre sur les cabanes pour Noël ;-)) – mais là, j’ai pas de photos de moi me grattant la tête …

Les premières poutres porteuses ont été montées début Août 2012 :


Puis les vacances sont venues, et début Septembre la plateforme avait à peine pris forme – mais il y avait une échelle ! :


Les prémices de l’ossature de la maison étaient là le 23 septembre :


Le 30 septembre, toute l’armature et la charpente étaient assemblées :


Le bardage a finalement été réalisé durant les longs (et peu productifs ;-) ) mois d’hiver… Voici l’état de la cabane à fin Novembre.



Le printemps est ensuite revenu à point nommé pour donner la dernière touche de motivation nécessaire à la finalisation du chantier. Reste maintenant à en profiter : nous attendons encore une petite montée des températures avant la 1ere nuit perchée !

Posted by: lbroudoux (@lbroudoux) | April 30, 2013

Real time analytics with Elasticsearch and Kibana3

Last month, I attended a great talk on Devoxx French edition (see on “Migrating an application from SQL to NoSQL”. The talk title was pretty well chosen but it was mainly a presentation of 2 products features : Couchbase and Elasticsearh.

Beyond the relevancy of the speakers and the products, an Elasticsearch extension called Kibana3 was briefly introduced and – although marked as alpha release – it totally astonished me ! Kibana3 is an extension designed for real time analytics of data stored into Elasticsearch. It allows a full customization of dashboards and is such easy to use that it can almost be put into the hands of business people…

Some weeks later I found some time for a test run and although things go well, I thought it would be useful to write kind of a “How to” or “Quickstart” with Kibana3. Here it is.

The setup

Install and run Elasticsearch

Download Elasticsearch from (as I recheck everything for writing this post, I have chosen the 0.90.0 release that wasn’t out when I first test this… so everything should run fine also on the 0.20.6 release I’ve picked previously). Just extract the archive into a target directory and simply run the following ;

laurent@ponyo:~$/dev/elasticsearch-0.90.0$ bin/elasticsearh -f
[2013-04-30 00:13:14,312][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: initializing ...
[2013-04-30 00:13:14,321][INFO ][plugins                  ] [Dominic Fortune] loaded [], sites []
[2013-04-30 00:13:17,045][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: initialized
[2013-04-30 00:13:17,046][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: starting ...
[2013-04-30 00:13:17,225][INFO ][transport                ] [Dominic Fortune] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/]}
[2013-04-30 00:13:20,306][INFO ][cluster.service          ] [Dominic Fortune] new_master [Dominic Fortune][evQbXTeASNmADq4h-Q847A][inet[/]], reason: zen-disco-join (elected_as_master)
[2013-04-30 00:13:20,353][INFO ][discovery                ] [Dominic Fortune] elasticsearch/evQbXTeASNmADq4h-Q847A
[2013-04-30 00:13:20,376][INFO ][http                     ] [Dominic Fortune] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/]}
[2013-04-30 00:13:20,376][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: started
[2013-04-30 00:13:20,489][INFO ][gateway                  ] [Dominic Fortune] recovered [0] indices into cluster_state

Congratulations ! You are now running an Elasticsearh cluster with one node ! That is basically anything you need in order to have a basic setup because every interaction with the node – from the administration ones to the client APIs – are done through REST APIs over HTTP. That means a simple CURL command does the job.

Anyway, before going further, we’d like to add an administration console to our cluster (cause having some GUI doesn’t hurt after all) and we need to feed our node with data. For that, we are going to install 2 plugins :

Plugins simply install themselves using the bin/plugin command as follow.

For elasticsearch-head :

laurent@ponyo:~/dev/elasticsearch-0.90.0$ bin/plugin -install mobz/elasticsearch-head
-> Installing mobz/elasticsearch-head...
Trying (assuming site plugin)
Downloading ............DONE
Identified as a _site plugin, moving to _site structure ...
Installed head

For elasticsearch-river-twitter :

laurent@ponyo:~/dev/elasticsearch-0.90.0$ bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.2.0
-> Installing elasticsearch/elasticsearch-river-twitter/1.2.0...
Downloading ...............................................................................................................................................................................................................................................DONE
Installed river-twitter

Now just restart your node by killing the started elasticsearch process and launching another one and point your browser to http://localhost:9200/_plugin/head/ ; you should now have access to web frontend.

Install and run Kibana3

As said into introduction, Kibana3 is an Elasticsearch plugin hosted by Elasticsearch itself and dedicated to analytics by providing the mean to dynamically build any dashboard onto an ES index (the data store). The best way to retrieve the product is to clone the GitHub repository like this :

laurent@ponyo:~/dev/github$ git clone
Cloning into 'kibana3'...
remote: Counting objects: 2148, done.
remote: Compressing objects: 100% (892/892), done.
remote: Total 2148 (delta 1305), reused 2060 (delta 1226)
Receiving objects: 100% (2148/2148), 11.47 MiB | 273 KiB/s, done.
Resolving deltas: 100% (1305/1305), done.

As states Kibana3 documentation, it’s ‘just’ a bunch of static HTML and Javascript resources that can be put onto any reachable web server. For test commodity, Kibana3 embeds a little Node.js server that can be run if you’re lazy like me :

laurent@ponyo:~/dev/github/kibana3$ node scripts/server.js 
Http Server running at http://localhost:8000/

You can now check http://localhost:8000/index.html with your web browser and should see a default dashboard appearing with a bunch of red panels announcing errors… We’re going to fix that in next section.

The dashboard creation

Before starting to acutally create a dashboard, we need data ! Remember, we have installed the Twitter river plugin : we are going to connect Twitter public stream to retrieve such data. In order to complete following step, you need a valid Twitter account.

The following command helps us creating a Twitter connection specifying some trendy keywords ;-) Just substitute the placeholders with your Twitter account name and password and that’s done.

laurent@ponyo:~$ curl -X PUT 'localhost:9200/_river/twitter-river/_meta' -d '{ "type" : "twitter", "twitter" : { "user" : "<twitter_user>", "password" : "<twitter_password>", "filter" : { "tracks" : "java,nosql,node.js,elasticsearch,eclipse,couchdb,hadoop,mongodb" } }, "index" : { "index" : "tweets", "type" : "status", "bulk_size" : 5 } }'

By browsing to http://localhost:9200/_plugin/head/, you should see the number into “tweets” index grow fast.

Let’s go back now to the defaul Kibana3 dashboard into your web browser.. We are gonna change somme params to make it a descent dashboard. First thing to change is the “Timepicker” widget that is use to define the data store on which dashboard it based.

14-may update

For the lazy ones (;-)) that will only want to see the result without building the dashboard, I’ve posted the JSON export here as a Gist : It’s easily importable into Kibana.

Edit this widget settings and change the time field as follow :


and then the index patterns as follow :


You should already have a descent dashboard as below (I’ve also changed the dashboard title and the time resolution to see many green bars on histogram).


You can experiment the “Zoom In” and “Zoom Out” on histogram and see their effect onto timepicker widget. You can also draw a rectangular zone onto histogram in order to zoom to this temporal period. Typing keywords into the Query input fied also have dynamic effects on searched records and histogram.

When moving down the page, you see a table widget that still have errors. Its goal is to display excerpts of found records. Edit this widget parameters as follow to configure it to correctly display your tweets :


You see that we reference here the different fields found into a Twitter message coming from public stream (such informations on available fields can be found through the Head web frontend when browsing indexes and looking at stored documents).

Note that we can also modify the layout of widgets by editing row parameters. For exemple, we’re switching table and fields widgets to suit our preferences. Fields widget is indeed very convenient for adding new fields to table view. The screenshot below shows a result obtained after such a switch.


Last thing I’ll show you here is the addition of new Kibana3 widget onto your dashboard. We are now going to display a map showing location of our Twitter users into the “Events” row. Open this row settings editor and select “map” into the new panel dropdown list. Then you’ll have to tell which field is used to get this information ; in the case of tweets the field is “place.country_code”. The setting is shown below :


Don’t forget to click on the “Create Panel” button before closing editor ! The map now displays on your row. Finally after having heavenly distribute widget onto the row, you may reach the following result :


The map widget is also clickable and can be used to drilldown into the data previously selected using query filter and/or timepicker filter. Quite impressive !


If I succeed in my demonstration, you have seen that using Kibana3 can be just easy when understanding the basic customization steps. Kibana3 looks like a very promising tool into this new area of big data, data scientist and miners that has appeared last years.

Some features might be still missing (like a complete integration with Elasticesearch indexes or document types catalogs, security around data consultation or dashboard sharing, etc…) for ensuring a deployment into enterprise world. However premices are already there with the ability of storing Kibana3 dashboard into Elasticsearch itself and the recent posts on how to secure an Elasticsearch cluster (see for french readers).

I think that Kibana3 being hosted under the Elasticsearch umbrella may be a guarantee of seeing this extension developped and enhanced in the near future. In my humble opinion, this can represent a big advantage onto Elasticsearch business cards.

Posted by: lbroudoux (@lbroudoux) | April 21, 2013

Launching Acceleo generation from Maven – take 3

Last year, I’ve wrote a post on Launching Acceleo generation from Maven. This was actually a second post on this topic – second post introducing multi Acceleo generators project build.

Things run well during this post redaction but after some weeks and colleagues tests, I realized that Acceleo had some limitations that made this build setup hard to be portable. To summarize : when it comes to referencing modules coming from other projects, Acceleo uses multiple forms to reference paths : relative paths when built dynamically by the IDE, platform:/ paths when exported as a plugin and absolute jar paths when built via Maven (our case).

The limitations

To illustrate, here is an excerpt of the entityFile.emtl module you may find on my sample project, The reference to my own Maven local repository location made it hard to be portable !

 <extends href="jar:file:/home/laurent/dev/repository/com/github/lbroudoux/acceleo/!/com/github/lbroudoux/acceleo/uml/java/files/classFile.emtl#/0"/>
 <imports href="jar:file:/home/laurent/dev/repository/com/github/lbroudoux/acceleo/!/com/github/lbroudoux/acceleo/uml/java/files/commonFile.emtl#/0"/>

When it comes to deployment onto our projects (in my company for my day-time job), these limitations do not really bother us because development and CI machines setup were standardized and we were sure that every local Maven repos were having the same location. I finally put up this problem over my shoulder and forgot it …

… until Dave comments !

Last week, Dave commented out (see its comments) this blog post, remembering me that this issue was left unsolved but still deserves some interest … While Dave is following a pure Java solution, I’m showing in this new post a pure Maven workaround, so let’s go.

A Maven workaround

The principle of this workaround is the following : as referencing other jar archives into the EMTL files make the build not portable, stop using multiple jar archives and use only one uber jar with referenced paths being relatives !

I know that this sounds weird as Maven promotes fine grained and atomic artifacts with transitive dependency resolution and so on … but it also open ways for different forms of artifacts when running/deploying into a constrained environment through the notion of assembly. That is exactly our situation : we’ve got a constrained running environment so we’re going to use an assembly.

The explanation takes place in 3 steps.

Replacing references into EMTL files

The first step is to deal with the referenced jar paths placed into EMTL files by the Acceleo compiler. The goal is to replace them by relatives paths. For this, we can use the Replacer plugin into the build of the Acceleo module referencing other modules.

In my sample project, this means modyfing the pom.xml file of the module as follow :


This configuration basically tells to activate plugin on the prepare-package phase and to process any emtl file to replace the given regular expression denoting an absolute jar path by this relative path.

On the excerpt we took above, this lead to the following result :

 <extends href="../../../../../../../../com/github/lbroudoux/acceleo/uml/java/files/classFile.emtl#/0"/>
 <imports href="../../../../../../../../com/github/lbroudoux/acceleo/uml/java/files/commonFile.emtl#/0"/>

Please, be sure to note 2 important things :

  • the value given for replacement is dependent of the package you for this current Acceleo module files ( in this case),
  • the value is the same for any EMTL file because sample project follows Acceleo best practice in term of package naming : each package containing generator is at the same deepness from root (not following that best practice make this workaround non applicable in this state – configuration of replacer might be trickier !)

Creating a flattened uber assembly

Next step is now to create an archive that will contains :

  • the EMTL and class files of our current Acceleo module (the one reworked during step 1),
  • the EMTL and class files of the generators we depend on (their coming from Maven dependencies)

The whole resources should be flattened : all put together into a single package hierarchy, into a single jar file for still being usable as a library.

In order to do that, we start declaring a configuration for the Maven assembly plugin into the pom.xml of the Acceleo module referencing other modules (check sample project) :


This configuration tells to activate assembly during the package phase (so after the pre-package) and to refer to descriptor present into assembly.xml file. This is a new file and you just have to create it into project root folder. Its content is the following :

<assembly xsi:schemaLocation="">

The important part here is to specify that our assembly with use a uber qualifier/classifier for its result archive and that self artifact and dependency artifact should be specified into inclusions.

From now, when doing a mvn install into this Acceleo module, Maven should now produce 2 artifacts : the main one that we already got and a the new uber one holding every EMTL reources with relatives paths flattened. That new artifact is attached as secondary artifact to your build process.

Using this new archive for generation

Last step is now to modify our application that integrates Acceleo generators during its own build process : we should now tell it to use the uber jar we produced at previous step. This modification is simply done editing the pom.xml of your application and adding a classifier information.

When looking at my sample application file, it represents a single new line highlighted below :

  <!-- Configuration details here -->

Uber jar with relatives references is now used and should make your build portable ! You can check the application of this workaround onto my sample project looking at the Github commit.

As always, feedback and comments are greatly appreciated !

Posted by: lbroudoux (@lbroudoux) | February 12, 2013

Generating SOA contracts with Obeo SOADesigner

I have dealt last weeks with evaluating SOADesigner (see as a complementary solution of a traditional Enterprise Architecture Management suite we are using at day work. One of our goals when deciding to use this suite was to minimize the gap between architecture analysis and realizations by generating and managing SOA assets such as WSDL and XSD artifacts. Obviously we did not succeed and then evaluate another way to get the job done…

SOADesigner is based on Eclipse tooling and implies many Eclipse Modeling initiative technologies. It provides a bunch of EMF Metamodels related to information system management in general and SOA in particular ; so that models produced on its top can be used by tools like Acceleo for generating text artifacts from.

The purpose of this blog post is to introduce the Acceleo generators I have realized for producing WSDL and XSD artifacts from SOADesigner models. The generators – still a work in progress – have been open sourced and put onto Github. You can find them here and I’ll explain later how to use them.

As an introduction and to setup ideas, here’s some screenshots of the kind of diagrams and concepts you may work with into SOADesigner.

Exchange model design

This first one, covers the design of the exchange model that will be used for services interface specification. The elements of such a model are called DTO (for Data Transfert Object) and may be initialized from Entity elements. DTO are organized into Category – which is roughly the same notion as a package – within a DTO Registry.


Service model design

This second diagram deals with the specification of Services within a Component. Service may hold many operations through its interface that can be detailed in terms of input and output specifications. You see here that we’re quite close of the SOA / WebServices terminologies apart the missing of fault specification (but there’s a feature request on its way ;-)).


Generators specifications

The design generators specifications are the following :

  • generate 1 XSD artifact per Category or sub-Category holding DTOs,
  • use the parent system name, category name and version to produce distinct file name,
  • generate 1 WSDL artifact per Service holding Operations,
  • make the WSDL artifact hold only the service related datatypes and reference reusable one from XSD,
  • use the service name and version to produce distinct file name

As an example on the model that is embedded into the tests modules of the Git repository, we achieve the following results in term of artifacts generation :


Generators features and usage

The currently supported features of generators are as followed :

  • usage of descriptions put into models to annotate artifacts with documentation,
  • usage of multiplicity informations to generate according XSD occurence specifications,
  • correct import XSD within another XSD or a WSDL,
  • correct usages of different namespaces during inclusions and reuse,
  • support of inheritance between DTOs,
  • support of composition and references between DTOs

If you would like to give them a try, you’ll have for now to git clone the repository (I have not yet released them under an plugin) and import the plugins/com.github.lbroudoux.acceleo.soa.contracts into your Eclipse workspace. Then you’ll have to create a new Acceleo launcher referencing a fresh model and the com.github.lbroudoux.acceleo.soa.contracts.main.GenerateAll class as the Acceleo generator class.

Obviously, we assumed you’ll have previously installed SOADesigner as mentionned here onto an Eclipse setup – so that you will have designers but also complete Acceleo environment sets up.

As always, feedback and comments are greatly appreciated !

Posted by: lbroudoux (@lbroudoux) | December 19, 2012

Some thoughts on testing Acceleo Java generators

I’ve tried the last few weeks to found the best way to write automated tests for my Acceleo generators and have come to some thoughts and findings that may be interesting.

The first resource I found on the subject was this Tumblr post from Stephane Begaudeau about how unit testing the Acceleo templates and queries. Even though Stephane work was close to an achievement, there was still work to be done on Acceleo API.

Also I realized that my needs were much closer to integration testing : follow a “black box” approach and have the ability to automatically launch non regression tests on a blueprint model when my generators change. Thanks to my previous findings on launching Acceleo from Maven, I would have the base infrastructure to do this.

All the code samples of this post and more details can be found on GitHub into the repository, into the folder.

3 levels of integration testing

So the use-case I’ll try to cover is the following : I have Acceleo generators that produce a bunch of Java and Xml files from a model and I’m going to write test to check that produced files are correct. I quickly realize that this checkings my be done at 3 different levels.

Byte code level

The first level is the byte code one and is available for the Java part of my use-case (you can of course extend this to any compiling language). This first level is interesting because it can be quickly achieved, just use the compiler and the reflection APIs provided by the JDK to check : that methods are generated with correct params, that fields are presents and so on …

However, this method has some limitations :

  • the code you produce has to compile ! This may seems odd but in large projects you may have many generators producing only parts of the whole puzzle and it may be difficult to make every generated unit compiling without pulling a lot of dependencies. Also sometimes you may want your generated code not to compile in order to force the developer to write something clever ;-)
  • compilation is a destructive process ! Some elements found in the sources dissapear when transformed into byte code … How to : retrieve parameter names, check javadoc presence or ensure that annotations are all there ?

Syntax tree level

The second level answers these limitations by providing a representation of the tree of directives found in the source files. This is generally called an “Abstract Syntax Tree” that may be produced and visited using APIs. The most famous one is the DOM APIs that represent the AST of an Xml document but analogous tools exists for Java and compiled language (we’ll see 2 of them in next section)

Using this AST allows us to check details that dissapear after the compilation step. It also allows us to verify that some statements (assignations, loops, switch, …) are done in respect of the coding rules/conventions. Sadly enough, an AST is “Abstract” that means that some details are still missing depending on the performance of the available tooling. Sometimes, it is necessary to go deeper to the third level…

String level

This last level is the obvious but tedious one : work at the string level (because after all everything we produce is text !). The kind of tools employed is more primitive : pattern and regexp matchers, string comparison and file readers. Every language offers its shortcuts for that (personally, I love Groovy grep, file and =~ operator a lot ;-))

Possible solutions for Java

The Syntax tree level is maybe the most ignored one, so I’m going to detail it a bit more … After a quick Google search, 2 solutions seems available.


Javaparser is a project hosted on and that seems quite inactive for a time. However, it does most of the job right.

The usage principle is the following : you just have to parse an input stream in order to obtain a compilation unit AST.

CompilationUnit stuffCu = null;
FileInputStream in = new FileInputStream("../");
   // Parse the file.
   stuffCu = JavaParser.parse(in);

Then, in order to traverse/browse the produced compilation unit, you’ll have to write a visitor implementation like this :

public class ClassSummaryVisitor extends VoidVisitorAdapter<Object>{
   // [..]
   public void visit(ClassOrInterfaceDeclaration n, Object arg){
      summary = new ClassSummary(n);
      super.visit(n, arg);

You finally have to call your visitor and, for example, later retrieve the produced representation :

// Use a visitor to build a summary.
ClassSummaryVisitor csVisitor = new ClassSummaryVisitor();
csVisitor.visit(stuffCu, null);
cs = csVisitor.getSummary();

Eclipse JDT

JDT stands for Java Devloper Tools and its the module responsible of all the Java related things in the Eclipse platform. It is used by project management module, editors, code style checker, etc in the IDE.

The main class in JDT that has to be used for our purpose is ASTVisitor. It’s usage is quite analogous to Javaparser except the initialization part that is a little bit trickier :

// Creates an input stream for the file to be parsed.
File javaFile = new File("../");
BufferedReader in = new BufferedReader(new FileReader(javaFile));
final StringBuffer buffer = new StringBuffer();
String line = null;
while (null != (line = in.readLine())) {
// Parse and get the compilation unit.
ASTParser parser = ASTParser.newParser(AST.JLS3);
CompilationUnit stuffCu = (CompilationUnit)parser.createAST(null);

For the rest of the usage, I recommend you checking the GitHub project.

Maven to the rescue

We’ve got our base tooling and thanks to Maven and JUnit we can automate all of this.

Test process

First, be sure to place the parsing code above into a JUnit test case class initialization block and then write some test methods with assertions like below :

public void testAnnotationPresence(){
   // Check that JPA annotation is present.

public void testFieldsPresence(){
  // Check that two fields have been produced.
  assertEquals(ModifierSet.PUBLIC, cs.getField("foo").getModifiers());
  assertEquals(ModifierSet.PUBLIC, cs.getField("bar").getModifiers());
public void testFieldsJavadoc(){
  assertTrue(cs.getField("foo").getJavadoc().toString().contains("Description of the property foo."));

Then, build yourself a blueprint model including all the generation cases you may allow and follow my previous posts in order to automatically enable the generation from this model using Maven.

Be sure that Maven launches the Acceleo generation before the execution of your JUnit test. For that, I recommend you to bind the generation onto the generate-test-resources phase of your build lifecycle.

Sample project

I mentionned above the sample project I’ve committed to GitHub. In this project, you will found :

  • A blueprint Uml model testModel.uml at project root that uses the generators from my previous posts,
  • A Maven pom.xml files that generates files into /target/test-resources directory
  • A Junit test case using Javaparser into /src/test/java/com/github/lbroudoux/acceleo/uml/java/jpa/files
  • A Junit test case using Eclipse JTD into /src/test/java/com/github/lbroudoux/acceleo/uml/java/jpa/files
  • Samples visitors for Javaparser and JDT into /src/main/java/com/github/lbroudoux/japa|jdt


Javaparser and Eclipse JDT are great tools for going into details of the source code and allow the checking of many things that dissapear with compilation. However, I have found limitations on both in the support of line and blocks comments that are not Javadocs :

  • JDT fully ignore them and give only information on the start and end line of blocks (useless !),
  • Javaparser tries to handle them but a bug into its parser make him lose the starting point of line comments if not followed by a Java instruction (a ‘;’ character is enough)

This feature would have been usefull for checking – for example – that my Acceleo templates were actually providing some protected area for code to be inserted !

The whole tool chain (AST + JUnit + Maven + Acceleo) makes the non regression checks on generators a breeze mainly if you plug them into a Continous Integration Server (such as Jenkins) in order to have thet checks trigerred by a modification on your Maven module containing your generators !

Let me know if it help some of you … Do not hesitate sending me feedback or other ideas on generators test automation !

Older Posts »



Get every new post delivered to your Inbox.

Join 78 other followers