Posted by: lbroudoux (@lbroudoux) | June 25, 2015

EIP to bridge the gap between EA and Development ?

This post introduces the EIP Designer project I work on for some months now (see https://github.com/lbroudoux/eip-designer and http://www.enterpriseintegrationpatterns.com for introduction on EIP).

Full content has been removed for now because I am in the process of submitting article to an online publication. Be sure I’ll post back either a link to the published article if proposal is accepted ; either the full content if not.

Come back check for it soon !

Posted by: lbroudoux (@lbroudoux) | January 20, 2015

Manage your Elasticsearch rivers with Sluice

As you may have noticed, I am an addicted user of Elasticsearch and have already written some river plugins for indexing different data sources such as Amazon S3 buckets or Google drives.

As a river plugins developper, I usually find myself in situations where I should test many Elasticsearch version and configuration combos – for that I have to reinstall fresh copies of my plugins. Also, as a river user, I have to create many rivers – and it’s often a trial and error process in order to find the correct configuration in regards of content to index, data freshness, index and request performance, mapping issues and so on…

So this lead me to the point that I felt tired about those long CURL commands needed to configure all this stuffs and decided to react ! Today I’m introducing you a new Elasticsearch plugin I’ve wrote to make my life easier (and maybe yours if you’re using rivers too ;-)): Sluice !

Features

So what’s Sluice ? As stated above, Sluice is also an Elasticsearh plugin whose goal is to help you manage your rivers : it simplifies installation of the required plugins but also helps you setup and tune your rivers. The idea with Sluice is to no longer have CURL commands to type, just install the Sluice plugin and then use its simple User interface.

Sluice is hosted on Github and install as a regular plugin by typing the following command in a shell :

$ bin/plugin --install com.github.lbroudoux.elasticsearch/sluice/0.0.1

Now restart your Elasticsearch instance and point your browser to http://localhost:9200/_plugins/sluice. You should see the following dashboard appear.

sluice-dashboard

You see here that Sluice checks installed river plugins among supported ones and offers simple way to install river plugins not already installed. Just click the Install link and its cares about retrieving and setting up Amazon S3 river plugin for example. You just then need to restart your ES node.

Picking up the dedicated section, you may also have the list all the river instances created into your ES cluster. For now, you can just edit and modify existing river – not remove them.

sluice-rivers

Finally, it offers a convenient way to add a new river. Configuration attributes of the river are grouped together with clear explanation of its meaning and supported format.

sluice-river-edit

Easy, no ? For the moment, supported River plugins are :

  • Amazon S3 River plugin,
  • Google Drive River plugin

Limitations

Sluice has only a first release named 0.0.1 and it’s far from being feature complete !
The current limitations are :

  • Only work with local development instances (yep ! http://localhost:9200 is hard coded… so ugly ! :-()
  • No way of removing rivers,
  • No way to start/stop rivers,
  • ES reboot is required after plugin installation

Future plans

Many useful features come to my mind – the order of the list has no relation with priority :

  • Configuration of Elasticsearch cluster endpoint,
  • Ability to remove or duplicate rivers,
  • Support of other river plugins such as the excellent FSRiver or the TwitterRiver,
  • Ability to start/stop or force a refresh of river settings while running,
  • Ability to get the CURL command of river for recreating it later (useful when tuning has been done in Dev or QA and that river creation should be scripted on production),
  • Rivers indexing statistics on dashboard !

Do not hesitate giving me feedback and sharing your feature ideas for future release of Sluice !

Posted by: lbroudoux (@lbroudoux) | May 5, 2014

Use Elasticsearch as a data store for your Spring Roo app

So long ago since my last post but be sure I have not been devoided of thoughts since then (have seen the blog title ? ;-)). Just a lack of time and energy to write things down…

I resume today with blogging with a Spring Roo plugin I finished last week. For those that didn’t know about Spring Roo : it’s a productivity tool helping you bootstrapping a Spring application within seconds. And although excitement seems to be more around Spring Boot these days, I found Roo to be a valuable tool for a developper toolbox… Anymway, Roo comes with many plugins allowing you to chose your persistence layer and APIs : typically JPA based or MongoDB based.

I’ve started some months ago this plugin allowing you to have a persistence layer based on Elasticsearch. The idea here is to have your domain objects directly persisted into an Elasticsearch index and – thanks to the conventions of Roo – quickly having a CRUD service layer and scaffolded web screens directly generates for us. After a little contribution to Spring Data for Elasticsearch (here), the plugin was on its way and is now hosted here on Github.

Twitter example development

The plugin is not yet released to official Spring Roo repository to installation is a bit teadious… The README.md on Githud explains how to do that so I won’t delve into this part. Instead, I propose to illustrate more in details the Twitter example that is used to illustrate the plugin commands.

In order to complete this tutorial, you’ll need :

  • Spring Roo with plugin installed (I’ve used 1.2.2.RELEASE),
  •  Maven installation (I’ve used 3.0.4),
  • Elasticsearch installation running on port 9200 and 9300 (I’ve used 1.1.1).

So let start with a brand new project. In a new directory, start a Roo shell and create a new project with this command :

project --topLevelPackage com.github.lbroudoux.es

Project initialization

This produces a bunch of configuration files as shown by screenshot above. Now, next thing to do is to activate the Elasticsearh layer plugin for Roo and setting it up for using a ES node that is non local to the JVM and hosted on localhost:9300. You do this with this line :

elasticsearch setup --local false --clusterNodes localhost:9300

Elasticsearch setup

Configuration files are generated for you, dependencies (to spring-data-elasticsearch) are added for you and Spring version is updated to required one. Following step is to tell Roo you want a Tweet domain object that will be backed by Elasticsearch. This is done through this new variation of the entity command available in Roo :

entity elasticsearch --class ~.domain.Tweet

Domain creation

Tweet domain Java class is generated and followed by AspectJ ITD. You can now embellish your domain class with fields such as author and content that should be limited to 140 characters length. This is done with the following based commands in Roo :

field string --fieldName author
field string --fieldName content --sizeMax 140

Fields addition

Nothing more to say here : Tweet class is modified. Next step is more interesting : it’s here that you’re asking the plugin to generate a Spring Data repository layer for persisting Tweets into ES. This is done by :

repository elasticsearch --interface ~.repository.TweetRepository --entity ~.domain.Tweet

Repository creation

You see that a new interface TweetRepository has been generated and that an ITD that triggers an Elasticsearch implementation proxy is also present. By now, we have to create a CRUD Service layer for our repository and its done simply using this command :

service --interface ~.service.TweetService --entity ~.domain.Tweet

CRUD service creation

The TweetService interface and its implementations are generated in a way that they’re using the repository we’ve generated earlier in order to persist and retrieve Tweet instances. Finally, in order to easily test and check the resulting application, we have to setup a web layer and generate scaffolded screens for our domain objects. This is done by sequencing these 2 commands :

web mvc setup
web mvc all --package ~.web

Web scaffolding

And a bunch of web resources, controllers and configuration files are now present into our application. Development is done !

Twitter example execution

We now want to execute all of this in order to properly test our app (Yes : Roo offers many way to unit and integration test your app but a screen is more expressive, at least for a blog post ;-)).

First, in a terminal, start your Elasticsearch node on localhost. Default command will do the job, you don’t need extra configuration :

bin/elasticsearch

Then, from the terminal you were working with Roo shell : exit the shell and launch the Tomcat plugin executing your app. This is done with this Maven command :

mvn tomcat:run

After Tomcat has started up, you can now open a browser to http://localhost:8080/es. You’ll get this screen this is the default home page for application.

Home screen

From there, you can access a page allowing you to create new Tweets with the fields we have added to our domain class.

Tweet creation

Persistence works fine and you’ll see by checking icons that every services are here for showing, updating, finding and deleting Tweets.

First tweet

Twitter example validation

Then you would told me : “Ok, ok… Stuffs are persisted but how do you know that they’re persisted into Elasticsearch node ?”. A simple thing to do is to check on ES using Marvel monitoring solution (I highly recommend you to install it if not already done !). So open a new browser tab to http://localhost:9200/_plugin/marvel/ and check the “Cluster Overview” dashboard.

Marvel indices

You see that a new index called tweets containing 1 document is now present. In order to check its content, you can go to the “Sense” dashboard that offers you online querying tool for your indices : http://localhost:9200/_plugin/marvel/sense/index.html.

Validation query

Now, you see our first tweet has really been persisted into Elasticsearch !

Conclusion

So I have demonstrated you how to write a full-blown Spring application that :

  • Persist and retrieve its domain object into Elasticsearch,
  • Is correctly architectured with a repository layer and a service layer,
  • Presents basic administrative web frontend,

in no more than 9 lines of Roo commands ! Wouah !

Much much more over this basic persistence stuffs, we’re able – as a developer – to build cool apps using the powerful indexing and querying features of Elasticsearch easily. Just consider this tutorial has a quick-starter and think about : full-text search, geo query, analytics and aggregates on various fields of your Tweets … it is close to hand !

Posted by: lbroudoux (@lbroudoux) | June 13, 2013

Plugin isolation support in Elasticsearch

As I blogged yesterday, I recently discover a limitation into Elasticsearch architecture regarding the isolation of plugins. The fact is that every plugin and its libraries are added to the same Java ClassLoader during startup and thus all the plugins share resources and classes definitions.

Observation

I encounter this by developping and testing 2 plugins : one for indexing documents stored onto Google Drive ; the other for indexing documents stored onto Amazon S3. Unfortunately, each one has Apache httpclient coming from its Maven dependencies : version 4.0.1 is used by Google SDK and version 4.1 is used by Amazon SDK.

So when you start Elasticsearch with both, you end up with a beautiful exception as follow :

laurent@ponyo:~/dev/elasticsearch-1.0.0.Beta1-SNAPSHOT$ bin/elasticsearch -f
[2013-06-13 22:25:29,044][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: initializing ...
[2013-06-13 22:25:29,144][INFO ][plugins                  ] [Brother Tode] loaded [river-twitter, river-google-drive, mapper-attachments, river-amazon-s3], sites [head]
[2013-06-13 22:25:31,989][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: initialized
[2013-06-13 22:25:31,989][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: starting ...
[2013-06-13 22:25:32,131][INFO ][transport                ] [Brother Tode] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.80:9300]}
[2013-06-13 22:25:35,187][INFO ][cluster.service          ] [Brother Tode] new_master [Brother Tode][LSvX2bRIRCWsQGcqvvvC7Q][inet[/192.168.1.80:9300]], reason: zen-disco-join (elected_as_master)
[2013-06-13 22:25:35,233][INFO ][discovery                ] [Brother Tode] elasticsearch/LSvX2bRIRCWsQGcqvvvC7Q
[2013-06-13 22:25:35,304][INFO ][http                     ] [Brother Tode] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.80:9200]}
[2013-06-13 22:25:35,305][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: started
[2013-06-13 22:25:36,339][INFO ][gateway                  ] [Brother Tode] recovered [3] indices into cluster_state
[2013-06-13 22:25:38,429][WARN ][river                    ] [Brother Tode] failed to create river [amazon-s3][s3docs]
org.elasticsearch.common.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.lang.NoSuchMethodError: org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method <init>()V not found
  at com.github.lbroudoux.elasticsearch.river.s3.river.S3River.<init>(Unknown Source)
  while locating com.github.lbroudoux.elasticsearch.river.s3.river.S3River
  while locating org.elasticsearch.river.River

1 error
	at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:344)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:178)
	at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
	at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:132)
	at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:66)
	at org.elasticsearch.river.RiversService.createRiver(RiversService.java:138)
	at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:270)
	at org.elasticsearch.river.RiversService$ApplyRivers$2.onResponse(RiversService.java:1)
	at org.elasticsearch.action.support.TransportAction$ThreadedActionListener$1.run(TransportAction.java:87)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method <init>()V not found
	at com.amazonaws.http.ConnectionManagerFactory.createThreadSafeClientConnManager(ConnectionManagerFactory.java:26)
	at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:95)
	at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:118)
	at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:65)
	at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:298)
	at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:280)
	at com.github.lbroudoux.elasticsearch.river.s3.connector.S3Connector.connectUserBucket(S3Connector.java:66)
	at com.github.lbroudoux.elasticsearch.river.s3.river.S3River.<init>(S3River.java:131)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:54)
	at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
	at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:98)
	at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:52)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:45)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:819)
	at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:42)
	at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:57)
	at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:45)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:200)
	at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:1)
	at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:812)
	at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
	at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
	... 10 more
[2013-06-13 22:25:38,489][INFO ][com.github.lbroudoux.elasticsearch.river.drive.connector.DriveConnector] Establishing connection to Google Drive
^C[2013-06-13 22:25:39,214][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: stopping ...
[2013-06-13 22:25:39,502][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: stopped
[2013-06-13 22:25:39,502][INFO ][node                     ] [Brother Tode] {1.0.0.Beta1-SNAPSHOT}[6098]: closing ...

What happens here ? Both plugins are loaded and Google Drive river seems to be loaded first. As you can see here, its libraries are added to ClassLoader first. So the 4.0.1 definition of org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager is first and will be later resolved by classes referencing it. During its init phase, Amazon plugin will try to use this class but needs the 4.1 definition that holds the new ()V method !

Enhancement

An an enhancement proposition, I’ve forked the Elasticsearch repository here and make some rework onto the classloading scheme of plugins. You may now have the possibility to force the loading of plugins into dedicated and isolated classloaders that will try to resolve requested classes using the plugin libraries first and then the main classloader.

Although I’ve made tests with some other plugins (twitter, head, attachment, fsriver) and see no regression, I thought it will be safer to add a feature toggle in order to activate this. Plugin isolation is then only done if the plugin.isolate settings flag is set to true (either from the YAML configuration file or from the command line).

The result is shown below, when started with the -Des.plugin.isolate=true property, dedicated classloaders are used making use of conflicting plugins a breeze :

laurent@ponyo:~/dev/elasticsearch-1.0.0.Beta1-SNAPSHOT$ bin/elasticsearch -f -Des.plugin.isolate=true
[2013-06-13 22:39:59,905][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: initializing ...
[2013-06-13 22:39:59,908][INFO ][plugins                  ] [Commando] Plugin isolation set to true, loading each plugin in a dedicated ClassLoader
[2013-06-13 22:39:59,948][INFO ][plugins                  ] [Commando] loaded [river-twitter, mapper-attachments, google-drive-river, amazon-s3-river], sites [head]
[2013-06-13 22:40:02,801][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: initialized
[2013-06-13 22:40:02,801][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: starting ...
[2013-06-13 22:40:02,941][INFO ][transport                ] [Commando] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.80:9300]}
[2013-06-13 22:40:05,990][INFO ][cluster.service          ] [Commando] new_master [Commando][2Xp9SsHsQ_SmFqiDGZUhzg][inet[/192.168.1.80:9300]], reason: zen-disco-join (elected_as_master)
[2013-06-13 22:40:06,037][INFO ][discovery                ] [Commando] elasticsearch/2Xp9SsHsQ_SmFqiDGZUhzg
[2013-06-13 22:40:06,097][INFO ][http                     ] [Commando] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.80:9200]}
[2013-06-13 22:40:06,098][INFO ][node                     ] [Commando] {1.0.0.Beta1-SNAPSHOT}[6253]: started
[2013-06-13 22:40:07,274][INFO ][gateway                  ] [Commando] recovered [3] indices into cluster_state
[2013-06-13 22:40:11,166][INFO ][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Starting amazon s3 river scanning
[2013-06-13 22:40:11,190][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] lastScanTimeField: 1371154754606
[2013-06-13 22:40:11,190][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Starting scanning of bucket famillebroudoux since 1371154754606
...
[2013-06-13 22:40:11,985][DEBUG][com.github.lbroudoux.elasticsearch.river.s3.river.S3River] [Commando] [amazon-s3][s3docs] Amazon S3 river is going to sleep for 36000 ms
[2013-06-13 22:40:12,182][INFO ][com.github.lbroudoux.elasticsearch.river.drive.connector.DriveConnector] Connection established.
[2013-06-13 22:40:12,182][INFO ][com.github.lbroudoux.elasticsearch.river.drive.connector.DriveConnector] Retrieving scanned subfolders under folder Travail, this may take a while...
...

I am in the process of suggesting this enhancement to Elasticsearch through a pull request. What is your opinion on it ? Will it be useful ? As usual, do not hesitate to send me your comments.

Posted by: lbroudoux (@lbroudoux) | June 13, 2013

Indexing your Amazon S3 bucket with Elasticsearch

I pursue my Elasticsearch journey with a new plugin release today …

So, your company uses Amazon S3 as a storage backend for internal documentation ? Or you’re running a Web application where users can upload and share files and content backed by S3 ? Now, you want/have/need to have the whole suff indexed and searchable using a “mind blowing searh engine” (say Elasticsearch ;-)) ? Well the solution might be the Amazon S3 River plugin for ES released today.

Main features

So what does this plugin do ? Here are the features for this first release :

  • Connect to your S3 bucket using AWS Credentials,
  • Scan only changes from last scan for better efficiency,
  • Filter documents based on folder path (no restriction on the depth level, you can use such path as Work/Archives/2012/Project1/docs/),
  • Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,
  • Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),
  • Indexes document content and document metadata (cause based onto the Attachment plugin),
  • Support ms office, open office, google documents and many formats (full list here),
  • Support scan frequency configuration,
  • Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : https://github.com/lbroudoux/es-amazon-s3-river. Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.

Restriction

As a disclaimer : when developping this plugin, I discover an Elasticsearch limitation in the fact that all loaded plugins are not isolated from each other and share the same resources (this because plugin libraries are added to main ClassLoader as you can see here). As a consequence, using this new plugin in the conjonction of the Google Drive River plugin previoulsy released is not possible (both Amazon and Google libraries are using conflicting versions of Apache http-client). I’ll tackle this subject if enough time in the forthcoming days.

As usual, do not hesitate to give me your feedback through comments on this post, issues on GitHub projet or tweets (@lbroudoux) !

Posted by: lbroudoux (@lbroudoux) | May 15, 2013

A river plugin for Elasticsearch that index Google Drive

Hi there,

I’ve blogged some weeks ago about a test run I’ve done with Elasticsearch and Kibana3 (now just Kibana, the ‘3’ has been dropped since ;-)). And the fact is that is was so much fun and so pleasant to go with them that I’d like to go further and start digging into Elasticsearch.

Few days scratching my head and looking around the plugin ecosystem of ES and I’ll get the idea of writing a Google Drive river to actually learn from the trenches. So I am happy to announce the 1st release of this Elasticsearch plugin that allows you to index with ES the content of a Google Drive !

Main features

So what does this plugin do ? Here are the features for this first release :

  • Connect to Google Drive in ‘offline’ mode (no need to be connected to your Google account, just to authorize the plugin to do so) using OAuth 2,
  • Scan only changes from last scan for better efficiency,
  • Filter documents based on folder path (only 1 level for the moment),
  • Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,
  • Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),
  • Indexes document content and document metadata (cause based onto the Attachment plugin),
  • Support ms office, open office, google documents and many formats (full list here),
  • Support scan frequency configuration,
  • Support bulk indexing for optimization

The project

Project is naturally hosted on GitHub here : https://github.com/lbroudoux/es-google-drive-river. Plugin is installable as a standard Elasticsearch plugin by using the bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.

Some features are still missing and some may be improved but the basic stuffs should work well and fast. Want to give it a try ? Or help with some ideas, tests or contributions ? Do not hesitate to give me your feedback, I’ll keep on digging and investigating in Elasticsearch the forthcoming weeks, months … who knows !?

Posted by: lbroudoux (@lbroudoux) | May 5, 2013

La cabane est finie !

Enfin !

Ce long week-end a porté ces fruits et je peux enfin annoncer que “Ca y est : la cabane est finie !”. Alors pour tout ceux qui en entendent parler depuis des mois, voici le résultat :

IMG_20130503_171800.resized

Je voulais aussi profiter de ce moment pour remercier toutes les personnes qui ont contribué en idée, en coup de main ou en prêt d’outils à ce projet. Dans l’ordre chronologique des contributions, merci à :

  • Yann, pour son perforateur,
  • Vincent, pour son coup de clé à pipe d’expert et ses bras,
  • Nicolas, pour ses grands bras (utile pour les poutres en hauteur !),
  • Christophe, pour ses vis et son foret de 13,
  • Jean-Luc, pour la pose des rambardes de terrasse,
  • Ma chérie, pour avoir joué la “commission sécurité” pendant toute la durée du chantier !

Et puis surtout, un grand merci à Alain qui a été présent pendant toute la durée du chantier pour les avis éclairés, les conseils et les nombreux coups de main. Je pense même que mon projet de départ est finalement devenu le nôtre à tous les 2 ! Merci encore.

Une petite visite guidée

Quelques points de vues pour vous donner une idée plus précise du rendu …

IMG_20130503_171816.resized

La terasse devant la porte décorée :

IMG_20130503_171858.resized

Même l’oiseau Twitter est de la partie !

IMG_20130503_171932.resized

Les enfants ont commencé l’aménagement intérieur dés le dernier clou enfoncé :

IMG_20130503_172106.resized

Un petit historique …

Pour ceux qui ont suivi l’histoire depuis le début, quelques flashbacks en photos ci-dessous.

Les idées, réflexions et études ont eu lieu pendant tous le 1er semestre 2012 (j’avais eu un beau livre sur les cabanes pour Noël ;-)) – mais là, j’ai pas de photos de moi me grattant la tête …

Les premières poutres porteuses ont été montées début Août 2012 :

IMG_20120816_210147.resized

Puis les vacances sont venues, et début Septembre la plateforme avait à peine pris forme – mais il y avait une échelle ! :

IMG_20120905_203649.resized

Les prémices de l’ossature de la maison étaient là le 23 septembre :

IMG_20120923_190604.resized

Le 30 septembre, toute l’armature et la charpente étaient assemblées :

IMG_20120930_180721.resized

Le bardage a finalement été réalisé durant les longs (et peu productifs ;-) ) mois d’hiver… Voici l’état de la cabane à fin Novembre.

IMG_20121125_164317.resized

IMG_20121125_164435.resized

Le printemps est ensuite revenu à point nommé pour donner la dernière touche de motivation nécessaire à la finalisation du chantier. Reste maintenant à en profiter : nous attendons encore une petite montée des températures avant la 1ere nuit perchée !

Posted by: lbroudoux (@lbroudoux) | April 30, 2013

Real time analytics with Elasticsearch and Kibana3

Last month, I attended a great talk on Devoxx French edition (see http://www.devoxx.com/display/FR13/Accueil) on “Migrating an application from SQL to NoSQL”. The talk title was pretty well chosen but it was mainly a presentation of 2 products features : Couchbase and Elasticsearh.

Beyond the relevancy of the speakers and the products, an Elasticsearch extension called Kibana3 was briefly introduced and – although marked as alpha release – it totally astonished me ! Kibana3 is an extension designed for real time analytics of data stored into Elasticsearch. It allows a full customization of dashboards and is such easy to use that it can almost be put into the hands of business people…

Some weeks later I found some time for a test run and although things go well, I thought it would be useful to write kind of a “How to” or “Quickstart” with Kibana3. Here it is.

The setup

Install and run Elasticsearch

Download Elasticsearch from http://www.elasticsearch.org (as I recheck everything for writing this post, I have chosen the 0.90.0 release that wasn’t out when I first test this… so everything should run fine also on the 0.20.6 release I’ve picked previously). Just extract the archive into a target directory and simply run the following ;

laurent@ponyo:~$/dev/elasticsearch-0.90.0$ bin/elasticsearh -f
[2013-04-30 00:13:14,312][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: initializing ...
[2013-04-30 00:13:14,321][INFO ][plugins                  ] [Dominic Fortune] loaded [], sites []
[2013-04-30 00:13:17,045][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: initialized
[2013-04-30 00:13:17,046][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: starting ...
[2013-04-30 00:13:17,225][INFO ][transport                ] [Dominic Fortune] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.80:9300]}
[2013-04-30 00:13:20,306][INFO ][cluster.service          ] [Dominic Fortune] new_master [Dominic Fortune][evQbXTeASNmADq4h-Q847A][inet[/192.168.1.80:9300]], reason: zen-disco-join (elected_as_master)
[2013-04-30 00:13:20,353][INFO ][discovery                ] [Dominic Fortune] elasticsearch/evQbXTeASNmADq4h-Q847A
[2013-04-30 00:13:20,376][INFO ][http                     ] [Dominic Fortune] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.80:9200]}
[2013-04-30 00:13:20,376][INFO ][node                     ] [Dominic Fortune] {0.90.0}[4013]: started
[2013-04-30 00:13:20,489][INFO ][gateway                  ] [Dominic Fortune] recovered [0] indices into cluster_state

Congratulations ! You are now running an Elasticsearh cluster with one node ! That is basically anything you need in order to have a basic setup because every interaction with the node – from the administration ones to the client APIs – are done through REST APIs over HTTP. That means a simple CURL command does the job.

Anyway, before going further, we’d like to add an administration console to our cluster (cause having some GUI doesn’t hurt after all) and we need to feed our node with data. For that, we are going to install 2 plugins :

Plugins simply install themselves using the bin/plugin command as follow.

For elasticsearch-head :

laurent@ponyo:~/dev/elasticsearch-0.90.0$ bin/plugin -install mobz/elasticsearch-head
-> Installing mobz/elasticsearch-head...
Trying https://github.com/mobz/elasticsearch-head/zipball/master... (assuming site plugin)
Downloading ............DONE
Identified as a _site plugin, moving to _site structure ...
Installed head

For elasticsearch-river-twitter :

laurent@ponyo:~/dev/elasticsearch-0.90.0$ bin/plugin -install elasticsearch/elasticsearch-river-twitter/1.2.0
-> Installing elasticsearch/elasticsearch-river-twitter/1.2.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-river-twitter/elasticsearch-river-twitter-1.2.0.zip...
Downloading ...............................................................................................................................................................................................................................................DONE
Installed river-twitter

Now just restart your node by killing the started elasticsearch process and launching another one and point your browser to http://localhost:9200/_plugin/head/ ; you should now have access to web frontend.

Install and run Kibana3

As said into introduction, Kibana3 is an Elasticsearch plugin hosted by Elasticsearch itself and dedicated to analytics by providing the mean to dynamically build any dashboard onto an ES index (the data store). The best way to retrieve the product is to clone the GitHub repository like this :

laurent@ponyo:~/dev/github$ git clone https://github.com/elasticsearch/kibana3.git
Cloning into 'kibana3'...
remote: Counting objects: 2148, done.
remote: Compressing objects: 100% (892/892), done.
remote: Total 2148 (delta 1305), reused 2060 (delta 1226)
Receiving objects: 100% (2148/2148), 11.47 MiB | 273 KiB/s, done.
Resolving deltas: 100% (1305/1305), done.

As states Kibana3 documentation, it’s ‘just’ a bunch of static HTML and Javascript resources that can be put onto any reachable web server. For test commodity, Kibana3 embeds a little Node.js server that can be run if you’re lazy like me :

laurent@ponyo:~/dev/github/kibana3$ node scripts/server.js 
Http Server running at http://localhost:8000/

You can now check http://localhost:8000/index.html with your web browser and should see a default dashboard appearing with a bunch of red panels announcing errors… We’re going to fix that in next section.

The dashboard creation

Before starting to acutally create a dashboard, we need data ! Remember, we have installed the Twitter river plugin : we are going to connect Twitter public stream to retrieve such data. In order to complete following step, you need a valid Twitter account.

The following command helps us creating a Twitter connection specifying some trendy keywords ;-) Just substitute the placeholders with your Twitter account name and password and that’s done.

laurent@ponyo:~$ curl -X PUT 'localhost:9200/_river/twitter-river/_meta' -d '{ "type" : "twitter", "twitter" : { "user" : "<twitter_user>", "password" : "<twitter_password>", "filter" : { "tracks" : "java,nosql,node.js,elasticsearch,eclipse,couchdb,hadoop,mongodb" } }, "index" : { "index" : "tweets", "type" : "status", "bulk_size" : 5 } }'
{"ok":true,"_index":"_river","_type":"twitter-river","_id":"_meta","_version":1}

By browsing to http://localhost:9200/_plugin/head/, you should see the number into “tweets” index grow fast.

Let’s go back now to the defaul Kibana3 dashboard into your web browser.. We are gonna change somme params to make it a descent dashboard. First thing to change is the “Timepicker” widget that is use to define the data store on which dashboard it based.

14-may update

For the lazy ones (;-)) that will only want to see the result without building the dashboard, I’ve posted the JSON export here as a Gist : https://gist.github.com/lbroudoux/5579650. It’s easily importable into Kibana.

Edit this widget settings and change the time field as follow :

kibana3-timepicker-1

and then the index patterns as follow :

kibana3-timepicker-2

You should already have a descent dashboard as below (I’ve also changed the dashboard title and the time resolution to see many green bars on histogram).

kibana3-1st-result

You can experiment the “Zoom In” and “Zoom Out” on histogram and see their effect onto timepicker widget. You can also draw a rectangular zone onto histogram in order to zoom to this temporal period. Typing keywords into the Query input fied also have dynamic effects on searched records and histogram.

When moving down the page, you see a table widget that still have errors. Its goal is to display excerpts of found records. Edit this widget parameters as follow to configure it to correctly display your tweets :

kibana3-table

You see that we reference here the different fields found into a Twitter message coming from public stream (such informations on available fields can be found through the Head web frontend when browsing indexes and looking at stored documents).

Note that we can also modify the layout of widgets by editing row parameters. For exemple, we’re switching table and fields widgets to suit our preferences. Fields widget is indeed very convenient for adding new fields to table view. The screenshot below shows a result obtained after such a switch.

kibana3-2nd-result

Last thing I’ll show you here is the addition of new Kibana3 widget onto your dashboard. We are now going to display a map showing location of our Twitter users into the “Events” row. Open this row settings editor and select “map” into the new panel dropdown list. Then you’ll have to tell which field is used to get this information ; in the case of tweets the field is “place.country_code”. The setting is shown below :

kibana3-map

Don’t forget to click on the “Create Panel” button before closing editor ! The map now displays on your row. Finally after having heavenly distribute widget onto the row, you may reach the following result :

kibana3-final-result

The map widget is also clickable and can be used to drilldown into the data previously selected using query filter and/or timepicker filter. Quite impressive !

Conclusion

If I succeed in my demonstration, you have seen that using Kibana3 can be just easy when understanding the basic customization steps. Kibana3 looks like a very promising tool into this new area of big data, data scientist and miners that has appeared last years.

Some features might be still missing (like a complete integration with Elasticesearch indexes or document types catalogs, security around data consultation or dashboard sharing, etc…) for ensuring a deployment into enterprise world. However premices are already there with the ability of storing Kibana3 dashboard into Elasticsearch itself and the recent posts on how to secure an Elasticsearch cluster (see http://dev.david.pilato.fr/?p=241 for french readers).

I think that Kibana3 being hosted under the Elasticsearch umbrella may be a guarantee of seeing this extension developped and enhanced in the near future. In my humble opinion, this can represent a big advantage onto Elasticsearch business cards.

Posted by: lbroudoux (@lbroudoux) | April 21, 2013

Launching Acceleo generation from Maven – take 3

Last year, I’ve wrote a post on Launching Acceleo generation from Maven. This was actually a second post on this topic – second post introducing multi Acceleo generators project build.

Things run well during this post redaction but after some weeks and colleagues tests, I realized that Acceleo had some limitations that made this build setup hard to be portable. To summarize : when it comes to referencing modules coming from other projects, Acceleo uses multiple forms to reference paths : relative paths when built dynamically by the IDE, platform:/ paths when exported as a plugin and absolute jar paths when built via Maven (our case).

The limitations

To illustrate, here is an excerpt of the entityFile.emtl module you may find on my sample project, The reference to my own Maven local repository location made it hard to be portable !

 <extends href="jar:file:/home/laurent/dev/repository/com/github/lbroudoux/acceleo/com.github.lbroudoux.acceleo.uml.java/0.0.1.qualifier-SNAPSHOT/com.github.lbroudoux.acceleo.uml.java-0.0.1.qualifier-SNAPSHOT.jar!/com/github/lbroudoux/acceleo/uml/java/files/classFile.emtl#/0"/>
 <imports href="jar:file:/home/laurent/dev/repository/com/github/lbroudoux/acceleo/com.github.lbroudoux.acceleo.uml.java/0.0.1.qualifier-SNAPSHOT/com.github.lbroudoux.acceleo.uml.java-0.0.1.qualifier-SNAPSHOT.jar!/com/github/lbroudoux/acceleo/uml/java/files/commonFile.emtl#/0"/>

When it comes to deployment onto our projects (in my company for my day-time job), these limitations do not really bother us because development and CI machines setup were standardized and we were sure that every local Maven repos were having the same location. I finally put up this problem over my shoulder and forgot it …

… until Dave comments !

Last week, Dave commented out (see its comments) this blog post, remembering me that this issue was left unsolved but still deserves some interest … While Dave is following a pure Java solution, I’m showing in this new post a pure Maven workaround, so let’s go.

A Maven workaround

The principle of this workaround is the following : as referencing other jar archives into the EMTL files make the build not portable, stop using multiple jar archives and use only one uber jar with referenced paths being relatives !

I know that this sounds weird as Maven promotes fine grained and atomic artifacts with transitive dependency resolution and so on … but it also open ways for different forms of artifacts when running/deploying into a constrained environment through the notion of assembly. That is exactly our situation : we’ve got a constrained running environment so we’re going to use an assembly.

The explanation takes place in 3 steps.

Replacing references into EMTL files

The first step is to deal with the referenced jar paths placed into EMTL files by the Acceleo compiler. The goal is to replace them by relatives paths. For this, we can use the Replacer plugin into the build of the Acceleo module referencing other modules.

In my sample project, this means modyfing the pom.xml file of the com.github.lbroudoux.acceleo.uml.java.jpa module as follow :

<plugin>
  <groupId>com.google.code.maven-replacer-plugin</groupId>
  <artifactId>maven-replacer-plugin</artifactId>
  <version>1.3.1</version>
  <executions>
    <execution>
      <phase>prepare-package</phase>
      <goals>
        <goal>replace</goal>
      </goals>
      <configuration>
        <includes>
          <include>target/**/*.emtl</include>
        </includes>
        <regex>true</regex>
        <token>jar:file:.*.jar!</token>
        <value>../../../../../../../..</value>
      </configuration>
    </execution>
  </executions>
</plugin>

This configuration basically tells to activate plugin on the prepare-package phase and to process any emtl file to replace the given regular expression denoting an absolute jar path by this relative path.

On the excerpt we took above, this lead to the following result :

 <extends href="../../../../../../../../com/github/lbroudoux/acceleo/uml/java/files/classFile.emtl#/0"/>
 <imports href="../../../../../../../../com/github/lbroudoux/acceleo/uml/java/files/commonFile.emtl#/0"/>

Please, be sure to note 2 important things :

  • the value given for replacement is dependent of the package you for this current Acceleo module files (com.github.lbroudoux.acceleo.uml.java.jpa in this case),
  • the value is the same for any EMTL file because sample project follows Acceleo best practice in term of package naming : each package containing generator is at the same deepness from root (not following that best practice make this workaround non applicable in this state – configuration of replacer might be trickier !)

Creating a flattened uber assembly

Next step is now to create an archive that will contains :

  • the EMTL and class files of our current Acceleo module (the one reworked during step 1),
  • the EMTL and class files of the generators we depend on (their coming from Maven dependencies)

The whole resources should be flattened : all put together into a single package hierarchy, into a single jar file for still being usable as a library.

In order to do that, we start declaring a configuration for the Maven assembly plugin into the pom.xml of the Acceleo module referencing other modules (check sample project) :

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-assembly-plugin</artifactId>
  <version>2.4</version>
  <executions>
    <execution>
      <id>make-assembly</id>
      <phase>package</phase>
      <goals>
        <goal>single</goal>
      </goals>
      <configuration>
        <descriptors>
          <descriptor>assembly.xml</descriptor>
        </descriptors>
      </configuration>
    </execution>
  </executions>
</plugin>

This configuration tells to activate assembly during the package phase (so after the pre-package) and to refer to descriptor present into assembly.xml file. This is a new file and you just have to create it into project root folder. Its content is the following :

<assembly xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">
  <id>uber</id>
  <formats>
    <format>jar</format>
  </formats>
  <includeBaseDirectory>false</includeBaseDirectory>
  <dependencySets>
    <dependencySet>
      <includes>
        <include>com.github.lbroudoux.acceleo:com.github.lbroudoux.acceleo.uml.java</include>
        <include>com.github.lbroudoux.acceleo:com.github.lbroudoux.acceleo.uml.java.jpa</include>
      </includes>
      <unpack>true</unpack>
      <useTransitiveDependencies>false</useTransitiveDependencies>
    </dependencySet>
  </dependencySets>
</assembly>

The important part here is to specify that our assembly with use a uber qualifier/classifier for its result archive and that self artifact and dependency artifact should be specified into inclusions.

From now, when doing a mvn install into this Acceleo module, Maven should now produce 2 artifacts : the main one that we already got and a the new uber one holding every EMTL reources with relatives paths flattened. That new artifact is attached as secondary artifact to your build process.

Using this new archive for generation

Last step is now to modify our application that integrates Acceleo generators during its own build process : we should now tell it to use the uber jar we produced at previous step. This modification is simply done editing the pom.xml of your application and adding a classifier information.

When looking at my sample application file, it represents a single new line highlighted below :

<plugin>
  <groupId>org.codehaus.mojo</groupId>
  <artifactId>exec-maven-plugin</artifactId>
  <version>1.2.1</version>
  <!-- Configuration details here -->
  <dependencies>
    <dependency>
      <groupId>com.github.lbroudoux.acceleo</groupId>
      <artifactId>com.github.lbroudoux.acceleo.uml.java.jpa</artifactId>
      <version>0.0.1.qualifier-SNAPSHOT</version>
      <classifier>uber</classifier>
    </dependency>
  </dependencies>
</plugin>

Uber jar with relatives references is now used and should make your build portable ! You can check the application of this workaround onto my sample project looking at the Github commit.

As always, feedback and comments are greatly appreciated !

Posted by: lbroudoux (@lbroudoux) | February 12, 2013

Generating SOA contracts with Obeo SOADesigner

I have dealt last weeks with evaluating SOADesigner (see http://marketplace.obeonetwork.com/module/soa) as a complementary solution of a traditional Enterprise Architecture Management suite we are using at day work. One of our goals when deciding to use this suite was to minimize the gap between architecture analysis and realizations by generating and managing SOA assets such as WSDL and XSD artifacts. Obviously we did not succeed and then evaluate another way to get the job done…

SOADesigner is based on Eclipse tooling and implies many Eclipse Modeling initiative technologies. It provides a bunch of EMF Metamodels related to information system management in general and SOA in particular ; so that models produced on its top can be used by tools like Acceleo for generating text artifacts from.

The purpose of this blog post is to introduce the Acceleo generators I have realized for producing WSDL and XSD artifacts from SOADesigner models. The generators – still a work in progress – have been open sourced and put onto Github. You can find them here https://github.com/lbroudoux/InformationSystem-generators and I’ll explain later how to use them.

As an introduction and to setup ideas, here’s some screenshots of the kind of diagrams and concepts you may work with into SOADesigner.

Exchange model design

This first one, covers the design of the exchange model that will be used for services interface specification. The elements of such a model are called DTO (for Data Transfert Object) and may be initialized from Entity elements. DTO are organized into Category – which is roughly the same notion as a package – within a DTO Registry.

soadesigner-01

Service model design

This second diagram deals with the specification of Services within a Component. Service may hold many operations through its interface that can be detailed in terms of input and output specifications. You see here that we’re quite close of the SOA / WebServices terminologies apart the missing of fault specification (but there’s a feature request on its way ;-)).

soadesigner-02

Generators specifications

The design generators specifications are the following :

  • generate 1 XSD artifact per Category or sub-Category holding DTOs,
  • use the parent system name, category name and version to produce distinct file name,
  • generate 1 WSDL artifact per Service holding Operations,
  • make the WSDL artifact hold only the service related datatypes and reference reusable one from XSD,
  • use the service name and version to produce distinct file name

As an example on the nonRegressionModel.is model that is embedded into the tests modules of the Git repository, we achieve the following results in term of artifacts generation :

soadesigner-03

Generators features and usage

The currently supported features of generators are as followed :

  • usage of descriptions put into models to annotate artifacts with documentation,
  • usage of multiplicity informations to generate according XSD occurence specifications,
  • correct import XSD within another XSD or a WSDL,
  • correct usages of different namespaces during inclusions and reuse,
  • support of inheritance between DTOs,
  • support of composition and references between DTOs

If you would like to give them a try, you’ll have for now to git clone the repository (I have not yet released them under an plugin) and import the plugins/com.github.lbroudoux.acceleo.soa.contracts into your Eclipse workspace. Then you’ll have to create a new Acceleo launcher referencing a fresh model and the com.github.lbroudoux.acceleo.soa.contracts.main.GenerateAll class as the Acceleo generator class.

Obviously, we assumed you’ll have previously installed SOADesigner as mentionned here onto an Eclipse setup – so that you will have designers but also complete Acceleo environment sets up.

As always, feedback and comments are greatly appreciated !

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.

Join 94 other followers