I’ve blogged some weeks ago about a test run I’ve done with Elasticsearch and Kibana3 (now just Kibana, the ‘3’ has been dropped since ;-)). And the fact is that is was so much fun and so pleasant to go with them that I’d like to go further and start digging into Elasticsearch.
Few days scratching my head and looking around the plugin ecosystem of ES and I’ll get the idea of writing a Google Drive river to actually learn from the trenches. So I am happy to announce the 1st release of this Elasticsearch plugin that allows you to index with ES the content of a Google Drive !
So what does this plugin do ? Here are the features for this first release :
- Connect to Google Drive in ‘offline’ mode (no need to be connected to your Google account, just to authorize the plugin to do so) using OAuth 2,
- Scan only changes from last scan for better efficiency,
- Filter documents based on folder path (only 1 level for the moment),
- Filter documents to include using wilcard expresssions, such as *.doc or *.pdf,
- Filter documents to exclude using alwo wilcards expressions, such as *.avi or *.zip (of course, exclusions are computed first),
- Indexes document content and document metadata (cause based onto the Attachment plugin),
- Support ms office, open office, google documents and many formats (full list here),
- Support scan frequency configuration,
- Support bulk indexing for optimization
Project is naturally hosted on GitHub here : https://github.com/lbroudoux/es-google-drive-river. Plugin is installable as a standard Elasticsearch plugin by using the
bin/plugin -install command. Everything you need for installation and configuration should be present onto the project front page.
Some features are still missing and some may be improved but the basic stuffs should work well and fast. Want to give it a try ? Or help with some ideas, tests or contributions ? Do not hesitate to give me your feedback, I’ll keep on digging and investigating in Elasticsearch the forthcoming weeks, months … who knows !?