User:TJones (WMF)/Notes/extra-analysis Elasticsearch Plugin

February/March 2018 — See TJones_(WMF)/Notes for other projects. See also T183015.

Background
I briefly mentioned the search/extra-analysis plugin in my notes on Serbian analysis, but thought I should flesh it out a bit more after a question came up on Gerrit.

The search/extra-analysis plugin is available on Gerrit and GitHub, with the GitHub docs being the easiest to read.

A Plan: Serbian, et al. and search/extra
Right now, the "extra-analysis" plugin is effectively the "Serbian-analysis" plugin, but we didn't name it to reflect that because the hope is that other stemmers and analysis tools will join the Serbian stemmer in either the extra-analysis plugin, or its companion, the search/extra plugin.

The search/extra plugin is a repository for us to keep a collection of small-but-useful Elasticsearch tools that we've built over the years. When we decided to add an Elasticsearch plugin wrapper around an open-source Serbian stemmer—keeping in mind plans to try to do the same for other open-source morphological software in the future—we decided that the first draft could be housed in the search/extra plugin. This decreases our ongoing maintenance burden compared to a separate Serbian plugin, and if someone really wants to use it, they could, with only a little extra overhead of the other tools in the plugin.

A New Plan: Licenses, licenses, and more licenses
Unfortunately, plans often do not survive contact with reality, and as the Serbian stemmer is licensed under GPLv3, it most likely could not be bundled into the search/extra plugin (licensed under Apache 2.0) without converting everything to GPLv3, which we didn't want to do.

So, we created a new plugin repository, search/extra-analysis, which is licensed under GPLv3, and which can incorporate future compatibly licensed morphological analysis libraries we want to build on.

We did end up this time with a new plugin repo to support, but going forward we shouldn't need another project/repo for every new Elastic plugin we want to build for search. Anything we build ourselves will be licensed under Apache 2.0 and can go into search/extra. Any open-source work we build on that has a permissive license compatible with Apache 2.0 (MIT, BSD, etc.) will also probably go in search/extra. Open source works with GPLv3-compatible licenses (GPLv3, GPLv2+, LGPL) will go into search/extra-analysis.

I am not a lawyer, but it seems to be common wisdom that GPLv2 (as opposed to GPLv2+) is incompatible with GPLv3, so that may be a problem in the future. But, as they say, we can burn that bridge when we get to it.

Future Plans
If a lot of people start using the Serbian stemmer (or some other future analysis tool we incorporate into search/extra or search/extra-analysis), we might consider spinning it off into its own project and trying to do a proper job maintaining it, with releases for every version of Elasticsearch, etc. But for now, that would be time and effort I'd rather spend working on other small-but-useful plugins—doing something helpful for Khmer or Chinese, for example—while minimizing the work needed to support those efforts, so this is our compromise.