Flow/Architecture/Search

From mediawiki.org

There are 3 big parts in making search work:

  • Manage ES config: this is about getting some ElasticSearch configuration right (e.g. how to interpret datatypes: stem words, highlighter config, ...) and managing the ES indices (validate, reindex, ...)
  • Index & search Flow data: self-explanatory, indexes Flow data in Elasticsearch & makes it searchable
  • Search front-end: how we'll present the search functionality to users.

The last is mostly blocked on nailing the mockups. Once we're happy with that, we can start building it.

Manage ES config[edit]

Patch: https://gerrit.wikimedia.org/r/#/c/161251/

Make CirrusSearch updateOneSearchIndexConfig.php reusable[edit]

There's been a bunch of refactoring in CirrusSearch so that we can reuse most of its code in Flow. For a list of those patches, see the Phabricator task.

Make ES configuration management maintenance script[edit]

How to use (1-4 will be done by enabling 'cirrussearch' role in MediaWiki-Vagrant). We should probably include this all in MediaWiki-Vagrant, either by default as part of Flow or as an optional role (flow-search?)

  1. Install ElasticSearch, version >=1.4 (if your MediaWiki-Vagrant doesn't yet have it, see update instructions in Matt's comment on PS12 here: https://gerrit.wikimedia.org/r/#/c/184404/)
  2. Install Extension:Elastica
  3. Install Extension:CirrusSearch
  4. Configure connection to ES (if different from the default 'localhost'): $wgFlowSearchServers = array( 'searchserver' );
  5. Flow & ES should now be in touch
  6. In CLI, run: php maintenance/FlowSearchConfig.php: this will prepare the search index. If you are using MediaWiki-Vagrant, you need to use vagrant ssh go to the /vagrant/mediawiki/extensions/Flow folder and run the script within the shell.
  7. (You could add any of the many options to that script, if you're looking to try out a particular piece)
  8. Should you, for some reason, need to quickly rebuild your index from scratch, kill it with curl -XDELETE http://localhost:9200/\*_flow\* (adjust the url as needed) and re-run these steps

Figure out how to deploy Flow search[edit]

Index & search Flow data[edit]

Patch: https://gerrit.wikimedia.org/r/#/c/126996/

Index Flow data in ES[edit]

How to use

You should look at #Make_ES_configuration_management_maintenance_script, which has more detailed instructions to also properly configure the search index.

  1. Do steps from #Make_ES_configuration_management_maintenance_script
  2. In CLI, run: php maintenance/FlowFixWorkflowLastUpdateTimestamp.php (to ensure workflow_last_update_timestamps are correct; may not be needed)
  3. In CLI, run: php maintenance/FlowForceSearchIndex.php
  4. Flow data should be indexing, hopefully

Search indexed Flow data[edit]

How to use

  1. See below, API endpoint is in place already ;)

Search API endpoint[edit]

How to use

  1. Do steps from #Index_Flow_data_in_ES
  2. Set $wgFlowSearchEnabled = true;
  3. Add 'script.disable_dynamic: false' to your elasticsearch.yml (we're adding dynamic code to figure out the total amount of matching terms)
  4. Do an API call, e.g.: http://mediawiki.dev/api.php?page=Main_Page&action=flow&submodule=search&qterm=test
  5. See search results!

Search front-end[edit]

For mockups, see Phabricator task.

There is a patch with a very barebones GUI - it's linked to in the Phabricator task.