User:Stefahn/Solr Docu

My own docu about Solr and SolrStore.

Indexing and updating

 * You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP.
 * You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.
 * schema.xml can specify a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.
 * index changes are not visible until changes are committed and a new searcher is opened.
 * Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end.

Query

 * You query it via HTTP GET and receive XML, JSON, CSV or binary results.
 * Basics: http://lucene.apache.org/solr/api-3_6_2/doc-files/tutorial.html#Querying+Data
 * Test and debug queries within your Solr: http://localhost:8080/solr/core0/admin/form.jsp
 * Example search UI: http://localhost:8983/solr/browse
 * http://wiki.apache.org/solr/SolrQuerySyntax

Installation

 * Extension:SolrStore/Install_Solr
 * http://www.icuriousmedia.com/blog/how-to-install-apache-solr-on-windows-xp-1439.php
 * The folder solr in tomcat/webapps is generated automatically. One doesn't need to copy it from other locations.

Restarting Solr
Do the following as root (or sudo): cd /opt ./tomcat/bin/shutdown.sh ./tomcat/bin/startup.sh

Command "shutdown" turns off the whole server!

schema.xml
Example:  "subject" = field, "text_general" = fieldtype / analyzer that is applied to the field called "subject"
 * Info: http://wiki.apache.org/solr/SchemaXml
 * located in:
 * SolrStore: solr/core0/conf/
 * Solr example: solr/example/solr/conf
 * Defines the field types and fields of documents.
 * The schema defines the fields in the index and what type of analysis (field types) is applied to them.
 * The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.
 * Attention: comment within comment leads to error

Tips and tricks

 * If you want to sort an attribute with values like "1 - rookie", "2 - advanced", "3 - expert" don't chose "text_general" as field type, but "string" for example. If you chose text_general results are sorted in this way: advanced, expert, rookie (because "1 -" is skipped/tokenized somehow).

SolrStore

 * You don't need to define the SMW attributes as fields in your schema.xml. You only need to define fields if you want to do one of the following:
 * You want to sort results by a attribute.
 * You want to have a search input that searches in more than one attribute (for example search in wikitext and pagetitle at the same time).

multivalued
Trick to use multivalued fields for sorting: use Copy_Fields to copy the content of one or several fields into another field that is not multivalued.
 * multiValued = this field may contain multiple values per document, i.e. if it can appear multiple times in a document
 * With SolrStore one can sort by every field of the Solr System - only requirement: the field must not be multivalued. Usually all the fields that SolrStore generates out of the wiki are multivalued.

Changing and reindexing
When you change the schema.xml you have not only to restart solr, but also to rebuild the index.

Way to go:
 * 1) Stop your application server
 * 2) Change your schema.xml file
 * 3) Delete the index directory in your data directory (Stefan: in the core directory)
 * 4) Start your application server (Solr will detect that there is no existing index and make a new one)
 * 5) Re-Index your data

Ways to reindex: php SMW_refreshData.php -ftpv php SMW_refreshData.php -v
 * For SMW: Use the following two commands on a shell:
 * See for more info.


 * Script (I don't know how to use up2now, Simon: doesn't work with SMW): http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/
 * Modify articles and save afterwards

Misc:
 * There seems to be no problem if one quits XAMPP - data is still there the next time when one launches XAMPP again (reason: it's saved)
 * In general, you need to be very careful when you change the schema without reindexing - see
 * Alternative to stopping application server: use multi-core - see

Multicore

 * Multicore means one has more than one Solr core
 * Purpose: you can have a single Solr instance with separate configurations and indexes - while having the convenience of unified administration. More info: http://wiki.apache.org/solr/CoreAdmin
 * Cores are defined in solr.xml

Links

 * http://php-solr-lucene.blogspot.com/