Extension:GeoData

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
GeoData

Release status: experimental

Implementation API, Parser function
Description Adds geographical coordinates storage and retrieval functionality
Author(s) Max Semenik (MaxSemtalk)
MediaWiki 1.19+
Database changes yes
License WTFPL 2.0
Download
log
Hooks used
LoadExtensionSchemaUpdates

ParserFirstCallInit
UnitTestsList
ArticleDeleteComplete
LinksUpdate

Translate the GeoData extension if possible

Check usage and version matrix; code metrics
Bugs: list open list all report

The GeoData extension allows articles to specify their geographical coordinates and publishes these coordinates via the HTTP API.

Usage[edit | edit source]

This extension adds a new parser function, {{#coordinates:}}, that saves coordinates to the database. Function's input format is made as compatible as possible with GeoHack.

Glossary[edit | edit source]

  • Coordinates - see here
  • Globe - terrestrial body on which the coordinate resides. By default, Earth is assumed. Internally, globe is represented as lowercase strings. The following globes are supported: earth, mercury, venus, moon, mars, phobos, deimos, ganymede, callisto, io, europa, mimas, enceladus, tethys, dione, rhea, titan, hyperion, iapetus, phoebe, miranda, ariel, umbriel, titania, oberon, triton and pluto. Globes not mentioned in this list will be assumed to have generic characteristics: longitude range 0-360°, Eastern longitude is positive. Longitude sign for known globes is taken according to IAU's conventions.
  • dim - approximate size of an object. Used by GeoData to restrict search and by Geohack for determining appropriate map zoom. The default unit of measurement is metres, although the km suffix may be appended to indicate kilometres.
  • Primary vs. secondary coordinates: primary coordinates define article subject's location, while secondary coordinates are other coordinates mentioned in the article. There can be only one primary coordinate per article, but as many secondaries as you like barring technical restrictions.

Parser function[edit | edit source]

Function format:

{{#coordinates:latitude|longitude|[primary|][GeoHack parameters|][extra parameters]}}
Empty parameters (e.g. || or | |) are always ignored.
  • latitude and longitude can be specified in several formats:
    • Direct signed input in degrees, e.g. 37.786971|-122.399677, which corresponds to 37° 47′ 13.1″ N, 122° 23′ 58.84″ W.
      As formatted number in the content language. Use {{formatnum:}}, to format a number of a expression.
    • Degrees/minutes or degrees/minutes/seconds, e.g. 37|47.2183|-122|23.9807 or 37|47|13.1|-122|23|58.84.
    • Either of the above, but with sign specified by N/E/S/W letters:
      37.786971|N|122.399677|W
      37|47.2183|N|122|23.9807|W
      37|47|13.1|N|-122|23|58.84|W
You should use either negative sign or N/E/S/W, but not both
  • primary keyword specifies that these coordinates are primary (see #Glossary).
  • Extra parameters are any combination of the following named parameters:
    • dim: approximate size of the object.
    • scale: Scale of map display for this object, e.g. scale of 300 is 1:300. Gets converted into dim internally using formula dim = scale / 10. If both scale and dim are set, dim has precedence.
    • globe, see #Glossary.
    • name: name of this point, up to 255 bytes (UTF-8).
    • region: ISO 3166-1 alpha-2 country code (e.g. US or RU) or an ISO 3166-2 region code (e.g. US-FL or RU-MOS). This parameter is always capitalised internally.
    • type: type of object with these coordinates, can be one of the following: country, satellite, state, adm1st, adm2nd, adm3rd, city, isle, mountain, river, waterbody, event, forest, glacier, airport, railwaystation, edu, pass, camera, landmark.
type Description Dim
country (e.g. "type:country") 1,000,000
satellite geo-stationary satellites 1,000,000
adm1st Administrative unit of country, 1st level (province, state), e.g. U.S. states 1,000,000
adm2nd Administrative unit of country, 2nd level, e.g. US county 30,000
adm3rd Administrative unit of country, 3rd level 10,000
city(pop) cities, towns, villages, hamlets, suburbs, subdivisions, neighborhoods, and other human settlements (including unincorporated and/or abandoned ones) with known population
(optional population in braces is ignored)
10,000
airport airports and airbases 3,000
mountain peaks, mountain ranges, hills, submerged reefs, and seamounts 10,000
isle islands and isles 10,000
waterbody bays, fjords, lakes, reservoirs, ponds, lochs, loughs, meres, lagoons, estuaries, inland seas, and waterfalls 10,000
forest forests and woodlands 5,000
river rivers, canals, creeks, brooks, and streams, including intermittent ones 10,000
glacier glaciers and icecaps 5,000
event one-time or regular events and incidents that occurred at a specific location, including battles, earthquakes, festivals, and shipwrecks 5,000
edu schools, colleges, and universities 1,000
pass mountain passes 1,000
railwaystation stations, stops, and maintenance areas of railways and trains, including railroad, metro, rapid transit, underground, subway, elevated railway, etc. 1,000
landmark buildings (including churches, factories, museums, theatres, and power plants but excluding schools and railway stations), caves, cemeteries, cultural landmarks, geologic faults, headlands, intersections, mines, ranches, roads, structures (including antennas, bridges, castles, dams, lighthouses, monuments, and stadiums), tourist attractions, valleys, and other points of interest 1,000
Default dim: if no type is used or the type is unknown to this extension 1,000
  • GeoHack parameters: one or more pairs in format parameter:value, delimited by underscores (_) or spaces (e.g. dim:1000_type:city). No spaces are allowed between parameter and colon or between colon and value. The parameters are the same as extra parameters above. If a parameter exists in both GeoHack parameters and extra parameters, extra parameters always have precedence. This input is needed only for compatibilty with preexisting {{coord}} templates - if your wiki is only designing a geographical coordinates template, it is best if you not used raw GeoHack parameters at all.

Examples[edit | edit source]

Note how extra parameters are specified:

{{#coordinates:primary|40.775114|-73.968802|type:landmark_region:US-NY|name=Loeb Central Park Boathouse}}

Embedding in templates[edit | edit source]

Error conditions[edit | edit source]

GeoData checks the data it receives for a number of error conditions.

The following conditions result in coordinates being outright rejected and added to tracking category (the name of it is defined by MediaWiki:Geodata-broken-tags-category):

  • Coordinates out of range:
{{#coordinates:56|04|N|190|00|E}}
{{#coordinates:76|61|03|N|37|25|30|W}} 
  • Mixing coordinate signs and hemisphere letters:
{{#coordinates:primary|-26|04|N|178|46|E}}
{{#coordinates:primary|26.16|N|-178.76|E}} 
  • More than one primary coordinate on page:
{{#coordinates:primary|26|04|N|178|46|E}}{{#coordinates:primary|26|04|N|178|46|E}}
  • Too many coordinates on page: by default 500, 2000 on WMF.

The following errors are non-fatal by default:

  • Unrecognised coordinate type:
{{#coordinates:primary|26|04|N|178|46|E|type=New York}}
{{#coordinates:primary|26|04|N|178|46|E|type:village}}

API[edit | edit source]

GeoData has two API modules that perform search around a given point and coordinates for a given article(s).

list=geosearch[edit | edit source]

Searches for articles around the given point (determined either by coordinates or by article name).

Parameters:

gscoord
Coordinate around which to search: two floating-point values separated by pipe (|)
gspage
Title of page around which to search
gsradius
Search radius in meters (10-10000). This parameter is required.
gsmaxdim
Restrict search to objects no larger than this, in meters
gslimit
Maximum number of pages to return. No more than 500 (5000 for bots) allowed. Default: 10. This aspect of GeoData won't quite work as you expect until an internal indexing problem is fixed (Bugzilla 49893). Until that time, GeoData returns a variable number of results that is always less than gslimit (and can be as little as 10% of gslimit) - you should request more results than you need and then trim the number of results in subsequent processing.
gsglobe
Globe to search on (by default earth).
gsnamespace
Namespace(s) to search. Default: main namespace.
gsprop
What additional coordinate properties to return. Values (separate with '|'): type, name, country, region.
gsprimary
Whether to return only primary coordinates (yes), secondary (no) or both (yes|no). Default: yes.

Example:

<?xml version="1.0"?>
<api>
  <query>
    <geosearch>
      <gs pageid="286442" ns="0" title="Wikimedia Foundation" lat="37.787" lon="-122.4" dist="0.3" primary="" />
      <gs pageid="283855" ns="0" title="Chinatown, San Francisco" lat="37.7947" lon="-122.407" dist="1087.3" primary="" />
      <gs pageid="167267" ns="0" title="City Lights Bookstore" lat="37.7976" lon="-122.407" dist="1331" primary="" />
      <gs pageid="258568" ns="0" title="Asian Art Museum of San Francisco" lat="37.7803" lon="-122.417" dist="1661.5" primary="" />
      <gs pageid="67167" ns="0" title="North Beach, San Francisco" lat="37.8003" lon="-122.41" dist="1745.3" primary="" />
      <gs pageid="250933" ns="0" title="Coit Tower" lat="37.8024" lon="-122.406" dist="1797.9" primary="" />
      <gs pageid="219250" ns="0" title="Roman Catholic Archdiocese of San Francisco" lat="37.7856" lon="-122.424" dist="2157.6" primary="" />
      <gs pageid="268254" ns="0" title="San Francisco LGBT Community Center" lat="37.7718" lon="-122.424" dist="2729.9" primary="" />
      <gs pageid="130498" ns="0" title="USS Pampanito (SS-383)" lat="37.8099" lon="-122.416" dist="2942.4" primary="" />
      <gs pageid="263996" ns="0" title="The Fillmore" lat="37.7841" lon="-122.433" dist="2956.9" primary="" />
    </geosearch>
  </query>
</api>

prop=coordinates[edit | edit source]

Returns coordinates of the given page(s)

Parameters:

colimit
How many coordinates to return.
cocontinue
When more results are available, use this to continue.
coprop
What additional coordinate properties to return. Values (separate with '|'): type, name, dim, country, region.
coprimary
Whether to return only primary coordinates (primary), secondary (secondary) or both (all). Default: primary.

Examples:

<?xml version="1.0"?>
<api>
  <query>
    <pages>
      <page pageid="286442" ns="0" title="Wikimedia Foundation">
        <coordinates>
          <co lat="37.787" lon="-122.4" primary="" />
        </coordinates>
      </page>
    </pages>
  </query>
</api>

Enumerating pages with or without coordinates[edit | edit source]

This functionality is not enabled on Wikimedia sites yet

GeoData extends two core API modules, list=allpages and list=categorymembers. The extended modules are called geopages and geopagesincategory. It adds two mutually exclusive parameters, withcoordinates and withoutcoordinates.

Installation[edit | edit source]

Currently, only MySQL 5 or later is supported.
  • Download and extract the files in a directory called GeoData in your extensions/ folder. If you're a developer and this extension is in a Git repository, then instead you should clone the repository.
  • Add the following code at the bottom of your LocalSettings.php:
require_once( "$IP/extensions/GeoData/GeoData.php" );
  • Run the update script which will automatically create the necessary database tables that this extension needs.
  • Done! Navigate to "Special:Version" on your wiki to verify that the extension is successfully installed.

Running GeoData with Solr[edit | edit source]

  1. Install Extension:Solarium which contains a shared Solr client library. Installation order in LocalSettings.php is irrelevant.
  2. Install Solr. GeoData is tested only with 3.6.0 though 3.3+ or even 4.0 could work (or not;)
  3. Copy solr/schema.xml to desired Solr core's configuration directory (if you have only one core and you don't want to use it for anything else, just use collection1, its config dir with Debian-packaged Solr is /etc/solr/conf).
  4. $wgGeoDataBackend = 'solr'; in LocalSettings.php
  5. Set $wgGeoDataSolrHosts and $wgGeoDataSolrMaster in LocalSettings.php to your server hostname(s)/IP(s).
  6. Decide how you will run Solr updates:
    • Run php solrupdate.php from cronjob. This is perhaps the most reliable and resource-saving way for small wikis, however it slows down the updates.
    • Update via job queue. This is a more resource-intensive way, however it can update the Solr index almost instantaneously. Set $wgGeoDataUpdatesViaJob = true; in LocalSettings.php
  7. Set up a cronjob to periodically purge killlist: php solrupdate.php --clear-killist <number of days>. Once per day should be enough. <number of days> is how old are the killlist entries to be purged. This value should allow you not to lose data in case e.g. crontab is working during temporary Solr outages.
  8. If it's not a fresh GeoData installation, run php solrupdate.php to make an initial import.

Configuration[edit | edit source]

Setting Type Default What is does
$wgMaxGeoSearchRadius int 10000 Maximum radius for geospatial searches, in meters. Reducing this value reduces server load
$wgDefaultGlobe string 'earth' Default globe if none is specified in {{#coordinates}}
$wgMaxCoordinatesPerPage int 500 Maximum number of coordinates per page, -1 means no limit
$wgTypeToDim array Long array, see the sources Conversion table type --> dim
$wgDefaultDim array 1000 Default value of dim if it is unknown
$wgGlobes array Long array, see the sources Defines parameters of every globe
$wgGeoDataWarningLevel array
array(
	'unknown type' => 'track',
	'unknown globe' => 'none',
	'invalid region' => 'track',
)
Controls what GeoData should do when it encounters some problem. Reaction type:
  • track - Add tracking category
  • fail - Consider the tag invalid, display message and add tracking category
  • none - Do nothing
$wgGeoDataIndexGranularity int 10 How many integer units per degree to use with database-only search. Influences performance. Run updateIndexGranularity.php after changing this setting.
$wgGeoDataBackend string 'db' Which backend should be used by spatial searhces: 'db' or 'solr'. Note if you're planning to change it, do so before creating the database tables.
$wgGeoDataSphinxIndex string 'geodata' Sphinx index name
$wgGeoDataSolrOptions array
array(
	'adapteroptions' => array(
		//'host' => '127.0.0.1',
		'port' => 8983,
		'path' => '/solr/',
	),
)
Generic Solr connection options, see Solarium documentation. Note: host must be set in $wgGeoDataSolrHosts instead of here for load-balancicng.
$wgGeoDataSolrHosts string or array 'localhost' List of Solr hosts, string if only one server, otherwise array( 'host1' => weight1, 'host2' => weight2 ... )
$wgGeoDataSolrMaster string 'localhost' Solr master used for updates
$wgGeoDataUpdatesViaJob bool false Whether search index should be updated via jobs. Supported only for Solr.