Extension:GWToolset/Technical Design

From MediaWiki.org
Jump to: navigation, search

Abstract[edit | edit source]

This section of the document answers questions about the project. What is the project? What is its purpose? What are the requirements?

GWToolset (or GLAMWikiToolset) is a Special Page extension. The main goal of the extension is to allow GLAMs the ability to mass upload content (pictures, videos, and sounds) to Wikimedia Commons based on respective metadata (XML); the intent is to allow for a wide variety of XML schemas. The extension goes about this task by presenting the user with several steps, represented by HTML forms, in order to set-up a batch upload process that will upload content and metadata to the wiki, which creates individual mediafile pages for each item uploaded.

The project, co-funded by Europeana and a few Wikimedia chapters[1], is under heavy development.

Further information can be found on the project page. Your feedback and questions are welcome, feel free to contact us.

Rationale[edit | edit source]

This section explains the value of the project, why we think it is of value, and how it fits into the bigger picture.

Process[edit | edit source]

Often cut into multiple sections, this describes how the feature is intended to work.

The current steps within the upload process are:

  1. Metadata detection
  2. Metadata mapping
  3. Batch preview
  4. Batch job creation
GWToolset Upload Process.png

Metadata detection[edit | edit source]

  1. indicate which element within the metadata file represents a mediafile record.
    • a mediafile record contains metadata about the digital item such as author, date created, and a url to the mediafile.
  2. select a MediaWiki template that will display the mediafile metadata on the mediafile page.
  3. optionally select a previously saved metadata mapping that maps the metadata fields within the metadata file with the fields in the MediaWiki template.
  4. select the metadata file stored on your local hard drive.
  5. upload the metadata file.

The metadata file will be uploaded to a FileBackend store; a relative reference to the FileBackend store is placed in the subsequent HTML forms so that the extension can retrieve it as necessary.

Metadata mapping[edit | edit source]

Summary[edit | edit source]

  • a summary of the information provided in Metadata detection step.
  • a listing of all of the MediaWiki fields in the template selected in the Metadata detection step.
  • drop-down menus next to those fields that contain all of the metadata elements found in the metadata file.
  • a sample mediafile record with corresponding metadata information about the mediafile record.

Create a mapping[edit | edit source]

  1. create a mapping of the MediaWiki template fields to the metadata record elements by selecting the corresponding metadata record element from the drop-down next to the appropriate MediaWiki template field.
    • more than one metadata record element can be related to a MediaWiki template field.
    • a metadata record element can be related to many MediaWiki template fields.

Global categories[edit | edit source]

  1. optionally add global categories to the upload
    • global categories are applied to all mediafile records in the metadata file
    • more than one global category can be applied

Item specific categories[edit | edit source]

  1. optionally add item specific categories to the upload
    • these are applied to each mediafile record, but use item specific information.
for example, if the drop-down contains a mediafile field called author, the value for each individual record will be used.
    • the phrase allows you to prefix the mediafile metadata field with something like “created by” which could pair with a drop-down field author.

Summary[edit | edit source]

  1. optionally provide a summary message that gives an overview of why you are uploading this metadata file and all of its records.

Batch preview[edit | edit source]

Uploads and creates the first 3 mediafile pages based on those records found in the metadata file.

  1. you can preview the results of the mapping
  2. you can go back to the mapping step and make any necessary changes.

Batch job creation[edit | edit source]

If the Batch preview looks good, go ahead and create the batch job process. This step will create the following background jobs:

UploadMetadataJob[edit | edit source]

The UploadMetadataJob will cycle through all of the records found in the uploaded metadata file provided in Step 1: Metadata Detection and create several individual UploadMediafileJobs. Depending on various configurations, the UploadMetadataJob will re-create itself in order to process all of the metadata records.

Throttles, Limits, Delays[edit | edit source]
  • a metadata job delay.
    • the intent of this delay is to space out the run of the UploadMetadataJobs when possible.
    • this delay only works when a job queue that honours delays is used to create UploadMetadataJobs, e.g. JobQueueRedis; the “regular” JobQueue does not currently honour delays.
    • GWToolset\Config::$metadata_job_delay, default is 1 minute.
  • the total number of UploadMediafileJobs added to the job queue during an UploadMetadataJob run.
    • the intent of this throttle is to limit the number of mediafile requests against a given mediafile server.
    • this can also be set by the user in Step 1: Metadata Detection.
    • GWToolset\Config::$mediafile_job_throttle_default, default is 10.
    • GWToolset\Config::$mediafile_job_throttle_min, default is 1.
    • GWToolset\Config::$mediafile_job_throttle_max, default is 20.
  • the total number of UploadMediafileJobs allowed in the job queue.
    • the intent of this throttle is to make sure the extension does not flood the job queue.
    • GWToolset\Config::$mediafile_job_queue_max, default is 1000.
    • if that limit is reached, the UploadMetadataJob will create another instance of itself and attempt to add the UploadMediafileJobs when that new instance is called.
      • If delayedJobsEnabled() is true, the new instance will also contain a jobReleaseTimestamp determined by GWToolset\Config::$metadata_job_attempt_delay, default is 5 minutes.
      • this process of re-creating the UploadMetadataJob if the GWToolset\Config::$mediafile_job_queue_max has been reached, will be attempted for a limited number of times. This limit is set by GWToolset\Config::$metadata_job_max_attempts, default is 10. if this limit is exceeded, the extension will give-up on trying to add the UploadMediafileJobs and issue an Exception message.

UploadMediafileJob[edit | edit source]

The UploadMediafileJobs contain all of the information entered in Step2: Metadata Mapping:

  • the MediaWiki template to map to
  • the metadata mapping
  • specific record information
  • any global and item specific categories that may have been added
  • a summary message if entered
  • whether or not to re-upload the mediafile

FileBackendCleanupJob[edit | edit source]

The UploadMetadataJob will continue to create another instance of itself as long as there are more metadata records to process. When it finishes cycling through all of the metadata records and has created the last UploadMediafileJob, it will create a FileBackendCleanupJob that will delete the FileBackend metadata file that was originally uploaded in Step 1: Metadata Detection.

Gallery and Assets[edit | edit source]

These are images that are essential to understand the project. Mockups, screenshots, and icons fall into this category.

See also[edit | edit source]

References[edit | edit source]

  1. Wikimedia UK, Wikimedia Netherland, Wikimedia France, Wikimedia Swizterland