User:Khorn (WMF)/Plantdata.io

Page for coordinating project for Vienna Hackathon 2017, and beyond.

Main Idea
This project would allow people with any amount of gardening experience, to find plants that would happily grow in whatever space they have available for growing. The user would place a small sensor package in their available growing space, and let it collect environmental data for at least a day. After that time, the user would take the sensor, plug it into their computer, and then application would use the collected data to find a selection of plants (ideally using Wikidata) that would happily grow where the sensor was placed.

At the Vienna hackathon, we focused on the Wikidata aspects of this project. Most of our time was occupied by research.

Determine what data we are likely to be able to collect with the sensor package
We will not be building these at the hackathon. However, we need an idea of the inputs that will be going into the initial queries.

Initial inputs:
 * Temperature ranges (though this is not as simple as it may seem)
 * This could be in the form of seed germination temperature, or low/high tolerance.
 * USDA zones (which we are generally familiar with) deal with severity of winters more than other factors, but this is also essentially temperature data that we could use. However, we would prefer to not make geographical blanket assumptions, because microclimates / urban heat islands / any other anomalies exist everywhere. Also, we want this to be useful without re-doing anything if the ice caps melt and everything gets wild. Plants will still like what they like, so let's base the system on that.
 * Sunlight requirements.
 * For food plants, this is frequently expressed in hours of direct sunlight. From there, it gets annoying in that requirements tend to be expressed qualitatively, as in "full/part sun", "full/part shade", "afternoon sun" or some such. We may have to put numbers around these common pieces of data.
 * Humidity
 * Soil humidity:
 * Watering requirements
 * Drainage requirements
 * Air humidity
 * pH
 * Soil type:
 * sandy, clay, whatever?
 * Maybe have a few types in a dropdown, and present a few things that will just work there, and everything else with the required amendments.

Determine what horticultural data currently exists within Wikidata
Can you look up plant items by environmental attributes, currently?

...probably not.

Wikidata query: Instance of any subclass of plant with taxon rank species.

At the first day of the hackathon, there are apparently 104 results that are filled out enough to make this initial cut.

None of the items returned appear to contain environmental data, growth habits, or anything else we need.

However, it will give us a list of plant species that have related unicode characters.

Perhaps more seriously: A visual display of plants with associated commons images. Select Display -> Image Grid at the top of the results pane for maximum effect.

BUT WAIT! We were searching wrong!
Apparently we need to use P171 (Parent Taxon) to get all the plants. So, start with plant, and go down recursively...

Existing open data sources
Are there additional data sources that we cold pull in to make this happen, with the idea that maybe wikidata would contain this data in the future and simplify the queries? Obviously we would have to double-check that all the data is licensed properly.

USDA plant data
USDA PLANTS Advanced Search

This data is made available by the USDA, and other than their images, they say the data is not copyrighted but request that they are cited when the data is used.

OpenFarm.cc
Found via FarmBot (an open farming robot project). Their software docs indicate that their underlying data was opened up into a standalone food growing database, now located at Openfarm.cc. Their data is CC0, and they have an API. As they started off primarily concerned with the cultivation of food seeds, this would be perfect for our needs if it wasn't so thin.

OpenFarm Crop Database List
OpenFarm Similar Projects

Questions for/about the Wikidata community
What are the rules for citing non-free sources? It is our understanding that you can't do it at all.

What's the process for getting community buy-in to add properties, and apply them to existing data?

We want to add:
 * Minimum and maximum temperature
 * Minimum and maximum pH
 * Moisture requirements
 * Sunlight requirements

How does one go about either importing larger datasets, or federating them?

Wikidata Action Items

 * Fill out the form for requesting new properties
 * Discussion will happen for two weeks before being approved
 * Import data with pywikibot or some other tool, unless there are just a couple hundred records, in which case just do it by hand

Working List

 * Open, end-to-end. Everything.
 * Target (micro)climate data must be able to be entered manually, as well as be retrieved from any sensor package or system anyone might create. Same API.
 * When the user makes selections in our plant search / query builder, if they require a specific value in a plant property (such as sunlight requirements = medium), we should not exclude plants from the results for plant items in which that property is unset. Instead, include those results in a special "unknown" section of the results, and encourage users to become familiar enough with Wikidata to make their own edits when they find answers if they care enough about to look it up.
 * Do not make assumptions based on user's geography! While it *may* be beneficial to make some assumptions about, say, seasonal variations based on latitude / longitude when it is indicated that the plant is meant to go outdoors, the system should work consistently to place appropriate plants in all of the following use cases:
 * Outside, in the ground
 * Outside, with some limited shelter or overhang
 * Outside, with heat leakage from nearby structures
 * Indoors
 * Inside an outdoor greenhouse, with or without some level of environmental control available
 * Outside, in a local heat island or other region that defies regional weather data
 * If you break the previous rule, do not assume that any regional climate data is stable.
 * Fully translatable from the start, i18n in place. (Can we use translate wiki? How do we set that up for non-wiki projects? A: Yes, and use github and something not insane to write it all in)
 * Encourage users to interact with the projects that make this idea fundamentally possible. Should go without saying, but users should be properly instructed to abide by the rules of the contributing communities when giving back to them.

Things we are not currently addressing, but which will need to be addressed somehow
Things that may go horribly wrong: Things that may eventually be cool:
 * Need plant toxicology data (human and animal)
 * Growth habit - Any of these could be mildly- to horribly invasive. In fact, this may be used in reverse to ID people's overgrown weed problems.
 * Illegal plants!
 * Using latitude to infer some things about the yearly swing in natural sunlight, IF the space is relying on natural sunlight.
 * Fully aquatic plants / other non-dirt things

Make some basic decisions about the structure of the project
What framework do we want to use?

Database schema.

Coooooode.
Wikidata SPARQL query Scratchpad for plant data.

Wikidata SPARQL endpoint info