Topic on Talk:Requests for comment/JSON validation

RobLa-WMF (talkcontribs)

My understanding is that JSON schema is a bit dormant right now. Not dead, but floundered a bit in the IETF process.

I previously tried to keep my parser up with the latest drafts of the JSON schema spec.

None of the JSON schema currently specs seem to be on a track toward getting through an IETF working group and getting published as a Proposed Standard or an Informational RFC. Since it's listed as "Intended status: Informational", that suggests that it hasn't been through the level of scrutiny that Proposed Standards go through. See IETF RFC 1796 ("Not All RFCs are Standards"). A particular note from that document: "The RFC series includes some documents which are informational by nature and other documents which describe experiences. A problem of perception occurs when such a document "looks like" an official protocol specification. Misguided vendors may claim conformance to it, and misguided clients may actually believe that they are buying an Internet standard."

Getting through the IETF publication process is not a requirement for a document to be useful. In fact, it's neat that there are live upstream implementations.

Now that there are two other viable PHP implementations of the Zyp drafts for our use (let alone whatever other language implementations there are), this seems ripe for reevaluation. I'd caution against racing to draft04, though; it's just a draft, and there's no guarantee that Kris Zyp won't publish draft05 completely breaking backwards compatibility with whatever draft04 implementation we migrate to.

Before making a draft03->draft04 migration, I'd prefer we figure out what the upstream's stability strategy is for the specification. Are they still attempting IETF publication? Are they going someplace else like W3C, WHAT-WG, OASIS, or indeieweb.org? Are they attempting to create a new consortium around json-schema.org?

If the upstream specification is stable and trusted, great! If not, then the fallback is having trust that the upstream implementation has a good format stability strategy. We don't want to move to draft04, then have upstream make big changes for draft05, then abandoning us to maintain our own draft04 implementation if we can't migrate with them.

Mobrovac-WMF (talkcontribs)

I agree. Also, it is not clear to me at all why do we need draft4 compliance. As you say, @RobLa-WMF, adhering strictly to a draft version is not future-proof for a project of this size.

ATDT (talkcontribs)

The EventLogging stack contains three schema validators: one in JavaScript; one in PHP; and one in Python. The common denominator is support for the subset of the draft3 specification. The draft4 specification mandates support for JSON Pointer, which complicates implementations substantially. The draft specification expired in August 4, 2013, meaning it is not currently on track to becoming a full-fledged standard. I suggest we stick with version 3, or live with two implementations.

Mobrovac-WMF (talkcontribs)

Yup, I think so too. If the only reason for going with draft4 is the usage of `anyOf` (or a similar construct), then we might want to re-evaluate the need for it and hopefully find replacements.

Harej (talkcontribs)

I am also having issues with "maxItems" and specifying multiples types i.e. [ 'string', 'boolean' ]. From a cursory glance such functionality is not implemented in JsonSchema.php.

GWicke (talkcontribs)

In practice most v4 implementations seem to strike a pragmatic compromise in their v4 interpretation. For example, most support quantification like `oneOf`, `anyOf` and JSON Pointers, but few support remote schemas, and if so only behind flags. We use those features in quite a few schemas: oneOf, anyOf, including some external ones like GeoJSON.

A quick google search brings up implementations of v4 for most environments, including PHP, Python and JS. Is there an actual concern about library availability or quality, or is the concern about changing libraries?

RobLa-WMF (talkcontribs)
A quick google search brings up implementations of v4 for most environments, including PHP, Python and JS. Is there an actual concern about library availability or quality, or is the concern about changing libraries?

I think the concerns are:

  • Changing the format of our existing draft03 content (and documentation/etc)
  • Disruption of changing libraries without understanding the benefit, and potentially losing some of the features we have

As @Ori points out, our EventLogging stack has three different implementations, and all of them would need to be upgraded off of draft03. draft04 is not backwards compatible with draft03, which in turn was not compatible with draft02. As I recall, draft02 had an "optional" attribute. This was changed in draft03 to a "required" attribute, and it was obnoxious to migrate to (but not that difficult). When draft04 came out and had an even more complicated "required" attribute (see draft-fge-json-schema-validation-00#section-5.4.3), it became difficult to justify the work to figure that one out.

As for the "features we have", one important feature: it's working, and seems to be stable. There doesn't seem to be a great justification for rushing off and implementing an outdated IETF Internet draft. Those of us that are interested in this stuff should get involved with json-schema.org and/or whatever consensus-based spec group we choose to support.

RobLa-WMF (talkcontribs)

It seems conceivable that we could iterate on draft03. I might even become convinced to get my head around the code to implement both maxItems and multiple type support. It might be to have separate feature request tickets for those (or perhaps break each of those into separate Flow threads on this Talk page). I notice that maxItems is a draft03 feature, and I vaguely recall starting to implement that one, but I may never have completed work on that. "Union types" are a draft03 feature too. As I recall, the current implementation only implements "simple types", and "any" is used in any place where a union type might otherwise be used.

Are there features of draft04 that we could implement in "draft03-2016" of our own? @GWicke suggests that draft04 is important for Swagger APIs & EventBus compatibility. Is it?

I'm very, very easy to convince that interoperability is important, since accepted standards often have unexpected benefits. Implementing a non-standard version of something because of "technical superiority" often leads to unexpected obstacles. On the surface, draft04 does appear to have momentum. That said, draft04 has a lot of superfluous requirements that haven't gotten traction, and there doesn't (yet) seem to be a clear stability strategy around the spec, or a clear way to prune poorly conceived ideas out of it.

Perhaps we can get someone involved with json-schema.org to respond on this point. I'm really encouraged to see Austin Wright's recent contributions to json-schema.org (like b5c9671). Perhaps they're getting ready to submit draft05...

Harej (talkcontribs)

We should probably go with an accepted standard, or draft standard. Even if Draft 4 is not an officially ratified standard, it is still a codified set of expectations. If we do our own novel implementation then we end up with a situation like we have now where not even Draft 3 features work despite the expectation that they would. We could definitely avoid things like remote schemas (they're not worth it in my opinion and I am not sure how much usage they get anyway) but I think we should try to be consistent with what everyone else is doing.

And if there are other libraries doing the work for us, I say we just copy their code and go with it. Best that I can tell the only major issues with other people's libraries is the lack of support for message localization, but that seems easier to address than trying to update our homegrown implementation of the JSON Schema standard.

RobLa-WMF (talkcontribs)

Migrating from draft03 to draft04 is a significant amount of work. Making our draft03 implementation work better might solve the same problems more cheaply and effectively. Furthermore, it may be that a draft05 comes out that's so awesome and well-accepted that we'll want to abandon draft04. It may be that draft03->draft05 migration is easier than draft04->draft05 migration.

If draft04 is truly what "everyone else is doing", then one of us should tell the IETF that, instead of trying to convince ourselves that that's what "everyone else is doing". Publishing a draft05 is basically "draft04 without requiring the stuff we don't want" is easy enough if we know for sure what we don't want.

When I say "publishing an IETF draft" is "easy enough", I really mean it. I'm happy to mentor someone on the process if we need to do it, though judging from the upstream activity, this may be well underway: https://github.com/json-schema-org/json-schema-spec/issues. Anyone here who is interested in pushing this along may just want to offer to help them.

Reply to "JSON schema drafts"