Talk:Core Platform Team/Initiatives/Core REST API in MediaWiki

About this board

Search Enhancement: Transliteration, Wrong Keyboard, and DWIM

1
TJones (WMF) (talkcontribs)

(Last one for now... probably!)

In Epic 1.5, item (5) talks about transliteration and DWIM, but those are two very different things.

DWIM, which is currently installed on Russian and Hebrew Wikipedias (and maybe other projects) catches wrong-keyboard mistakes. If I switch to the Russian keyboard and type "dwim" I get "вцшь", on the Hebrew keyboard it's "ג'ןצ". These are not transliterations, and the output is usually gibberish (but recoverable).

Transliteration is much more difficult: the wrong-keyboard mapping is one-to-one and exact (as long as you commit to a particular pair of keyboards), but transliteration can be much harder, and depends not just on the scripts, but also the languages you are transliterating to and from.

For example, Щедрин is transliterated as Shchedrin in English, Sxedrín in Catalan, Ščedrin in Czech, Sjtjedrin in Danish, Schtschedrin in German, Chtchedrine in French, etc. This can be true for any name with Щ in it. Чайковский, on the other hand, is Tchaikovsky in English, instead of the expected Chaikovsky because we adopted the French spelling for.. uh.. "historical reasons".

Crimean Tatar transliteration is word-specific (and depends in part on what language the word came into the language from) and full of exception cases. This code is based on the same source as the Crimean Tatar transliteration used on crh.wikipedia.org.

I'm less familar with the Indic languages—@Santhosh.thottingal knows a ton about them, though—but I believe the transliteration between them is usually/often/sometimes? straightforward, but I worry that the transliteration into English or other langauges using the Latin alphabet may be variable, as with Cyrillic.

Anyway, it would be good to decide which use case you are supporting (maybe both!)—just don't conflate the two!

Reply to "Search Enhancement: Transliteration, Wrong Keyboard, and DWIM"

Search Enhancement: Thumbnails and Descriptions

1
TJones (WMF) (talkcontribs)

In Epic 1.5 items (1) and (2) are about some nice improvements to the search suggestions. Is there any plan to include those changes to the "big" search box on the Special:Search page?

I discussed this with @JDrewniak (WMF), and suggested/requested it. However, there is now some very early discussion (just brainstorming at this point) about adding search suggestions to the big search box (i.e., you type "wikimedia" and it suggests the next word might be "commons" or "foundation"). It seems really complicated if not impossible to have a clean UI that supports both at once (though if someone could conquer that design challenge that would be awesome!). It might also be better not to add images and descriptions to the big search box only to take them away later, so we (Core and Search Platform) should probably coordinate on that before either moves forward with changes to the big search box.

OTOH, if it's not on your radar at the moment, then it's no harm, no foul.

Reply to "Search Enhancement: Thumbnails and Descriptions"

Search Enhancement: Caching

1
TJones (WMF) (talkcontribs)

In Epic 1.5, item (3) is about "Briefly cacheable search results". Is that caching on the API side, or in the browser?

Also, has there been any discussion about how to determine the number of items to cache and the time to cache them? (This seems relevant to a browser-side cache.) It should be possible to capture some of the logs for people typing in the search box to get a sense of how far they tend to back up and how long it takes—though not every keystroke is in the log because some people type too darn fast.

Reply to "Search Enhancement: Caching"

Questions about page editing endpoints

4
DKinzler (WMF) (talkcontribs)

For the record, here are some of the open questions about the editing endpoints that have com up today. They all boil down to the question of what functionality of API:Edit will be supported, and how, and when. Most importantly:

  • How to we detect edit conflicts?
  • Do we need CSRF tokens, or is requiring OAuth Authorization headers sufficient?
    • Do we need review from the security team?
  • Will the implementation be cased on EditPage?
    • if yes, is there anything we need to change about it (like CSRF checks)?
    • if yes, do we plan to address the debt associated with that?
    • if no, is it clear what permission checks and rate limits need to be applied?
EProdromou (WMF) (talkcontribs)

For the first one, I think including a previous revision ID in the PUT request would help with automated merges, correct? If the automated merge doesn't work, we can provide a specific error code.

I've started asking for review from Security on CSRF tokens. I don't think OAuth is what helps, but not supporting session cookies for auth.


83.38.157.81 (talkcontribs)

> For the first one, I think including a previous revision ID in the PUT request would help with automated merges, correct?


Yes, the previous revision ID should be supplied for updates. I think it should be required, even - overwriting concurrent edits is not acceptable. But how will it be submitted? As a URL parameter? I request body is JSON, right?


Btw, it should be clarified whether auto-merge is a hard requirement, or just nice to have. I think *detecting* conflicts is a requirement, but just failing without attempting a merge would be acceptable, at least initially. Auto-merges is best-effort anyway.

EProdromou (WMF) (talkcontribs)

I think we resolved this one on the appropriate ticket.

Reply to "Questions about page editing endpoints"

Questions about HTML retrieval endpoints

1
DKinzler (WMF) (talkcontribs)

Per today's discussion, we will be doing the HTML retrieval endpoints based on Parsoid output, not the PHP parser. This raises the question how the equivalent of ParserCache will work for that. For Parsoid/JS, we are using Cassandra via RESTbase for caching. For Parsoid/PHP, we currently have no ParserCache-equivalent.

We could think about doing this without internal caching, solely relying on the web cache. But this raises the question of purging. MediaWiki uses active purging for cached URLs. For this, MediaWiki needs to enumerate the URLs to purge, which is currently hardcoded in Title::getCdnUrls. Bucketing would probably make this a lot saner. We need a plan...

Reply to "Questions about HTML retrieval endpoints"

Are Product stakeholders?

3
Phuedx (WMF) (talkcontribs)

To put another it another way: Are Product Infrastructure ambassadors for all teams in Product?

EProdromou (WMF) (talkcontribs)

So, we discussed this on the Product-Platform sync today briefly. I have a lot of thoughts on the subject.

The purpose of this initiative is to expose the core functionality of MediaWiki through a RESTful API. The main users we are targeting are:

- WMF client software developers (~Product)

- Community client software developers (bots, gadgets, etc.)

- Third-party developers integrating WMF content into their software (a home exchange app adding Wikivoyage guides to their iOS app)

- Large scale content syndicators (voice assistants, search engines, ...)

The functionality in Epics 1 and 2 are going to be very important for third-party developers. Our official clients have already got this functionality through other APIs (Action API, RESTBase) so it's probably not as valuable to them.

The functionality in Epics 3, 4 and 5 is more oriented towards advanced end-users (the Curator and Administrator personas). Some of it will probably never get into official clients; other parts are just being added now, or will be added in the future. We plucked some of the user stories from Epic 3 to make an Epic 0.5 specifically for iOS.

So, what is the role of Product Infrastructure in this initiative? I have a few ideas, but would probably stand to learn rather than instruct in this regard. Here are my ideas:

- Using the REST API infrastructure to make extension APIs. The current method of exposing RESTful APIs is to build Action API extensions and then proxy them through RESTbase to make them RESTful. This infrastructure should make exposing RESTful APIs from extensions easier.

- Advocating for use of this API within Product. I think that's your "ambassador" role.

- Letting CPT know about client work that would benefit from re-prioritizing user stories in these epics. As we did for iOS, we could also do for other teams.

- Executing some of these user stories. It may make sense for Product Infrastructure to implement a user story here if CPT can't or won't get to it in time for client app schedule needs.

Again, I'm just feeling around. I think we might want to have a direct call to consider it.


Phuedx (WMF) (talkcontribs)

Thanks, @EProdromou (WMF). Given their body of work and the roles that you've laid out, it makes sense for Product Infrastructure to be mentioned. AIUI they're already fulfilling some of those roles already.


We can always revisit this if our team structure and/or needs change in the future.

Reply to "Are Product stakeholders?"
There are no older topics