Topic on Talk:Parsoid/Language conversion/Preprocessor fixups

IKhitron (talkcontribs)

Hi. I tried to find the problems in hewiki. There are 336 possible cases of transclusion, fair enough: . But I can't check the urls: cirrussearch fails. If I exclude the file namespace, it's OK, 0 results: . But in files, there are too much image descriptions in commons to scan. Is there a way to search in local wiki only? Thank you.

Amire80 (talkcontribs)

More simply, can anybody please run this for all projects, rather than wiki projects with >1,000,000 articles?

Cscott (talkcontribs)

@Amire80 Sure. There will be false positives in wikis with LanguageConverter turned on, of course. Give me a bit of time to download the latest dumps and re-run the grep.

Elitre (WMF) (talkcontribs)

Will this also help IKhitron though?

Amire80 (talkcontribs)

Yes, I think it will help, but @IKhitron can correct me if I'm wrong.

Wikipedia's search box is better than ever for finding information useful for readers, but I don't really expect it to be useful for finding precise strings of exotic wiki syntax, which is what's needed in this case. If @Cscott can run a comprehensive grep in all namespaces in all wikis, it will be exactly what we all need.

IKhitron (talkcontribs)

Of course, thanks a lot.

Elitre (WMF) (talkcontribs)

When you say all wikis, I think you mean Wikipedias - Cscott, were other sites checked/should they be? Meta, Commons, ...

IKhitron (talkcontribs)

Think about a possibility not to create 902 subpages, but put the results inside the wikis, at unique address, for example, "(ns:4):preprocessor fixups-May 2017".

Amire80 (talkcontribs)
Elitre (WMF) (talkcontribs)

I don't know about that. It's certainly not common practice, and people come here to look for documentation and stuff. I also don't know how long would it take to get all of this done.

DePiep (talkcontribs)

I suggest to list all cases in all lang-wikipedias and all sisterprojects, wikilinked in one or two pages (two=lang and sisters split). Pagename could be systematically & simplified like :lang/sistercode:ns:pagename. Actual red-marking of the offensive code, as is was done last time, is not needed. All aims to make a simple script run, and to reduce post-processing. This would introduce false-positives, but that is acceptable -- the alternative would be to exclude situations (e.g. in REGEX strings...). The task then is to manually/visually check the need for the edit (that is: AWB-style not bot-style editing).

Cscott (talkcontribs)

I suspect we'll run up against the page size limit in some cases---certainly if we include the wikis where language converter is actually enabled, like zhwiki. I'm currently downloading the 20170501 dump for all 739 not-private not-closed wikis. I'll try to tweak the munge script so it dumps raw wikilinked titles onto a small number of pages... patches welcome of course.

DePiep (talkcontribs)

Sounds good. Completeness over most wikis would be great, and I wanted to reduce post-run extra processes for you. With this, number of listing pages is not an issue.

Cscott (talkcontribs)
IKhitron (talkcontribs)

Thanks a lot!