Structured Data Across Wikimedia/id

Data Terstruktur Lintas Wikimedia (bahasa Inggris: Structured Data Across Wikimedia (SDAW)) adalah proyek untuk menyusun data secara terstruktur dari teks wiki yang dapat dikenali oleh mesin. Tujuannya adalah untuk memudahkan kita dalam membaca, menyunting, maupun mencari konten di proyek Wikipedia dan internet.

Selain itu, proyek ini dapat menghubungkan beragam konten yang ada di proyek Wikimedia, membantu pembaca dalam mengeksplorasi ekosistem pengetahuan Wikimedia, serta menyebarkan informasi ke seluruh proyek Wikimedia secara efisien layaknya Wikidata. Dalam proyek ini, kami juga memberikan ruang untuk bereksperimen dengan peralatan penyuntingan menggunakan bantuan komputer untuk mempermudah akses penyuntingan bagi seluruh kontributor.

Latar belakang
Proyek ini merupakan lanjutan dari proyek sebelumnya yang sudah terwujud di Wikimedia Commons, yaitu Structured Data on Commons (SDC), serta akan didanai separuhnya selama tiga tahun oleh Sloan Foundation. Selama mengerjakan proyek SDC, kami menyadari bahwa dibutuhkan data secara terstruktur untuk seluruh konten yang ada di proyek Wikimedia dan bukan semata-mata untuk Wikimedia Commons.

Untuk tercapainya keberhasilan dari proyek ini, kami menargetkan tiga hal sebagai berikut:
 * 1) Menggunakan pemelajaran mesin untuk mengenali konten Wikimedia dan memberikan saran yang berhubungan ke konten Wikimedia lainnya. Kami sudah melakukan ujicoba melalui proyek saran untuk gambar.
 * 2) Merancang struktur artikel dan halaman untuk mengaktifkan format konten yang baru, misalnya suatu konten (dalam hal ini artikel dan halaman) dapat disajikan dalam format yang sederhana agar mudah untuk diakses dan dibagikan kepada khalayak luas.
 * 3) Memudahkan kontributor Wikimedia untuk mencari konten dengan lebih efisien. Kami selalu mencari cara baru untuk menyempurnakan sistem pencarian di seluruh situs web Wikimedia menggunakan data yang terstruktur.

Proyek
Tujuan akhir dari proyek ini adalah mendesain dan merancang sistem baru untuk memudahkan dalam mengakses semua metadata yang mungkin akan kami butuhkan ke depannya.

We identified three main projects that we will develop, as part of our work:
 * 1) Image suggestion, a feature for experienced users to help illustrate Wikipedia articles;
 * 2) Sectional metadata, also known as Section topics, in order to describe what a section of a Wikipedia article is about;
 * 3) Search improvements, that will use structured content to give users a more inviting and efficient way to search and find content on the Wikipedias.

Image suggestion
The Image Suggestion UI aims at developing systems for structured data across all Wikimedia projects.

This work will build on the work already begun as part of the “Add an image” structured task project. However, its focus will be shifted towards improving the processes for experienced contributors. In particular, we will target users who have edited or watched a particular article or set of articles, since they are likely to be experts in the topic and to have interest in seeing that article(s) improve.

Section topics
The Section Topics project will identify sections in an article and create topics accordingly for those sections, drawing on several elements:
 * an algorithm that detects Wikidata items based on the section’s blue links (which will be developed in partnership with the Structured Data, Research, Machine Learning, and Data Platform teams);
 * the ability to automatically identify sections in an article (which will be developed in partnership with the Structured Data and Data Platform teams);
 * section-level image suggestions, which will use the blue-links algorithm and section identification infrastructure above, and be delivered both via the newcomer experience and via notifications for experienced contributors.

This last point will build upon the prior image suggestions work and will be developed in partnership with the Structured Data, Machine Learning, Data Platform, Research, Search, and Growth teams.

These elements will not change, nor impact the current editing experience for users. All these activities will be automatic and will not depend on any action from users who edit an article. Currently, this project is still in the investigating phase, and there are still aspects that may require further investigation and/or feedback from users.

Based on the viability of those options, the project also aims at:
 * using section topics to improve our SEO reach with outside search engines (in partnership with the Web team)
 * establishing partnerships with the larger Wikimedia community to show the impact of our new tools (in partnership with the Structured Data, Growth, and Community Programs (GLAM/Campaigns) teams).

Search improvements
The Search Improvements project will use structured content to give users a more inviting and more efficient way to search and find content on the Wikipedias.

We aim to identify and define incremental “special search” improvements that use structured content, to assist users in finding the content they are looking for, especially in those language wikis that have fewer articles.

Also this project is currently still in the investigating phase, and there are still aspects that may require further investigation and/or feedback from users.

What do we not want to do?

 * 1) Leave users out of the process
 * 2) Overwhelm users with too much new content to moderate
 * 3) Add any additional bias to Wikimedia projects
 * 4) Add additional vectors for vandalism
 * 5) Introduce too much complexity into our systems

2022

 * Project pages updated to reflect the new current status of the initiative and the three main projects to be developed.

2022

 * Establishing contact with Portuguese and Russian Wikipedia community as first tester communities for Image Suggestions.

2021

 * Project is moving to a first test stage, that is experimenting with the use of notifications to alert users of potential useful images for Wikipedia articles.

May-August 2021

 * Looking for feedback about the Image Suggestions project, through individual invitations and a month-long RfC specifically targeted to 4 Wikipedias + Commons

2021

 * Looking for feedback about these ideas.
 * Working on rough wireframes and mockups to help explore these ideas.
 * Exploring infrastructure to support this work via the Technical Decision Making Forum process. See.

Second half of 2020

 * Building MediaSearch on Commons.
 * MediaSearch A/B test - conducted between 10 and 17 September 2020.

Feedback
Project feedback is and will always be welcome. We are especially interested in your ideas about the extent to which you want to keep the “human-in-the-loop” throughout the topical metadata creation process. We are looking forward to hearing from you about the following open questions:
 * 1)  Your expectations about the project
 * 2) What do users expect from this project? What are the necessary actions to be addressed?
 * 3) How do you envision this metadata being used? Can you think of ways it would aid in your workflows?
 * 4)  Metadata moderation
 * 5) Is moderation necessary to avoid vandalism and/or bias?
 * 6) If moderation is necessary, how can it be effectively managed?
 * 7)  Adding and confirming metadata
 * 8) Do users want to be able to approve or reject metadata suggested by the automated system?
 * 9) Do users want to be able to add additional metadata beyond what is suggested by the automated system?
 * 10) Do you think it may just be sufficient for users to have the opportunity to send feedback with suggestions on how to improve the machine generated metadata, when necessary?
 * 11)  Privileges for visualising and editing
 * 12) Do we want metadata to be visible for all users or only for certain classes of users?
 * 13) Do we want metadata to be editable for all users or only for certain classes of users?

Also, more specific feedback about related projects can generally be left on the projects' talk pages:
 * MediaSearch on Commons
 * Image Suggestions

Funding
Partial funding for this work is provided by a from the Alfred P. Sloan Foundation, to further the work done by the first round of funding to develop Structured Data on Commons.