Wikibase/Indexing/Updater performance analysis

From mediawiki.org

This page is for analyzing performance of Updater tool running on WDQS Beta deployment. The Updater is the tool which synchronizes the service with Wikidata, updating the Query Service's storage to reflect recent changes and additions to Wikidata. This ensures that the results the Query Service returns are as up-to-date as possible.

Running configuation[edit]

5 threads, 500 change/rc records in batch (-t 5 -b 500)

31 batches in the analyzed sample.

Batch statistics[edit]

Average 315 updated entities per batch

9766 changes overall

Average processing time: 77.264 s per batch (0.24 s per entity or 4 updated entities per second)

Min.   : 54711
1st Qu.: 62598
Median : 67162
Mean   : 77264
3rd Qu.: 84636
Max.   :116319

Wikidata update stream advancement: 110s per batch.

 Min.   : 87.0  
 1st Qu.:100.2  
 Median :112.0  
 Mean   :110.7  
 3rd Qu.:120.0  
 Max.   :149.0

Updates statistics[edit]

Revision requests[edit]

9766 revision requests, 9007 unique ones, 759 repeated (~7% repeats)

1015 old revs (~10%), 8751 new revs

Updates statistics[edit]

Average entity update time: 1214 ms.

 Min.   :    3  
 1st Qu.:  738  
 Median :  964  
 Mean   : 1214  
 3rd Qu.: 1479  
 Max.   :54586 

Query statistics[edit]

       time          type             query size        
 Min.   :    2.0   refs   :8751   Min.   :   7.00  
 1st Qu.:   26.0   update :8751   1st Qu.:   7.00  
 Median :   36.0   values :8751   Median :   8.00  
 Mean   :  298.1   version:9766   Mean   :  43.66  
 3rd Qu.:  222.0                  3rd Qu.:  13.00  
 Max.   :45208.0                  Max.   :1618.00

Updates[edit]

      time          type        query size        
 Min.   :   70   update:8751   Min.   :  52.0  
 1st Qu.:  611                 1st Qu.: 128.0  
 Median :  830                 Median : 138.0  
 Mean   : 1103                 Mean   : 150.9  
 3rd Qu.: 1326                 3rd Qu.: 177.0  
 Max.   :45208                 Max.   :1618.0  

Version queries[edit]

     time          type           query size        
 Min.   :   2.00   version:9766   Min.   :7  
 1st Qu.:  29.00                  1st Qu.:7  
 Median :  33.00                  Median :7  
 Mean   :  34.62                  Mean   :7  
 3rd Qu.:  39.00                  3rd Qu.:7  
 Max.   :1003.00                  Max.   :7  

References/values retrieval[edit]

     time          type           query size    
 Min.   :    7.0   values:8751   Min.   :13  
 1st Qu.:   26.0                 1st Qu.:13  
 Median :   37.0                 Median :13  
 Mean   :   59.2                 Mean   :13  
 3rd Qu.:   58.0                 3rd Qu.:13  
 Max.   :12844.0                 Max.   :16  
 Min.   :  3.00   refs:8751   Min.   :8  
 1st Qu.: 17.00               1st Qu.:8  
 Median : 24.00               Median :8  
 Mean   : 25.54               Mean   :8  
 3rd Qu.: 30.00               3rd Qu.:8  
 Max.   :330.00               Max.   :8  

Query timing[edit]

For each entity, we do the following queries:

version query (35 ms) + refs select (26 ms) + values select (59 ms) + update (1103 ms) = 1123 ms