Projekt:LSH-tillgängliggörande 2016/Updating descriptions

To update the image descriptions a basic script was built in maintenance/replace_descriptions.py.

The implemented logic

A new description text is generated for the image (new_text), this is then compared to the first uploaded text (first_text) and the current revision of the page (last_text).

The logic used is as follows:

  • If new_text is the same as first_text:
    • No need to update
  • Else if first_text is the same as last_text:
    • No (un-reverted) changes to the page. Just overwrite with new_text.
  • Else if first_text is the same as last_text up to any categories (and <!-- --> comments):
    • Only changes to the categories need to be kept. Overwrite with new_text but use the union of (new_cats - (first_cats - last_cats)) and (last_cats - first_cats) instead of just the categories in new_text. This ensures the following:
      • We replace all initial categories by the final set of categories
      • We do not re-add any categories removed since the first version
      • We keep any categories added since the first version
  • Else:
    • Skip the page

Known issues

  • The above logic fails to recognise (page is skipped) when a page has already been updated (since last_text need not be the same as new_text). One way around this would be to implement some logic considering if the same bot did the latest update... On the other hand this would give a false positive if we run yet another update.
  • The logic is note aware of Pipe trick and so might try to save pages only to have Commons tell it that there would be no actual change. (Also one of the reason for failing to recognize when no changes are needed)
  • There were some issues with the new data. Specifically some of the description fields now also contain text in English.

Results

An example: the change to File:Östasiatisk keramik. Rökelsebrännare - Hallwylska museet - 95635.tif

Per log

  • Updated: 711
    • Only categories: 711
  • Skipped: 26
    • Unresolved: 26

Per Quarry

  • Pages: 707
  • Bytes: 2146079

Left to do

  • maintenance/replace_descriptions.py should be generalised and moved into the BatchUploadTools repo.
  • 26 images could not be resolved using the above logic. These could be investigated then force-updated using the information below.
import maintenance.replace_descriptions as repl
skips = [
	u"File:Burk med lock. Ding yao. Songdynastin - Hallwylska museet - 96217.tif",
	u"File:Kinesisk porslinsflaska från 1645-1655 - Hallwylska museet - 95598.tif",
	u"File:Kinesisk sexsidig penselburk i porslin med blå dekor, gjord cirka 1662-1722 - Hallwylska museet - 95603.tif",
	u"File:Barmhärtighetens gudinna Guanyin sittande i en grotta. Qingbai-gods, Yuandynastin, cirka 1280-1330 - Hallwylska museet - 107695.tif",
	u"File:Guanyin (barmhärtighetens gudinna) sittande i en grotta, från Jingdezhen, Kina - Hallwylska museet - 96183.tif",
	u"File:Baksidan - Hallwylska museet - 96184.tif",
	u"File:Rökelsebrännare i form av hund - Hallwylska museet - 95971.tif",
	u"File:Östasiatisk keramik. Vas från Shunzhi-perioden under Qing-dynastin - Hallwylska museet - 95966.tif",
	u"File:Kinesisk urna med lock gjord av porslin, 1700-tal - Hallwylska museet - 95606.tif",
	u"File:Guanyin barmhärtighetens gudinna gjord av porslin i Kina på 1800-talet - Hallwylska museet - 95595.tif",
	u"File:Kinesiskt porslin från 1735-1795 - Hallwylska museet - 95873.tif",
	u"File:Kanna från 618-906 - Hallwylska museet - 96178.tif",
	u"File:Porslinsfigur, quingbaiporslin, 1280-1330, Yuandynastin - Hallwylska museet - 100909.tif",
	u"File:Baksidan av gudinnefigur, Quingbaiporslin - Hallwylska museet - 100910.tif",
	u"File:Kruka från Kina gjord cirka 960-1279 Jun yao - Hallwylska museet - 96226.tif",
	u"File:Grön vietnamesisk skål från 1400-talet - Hallwylska museet - 96228.tif",
	u"File:Kinesisk signatur och museets föremålsnummer över - Hallwylska museet - 96012.tif",
	u"File:Kinesiskt fat från Ming-dynastin 1368-1644 - Hallwylska museet - 95659.tif",
	u"File:Kinesiska koppar med drake (lung), från 1899 - Hallwylska museet - 95625.tif",
	u"File:Guanyin barmhärtighetens gudinna gjord av porslin i Kina på 1700-talet - Hallwylska museet - 95578.tif",
	u"File:Guanyin barmhärtighetens gudinna gjord av porslin i Kina på 1700-talet - Hallwylska museet - 95549.tif",
	u"File:Ask med lock Korai-dynastin - Hallwylska museet - 96212.tif",
	u"File:Vas. Mei ping. Zhangzhou-typ. Ming dynastin - Hallwylska museet - 96219.tif",
	u"File:Kinesiska mingkoppar med innerskålar av silver, från 1650 eller tidigare - Hallwylska museet - 95627.tif",
	u"File:Östasiatisk keramik. Ask med lock och fotställ - Hallwylska museet - 95630.tif",
	u"File:Bålskål tillverkad i Kina, cirka 1770 - Hallwylska museet - 99454.tif"
]
repl.skipped_info("2014-11", skips, view='first-last')  # to see which changes had been made which might have caused logic to fail
# upload the change and manually re-add the overwritten changes.