Commons:Batch uploading/Kieler Stadtarchiv

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Computer operator, Kiel 1971
  • Source to upload from:
    • City archive of Kiel
    • Do the media URLs follow a pattern?
      • Don't know.
    • Does the site have an API?
      • Don't know
    • What else could ease uploading? (is the site valid XHTML, do they use a WCM…?)
      • Don't know
    • Did you contact the site owner?
      • Yes
  • Describe the works to be uploaded in detail (audio files, images by …):
    • Old pictures of Kiel
  • Which license tag(s) should be applied?
    • from CC-Zero to cc-by-sa-3.0
  • Is there a template that could be used on the file description pages? Do you think a special template should be created?
    • {{Institution:Stadtarchiv Kiel}}

Habitator terrae (talk) 17:18, 20 May 2018 (UTC)[reply]

Opinions

[edit]

Here are some details and additions to the questions above:

  • Gallery page with 1048 images: [1]
  • URL pattern: http://fotoarchiv-stadtarchiv.kiel.de/zvimg.FAU?sid=12FDA2EB&dm=1&qpos=12915&rpos=fotos.jpg&ipos=3&erg=a&hst=1&npos=1 where qpos= seems to be the individual image ID number but all other entries are static.
  • Many if not most images seem to be licensed under {{Cc-by-sa-3.0-de}}.
  • {{Institution:Stadtarchiv Kiel}} should be used in the source field in addition to the individual source url.

There is a database in the background that has the following fields for each image (my translation and notes in square brackets). There doesn't seem to be an API though.

  • Archivtitel [archived title]
  • Beschreibung [description in German]
  • Datierung [date, may contain MM.YYYY or DD.MM.YYYY]
  • Fotograf [photographer: surname, given name]
  • Nutzungsrechte [attribution: e.g. Gesellschaft für Kieler Stadtgeschichte]
  • Rechtsstatus [licence string and link: e.g. CC BY-SA 3.0 DE]

De728631 (talk) 20:35, 20 May 2018 (UTC)[reply]

sid is session id and varies like a cookie. For the batch upload this can be "inherited" by following the website navigation like any normal user, though the sid may expire within a few minutes.
The collection is broken down into these topics, and initial sub-categorization is triggered by the first word:
  1. Chronik → Category:History images from Stadtarchiv Kiel
  2. Kieler → Category:Kiel Week images from Stadtarchiv Kiel
  3. Topographie → Category:Kiel images from Stadtarchiv Kiel
  4. Gebäude → Category:Buildings images from Stadtarchiv Kiel
  5. Wirtschaft → Category:Economy images from Stadtarchiv Kiel
  6. Verkehr → Category:Transport images from Stadtarchiv Kiel
  7. Politik → Category:Politics images from Stadtarchiv Kiel
  8. Bildung → Category:Education images from Stadtarchiv Kiel
  9. Kultur → Category:Culture images from Stadtarchiv Kiel
  10. Sport → Category:Sport images from Stadtarchiv Kiel
  11. Alltagskultur → Category:Everyday life images from Stadtarchiv Kiel
  12. Vereine → Category:Club and association images from Stadtarchiv Kiel
  13. Schifffahrt → Category:Shipping images from Stadtarchiv Kiel
  14. Marine → Category:Navy images from Stadtarchiv Kiel
  15. Militär → Category:Military images from Stadtarchiv Kiel
  16. Personen → Category:Portrait images from Stadtarchiv Kiel
  17. Schleswig-Holstein → Category:Schleswig-Holstein images from Stadtarchiv Kiel
  18. Deutschland → None
  19. Archivintern → None
  20. Rätselbilder → None
-- (talk) 08:50, 21 May 2018 (UTC)[reply]
Thank you for taking the assignment. I had already been thinking about your bot. De728631 (talk) 21:02, 21 May 2018 (UTC)[reply]

Useful searches

[edit]
Ehrenpforte für Kaiser Wilhelm II, 1888

Example searches which are easy to adapt for most queries on date or author:

Why the incategory-search doesn't work on my computer? Habitator terrae (talk) 13:40, 7 June 2018 (UTC)[reply]

Cat-a-lot will work on search result pages.

Category

[edit]
I am unclear what you wish to mass move. The hierarchy of "Kiel Week" → "Kiel Week images from Stadtarchiv Kiel" is okay, as is a category having multiple parents. Each "bucket" category could all merge up to the parent category, but some people are sensitive to flooding a main category, so it is debatable.
Uploading will be on hold from later this morning until Monday night, so it's a good time to rethink default categorization versus housekeeping categories post-upload. -- (talk) 03:54, 24 May 2018 (UTC)[reply]
@: Could you make this Category:Naval images from Stadtarchiv Kiel instead of "Navy..." to go along with Category:Naval ships and the like? De728631 (talk) 12:57, 24 May 2018 (UTC)[reply]
Will do, these are slightly arbitrary translations. The best way of reorganizing the categories would be after the collection is fully uploaded so that searches plus catalot will work for all possible photographs without having to repeat the housekeeping. Travelling, so this is paused until Monday or Tuesday anyway. (talk) 13:17, 24 May 2018 (UTC)[reply]
It turned out that the images were all duplicated in other categories, so this one would be redundant anyway. -- (talk) 16:36, 1 June 2018 (UTC)[reply]
Ok, naval photos should be lumped into Category:Military images from Stadtarchiv Kiel then. De728631 (talk) 17:45, 1 June 2018 (UTC)[reply]

Creators

[edit]
Friedrich Magnussen

Suggestion: A lot of images are assigned to Friedrich Magnussen obviously, so
| author = Magnussen, Friedrich (1914-1987)
should be replaced with
| author = {{Creator:Friedrich Magnussen}} (example edit).
Variants like
| author = Magnussen, Friedrich (without dates), or
| author = Friedrich Magnussen (different order)
might also exist. Due to the number of files, this has to be done by a bot. And a category like Category:Photographs by Friedrich Magnussen could be created/added accordingly. --Te750iv (talk) 14:39, 7 June 2018 (UTC)[reply]

[edit]

The most common license used for rechtsstatus appears to be CC BY-SA 3.0 DE, which is mapped to {{Cc-by-sa-3.0-de}}.

Where "Gemeinfrei (Public Domain Mark 1.0)" is used, it is presumed that Gemeinfrei is based on a valid assessment of being out of copyright by age, and is mapped to {{PD-old}}. Search

I think it is also good when wie added a {{CC-Zero}}, because then it is clear, that it is also public domain in the USA, because it is published under this licence. Habitator terrae (talk) 09:53, 27 May 2018 (UTC)[reply]
If there are explicit CC-Zero licences, we should use them, but please don't mix up CC-Zero and Public Domain Mark 1.0. The latter is not a licence but just a statement that the image is in the public domain in Germany for some reason. De728631 (talk) 13:07, 27 May 2018 (UTC)[reply]

Housekeeping

[edit]
A very fast way to send photographs: a "Bildempfangsgerät", 1963

Initial uploads use BeautifulSoup for webscraping, this is limited to the 'static' html code. The 'active' additional fields are discovered through a later and fragile/slower running housekeeping task using Selenium with an automated browser instance.

The following fields with information box English equivalents are optional for each entry, for example People and Societies are used in a small minority of photographs.

['bestand','Collection'], 
['person','People'], 
['vereine', 'Societies'], 
[u'ort (au\xdferhalb kiels)', 'Place'],
[u'geb\xe4ude', 'Building'],
['verweise','References'],
['klassifikation','Classification'],
['nutzungsrechte','Rights holder']
FWIW: u'ort (au\xdferhalb kiels)' = Ort (außerhalb Kiels) means "place outside of Kiel", so this is not Kiel proper. De728631 (talk) 18:54, 2 June 2018 (UTC)[reply]

Known errors

[edit]
  • HTML is inconsistently coded, so header metadata is different to the main body
  • Due to transient use of sid, it is apparently not possible to give permalinks to either source images or the gallery pages. Without "live" generation of new sid values, any deep url would redirect to the main homepage
  • Due to reliance on Ajax to write detailed metadata on the webpage, this has been skipped for simplicity. The additional fields like Nutzungsrechte are nice to have. Added post-upload via a housekeeping task

Progress

[edit]
Assigned to Progress Bot name Category
  • 2018-05-21
    • Initial tests. Url not whitelisted, so local download/upload will be needed
  • 2018-05-22
  • 2018-05-23
    • Full run, may be interrupted by travel
  • 2018-05-30
    • Housekeeping to add additional (Ajax only) fields running. This runs separately as an automated instance of Chrome is used and each page has to have the existing wikitext examined. Example
  • 2018-04-03
    • Uploads completed. Housekeeping will take a few days to finish.
NA Category:Images from Stadtarchiv Kiel

New Files

[edit]

@ and De728631: Now there are around 628 new sport pictures between 1969 an 1972: Pressefotos Sport 1969, Pressefotos Sport 1970, Pressefotos Sport 1971, Pressefotos Sport 1972

Habitator terrae (talk) 14:53, 9 July 2018 (UTC)[reply]

The links doesn't work now. But you could click in the links in the startpage with the same name. Habitator terrae (talk) 17:16, 9 July 2018 (UTC)[reply]

@ and De728631: Thanks for your work. I am missing some files: 45.903 82.795 82.803 82.797 I do not know why they are missing, perhaps because of the last three being "public domain mark"? --Juliabackhausen (talk) 19:53, 14 September 2018 (UTC)[reply]

@Juliabackhausen: Hi Julia. Yes, Public Domain Mark is most likely the reason. PDM files should not be uploaded automatically because we always need to know the reason why they are in the public domain. I can't seem to find a search function for the signature numbers at the archive's homepage, but e.g. 45.903 can be seen here. It is attributed to Fabritz, Gerhardt (1919-1977) and has a CC licence, so I don't know why it would now have a public domain mark at the original collection's page. De728631 (talk) 22:16, 15 September 2018 (UTC)[reply]

By now there are a few thousand new images available. Could they be imported again? -- Discostu (talk) 09:14, 28 June 2021 (UTC)[reply]

@: -- Discostu (talk) 09:48, 28 June 2021 (UTC)[reply]