Commons:Bots/Requests/YouTubeReviewBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: Eatcha (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought:
Review files from YouTube and Vimeo, see category Category:License_review_needed_(video). List of 13K (backlog) files available at here
I will only review passed files, will not mark the failed for deletion. The failed reviews can be reviewed by humans. This also prevents accidental mass deletion request if youtube changes their site, I don't use the YouTube Data API as it's not working "Toollabs IPs are banned due to mass downloading using Video2commons". I scrape the website to review files.

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Continuous, daily

Maximum edit rate (e.g. edits per minute): not more than 6 per minute

Bot flag requested: (Y/N): Yes

Programming language(s): Python3

Eatcha (talk) 14:13, 1 December 2019 (UTC)[reply]

Discussion

Need license reviewer rights to review files! There is an abuse filter to prevent license reviewes by non-license reviewers. Thanks -- Eatcha (talk) 14:19, 1 December 2019 (UTC)[reply]

Bot's LR request available at Commons:License_review/Requests#YouTubeReviewBot -- Eatcha (talk) 14:40, 1 December 2019 (UTC)[reply]

Are you implying that the bot will run on Toolforge? And is the source code available? Masum Reza📞 14:51, 1 December 2019 (UTC)[reply]
Masumrezarock YES, Source code is available at tools:ytrb ON FORGE. -- Eatcha (talk) 14:58, 1 December 2019 (UTC)[reply]
Sounds like a good plan to me. Where should I  Support this BRFA? Masum Reza📞 15:02, 1 December 2019 (UTC)[reply]
Masumrezarock Thanks, but it's a discussion supports don't count. Best, -- Eatcha (talk) 15:05, 1 December 2019 (UTC)[reply]
License check is great, but I think human reviews is still needed because of Commons:Derivative works, so we split the review process of bot assisted/human parts. --EugeneZelenko (talk) 15:14, 1 December 2019 (UTC)[reply]
Maybe it's similar to User:FlickreviewR 2, examples, where the bot passed Derivative works 1, 2, 3 and many others. We don't have Internet Archive bot, the review should just act a proof that the video had a creative commons tag if the license is changed by the uploader. It's impossible (as of now) for the bot to detect derivative work in videos, it's possible for images but hard AFAIK. Best -- Eatcha (talk) 15:39, 1 December 2019 (UTC)[reply]
I didn't suggest detection of Commons:Derivative works, my point was to split review task between bot and humans. --EugeneZelenko (talk) 15:08, 2 December 2019 (UTC)[reply]
EugeneZelenko How should I(or anyone else) split review task between bot and humans ? Can you please give one example as a hint ? Thanks-- Eatcha (talk) 15:20, 2 December 2019 (UTC)[reply]
I think we should expand template to allow two reviews: one for license by bot and other for humans for other issues. --EugeneZelenko (talk) 15:24, 2 December 2019 (UTC)[reply]
EugeneZelenko What If I add a new template that would state "This file had CC-BY-SA-3.0 tag on MM-DD-YYYY, which was confirmed by Youtubereviewbot. This file should not be deleted if the license has changed in the mean time, but if it's a derivative work or copy-right violation it should be deleted." -- Eatcha (talk) 15:34, 2 December 2019 (UTC)[reply]
Template should clearly state that bot checks only license and further human review was needed or performed by somebody. --EugeneZelenko (talk) 15:37, 2 December 2019 (UTC)[reply]
YouTube logo This file, which was originally posted to YouTube, was reviewed on 3 December 2019 by the automatic software YouTubeReviewBot, which confirmed that this video was available there under the stated Creative Commons license on that date. This file should not be deleted if the license has changed in the meantime. The Creative Commons license is irrevocable.

The bot only checks for the license, human review is still required to check if the video is a derivative work, has freedom of panorama related issues and other copyright problems that might be present in the video. Visit licensing for more information. If you are a license reviewer, you can review this file by manually appending |reviewer={{subst:REVISIONUSER}} to this template.

Creative Commons logo

After human review |reviewer=Eatcha was appended


YouTube logo This file, which was originally posted to YouTube, was reviewed on 3 December 2019 by the automatic software YouTubeReviewBot, which confirmed that this video was available there under the stated Creative Commons license on that date. This file should not be deleted if the license has changed in the meantime. The Creative Commons license is irrevocable.

This file was manually reviewed by reviewer Eatcha, who confirmed that this file is allowed on Wikimedia Commons.

Creative Commons logo

Eatcha (talk) 03:29, 3 December 2019 (UTC)[reply]

It would be reasonable to use links to Commons:Derivative works, Commons:Licensing, Commons:Freedom of panorama, etc. --EugeneZelenko (talk) 15:40, 3 December 2019 (UTC)[reply]
Mind collapsing all the blank spaces a bit? The margin around the CC logo is huge --Zhuyifei1999 (talk) 15:47, 3 December 2019 (UTC)[reply]
Eugene Zelenko &Zhu yifei is it ok now? -Eatcha (talk) 16:43, 3 December 2019 (UTC)[reply]
Heh I like Special:Diff/378903780. the old version was too green, and have too little margin above the logo and beside the texts (on my screen the texts were touching the borders) --Zhuyifei1999 (talk) 17:28, 3 December 2019 (UTC)[reply]
Are colors same as in other review templates? --EugeneZelenko (talk) 16:04, 4 December 2019 (UTC)[reply]
This image was originally posted to Flickr by Österreichisches Außenministerium at https://flickr.com/photos/88775815@N04/40647907102. It was reviewed on 5 December 2019 by FlickreviewR 2 and was confirmed to be licensed under the terms of the cc-by-2.0.

5 December 2019 Imo color of flickr reviews and YouTube reviews are same. --Eatcha (talk) 05:10, 5 December 2019 (UTC)[reply]

It actually uses Template:LicenseReview/styles.css, so, yeah.. - Alexis Jazz ping plz 06:00, 5 December 2019 (UTC)[reply]
@Eatcha: I messed with your comment again. Can you change it so {{ISOdate}} works properly? "December 3 2019" is nonsense in Dutch. (we say 3 december 2019) - Alexis Jazz ping plz 18:25, 3 December 2019 (UTC)[reply]
Alexis Jazz will do that, unfortunately I was doing something else and the bot made more than 300 edits with US date format. It will require a clean up, I guess. -- Eatcha (talk) 18:44, 3 December 2019 (UTC)[reply]
I fixed the date on all processed files, if anybody has a suggestion please tell me now, cleaning up 13 thousand edits is enough to kill my brain. --Eatcha (talk) 19:08, 3 December 2019 (UTC)[reply]
@Eatcha: that's odd, I thought I fixed them? - Alexis Jazz ping plz 19:15, 3 December 2019 (UTC)[reply]
Oh, suggestions: YouTube video title. Template accepts title= now. Also beware I changed human=/user= to reviewer=. - Alexis Jazz ping plz 19:16, 3 December 2019 (UTC)[reply]
This file should not be deleted if the license has changed in the meantime. Do we need this bit in the template? This is true for any license that has been subject to a license review, whether by bot or human. Other review templates don't include such a statement. I'm also wondering if the template should address videos that are free but are licensed under the Standard YouTube License, like {{PD-FLGov}} works (File:Mayor Dyer on the Pulse Site Purchase.webm, for example). ƏXPLICIT 03:49, 4 December 2019 (UTC)[reply]
I don't think you need to do through the process of {{YouTubeReview}} it was {{PD-FLGov}}. The purpose of {{YouTubeReview}} is to have a record that the licence marked on YouTube was once CC-BY --Zhuyifei1999 (talk) 03:56, 4 December 2019 (UTC)[reply]
  • I think we need a page like User:FlickreviewR/bad-authors for Youtube too. Hanooz 06:12, 6 December 2019 (UTC)[reply]
    This bot literally only checks whether the license on YouTube is CC-BY. It doesn't 'pass' the review all the way through like FlickreviewR does, and human review is still needed, so potentially the human reviewer might have a list of bad-authors in mind. I guess a list for the bot could be added (depends on whether Eatcha is willing to ;) ), but it probably won't be as useful as the Flickr one (for reference, neither the Picasa one nor the Panoramio had such a list, even though they were all-the-way pass).
    That said, do you have a list of YouTubers to pre-populate the page? --Zhuyifei1999 (talk) 07:32, 6 December 2019 (UTC)[reply]
    Oh, I see. No, not now. Hanooz 08:51, 6 December 2019 (UTC)[reply]
    @Zhuyifei1999:
Some may have some original content, but all these really require a human review. - Alexis Jazz ping plz 17:07, 6 December 2019 (UTC)[reply]
As you(HanoozZhuyifeiAlexis Jazz) 3 users are involved , Should I create a block list ? Please use "[Yy]es" or "[Nn]o". If I see more yes, I will create one. -- Eatcha (talk) 03:56, 7 December 2019 (UTC)[reply]
I don't mind either way. --Zhuyifei1999 (talk) 03:59, 7 December 2019 (UTC)[reply]
If it's not a huge undertaking, yes please. But it's "nice to have", not absolutely essential. - Alexis Jazz ping plz 04:05, 7 December 2019 (UTC)[reply]
+1 Hanooz 05:31, 7 December 2019 (UTC)[reply]
Working -- Eatcha (talk) 05:59, 7 December 2019 (UTC)[reply]
✓ Done HanoozAlexis Jazz Add as many IDs as you want @ User:YouTubeReviewBot/bad-authors. Small Demo @ https://repl.it/repls/RubberyNotableAnimatronics -- Eatcha (talk) 14:46, 7 December 2019 (UTC)[reply]
(above comment moved for clarity)
@Eatcha: this could happen for various reasons. You may not be giving the answer Google wants. (captcha difficulty varies depending on some things, the more difficult captcha are nearly impossible for humans) You may not be saving the cookie for the captcha properly. Your browser may not be processing the captcha properly. Or something else. - Alexis Jazz ping plz 06:56, 10 December 2019 (UTC)[reply]

Flow :

  • Oldest-archive ---> Real-time video --> all archives --> Get Result (PASS or FAIL)


Thanks -- Eatcha (talk) 10:47, 15 December 2019 (UTC)[reply]

@EugeneZelenko: I will be very busy till January 10, if everything looks okay you may flag the bot. I will edit after January 10, If not okay, please keep the request open till then. Thanks -- Eatcha (talk) 07:24, 16 December 2019 (UTC)[reply]

Files should be sorted as follows:

Category:License review needed (video)
Files passed by YouTubeReviewBot pending human reviews

as well as

Category:License reviewed by YouTubeReviewBot
Files passed by YouTubeReviewBot pending human reviews
Files passed by YouTubeReviewBot but human review is impossible (for those sources that go missing and have no archived records)

Files first passed by YouTubeReviewBot and subsequently passed by a human should be categorised under Category:Files from external sources with reviewed licenses.--Roy17 (talk) 01:40, 25 December 2019 (UTC)[reply]

Files passed by YouTubeReviewBot pending human reviews will be another backlog : therefore I  Oppose, all links are archived. There is no need for another backlog. We never had it for other websites, why Youtube ? YouTube is far more efficient in removing copyrighted materials. -- Eatcha (talk) 05:42, 29 December 2019 (UTC)[reply]
Files passed by YouTubeReviewBot but human review is impossible (for those sources that go missing and have no archived records) :  Impossible human review implies Impossible bot-review
Files first passed by YouTubeReviewBot and subsequently passed by a human should be categorised under Category:Files from external sources with reviewed licenses :  Support, this should be done using the template, appending |reviewer={{subst:REVISIONUSER}} should add the required category. I don't know how to edit the template for this change. -- Eatcha (talk) 15:23, 28 December 2019 (UTC)[reply]
Your bot's job is not to pass LR but merely verifying whether the link given has a commons-compatible licence. All files have to be passed by humans. The only other bot reviewed site is flickr. It does not require humans because the bot can check whether upload is identical to source. In case it's not it gets sorted in flickr needing human review. You know your bot is not designed for this.
Files verified by your bot but disappeared/made private before human review would be problematic. If the source is unreliable it's most probably gonna be deleted. But let's say if the video appears alright and the source is reliable like being a public institution, it's up to the community whether they should be hosted. Maybe the community would reject all such, maybe not. Before that concensus is formed, it's better to preempt the situation.--Roy17 (talk) 02:35, 29 December 2019 (UTC)[reply]
Files passed by YouTubeReviewBot pending human reviews : will be another backlog I would quote this discussion in the future if backlog keeps increasing. I am now  Neutral.
I am not able verify the video because Public IP of tool forge is banned by YouTube, I can not download the videos to compare. Comparing Images is easier than videos, videos are in fact re-encoded and much bigger files. -- Eatcha (talk) 05:42, 29 December 2019 (UTC)
[reply]


Files verified by your bot but disappeared/made private before human review would be problematic. :   Not possible due to limitations I will not fetch the live YouTube page, we are blocked. See the discussion page of Video2Commons. Alexis Jazz and Zhuyifei1999, asked me not to force a new archive every time. According to both of them "YouTube will block IA if I do that" When the bot pass a file, I am retrieving the page via wayback-machine due to this block by YouTube. -- Eatcha (talk) 05:42, 29 December 2019 (UTC)[reply]
Again you dont seem to understand limitations of your own bot. Archived pages do not archive the video. Your bot only checks whether a link has a good licence, but chances are the upload does not match the link or it contains extra material like another soundtrack remixed.
In case you still dont inderstand. Your bot passed File:Mama Cax at Chromat AW19 Climatic.webm. I cut this footage from the source. Now if the source disappears before a human comes to it, all that's left is the archived page, which says nothing about whether the upload was indeed part or the whole of the video.--Roy17 (talk) 14:24, 29 December 2019 (UTC)[reply]
I asked for thoughts on it @Commons:Village_pump#Need_some_opinions_for_YouTubeReviewBot, un-free music from YouTube is non - issue, it's beating humans at the moment. Unfortunately for vimeo, it's not true. I can Increase accuracy by checking the video-length, but it would fail cropped/trimmed videos, which is undesirable IMHO. -- Eatcha (talk) 06:05, 30 December 2019 (UTC)[reply]

Does the bot review the captures (i.e. screenshots) of YouTube videos? – Kwj2772 (talk) 07:06, 31 December 2019 (UTC)[reply]

Kwj No, that would be risky. We can deduce From archive and the length of video without even watching the full video if they are the same. But it's not possible to tell if a screenshot is from a particular video. Straight answer : no it doesn't review screenshots from a video. -- Eatcha (talk) 07:47, 31 December 2019 (UTC)[reply]

tl;dr: What is the current state of discussion, is this ready to be approved, or which issue are open? --Krd 15:51, 31 December 2019 (UTC)[reply]

Krd No issue with the bot, but Roy17 has asked for creation of a category ("Category:Files passed by YouTubeReviewBot pending human reviews"), the purpose of category : All files reviewed by this Bot should be reviewed by humans again, because the bot checks the license (only), and has only a channel black list as a preventative measure. The bot doesn't checksum videos, because it's impossible for bigger files like videos. YouTube stores video and audio separately after transcoding checksum doesn't matches. And toolforge's public IP is banned as Google believes we violated their TOS by mass downloading thousands of videos. In my opinion we don't need that category because "That would be definitely an another backlog, the bot reviewed some files which weren't reviewed for More than 3 years. Some links were dead, they are reviewed using wayback machine. way back machine doesn't saves the video, just images and html/js/style . In Roy17' opinion why should we believe that the video uploaded on Commons is the same video which was once available under that link, which is now dead and we are left with an archive on wayback machine which doesn't archives the video "just the webpage(text+images)". We can compare the video length IMHO, and if imported using video 2 commons directly from url then it's a non-issue as we don't get to choose the upload summary. I am neutral on this, but maybe input from others is necessary before creating a category that is destined to be a backlog, as per today's review rate. BTW: I am not against 2nd human review, if we have enough interested humans. {{YouTubeReview}} States that If you are a license reviewer, you can review this file by manually appending |reviewer={{subst:REVISIONUSER}} to this template.. -- Eatcha (talk) 17:49, 31 December 2019 (UTC)[reply]
In a nutshell, the bot's job is confirming <URL> has a <licence> as of <date>. Human review is necessary to check (1) the video/audio does come from the <URL> (2) the <URL> is published by a genuine account instead of a fake one. Technically this problem might not concern the current workflow of the bot, but as I believe this is an utmost consideration which has not been raised, how this goal is achieved is best adapted with the bot's design. There's also suggestion that YouTubeReview should be merged with LicenseReview, which I definitely support.
btw, youtube doesnt always strike down videos using unauthorised music but often only label them as containing it in the attributions field. It doesnt recognise all music either but only the versions supplied by music companies, not to mention music not distributed by mainstream labels. Fake channels pirating news videos are also rampant, e.g. Commons:Deletion requests/Files in Category:2019 Koreas–United States DMZ summit.--Roy17 (talk) 01:09, 1 January 2020 (UTC)[reply]

Approved. --Krd 06:33, 25 January 2020 (UTC)[reply]