Commons talk:Structured data/Get involved/Feedback requests/First licensing consultation
- The following discussion is archived. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Thank you all for participating. There will be more in-depth conversations about this in the future, with prototypes informed by this discussion. Keegan (WMF) (talk) 17:20, 2 April 2018 (UTC)[reply]
Contents
- 1 Discussion
- 2 Models used by other institutions
- 3 Previous work
- 4 De minimis
- 5 Interoperability & intelligent semantics – let's take advantage of what we're building!
- 6 Examples
- 7 Read EXIF and IPTC?
- 8 Legal bases
- 9 Discussion so far
- 10 "Automated" licencing
- 11 Different focus than a lot of other systems are likely to have
- 12 Do we need some way to track conflicting or dubious copyright claims?
- 13 Copyright modeling in structured data
- 14 Category tree
- 15 Further comments and feedback after April 2, 2018
- 16 Please visit d:Wikidata:Property proposal
- Try to accommodate both; but the current level of detail at Commons is necessary, and it *must* remain possible to continue to provide this.
- It must remain possible to advise users of the full range of different copyright and licensing situations and their nuances currently captured by Commons templates.
- Ideally the information would be encoded systematically in the structured system, to be accessible through arbitrary queries, in addition to its visibility on the file page and through categories as at present.
- But each of the templates evolved for a reason, and the information they convey to users is important, especially the explanations as to why a particular copyright assessment is being made. The presumption should be that the distinctive information of each different template, and of each different value in use for each different possible parameter, should be treated a-priori as significant, unless proved otherwise. The structured system needs to be able to distinguish all these cases, and trigger the correct relevant templated message(s) as appropriate. (And the messages must be templated, so they can be discussed and adjusted if necessary). Unfortunately, there is a degree of diversity and complication and variation from jurisdiction to jurisdiction which is inherent in the system and cannot be streamlined. Anything that "dumbs down" or limits Commons current reflection of the full range of identifiable different cases and legal situations (and adaptability to new ones, or improved understanding of existing ones), will not be acceptable.
- Within that requirement, it may be possible to structure the information in such a way that summary information can be easily extracted -- in a way that ideally may be a close match to the "headline messages" of the 'streamlined' conclusion/summary information that the external institutions are trying to standardise. If this can be done, it would be good. And it may well be that we can better organise our information, and re-factor how we present it between different templates.
- But (perhaps unlike those institutions) the legal basis that the Commons assessment is based on is also required and must also be exposed -- so that users can read it, understand it, and if necessary disagree with it and bring it up for review in Commons forums. That requires a recording of the full complexity of which interpretation of which provision of which law in which country our copyright analysis is resting on. Jheald (talk) 16:56, 15 March 2018 (UTC)[reply]
- A couple more things to consider:
- Firstly, our short tags like
PD-US-1923
are very convenient for experienced Commons users. It sums up the whole (US) copyright status and legal underpinning for a wide class of images in just nine characters, in a single unit, that is easily referred to in discussion, and can be applied simply by typing {{PD-US-1923}} on a file page. It's nice to be able to do that directly, rather than being forced to go through drop-down menus. Similarly, while one might object to something like {{PD-Art-YorckProject}} for muddling together provenance and copyright and licensing, in practice when those are all strongly related for images of a particular class, it is very useful to have a single object that can be applied to all the images from that source, and contains all that relevant information in one place once. - So for all the apparent joy of squeaky-cleanness of re-factoring what we store in some orthogonal way, I think that would be a mistake. For users (helping eg ease of editing, ease of recognition, maintenance of consistency) preserving a level of indirection like this can be quite useful. So I would suggest whatever new system we come up should preserve the possibility of being able to attach things like
PD-Art-YorckProject
to an image as a single Q-number, that in turn would have statements giving RightsStatements-compatible copyright status, licensing, provenance, jurisdictions etc; rather than all of these having to be separately applied to the image atomistically. For a query or a template, it should be no problem to follow the indirection to obtain the copyright status; but for users IMO this indirection that we are accustomed to is helpful, by bundling relevant things together in a memorable way with a name for whole classes of cases; as well as mapping much more closely to where we currently start from. - Secondly, there is the question of migration and transition. While it's all very well to think how one might start with a clean slate, in practice there will be a transition period of parallel running with our existing templates, that may be extended or even indefinite. That's going to need thinking about; but is probably easier the more the level of congruence there is between the structured data and the templates we currently have. Jheald (talk) 18:20, 15 March 2018 (UTC)[reply]
- A problem with a template like {{PD-US-1923}} is that it only tells you that the file is public domain in the USA, and nothing about any other country. I wonder if it would be possible to store the underlying facts and derive the public domain status (in any particular country) from that. --ghouston (talk) 22:46, 15 March 2018 (UTC)[reply]
- That may be what you just suggested, in more detail. --ghouston (talk) 22:54, 15 March 2018 (UTC)[reply]
@Keegan (WMF): for those of us who are not familiar with GLAM organisations, please could you describe or link to descriptions of the kind of models you are referring to. How do they "streamline people’s understanding of free licensing ... describing terms and conditions associated with licenses, rather than the license itself"? What differences from Commons' current system might be necessary to promote interoperability? BethNaught (talk) 17:06, 15 March 2018 (UTC)[reply]
- I'm also a bit confused by "based on the system used by other institutions" - can you give examples of this?
- I do appreciate the vast complexity of options for uploaders, from the limited selection (5) in the Special:UploadWizard, to the limited selection (33) in the old Special:Upload form, to the overwhelming selection (~700) at Commons:Copyright tags.
- I'm not very familiar with this area beyond https://opensource.org/licenses (the "sorted by category" system linked there might be relevant) and https://spdx.org/licenses/
- Whilst briefly searching I found RightsStatements_IPWG%20Guidance.pdf which has an interesting flowchart on pages 8-9, which might also be relevant. Quiddity (talk) 17:30, 15 March 2018 (UTC)[reply]
- @BethNaught: There should be some example links posted shortly. Sorry about that, it's hard for me to pull those together when I'm not a GLAM person. Help is on the way. Keegan (WMF) (talk) 17:47, 15 March 2018 (UTC)[reply]
- Pending those links, the ones that Quiddity has provided are helpful. Essentially, the way Commons currently operates is to bundle the license and the terms and conditions of the license all together into one template. The workflow for Commons is then "Choose your license..." This can make it very hard for contributing users and institutions to match licenses to work they already have. The idea is that matching terms and conditions to works will lead to the appropriate license, easing donation hurdles, as one example of interoperability. There are other examples that I hope some with more expertise than I can provide. Keegan (WMF) (talk) 17:57, 15 March 2018 (UTC)[reply]
- Quiddity gives good examples. The PDF he links to, refers to http://www.rightsstatements.org: a scheme developed by Europeana and DPLA (the Digital Public Library of America) to categorize 'groups' of copyright statuses that online cultural works can fall into. Otherwise, I am also (kind of) familiar with http://outofcopyright.eu/ which is a system of public domain calculators - not strictly licensing metadata, but a set of (very complex) logic that perhaps might be useful for Commons as well. Pinging @Martsniez: he is involved in that initiative, and he might also have some great input here. SandraF (WMF) (talk) 18:13, 15 March 2018 (UTC)[reply]
- yes, please use the europeana and dpla examples to simplify. the propensity on commons is to create custom templates by collection, i.e. template:PD-Bain which clarifies when the institution has a rights statement by collection, but is not much help to uploaders. (and do not mention hybrid licensing) Slowking4 § Sander.v.Ginkel's revenge 01:30, 16 March 2018 (UTC)[reply]
- @Keegan (WMF): Thank you for explanation, but I still cannot imagine it. Would it be possible to display what you are saying by drawing/image?--Juandev (talk) 08:03, 18 March 2018 (UTC)[reply]
- yes, please use the europeana and dpla examples to simplify. the propensity on commons is to create custom templates by collection, i.e. template:PD-Bain which clarifies when the institution has a rights statement by collection, but is not much help to uploaders. (and do not mention hybrid licensing) Slowking4 § Sander.v.Ginkel's revenge 01:30, 16 March 2018 (UTC)[reply]
- Quiddity gives good examples. The PDF he links to, refers to http://www.rightsstatements.org: a scheme developed by Europeana and DPLA (the Digital Public Library of America) to categorize 'groups' of copyright statuses that online cultural works can fall into. Otherwise, I am also (kind of) familiar with http://outofcopyright.eu/ which is a system of public domain calculators - not strictly licensing metadata, but a set of (very complex) logic that perhaps might be useful for Commons as well. Pinging @Martsniez: he is involved in that initiative, and he might also have some great input here. SandraF (WMF) (talk) 18:13, 15 March 2018 (UTC)[reply]
- Pending those links, the ones that Quiddity has provided are helpful. Essentially, the way Commons currently operates is to bundle the license and the terms and conditions of the license all together into one template. The workflow for Commons is then "Choose your license..." This can make it very hard for contributing users and institutions to match licenses to work they already have. The idea is that matching terms and conditions to works will lead to the appropriate license, easing donation hurdles, as one example of interoperability. There are other examples that I hope some with more expertise than I can provide. Keegan (WMF) (talk) 17:57, 15 March 2018 (UTC)[reply]
- The problem with the RightsStatements markers is that, essentially, none of them are acceptable principal copyright statements for Commons. Almost all of that are statements that content would not be acceptable to include on Commons.
- Okay,
NoC-OKLR/1.0/
(Other known legal restrictions) might have a place as a warning, but I think we do better to attach such subsidiary issues (eg: Trademark, Personality Rights) as a separate statement, analogous to the way we currently do at Commons with an additional template, rather than building in another dimension of complexity to an already complicated copyright statement. People need to have a general awareness that there are other rights out there besides copyright, because we're not going to tag every right every time. - The only other statement that wouldn't get content deleted here on sight is
NoC-US
. It may make sense, as a headline copyright statement, to distinguish "PD-everywhere", "PD-almost-everywhere", "PD-only-US", "PD-except-US", rather than just "PD", so that people can tell just from the "truthy" (wdt:
) from of the statement further details they need to be aware of, specified using our more detailed copyright reasons -- though equally that might be an unnecessary addition to complication.NoC-US
might be acceptable as a headline statement, where the copyright status elsewhere was unclear or undetermined, and Commons was okay with that, (and subject to some further information asserting a more detailed justification); though I'm still not 100% convinced. NKC/1.0/
(No known copyright) I think is not strong enough for Commons.- As noted per my comments in the initial section above, the real problem of these is that we want a statement of reasons, *as asserted by the uploader*, as to why the copyright and licensing status is acceptable for Commons. These badges don't provide that. Jheald (talk) 11:05, 28 March 2018 (UTC)[reply]
We should probably use these slides as a starting point. Multichill (talk) 21:07, 16 March 2018 (UTC)[reply]
- Thanks for that. Taking that as a starting point, are there any thoughts or questions that come to mind as people look through the slides? These were created four years ago, and can certainly be as valid today as they were then. It'd be interesting to hear if anything has changed for people since 2014. Keegan (WMF) (talk) 21:37, 16 March 2018 (UTC)[reply]
- This deck has a lot of good ideas. One thing it doesn't seem to touch on, however, is jurisdiction. Many files have multiple license/status statements to deal with different laws in different jurisdictions. Perhaps that could be handled with a "jurisdiction" modifier to the license. One other thing to note is that we still need short-hand ways of entering this information. For example, "PD-EU-anonymous" is short-hand for 3 pieces of metadata: "Public domain", "jurisdiction: European Union", and "justification: author is anonymous". Are we going to expect people to enter all that information separately? Kaldari (talk) 23:10, 16 March 2018 (UTC)[reply]
- Agree, my bot just uploaded this file. To indicate the copyright status of the underlying work I use {{PD-Art|PD-old-auto-1923|deathyear=1919}} to indicate it's in the public domain in both the US and the source country. Wikidata already has applies to jurisdiction (P1001) and that's already used to try to model these kind of things. See d:Wikidata talk:WikiProject sum of all paintings#Shall we introduce properties indicating copyright status. We probably need to figure out how to model the cases that happen a lot and make shorthands for these cases for users. Multichill (talk) 10:25, 17 March 2018 (UTC)[reply]
- A discussion point is how to connect licenses and templates to the copyright situation. Is a license a qualifier for a copyright status or is a license/template a subclass of a copyright status. For example the status public domain (Q19652) can have as subclass public domain due to expiry of copyright (Q50552069) that could have as subclass or a relation to Template:PD-old-70 (Q6535634). At the end we sould be able to query items based on certain usage rights. In the Licensing model framework discussion proposed is to have the property 'usage rights' with as qualifiers (or properties?) 'obligations', and 'justification'. justification can we probably read as legal base. The idea to have obligations is interesting. Would it also be the case that we would have multiple usage rights, and if so, should we have obligations for each usage right? --Hannolans (talk) 12:39, 20 March 2018 (UTC)[reply]
- Agree, my bot just uploaded this file. To indicate the copyright status of the underlying work I use {{PD-Art|PD-old-auto-1923|deathyear=1919}} to indicate it's in the public domain in both the US and the source country. Wikidata already has applies to jurisdiction (P1001) and that's already used to try to model these kind of things. See d:Wikidata talk:WikiProject sum of all paintings#Shall we introduce properties indicating copyright status. We probably need to figure out how to model the cases that happen a lot and make shorthands for these cases for users. Multichill (talk) 10:25, 17 March 2018 (UTC)[reply]
- This deck has a lot of good ideas. One thing it doesn't seem to touch on, however, is jurisdiction. Many files have multiple license/status statements to deal with different laws in different jurisdictions. Perhaps that could be handled with a "jurisdiction" modifier to the license. One other thing to note is that we still need short-hand ways of entering this information. For example, "PD-EU-anonymous" is short-hand for 3 pieces of metadata: "Public domain", "jurisdiction: European Union", and "justification: author is anonymous". Are we going to expect people to enter all that information separately? Kaldari (talk) 23:10, 16 March 2018 (UTC)[reply]
- I like the basic idea of this very much, with a headline copyright status supported by an assertion of a detailed justification in a qualifier.
- However, I think there is a serious problem, in that what one really wants to do is to qualify the justification, to indicate that the justification (with linked licensing obligations) applies (or does not apply) in a particular territory; or to a particular part of the work; or to a particular contribution towards the overall work. But there is a critical problem here, because it is not possible in Wikibase to qualify a qualifier, only a main statement.
- This means that each different part or contribution or territorial case is going to need a different main statement, so that the item will start proliferating copyright "headline" statement, in just the way that the packaging-up of information as proposed here was trying to avoid. I don't see an easy way round that, within the limits of the wikibase system. But it's not so helpful, if an item ends up with three statements saying PD and another couple saying "Copyright but open licensed", and a simple query is trying to work out what's going on.
- However, this may be what one has to accept, if one has different contributions one needs to be able to distinguish. Jheald (talk) 11:39, 28 March 2018 (UTC)[reply]
Hi, One additional restriction is when a copyrighted item is part of a larger picture, typically File:Louvre at dusk.JPG. It would not be allowed to crop the image to show the pyramid only. This is shown with {{De minimis |1=pyramid|reason=No FP in France}}
. Regards, Yann (talk) 04:05, 17 March 2018 (UTC)[reply]
A key component of Wikimedia copyright expression has been the adoption of Creative Commons licensing. I think it should continue as the basis for future work. Works that are not open cultural works, should not be included in Wikimedia Commons, although their metadata can be included in Wikimedia projects. For example, rightsstaments.org focuses on describing in-copyright works and is not a license, and thus those systems can only complement what widely adopted open licences can offer.
Secondly, it will be possible to record the copyright conditions, under which a certain copyright status emerges. We can state the year of death of the author and use that to calculate the expiration of copyright in different jurisdictions. Or explicitly define all the layers of copyright of an image: the underlying artwork(s), the photograph and the digital reproduction of the photograph. Although all of the information would not be needed in the US, the legislation in other parts of the world may recognise copyright for faithful reproductions, for example.
Thirdly, I would be interested to see any of this encoded to the files at time of download, so that all enriched copyright data could be read automatically by other systems (when such systems would be developed).
And sorry if I repeat any previously mentioned approaches, I did not read every detail of it yet. – Susanna Ånäs (Susannaanas) (talk) 11:35, 17 March 2018 (UTC)[reply]
What about some examples, that we can see, what stayes behind these models?--Juandev (talk) 07:50, 18 March 2018 (UTC)[reply]
You whould come with examples on the description page. The best screenshots, or at least links. Thats for those of us, who are not familiar with GLAM licensing model, for those of us who does not have English as a mother tongue and for those of us, who doesnt understand this slightly IT language.--Juandev (talk) 07:56, 18 March 2018 (UTC)[reply]
- @Juandev: We will try to put some sort of examples together. The reason we have not is that nothing has been built yet, we are still talking about things and do not really have a place to start yet. But we will try for something simple this week, with the feedback we have received already. Keegan (WMF) (talk) 16:44, 19 March 2018 (UTC)[reply]
Would it be possible to read the EXIF and IPTC information in images for this? It's a long shot because both do not have a good field to put this information in yet, but this could be determined by
- simply choosing a field to put this information in and instructing uploaders (especially photographers) to add their license information in it
- reaching out to the IPTC and EXIF people to see if this can be included in future versions — Preceding unsigned comment added by Ter-burg (talk • contribs) 14:46, 19 March 2018 (UTC)[reply]
Ter-burg (talk) 14:47, 19 March 2018 (UTC)[reply]
Structured data opens the opportunity to add the exact legal base on which a work is either public domain or copyrighted. If we connect the legal base to templates and/or the works itself we can claim it is public domain based on this and this legislation in that country. We already made some progress to add the legal base to the public domain date (P3893) for several artworks in wikidata. For example for the painting Q29910839 we have added laws applied (P3014) as article 37 Dutch copyright law from 1995 (Q29862345) with determination method or standard (P459) as 70 years or more after author(s) death (Q29870196). Likewise, we can say 100 years or more after author(s) death (Q29940705) or 70 years or less since publication (Q29870735), in connection to copyright law. --Hannolans (talk) 12:23, 20 March 2018 (UTC)[reply]
I read through the above discussion and it seems to me each person describes some aspect of the final system. I will try to characterize my take from each persons vision and add my own. I apologize ahead of time if I misunderstood and/or mis-characterized someone statements. I feel like we all are blind men describing an elephant of a license system.
- Jheald:
- Capture all the nuances and details of the current system, including jurisdictions
- License templates to be shown based on info in structured data
- Convenience of short copyright tags (like "PD-US-1923"), as opposed to setting multiple statements
- transition period (possibly indefinite) from templates only to Structured data (+ templates?)
- Ghouston
- per jurisdiction statements
- Multichill
- see old discussions
- I see many parallels between framework outlined on Slide 5 and this discussion. Lets dive into details once we agree on some macro characteristics.
- Kaldari
- Allow to specify license for each jurisdiction
- Yann
- need to specify copyright info and restrictions to pieces of the depicted artwork, like the Louvre pyramid
- Susanna Ånäs
- No uploads of material that does not meet current copyright policies, but system should be able to capture copyright details of of such works
- calculate copyright tags based on other properties, like date of death
- add copyright info to the files we download from Commons
- User:Hannolans
- Wikidata already has some properties related to copyrights
I will add few of my own wish-list macro properties of the new system, or reiterate points already made by others:
- I agree with Jheald that the new system needs to capture all the nuances currently handled by the templates. I hope that it will allow us to be much more precise.
- The new system will need to capture copyrights restrictions of different parts of the image, (like the Louvre pyramid mentioned by Yann), but also capture copyrights related to:
- original work
- derivative works
- digitization process
- some works like films or recorded music might have multiple copyright holders, like composer and performer
- Many people mentioned capturing copyrights per jurisdiction. I would also add per different time periods, including the future.
- I would like to add one of my own requirements for the new system, that as much information as possible was kept on the page related to the file, so any changes to the licensing would be reflected in the file history. That is something consistent with the current practices where we ask people to add copyright tags directly to a file and not through some personalized user copyright template.
Possible Solutions:
- In order to balance ability to capture great detail of copyright nuance (if needed), directly in the item associated with each file, with current ease of single string (like "PD-US-1923"), maybe we should employ some system of items that would function like temporary user defined templates which are than substituted into the page. For example I grab some sandbox item, which is unlikely to be touched/changed by others set it up the way I want and ask system to copy some set of properties from such item to newly created file-item. Alternative approach might be, ability to easily copy some properties from already existing file-items to file-items related to new uploads.
- We could also have some items that behave like templates. We set bunch of properties of lets-say PD-US-1923 item (which is protected) and all file-items calling that items get some properties like jurisdiction automatically included.
--Jarekt (talk) 15:17, 20 March 2018 (UTC)[reply]
- Thanks for the distillation Jarekt! I would like to clarify, however, that I don't think we should specify the license for each jurisdiction, as that's practically impossible, but we should be able to specify it for more than 1 jurisdiction, typically the source country and the U.S., per Commons policy. Feel free to tweak the wording further. Kaldari (talk) 17:01, 20 March 2018 (UTC)[reply]
- Kaldari, that is how I understood it. Each copyright tag needs to have info if it is world-wide or valid in some specific country. Hopefully, files are in PD or under free license in country of origin and in the US. --Jarekt (talk) 02:33, 21 March 2018 (UTC)[reply]
- Thanks for the distillation Jarekt! I would like to clarify, however, that I don't think we should specify the license for each jurisdiction, as that's practically impossible, but we should be able to specify it for more than 1 jurisdiction, typically the source country and the U.S., per Commons policy. Feel free to tweak the wording further. Kaldari (talk) 17:01, 20 March 2018 (UTC)[reply]
- Yes, thanks for the summary Jarekt. While it is a challenge to describe something we've never seen before, I think we make progress by putting ideas out there that we can build off of, both literally and figuratively. Keegan (WMF) (talk) 20:00, 20 March 2018 (UTC)[reply]
Hi, I don't know if this has been discussed before, or if this is even on topic, but could we imagine some form of "automated" licencing? If an image contains enough metadata (e.g. "composition", "creator is Bela Bartok", "created in 1940", "first performed 1941" etc.), that the proper licencing tag(s) would be added automatically (i.e. "in the public domain in France")? Thanks, --Gnom (talk) 21:40, 26 March 2018 (UTC)[reply]
- In Wikidata we started with autocalculating public domain dates for some paintings based on the metadata. We could on the long run two kind of calculations: assumption based and evidence based. For evidence based calculations we need to know all kind of metadata like publication date and type of author (government/pseudonym etc). It might be hard to have all those information. For assumption based calculations we need to have for every jurisdiction the copyright term in wikidata and we can calculate for every work the public domain date assuming it is based on the default pma rule. Besides the copyright terms for several countries, for the US copyright situation for evidence based calculations we should have a solution to have the copyright registration entries for all works as an external identifier. For example we should be able to link to https://vcc.copyright.gov/browse?selected_timeframe=19711977&drawer_id=&drawer_number=&window_scroll_pos=150 also, we should have a solution to say 'this work was not registered' --Hannolans (talk) 09:12, 27 March 2018 (UTC)[reply]
- It may seem like a lot of detail is required, but it's only the same information that you'd need to work out manually whether the work is in the public domain. --ghouston (talk) 09:28, 27 March 2018 (UTC)[reply]
- Well, that's true, but at the moment a lot of assumptions are made, for example that the date of publication is the date of creation and that the publication was in the country where the artist worked. If we create an automatic copyright status detection during upload we should therefore opt for an assumption based model, based on the most common situation and provide people with recommended templates or warnings. Matching authors during upload will help, but if someone makes pictures in a museum as own work and want to upload them, we should have advanced technology to detect that it is in fact work of an artist. Probably the best usecase at the moment is to do bot calculations of works that are at the moment in wikidata (paintings, movies, songs, books, buildings, sculptures) and to do calculations in Commons after the work is uploaded and categorised etc? --Hannolans (talk) 10:19, 27 March 2018 (UTC)[reply]
- It would be feasible anyway to store both: properties for the underlying facts such as creation date, date and country of first publication, authors death date(s), US renewals etc., as well as a property for public domain date (by country). There could be a tool that reads the facts and verifies or sets the public domain dates. However, the problem with storing public domain dates are that there are a lot of countries, and many of them have identical rules. At present, Commons generally only bothers with the US and the country of origin if different, so continuing that policy would be an option. --ghouston (talk) 02:14, 28 March 2018 (UTC)[reply]
- yes, but we don't have that information during the upload and it is very hard to find the 'first' publication and so to find the exact country of origin. We can strive towards having all that properties filled later on. The country of origin might vary depending on the jurisdiction and as such a work can have multiple countries of origin. In Commons so far we didn't define the country of origin of a work, we are making assumptions based on the image and the country of the creator. Would be interesting to work towards it. About having the public domain date for each country, yes, we could start with those two countries, and probably grouping countries based on the copyright term. But interesting how we would deal with detect the country of origin. I would say we start with using the property of the citizenship of the creator?--Hannolans (talk) 07:59, 28 March 2018 (UTC)[reply]
- It would be feasible anyway to store both: properties for the underlying facts such as creation date, date and country of first publication, authors death date(s), US renewals etc., as well as a property for public domain date (by country). There could be a tool that reads the facts and verifies or sets the public domain dates. However, the problem with storing public domain dates are that there are a lot of countries, and many of them have identical rules. At present, Commons generally only bothers with the US and the country of origin if different, so continuing that policy would be an option. --ghouston (talk) 02:14, 28 March 2018 (UTC)[reply]
- Well, that's true, but at the moment a lot of assumptions are made, for example that the date of publication is the date of creation and that the publication was in the country where the artist worked. If we create an automatic copyright status detection during upload we should therefore opt for an assumption based model, based on the most common situation and provide people with recommended templates or warnings. Matching authors during upload will help, but if someone makes pictures in a museum as own work and want to upload them, we should have advanced technology to detect that it is in fact work of an artist. Probably the best usecase at the moment is to do bot calculations of works that are at the moment in wikidata (paintings, movies, songs, books, buildings, sculptures) and to do calculations in Commons after the work is uploaded and categorised etc? --Hannolans (talk) 10:19, 27 March 2018 (UTC)[reply]
- It may seem like a lot of detail is required, but it's only the same information that you'd need to work out manually whether the work is in the public domain. --ghouston (talk) 09:28, 27 March 2018 (UTC)[reply]
- I would be against automated licensing, and I think the Commons community would have a hard time with it too.
- At the moment, an essential part of our system is an assurance by a particular named individual that an image is acceptable, together with the assertion of reasoning as to why the image the image is acceptable. That fact that the assertion is made by the uploader rather than the system may be important legally. The obligation also forces our uploaders to learn enough about the copyright background to know whether thay can make that assurance.
- I can see automatic checks like WD constraint checks to look at whether there is underlying information to support the assertion -- is the creator identified, do we know the type of creator (individual person vs. anonymous employee etc), places of publication, creator's date of death etc etc. But I would not make the constraints too hard -- better to have edge cases identified as anomalies and investigated by human beings, than people falsifying data to get around automated filters.
- But the ultimate responsibility for the copyright assertions has to remain with the uploader. By all means make calculators and support gadgets available; but at the end of the day this is something that they must personally sign off. Jheald (talk) 14:42, 28 March 2018 (UTC)[reply]
Except for things that have been straightforwardly licensed by their copyright holders, we have a different focus than a lot of other systems are likely to have: we are very concerned with documenting why a particular work is in the public domain; most systems are more likely to be focused on who holds copyright on things that are currently copyrighted, and are likely to "lump" public domain together as a single thing.
Obviously, we would do well to exploit frameworks of existing systems if one or more comes close to our needs, but I'm almost certain that we have concerns that few or no existing systems track.
Also: we have a running controversy over how old an image of unknown authorship & publication date has to be before we presume it to be PD. On such things, it is more crucial that we have the raw data about an image, on which a policy can be imposed, than that we record a conclusion that is subject to change. - Jmabel ! talk 22:42, 27 March 2018 (UTC)[reply]
- I am actually not sure about "exploit[ing] frameworks of existing systems". As we switch to a new system we should clean up as much issues and inconsistencies as possible. We should align our new system with better organized systems if we can identify them, the way we adopted Commons:Hirtle chart, we could adopt similar charts from Europiana. But the end result should be equivalent. --Jarekt (talk) 12:08, 28 March 2018 (UTC)[reply]
- @Keegan (WMF): I think that Jmabel is addressing an important matter. We don't just say "this work is PD"; we say where and why it is PD. This, I think, greatly helps users of Commons content to make sure that the content they want to use is PD also in their jurisdiction. The WMF is also, as their reports show, receiving very few copyrighted-related legal complaints for such a large project. A reason for that certainly is the diligent handling of copyright matters by the Commons community. I'm all for structured data, bot not at the cost of dumbing down our licensing info or hiding relevant information. Everything that is currently recorded must be preserved, and it must be possible to view the information with one click. Sometimes, the reason for something being in the public domain is so complicated that it can't be expressed in a simple template and we have to resort to {{PD-because}}, i.e. reasoning/explanation in free text. All that needs to remain possible with structured data. Gestumblindi (talk) 00:14, 2 April 2018 (UTC)[reply]
- All right, thank you for emphasizing the point. Keegan (WMF) (talk) 17:18, 2 April 2018 (UTC)[reply]
Do we need some way to track conflicting or dubious copyright claims? E.g.:
- Getty images claims a copyright issues a restrictive license on something we believe to be in the public domain.
- A museum claims a copyright on a scan of a 2D public domain work.
- A library claims copyright on a 19th-century photo, with no indication either why the work would still be in copyright nor how they come to hold such a copyright.
- Jmabel ! talk 22:45, 27 March 2018 (UTC)[reply]
- In Het achterhuis (Q14624856) and Triumph of the Will (Q156497) we added statement disputed by (P1310) for a conflicting copyright claims (copyright holder (P3931) and public domain date (P3893)). Specific for 2d and semi 2D-photos, we should have an option to qualify the threshold of originality (Q707401)? --Hannolans (talk) 23:50, 27 March 2018 (UTC)[reply]
- Yes we should the same way we track conflicting or dubious dates of birth or other statements on Wikidata. I would not catalog every time Getty Images claim copyrights to old PD image, but if there are valid conflicting claims, that should be reflected in the data. Hopefully with preferred, normal and depreciated rank. --Jarekt (talk) 12:00, 28 March 2018 (UTC)[reply]
- depreciated => deprecated? - Jmabel ! talk 16:08, 28 March 2018 (UTC)[reply]
- Yes we should the same way we track conflicting or dubious dates of birth or other statements on Wikidata. I would not catalog every time Getty Images claim copyrights to old PD image, but if there are valid conflicting claims, that should be reflected in the data. Hopefully with preferred, normal and depreciated rank. --Jarekt (talk) 12:00, 28 March 2018 (UTC)[reply]
Looking at proposal in File:Licensing model framework discussion.pdf and the discussions at WikiProject_sum_of_all_paintings and d:Property_talk:P275, I think we need each file to have one or more groups of statements, implemented as a statement and several qualifiers. We have slightly different needs for copyrighted and PD works but we should reuse as many properties used to model them as possible.
Property | explanation for PD works | explanation for copyrighted works |
---|---|---|
copyright status | with possible values: In Copyright or public domain | |
copyright license (P275) | does not apply | Examples: GNU General Public License, version 2.0, Creative Commons Attribution-ShareAlike 4.0 International. Each license item should specify: obligations and have URL to the full license text |
copyright holder (P3931) | does not apply | |
author (P50) | specified as qualifier for the copyright statement if there are multiple authors and we are tracking copyrights of each one separately | if different than copyright owner |
start time (P580) | rarely needed for copyrighted works | |
public domain date (P3893) | alternative to start time (P580) (?) | does not apply |
end time (P582) | usually not needed for PD works, except for rare cases where PD works become copyrighted again | |
applies to jurisdiction (P1001) | stating a country or countries, where copyright applies, with default for copyrighted works being worldwide. Possible values: United States, worldwide | |
attribution text | if custom attribution is required | |
justification | with values like Template:PD-old-100 | "own work", "Published at external website under stated license", "published at Flickr under stated license", "permission sent to OTRS" |
reference URL (P854) | link to external website with stating that work is in PD | link to external website stating the license for the file |
applies to part (P518) | allow us to specify copyrights of works current work was derived from, objects in the photograph, sound and video part of a movie, composer and performers, etc. | |
comment | Free text explanation of whatever can not be explained through other means. Should not be used much. |
Indirect Property | property of item linked through | comment |
---|---|---|
laws applied (P3014) | justification | map a commons template to the legislation where it is based on |
obligations | copyright license (P275) | attribution, share-alike, distribution with the full copy of the license, etc. |
license URL | copyright license (P275) | https://creativecommons.org/licenses/by-sa/2.0/deed.en, https://creativecommons.org/licenses/by-sa/2.0/legalcode |
We could have Copyright status as the statement and all the other properties as qualifiers, so they stay grouped together. A file (or artwork item on Wikidata) could have one or more such statement groups allowing us to specify copyright status for:
- different periods of time including past and future
- different jurisdictions
- different creators of different parts of the work
- changing copyright owners
Specifying all those properties might be excessive for most files, but we should be able to get to that level of detail. We should also allow people to predefined some combinations in personal sandbox items from where they would be copied at the image upload time. --Jarekt (talk) 13:48, 28 March 2018 (UTC)[reply]
- @Jarekt: As I said above, in my comment on Multichill's thread, this model is attractive. But I do see a problem with works that are the result of a complicated chain of contributions, multiple parts, different statuses in different jurisdictions, etc, because (as far as I can see) each different combination of those factors is going to require a different statement, so a work may end up with quite a confusing wall of statements -- potentially including all three of PD, Copyright-Licensed, and Copyright-Restricted -- in a way that may result in something quite hard even for a machine to make sense of.
- The other thing I worry about with this model is the qualifier to express the aspect of the image or the nature of the contribution that the statement refers to. Will a single item value be expressive enough to cover all the possibilities? And is this the same place or a different place as to where one would specify 'Created by X after an engraving by Y after a drawing by Z' ? Jheald (talk) 15:07, 28 March 2018 (UTC)[reply]
- Jheald, that are all very good points. About complexity issue, I agree, it will be potentially confusing, if you have statements for different creators with time periods and jurisdictions. It is even more confusing right now. One thing that might make it easier (or more crazy) is that if I take photo of PD sculpture that also has an item. Than the sculpture's copyright will be specified in the sculpture item, so different photographs of the same sculpture have the same copyright statements regarding the sculpture. The same with different performances of the same composition, etc. Maybe we can have property "Inherit copyright statements from" with applies to part (P518) qualifier. The relationships like 'Created by X after an engraving by Y after a drawing by Z' could be handled with applies to part (P518) qualifier with values like "sculpture", "engraving", "drawing", etc.--Jarekt (talk) 15:38, 28 March 2018 (UTC)[reply]
- Indeed, that could help or (and!) make things more crazy if the earlier object has an item (either on Wikidata or on CommonsData). Not all will, however - or might only exist as part of a larger entity, such as a book. Jheald (talk) 15:56, 28 March 2018 (UTC)[reply]
- In the overview I do miss laws applied (P3014). With this property we can link to the specific article in a national copyright law that provides the legal base to claim the public domain status. We could use this property inboth to map a commons template to the legislation where it is based on, or directly as justification. Also I would like to see if we can use public domain date (P3893) instead of start time (P580) --Hannolans (talk) 20:32, 28 March 2018 (UTC)[reply]
- Hannolans laws applied (P3014) should be the same for each justification so it should be property of that object, and not set for each file individually. But you are right, wee need it too. Similarly we will need items linked by copyright license (P275) to have statements about obligations (attribution, share-alike, distribution with the full copy of the license, etc.) and URL to the license page, as it was pointed out in File:Licensing model framework discussion.pdf. I created another table of indirect or inherited properties. As for public domain date (P3893), it could be alternative of start time (P580). It seems better suited as stand alone property and not a qualifier. In case you have multiple PD licenses that apply (for different jurisdictions for example or when multiple laws apply to a single file) and each has a different start day. --Jarekt (talk) 12:03, 29 March 2018 (UTC)[reply]
- In the overview I do miss laws applied (P3014). With this property we can link to the specific article in a national copyright law that provides the legal base to claim the public domain status. We could use this property inboth to map a commons template to the legislation where it is based on, or directly as justification. Also I would like to see if we can use public domain date (P3893) instead of start time (P580) --Hannolans (talk) 20:32, 28 March 2018 (UTC)[reply]
- One way to deal with a forest of different copyright statuses for different contributions might be to have an additional "copyright status summation" statement, giving the overall status for the work as a whole, similar to the quite limited level of detail presented by Flickr (Commons:Flickr files). Of course, this would have to be kept in sync with all the "copyright status details", which would be a pain, and an annoyance to data orthogonality purists; but it might be quite useful. Jheald (talk) 15:33, 30 March 2018 (UTC)[reply]
- Yes we could have some summary statement to clarify large number of copyright statements with narrow scopes. It could state copyrights in the US at the present time for example, with option to make it worldwide. --Jarekt (talk) 12:17, 2 April 2018 (UTC)[reply]
It'll be great to have constrains for works that become public domain by copyrights term expiration based on copyrights term and death of author(s). --EugeneZelenko (talk) 14:23, 29 March 2018 (UTC)[reply]
Is the category tree ( =item tree) structured data? What is licensed here: the whole tree, its individual branches, parts from the root (1/2, 1/3, ... 1/100000(0) ), different scales, only the supercategory, something else? --Fractaler (talk) 06:42, 29 March 2018 (UTC)[reply]
- What is licensed is the source code that produced the tree. It is a structured data, but not the Structured Data on Wikimedia Commons we are discussing here, which will use Wikidata like software. --Jarekt (talk) 12:06, 29 March 2018 (UTC)[reply]
- Why "source code that produced the tree"? The knowledge base (category tree) is the graph, directed, connected and acyclic graph. vertex, edges are put by users. --Fractaler (talk) 12:20, 29 March 2018 (UTC)[reply]
- The mw:Extension:CategoryTree is a software licensed as GNU GPL 2.0. It operates on data distributed under CC-Zero license, so graphs are derivative of CC-Zero work. My guess would be that they are CC-Zero as well. --Jarekt (talk) 12:26, 2 April 2018 (UTC)[reply]
- Ie, we must have time to make such a graph (taxonomy) as large as possible. Otherwise, there will be a paid version of a larger graph that does not allow the existence of a free graph. --Fractaler (talk) 13:06, 2 April 2018 (UTC)[reply]
- I do not see how other organizations creating copyrighted versions of such graph would have any influence on copyright of Wikidata based graph. --Jarekt (talk) 14:52, 2 April 2018 (UTC)[reply]
- Ie, we must have time to make such a graph (taxonomy) as large as possible. Otherwise, there will be a paid version of a larger graph that does not allow the existence of a free graph. --Fractaler (talk) 13:06, 2 April 2018 (UTC)[reply]
- The mw:Extension:CategoryTree is a software licensed as GNU GPL 2.0. It operates on data distributed under CC-Zero license, so graphs are derivative of CC-Zero work. My guess would be that they are CC-Zero as well. --Jarekt (talk) 12:26, 2 April 2018 (UTC)[reply]
- Why "source code that produced the tree"? The knowledge base (category tree) is the graph, directed, connected and acyclic graph. vertex, edges are put by users. --Fractaler (talk) 12:20, 29 March 2018 (UTC)[reply]
- The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.
In response to: Models used by other institutions
[edit]- I will be summarizing this for the Rightstatements.org Technical Working Group members and will bring our discussions back here. Are there any specific questions you would like the group to address? Musebrarian (talk)
In response to: Previous work
[edit]- Multichill, is there a study/source of the numbers for how many PD/Commons templates/categories are applied in the Commons? I see on the main Structured Data page some other information about information templates, but not about licenses, etc. Musebrarian (talk) — Preceding unsigned comment added by SandraF (WMF) (talk • contribs) 06:45, 18 April 2018 (UTC)[reply]
CC REL, the Creative Commons Rights Expression Language
[edit]I was alerted to the fact that Creative Commons has been working on a RDF representation of its licenses - CC REL, The Creative Commons Rights Expression Language. I think it can be very inspirational for our own modelling on Wikidata.
- General info: https://creativecommons.org/ns
- For more extensive background/rationale, see chapter 10 of this book: https://www.communia-association.org/wp-content/uploads/the_digital_public_domain.pdf
Cheers, SandraF (WMF) (talk) 18:37, 18 May 2018 (UTC)[reply]
Please visit Wikidata:Property proposal/OTRS ticket number 2 and other proposals in d:Wikidata:Property_proposal/Sister_projects#Wikimedia_Commons for discussion on new properties related to Commons. --Jarekt (talk) 13:39, 18 December 2018 (UTC)[reply]