Wikidata:Property proposal/start and end time in video

From Wikidata
Jump to navigation Jump to search

"start time in media" and "end time in media"

[edit]

Originally proposed at Wikidata:Property proposal/Sister projects

   Not done
Descriptionqualifiers for stating to which part of a media file a statement applies to. These qualifiers (i.e. "start time in media" and "end time in media") should only be used in media files (i.e. audio and videos) of Wikimedia Commons.
Data typeQuantity
Allowed values(\d{1,2}:)?(\d{2}):(\d{2})(.\d{3})? (same notation that uses FFmpeg for specifying times, documentation can be found here)
Example 1Video file: commons:File:WikidataCon 2017 - Languages in Wikidata.webm
main subject
Normal rank Wikidata label
start time in media 00:02:36
end time in media 00:03:33
0 references
add reference
Normal rank Wikidata editor
start time in media 00:09:02
end time in media 00:12:00
0 references
add reference
add value
Example 2Video file: commons:File:35c3 WikipakaWG - Wikidata-introduction Wikidata Query Service introduction (eng).webm
main subject
Normal rank Wikidata item
start time in media 00:16:34
end time in media 00:22:46
0 references
add reference
Normal rank Help:Interlanguage links
start time in media 00:22:46
end time in media 00:24:20
0 references
add reference
Normal rank Commons category
start time in media 00:38:12
end time in media 00:39:09
0 references
add reference
Normal rank Structured Data on Wikimedia Commons
start time in media 00:39:09
end time in media 00:40:04
0 references
add reference
Normal rank depicts
start time in media 00:40:04
end time in media 00:40:24
0 references
add reference
Normal rank Commons category
start time in media 00:40:24
end time in media 00:40:54
0 references
add reference
add value
Example 3Video file: commons:File:WIKITONGUES- Diego speaking Portuguese, English, Spanish, French, Italian, and German.webm
language of work or name
Normal rank French
start time in media 00:00
end time in media 00:52
0 references
add reference
Normal rank English
start time in media 04:16
end time in media 06:28
0 references
add reference
Normal rank Spanish
start time in media 06:31
end time in media 07:30
0 references
add reference
add value
Example 4Video file: commons:File:Ex1402-dive02.webm
depicts
Normal rank crab
start time in media 00:03:44.992
end time in media 00:03:55.869
0 references
add reference
add value
Example 5Audio file: commons:File:Debate-presidencial-peru2006-parte1.ogg
speaker
Normal rank Alan García
start time in media 00:04:46.353
end time in media 00:07:50.131
0 references
add reference
Normal rank Ollanta Humala
start time in media 00:08:40.531
end time in media 00:11:39.293
0 references
add reference
add value
Planned useI mainly plan to use it in conference presentations (e.g. WikidataCon 2017) since those are the media files that I mostly watch, but I'm sure people will be able to use in the structured data of media files of any kind.

Motivation

[edit]

In Wikimedia Commons, there are long videos (00:57:31, 00:52:50, 00:48:37, 00:36:36) and long audios (01:15:57, 00:45:31, 00:27:51, 00:18:51). The statements in SDC sometimes only apply to some parts of the media files. For example, an indigenous language can be spoken in a short part of a documentary or the main subject during the part of a lecture can be a specific topic or a person can be the speaker at multiple parts of a conference presentation. For this reason, we need qualifiers for stating to which segment of the media file a statement applies to. I'll list some more examples.

In File:WIKITONGUES- Pau speaking French, the languages used are French, Lithuanian, Italian, English, Spanish, and Catalan. A user that would find this video might find difficult to know the parts in which each language is spoken. This qualifier can solve this issue by using it as it follows.

language of work or name
Normal rank French
start time in media 00:00
end time in media 00:52
0 references
add reference
Normal rank English
start time in media 04:16
end time in media 06:28
0 references
add reference
Normal rank Spanish
start time in media 06:31
end time in media 07:30
0 references
add reference
add value

Another example: Consider File:35c3 WikipakaWG - AI in Wikipedia (eng).webm. Its duration is 37 min 13 s. Because of the title, we can say the main subjects are artificial intelligence and Wikipedia, but, during the presentations other topics are discussed. This qualifier can help for specifying those topics and the segments in which they are discussed.

main subject
Normal rank ORES
start time in media 02:27
end time in media 03:10
0 references
add reference
Normal rank MediaWiki
start time in media 04:57
end time in media 05:54
0 references
add reference
Normal rank ethics of artificial intelligence
start time in media 19:09
end time in media 19:51
0 references
add reference
add value

Final example: Consider File:Documentary - The Fourth Industrial Revolution.webm. Having support for milliseconds will provide users the opportunity to be as granular as they need.

depicts
Normal rank firework
start time in media 00:02:12.603
end time in media 00:02:15.523
0 references
add reference
Normal rank firework
start time in media 00:02:22.923
end time in media 00:02:26.203
0 references
add reference
add value


Rdrg109 (talk) 08:04, 21 December 2021 (UTC)[reply]

Discussion

[edit]
  •  Comment we already have time index (P4895) for start time, but lack a property for end time. --- Jura 12:59, 21 December 2021 (UTC)[reply]
  •  Comment Someone in the telegram group of Wikimedia Commons mentioned that these qualifiers could also be used in audio files. I've updated the proposal. The name of the qualifiers are now "start time in media" and "end time in media" instead of "start time in video" and "end time in video" so that they can also be used in audio files. --- Rdrg109 (talk) 15:16, 21 December 2021 (UTC)[reply]
  •  Support This will be especially useful on longer media files. Ainali (talk) 22:42, 27 December 2021 (UTC)[reply]
  •  Oppose per above. Use existing property for start time. Datatype for end time shouldn't be string, but the same as time index (P4895). --- Jura 09:54, 28 December 2021 (UTC)[reply]
    @Jura1: The problem with the quantity data type is that it only allows a single unit. This means that if someone wants to specify a timestamp, they should use the unit of the smallest time unit. For example, if someone wants to store 01:00:00, they would use 1 hour. If someone wants to store 01:32:00, they would use 92 minutes. If someone wants to store 01:12:37, they would use 4357 seconds. If someone wants to store 01:13:18.322, they would use 4398322 millisecond. Having to convert the timestamp to different units makes it more difficult to the user to contribute which might cause the property not to be used at all.
    Because the data type of time index (P4895) is "quantity", I think that the proposed properties are more suitable.
    Rdrg109 (talk) 04:45, 2 January 2022 (UTC)[reply]
    • You can enter 18.322 seconds as 18.322 second. No need to input 18322 millisecond.
    It's clear that it requires some thought about unit selection and value entry (but even date entry require date precision to be determined). If you need help with that, please ask on project chat.
    It may be easier to add "01:32", but a user couldn't really know what it means. Is is 1 hour and 32 minutes, 1 minute and 32 seconds?
    Quantity datatype does provide an automatic conversion to seconds on query service, so fairly easy to figure out the values, even without checking units. --- Jura 14:40, 2 January 2022 (UTC)[reply]
    @Jura1: We can avoid the confusion of taking "01" in "01:32" as hour or minute, by using the same logic that uses FFmpeg. You can find information about that in this link. Here's the notation: [-][<HH>:]<MM>:<SS>[.<m>...] (omit the hyphen at the beginning -). As you can see, the hours are only considered when there are three parts in the time. --- Rdrg109 (talk) 15:48, 12 January 2022 (UTC)[reply]
    Seems possible. Still, I think it's better to use the available datatype that has conversion built in. --- Jura 12:20, 18 January 2022 (UTC)[reply]
  •  Oppose start time is a duplicate of time index (P4895) and data type should be quantity. --Dipsacus fullonum (talk) 09:21, 1 January 2022 (UTC)[reply]
    time index (P4895) is the relative equivalent of point in time (P585), not start time (P580). I agree that the datatype should be quantity though. - Nikki (talk) 03:41, 2 January 2022 (UTC)[reply]
    @Nikki: @Dipsacus fullonum: I've shared my thoughts on why I think we shouldn't use the quantity datatype for this use case in the Jura1's answer in this property proposal. Let me know your thoughts please. --- Rdrg109 (talk) 04:50, 2 January 2022 (UTC)[reply]
    @Nikki: Two out of three uses of Wikidata property example (P1855) for time index (P4895) (for Blade Runner (Q184843) and Stairway to Heaven (Q192023)) are similar to the examples given in this proposal except the end time isn't indicated. I see no reason why media files at Wikimedia Commons should use other properties for relative times than media files in general. --Dipsacus fullonum (talk) 05:09, 2 January 2022 (UTC)[reply]
    To me that is a clear argument for these properties. "Blade Runner depicts a unicorn at 72 minutes" is true as long as there is a unicorn visible at that time, whether it only just appeared or not. If you agree with that, then we're missing a way to enter the start time. If you disagree with it, then the property is ambiguous and we're missing a way to distinguish start time uses from point in time uses. - Nikki (talk) 09:52, 4 January 2022 (UTC)[reply]
  •  Comment A more appropiate data type for these properties would be Duration, but because, apparently, it hasn't been fully integrated into Wikibase, I think that if these properties were to be created, they could be handled with String until the Duration data type is fully implemented. --- Rdrg109 (talk) 04:58, 2 January 2022 (UTC)[reply]
    I wrote that ticket, but it doesn't really seem to be acted upon.
    One of the suggestions in the ticket is to display quantity differently, so entering it as quantity already would allow to convert values easily later.
    BTW, speaking of "duration" (different sense), instead of specifying end time, one could just use duration (P2047) and time index (P4895). --- Jura 08:25, 11 January 2022 (UTC)[reply]
    @Jura1: In regards to the ticket: Hopefully, it'll be implemented some day. If not, we can use String for the time being.
    In regards to using duration (P2047): I think the same problem happens: More friction for the user. The user would need to compute the difference between point 1 and point 2 in the video and then use that value for duration (P2047). With this properties, they would just need to use the time that is shown in their media players which most of them, if not all, show the time in the format [<HH>:]<MM>:<SS>[.<m>...] which is the one that would be used in this property. --- Rdrg109 (talk) 16:15, 12 January 2022 (UTC)[reply]
    I think the time format shown in one's player depends on the player's localization. I strongly disagree to the datatype string with a format used in English. While we wait for the duration datatype, quantity is much better as it includes unit so people don't have to guess to interpret the string, and it be usable for all independent of language, script system and culture. It will also be easy to make tools to convert to and from whatever format different people prefer. Wikidata is primary intended to be accessed by computers, and datatypes should be intended for easy computer handling. --Dipsacus fullonum (talk) 18:41, 12 January 2022 (UTC)[reply]
  • I have boldly changed the datatype of proposal from string to quantity.--GZWDer (talk) 18:45, 16 February 2022 (UTC)[reply]
  •  Support very useful!--2le2im-bdc (talk) 19:59, 20 February 2022 (UTC)[reply]
  •  Comment Why not speak of "start validity in media" and "end validity in media"? It would be also possible to use it for a text. Ex : One text depict something from page 2 to page 5. --2le2im-bdc (talk) 20:14, 20 February 2022 (UTC)[reply]
    We use page(s) (P304) for that. I don't think it would be a good idea to use this for both timestamps and page numbers, because they are not the same thing. Page numbers are also not strictly numeric or sequential (you can have both page v and page 5 in a book). - Nikki (talk) 17:24, 18 November 2022 (UTC)[reply]
  •  Conditional support I support adding these but only if the datatype is quantity. I realise the interface is not ideal for that, but Wikidata is about storing structured data and the interface can theoretically be improved. If someone wants to enter it as plain text and try to enforce a consistent format for it, they can already do that in wikitext. It should be possible to make a script to both parse and display HH:MM:SS timestamps in Wikidata, but unfortunately I have no idea how to do that in Commons - most of the hooks that we use for scripts aren't currently used there. - Nikki (talk) 17:39, 18 November 2022 (UTC)[reply]

@Rdrg109: the examples are not in the proposed datatype. Please either edit the examples to the proposed datatype or edit the status of the property to "on hold" to wait for a suitable datatype. ChristianKl18:03, 4 December 2022 (UTC)[reply]