Wikidata:Property proposal/Natural science

Property proposal: Generic Authority control Person Organization
Creative work Place Sports Sister projects
Transportation Natural science Computing Lexeme

See also

edit

This page is for the proposal of new properties.

Before proposing a property

  1. Search if the property already exists.
  2. Search if the property has already been proposed.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
  6. Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
  3. See property creation policy.



Physics/astronomy

edit

‎Maximum beam energy

edit
   Under discussion
DescriptionMaximum beam energy of a particle accelerator
Representsparticle accelerator (Q130825)
Data typeQuantity
Template parameterenergy in en:template:infobox particle accelerator
Allowed unitsmegaelectronvolt (Q72081071)

gigaelectronvolt (Q12789864)

teraelectronvolt (Q3984193)
Example 1Large Hadron Collider (Q40605)6800 GeV
Example 2Large Electron–Positron Collider (Q659029)209 GeV
Example 3Tevatron (Q944533)1000 GeV
Planned useMany particle accelerators are present in Wikidata, but important properties such as beam energy cannot be added.

Motivation

edit

particle accelerator (Q130825) have many properties. One of the most important is the maximum beam energy. Values are tabulated in many reviews. This property is one of the many properties that should be introduced.  – The preceding unsigned comment was added by Wiso (talk • contribs) at 09:10, 24 December 2024 (UTC).[reply]

Discussion

edit

I hope to receive some feedback on the correctness of my first property proposal. In particular, I am wondering if it is correct to associate this property to a particle accelerator (Q130825) instance. A potential issue is, for example for Large Hadron Collider (Q40605) the value 6800 GeV refers to the maximum beam energy when accelerating protons, which is its main usage. LHC can also accelerate heavy ions, and in this case, the value depends on the kind of ion and, in addition, the energy is expressed as "energy (typically GeV) per nucleon". I am wondering if this is an important ambiguity and if it needs more thinking. On the contrary, in many reviews this kind of property is used referring only to the main usage (protons for LHC), for example in the particle data group reviews (https://pdg.lbl.gov/2024/download/db2024.pdf page 240) and also on Wikipedia infobox en:template:infobox particle accelerator.

On the contrary, it is possible to associate the property to a "Run". For example in Run1 LHC accelerated protons at 3500 or 4000 GeV per beam (depending on the year). I think that the adjective "maximum" resolves this ambiguity.

An alternative, but probably more complicated, proposal would be to use a more general concept of energy property (which I think doesn't exist) and to associate with the "beam part".  – The preceding unsigned comment was added by Wiso (talk • contribs) at 09:38, 24 December 2024 (UTC).[reply]

‎SIMBAD catalog properties (used more than 1 million times)

edit

Gaia Data Release 2 ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in Gaia Data Release 2
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{18}
Example 1BS Cnc (Q2889194)661284024235415808
Example 2Gliese 450 (Q5880899)4031586157514097024
Example 3TYC 3645-2080-1 (Q75838267)1943381923013901440
SourceGaia Data Release 2 (Q51905050)
Planned usemigrate all P528 values qualified with P972 Q51905050 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR2%20$1

2MASS ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in the Two Micron All Sky Survey
Data typeExternal identifier
Domainastronomical objects
Allowed valuesJ[0-9]{8}[+-][0-9]{7}
Example 1BS Cnc (Q2889194)J08390909+1935327
Example 2Gliese 450 (Q5880899)J11510737+3516188
Example 3TYC 3645-2080-1 (Q75838267)J23350993+4851114
Source2MASS (Q1454942)
Planned usemigrate all P528 values qualified with P972 Q1454942 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=2MASS%20$1

Tycho-2 Catalogue ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in the Tycho-2 Catalogue
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{1,4}-[0-9]{1,4}-1
Example 1BS Cnc (Q2889194)1395-2445-1
Example 2Gliese 450 (Q5880899)2526-2357-1
Example 3TYC 3645-2080-1 (Q75838267)3645-2080-1
SourceThe Tycho-2 catalogue of the 2.5 million brightest stars (Q2725928)
Planned usemigrate all P528 values qualified with P972 Q2725928 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=TYC%20$1

Gaia Data Release 1 ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in Gaia Data Release 1
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{18}
Example 1BS Cnc (Q2889194)661284019938140032
Example 2Gliese 450 (Q5880899)4031586157514097024
Example 3TYC 3645-2080-1 (Q75838267)1943381923012780160
SourceGaia Data Release 1 (Q37859523)
Planned usemigrate all P528 values qualified with P972 Q37859523 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR1%20$1

SDSS object ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in the Sloan Digital Sky Survey
Data typeExternal identifier
Domainastronomical objects
Allowed valuesJ[0-9]{6}\.[0-9]{2}[+-][0-9]{7}\.[0-9]
Example 1BS Cnc (Q2889194)J083909.03+193532.4
Example 2Gliese 450 (Q5880899)J115106.57+351627.2
Example 3TYC 3645-2080-1 (Q75838267)J233509.93+485111.4
SourceSloan Digital Sky Survey (Q840332)
Planned usemigrate all P528 values qualified with P972 Q840332 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=SDSS%20$1

OGLE-III object ID

edit
   Under discussion
Descriptionidentifier for an astronomical object in the Optical Gravitational Lensing Experiment
Data typeExternal identifier
Domainastronomical objects
Example 1R99 (Q22087000)BRIGHT-LMC-MISC-429
Example 2R85 (Q28406638)BRIGHT-LMC-MISC-9
Example 3SV* HV 2827 (Q74703824)LMC-CEP-4689
SourceThe Optical Gravitational Lensing Experiment. The OGLE-III catalog of variable stars. I. Classical Cepheids in the Large Magellanic Cloud (Q67054966)
Planned usemigrate all P528 values qualified with P972 Q67054966 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=OGLE%20$1

Motivation

edit

The specific combination of catalog code (P528) qualified by catalog (P972) is used in 24 million statements, the vast majority of which are for astronomical objects. About 14 million of these statements come from six catalogues, so migrating those statements to use these properties would remove the 14 million triples taken up by the P972 qualifiers. (Another 18 catalogues have more statements than the number of statements for inventory number (P217) with qualifier collection (P195) The Palace Museum (Q2047427)—127545 as of 6 August 2024.)

(This migration would similar to the migration that took place after the properties proposed at Wikidata:Property proposal/proper motion components were created. While this page intends to handle only the six largest catalogues, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment.) Mahir256 (talk) 21:56, 6 August 2024 (UTC)[reply]

Discussion

edit
@Mahir256 Is there any specific reason why we want to reduce number of P528 statements? Ghuron (talk) 00:03, 7 August 2024 (UTC)[reply]
@Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
@Mahir256: Let me rephrase how I understood your rationalization: if p:P528/pq:P972 wd:Q51905050 occurs more than a million times, then it is both a necessary and sufficient condition for creating a new property, since it reduces the number of triplets and thus reduces the risk of Blazegraph crashing. Is that a correct summary? Ghuron (talk) 22:44, 12 August 2024 (UTC)[reply]
@Ghuron: I would not phrase it quite so absolutely, but I do want to see the number of triples reduced and believe this is a way to do it; an extremely high number of identically structured uses of a generic identification property like catalog code (P528) with the same qualifiers suggests that a more specialized identifier property is worth introducing to streamline things, just as has been done multiple times before. Mahir256 (talk) 16:50, 13 August 2024 (UTC)[reply]
As stated by Ghuron, is there any reason why we need to reduce the number of P528 statements? In the first place there are millions of Gaia IDs because of the import of the Simbad database (I am NOT against this import btw).
Also, I wonder why only some catalogues would have their own properties. This will create a weird in-between for catalogues in P258 vs catalogues having their own properties. This makes no sense imo.
Romuald 2 (talk) 15:31, 8 August 2024 (UTC)[reply]
  • There is nothing wrong with having separate external id properties for most used identifiers with the correct "url formatter".
    But I have 2 major objections:
  1. I don't see any reason to use https://simbad.u-strasbg.fr/simbad/sim-id?Ident= as a url. Those items that are on simbad, we already have Property:P3083 with the link to simbad. Those rare items that are not on simbad, this link will result in 404
  2. Having in mind (1) it would make sense to link to really useful external storages, that are only partially synchronized with simbad (like HyperLEDA or Gaia Archive). And that leads us to question about proposed set of properties:
    1. Why did we choose Gaia DR2, because this is only temporary IDs, permanent are Gaia DR3?
    2. Why did we choose Tycho-2, they pretty much 100% imported in Simbad?
Ghuron (talk) 12:52, 9 August 2024 (UTC)[reply]
  • @Romuald 2: Reducing the number of RDF triples that Wikidata consists of is generally a good thing, as there is a lot of discussion going on about the health of the Query Service and how reducing the number of triples that a single running Blazegraph instance holds is generally a good thing. Also I had noted that there were 18 other catalogs with more entries than the most frequent inventory number source; I only didn't add them to this page because it would have got too long. If these six go through, then I will promptly propose properties for those 18 (and as I stated in the motivation above, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment). Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
    @Ghuron: The reason I selected the SIMBAD formatter URL is that the external IDs I tried with that URL all seemed to resolve to the right objects; if there are in fact objects for which this resolution doesn't work, it would be great if you could name some. The caveat "(used more than 1 million times)" in the title of this property proposal page is important; because your imports did not yield more than 1 million Gaia DR3 identifiers, I did not think to propose a property for it here, though I'd gladly support one for Gaia DR3 if you think it would be useful. I don't know who "we" is as regards either Gaia DR2 or Tycho-2; you're the one who mass-imported the objects, so I'm working with the catalog codes I see on those objects. Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
    @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)[reply]
    I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
    I understand the idea that this will reduce the number of triplets, but I think that the measly few million that we are discussing here are a drop in the ocean. Our goal is to upload data to Wikidata, and not try to optimize it in a way that makes life easier for the foundation's engineers. Let them do their job and we will do ours. Ghuron (talk) 19:00, 16 September 2024 (UTC)[reply]
  • It seems this proposal has lost its momentum. I have created another proposal for one of the identifiers, in which I have tried to take into account all the comments above. Perhaps in this form it will be more acceptable to the community? Ghuron (talk) 15:23, 21 December 2024 (UTC)[reply]

Gaia ID

edit
   Under discussion
DescriptionObject ID in the Gaia catalog (Data Release 3 unless otherwise stated)
Data typeExternal identifier
Domaininstances and subclasses of star (Q523)
Allowed values[0-9]{18}
Example 1BS Cnc (Q2889194)661284024235415808
Example 2Gliese 450 (Q5880899)4031586157514097024
Example 3TYC 3645-2080-1 (Q75838267)1943381927312402304
Planned usemigrate all catalog code (P528) values qualified with catalog (P972) Gaia Data Release 3 (Q66061041) to this property
Formatter URLhttps://vizier.cds.unistra.fr/viz-bin/VizieR-S?Gaia%20DR3%20$1

Motivation

edit

This proposal is inspired by Wikidata:Property_proposal/SIMBAD_catalog_properties_(used_more_than_1_million_times) by User:Mahir256. In 2024 (and probably for a few years to come), Gaia DR3 ID is considered the most reliable star identifier and provides access to the most accurate astrometric information.

Discussion

edit
Arlo Barnes Athulvis Buller1 cdo256 Cekli829 Harlock81 Jc3s5h JenlovesbigD J. N. Squire Jura1 Kepler-1229b LiMr Manlleus AyberkKZ Meodudlye, with only limited amount of time to spend in the foreseable future. Mike Peel mu301 (mikeu) Paperoastro Path slopu Ptolusque Romuald 2 Sarilho1 Shisma Simon Villeneuve SM5POR Tom.Reding VIGNERON Wallacegromit1 - generally like to add ground and space observatory instrument data Ysogo Jck1337

  Notified participants of WikiProject Astronomy. Samoasambia 05:32, 27 December 2024 (UTC)[reply]

Biology

edit
Please visit Wikidata:WikiProject Taxonomy for more information. To notify participants use {{Ping project|Taxonomy}}
Please visit Wikidata:WikiProject Biology for more information. To notify participants use {{Ping project|Biology}}

‎mode of reproduction

edit
   Ready Create
Descriptionways for living organisms to propagate or produce their offsprings
Data typeItem
Domaintaxon (Q16521) or organisms known by a particular common name (Q55983715)
Allowed valuesitem
Example 1mammal (Q7377)sexual reproduction (Q182353)
Example 2bacteria (Q10876)cell division (Q188909)
Example 3plant (Q756)asexual reproduction (Q173432)
Example 4plant (Q756)sexual reproduction (Q182353)
Planned useWould like to enable specifying mode(s) of reproduction for any organism or taxon via this property, preferably with references.
Expected completenessalways incomplete (Q21873886)

Property constraints

edit

Motivation

edit

Currently, for the hundreds of thousands of Wikidata records related to taxa or organisms, there is no easy way to specify the mode of reproduction. This proposed property is intended to fill a gap. --Zhenqinli (talk) 04:37, 30 August 2024 (UTC)[reply]

Discussion

edit

  Notified participants of WikiProject Biology. –Samoasambia 09:33, 30 August 2024 (UTC)[reply]

Agreed that there is no need to specify this property for every species. For some, specification at the highest level of taxons would suffice. However, there is a great deal of diversity and variability in the biological world. Even just for vertebrates, the mode of reproduction could be: oviparity (Q212306), viviparity (Q120446), and ovoviviparity (Q192805). In short, this property would provide an option for clarifications when more explicit explanation(s) are needed. --Zhenqinli (talk) 13:28, 30 August 2024 (UTC)[reply]
Thanks for the feedbacks. Indeed, having has characteristic (P1552) with any subclass of mode of biological reproduction (Q130077803) is better than having no information regarding an organism's mode(s) of reproduction in Wikidata. Currently they are almost 300 taxon-related properties. Many of them could have been implemented in similar ways as suggested. In my personal opinion though, having a roundabout way to state a key feature of an organism, is not ideal. --Zhenqinli (talk) 21:46, 1 September 2024 (UTC)[reply]
P.S. The description of has characteristic (P1552) does mention: "Use a more specific property when possible". This property is currently used in more than 200,000 statements, without constraints on subject (organism or taxon) or value (mode of reproduction) as this proposal would prefer. These facts will likely discourage systematic input of useful data and eventual WDQS query of mode of reproduction information using this property in Wikidata. --Zhenqinli (talk) 02:25, 2 September 2024 (UTC)[reply]
  •   Support; Zhenqinli makes a strong case against using has characteristic (P1552). However, the proposal should be revised to reflect Andy's note – it's standard practice to apply statements only at the highest class (or taxon) at which they are universally true (and sometimes even higher, with qualification like nature of statement (P5102)=often (Q28962312)), a principle that Example 1 (at least) violates. [Edit: fixed 18:17, 12 September 2024 (UTC)] It doesn't seem like this property carries any special encouragement to violate that principle, but if it does, that could be addressed in a property usage note. Swpb (talk) 17:56, 9 September 2024 (UTC)[reply]
Agree that in the first example, Homo sapiens (Q15978631) should probably be replaced by mammal (Q7377). As parent taxon (P171) is a subproperty of subclass of (P279), statements describing organisms at higher taxon ranks do not need to be re-stated at lower ranks of the class, so there will be no redundancy issue. --Zhenqinli (talk) 18:49, 9 September 2024 (UTC)[reply]
  • I hope anyone who still has reservation about this proposal could help clarify if there are remaining open issues or alternatives to be discussed further. While diel cycle (P9566) does have more than 284,000 statements for animals, I believe this proposed property for all living organisms should require far less statements, since mode of reproduction is typically more well-defined biologically and commonly stated at higher taxon ranks than diel cycle (diel cycle could also be modified due to domestication). --Zhenqinli (talk) 18:09, 12 September 2024 (UTC)[reply]
  •   Weak support Infoboxes on Wikipedia might want to include the mode of reproduction and thus it's good to have it one it's own property that's separate from has characteristic (P1552).
Currently, the problem is that the examples of the property are bad. It's not true that all plants have both sexual and asexual reproduction and thus it would be bad to make the statement for plants. ChristianKl12:35, 1 October 2024 (UTC)[reply]
Such a statement for plants could be qualified by nature of statement (P5102)=often (Q28962312), but I agree that an unqualified always-true statement would make a better example. Anything wrong with examples 1 and 2? Swpb (talk) 14:01, 1 October 2024 (UTC)[reply]
Thanks for supporting the proposal. I, too, would like to see better examples. But I also think more examples could be introduced, improved or updated later. I believe the mode of reproduction is well-documented scientifically and systematically. Once introduced to Wikidata, this property can have comparable or better data quality and utilization compared with similar taxon-related properties such as is pollinated by (P1703), seed dispersal (P3741), longest observed lifespan (P4214), and diel cycle (P9566). --Zhenqinli (talk) 07:00, 18 October 2024 (UTC)[reply]
This is a relatively complex field. Human (and mouse) parthenogenesis has been achieved, on an embryonic level. Gynogenesis is present in vertebrates, as is hybridogenesis. I imagine the viral reproduction we are familiar with is called lysogenesis, but I also imagine that there's more to viruses than they are letting on, and certainly there can be gene mixing (indeed there can be inter-species and even inter-kingdom gene mixing). So I suppose we would want a list with custom allowed. Would we also allow the use of this property on things that reproduce but normally considered living? All the best: Rich Farmbrough13:28, 19 November 2024 (UTC).
Thanks for the informative comments. Indeed, this is an important and broad concept that is currently missing among existing Wikidata properties. Personally, I hope to see a new simple property to serve as a common denominator applicable to all taxa and organisms. The complexity of reproduction in the biological world could still be captured within combinations of value items and qualifiers, on an as-needed basis. For an example, the fact that sheeps could be reproduced via cloning can be expressed in the following statement: sheep (Q7368)cloning (Q120877), with qualifiers observed in (P6531)=cloned mammal (Q57813806) and model item (P5869)=Dolly the Sheep (Q171433). --Zhenqinli (talk) 00:20, 20 November 2024 (UTC)[reply]

‎nomenclatural type of

edit
   Under discussion
Descriptiontaxon item of which this item is the taxonomic type
Representstype (Q3707858)
Data typeItem
Example 1specimen of Salenidia gibba (Q123369633)→nomenclatural type of→Salenidia gibba (Q122966621) with qualifier subject has role (P2868)historic holotype (Q122811034)
Example 2FOS.294 (Q106511046)→nomenclatural type of→Ceratodus philippsii (Q106493544) with qualifier subject has role (P2868)holotype (Q1061403)
Example 3NHMD90101 (Q116051154)subject→nomenclatural type of→Q120685567{{{5}}} with qualifier subject has role (P2868)syntype (Q719822)
Planned usereplacement of of (P642) stored values in the results of this query
See alsotaxonomic type (P427)
Wikidata projectWikiProject Taxonomy (Q8503033)

Property constraints

edit

Motivation

edit

Proposal following this discussion. The new property will allow to store the values curently stored with of (P642) for e.g. all the results of this query. The new property can be used for type specimens or for taxa because taxa can also be the types of other taxa (e.g. a species can be the type of a genus). Qualifier subject has role (P2868) will be used to indicate the kind type it is, and to put that info within "instance of" as we sometimes currently see is actually not a good idea at all (they should be instance of "type specimen"), indeed in biology the same specimen can very well be the syntype of one taxa and the holotype of another different taxa. The new property will be the opposite of taxonomic type (P427). Christian Ferrer (talk) 16:37, 15 December 2024 (UTC)[reply]

Discussion

edit

Biochemistry/molecular biology

edit
Please visit Wikidata:WikiProject Molecular biology for more information. To notify participants use {{Ping project|Molecular biology}}

Chemistry

edit
Please visit Wikidata:WikiProject Chemistry for more information. To notify participants use {{Ping project|Chemistry}}

‎molecular formula

edit

  Notified participants of WikiProject Chemistry

Motivation

edit

This proposal addresses the need for improved data structure and maintenance within Wikidata’s chemical compound data. Currently, the Wikidata:WikiProject Chemistry manages approximately 1 million chemical items, with many of them linked to chemical formula (P274) and mass (P2067). The main issues are:

Redundancy in Data: With about 300,000 unique chemical formula strings in use, redundancy is a significant problem. Some strings are associated with over 1,000 items, which complicates data management (see https://w.wiki/B2ax).

Efficiency and Maintenance: Transitioning from string-based formulas to item-based ones will simplify maintenance, reduce redundancy, and optimize query performance, especially for SPARQL queries involving formulas or masses.

Data Optimization: Moving mass (P2067) statements to the newly created formula items will reduce the number of triples and make data management more efficient. Additionally, this change will facilitate the use of different units for masses and allow for better structured data.

Improved Modeling: Switching to item-based formulas could eliminate the need for overly complex has part(s) (P527) statements on chemicals, allowing cleaner, more precise data models (e.g., identifying all chemical formulas containing more than five oxygen atoms).

This change is expected to bring numerous benefits, including reduced redundancy, improved query efficiency, and better data maintenance. The potential downside of increased label editing can be managed, and the overall gain for Wikidata’s chemical data justifies this proposal. If approved, I am prepared to create the necessary items and migrate existing data.

Any further input to refine this proposal is more than welcome!

P.S.: I have no strong opinions if current chemical formula (P274) should be deleted or used on the new items as "Chemical Formula String"  – The preceding unsigned comment was added by AdrianoRutz (talk • contribs) at 15:00, August 28, 2024‎ (UTC).

discussion

edit
  •   Support sounds great! Egon Willighagen (talk) 15:25, 28 August 2024 (UTC)[reply]
      Comment Last night on the boat between Finland and Sweden I thought of another aspect where this would help model the chemistry in Wikidata better. If chemical formula are items (and thanks to GZWDer for showing various Wikipedias decided it was useful too), then they can also subclass each other. We can have an isotope-agnostic chemical formula ( the common case) and subclasses for chemical formula with isotopes.As such it does much more than being something technical (e.g. just about scalability) but actually improve how we talk about the chemistry. Egon Willighagen (talk) 07:07, 29 August 2024 (UTC)[reply]
  • Some comments:
  1. I will oppose "Additionally, this change will facilitate the use of different units for masses and allow for better structured data." - For consistency and machine-readability we should stick to one unit. I instead propose Wikidata:Property proposal/formula weight.
  2. Many wikis has pages like C15H20O4 (Q1250089). Some wikis treat it as disambiguation pages; some as set indices; we need to discuss how to handle such existing items. GZWDer (talk) 21:10, 28 August 2024 (UTC)[reply]
  • I looked at the English Wikipedia sitelink-ed page, and that actually looks exactly like a page about a chemical formula. To be honest, this actually sounds like in argument in favor of this proposal and that C15H20O4 (Q1250089) should be of type chemical formula (Q83147). The same for the French WP page, and neither say they are disambiguation pages, but are far more like a category of things with the same property. Just like this proposal, not? Egon Willighagen (talk) 06:58, 29 August 2024 (UTC)[reply]
I was only partially able to follow your mind here. In your proposal, you mention this property if created, thus you would support it? I believe the discussion about mass (P2067) (and units) or other properties is an interesting one this proposal would allow to better discuss/implement, and what I mentioned about these or what is currently on the example item are just ideas, if this new property allows for these things to also improve, even better! AdrianoRutz (talk) 08:51, 30 August 2024 (UTC)[reply]
  •   Weak oppose I cannot question arguments raised here about efficiency, but I don't see this as a proper way forward. This proposal completely fails to take into account the fact that for a given chemical entity there may be many – equally correct – chemical formulae (simple example in Q27260276#P274). Moving chemical formulae to another item will not help at all with the most important purpose for which WD exists – using this data. I would see the new property as being created only to assist with specific activities – but not to replace existing properties – and with appropriate disclaimers in the name and constraints that it is a strictly technical property only. Wostr (talk) 22:21, 28 August 2024 (UTC)[reply]
    I think this proposal has no problems with alternative formula notations, e.g. like CHAgO₃ (Q130044611). Or? Egon Willighagen (talk) 06:51, 29 August 2024 (UTC)[reply]
    CHAgO₃ and AgHCO₃ are not the same chemical formula. Just as e.g. XeF4O and XeOF4 which would require two different items for the same compound. In fact, for some compounds several new items would need to be created. For some chemical species we would have formulae that have different number of atoms of elements: C30H40F2N8O9, C15H17FN4O3·1,5H2O and C30H34F2N8O6·3H2O are correct formulae for the same compound, but I don't see a way for this to be reflected correctly by the current proposal. Everything looks fine if you consider only simple organic compounds and their formulae in Hill notation, but it's not that simple especially if we consider some inorganic compounds which are not molecules. Wostr (talk) 12:34, 29 August 2024 (UTC)[reply]
    Thank you for this important point! I removed the single value constraint, thus allowing for what you mention. AdrianoRutz (talk) 08:47, 30 August 2024 (UTC)[reply]
    Good point about non-molecular substances. I think the chemical concept we are trying to capture is that of isomerism: chemical entities are isomers when they have the same molecular formula (Q188009) or (non-structural) formula unit (Q1437643), enabling one molecule/ion/unit of the first chemical entity to be rearranged into one molecule/ion/unit of the second chemical entity by moving atoms/bonds around.
    • For example, the ionic compounds with structural formulas [CrCl(H₂O)₅]Cl₂•H₂O and [Cr(H₂O)₆]Cl₃ are (hydration) isomers, which we can recognise by assigning them the same formula H₁₂Cl₃CrO₆. This shows that all species in the crystal lattice of a compound should be combined together into a single entity when determining the formula. In the example you give above, the correct formula would be C₃₀H₄₀F₂N₈O₉, derived from combining together 2C₁₅H₁₇FN₄O₃•3H₂O, the smallest formula unit with integer multiples of all species.
    • Likewise, the molecular substance CO(NH₂)₂ and ionic compound NH₄OCN are considered isomers, which we can recognise by assigning them the same formula CH₄N₂O. This is the molecular formula of urea and the formula unit of ammonium cyanate, showing how molecular and non-molecular substances can be isomeric.
    • For ions, fulminate(1−) (Q27110286) (with structural formula CNO-) and cyanate anion (Q55503523) (with structural formula OCN-) are isomers, which we can recognise by assigning them the same formula CNO-.
    • Clathrates are similar to coordination compounds. E.g. methane clathrate (Q389036) has structural formula 4CH₄•23H₂O, yielding the formula C₄H₆₂O₂₃. Likewise, the endohedral fullerene CH₄@C₆₀ should have formula C₆₁H₄.
    • Compounds should not usually map to multiple formulas: if C links to two different formulas, one the same as A (from reference 1) and one the same as B (from reference 2), this implies C is isomeric with A, and C is isomeric with B, but A is not isomeric with B. This only makes sense if 1 and 2 disagree as to what the correct formula of C ought to be.
    • When references disagree, we may need to support multiple formulas. Historically, w:en:copper monosulfide was thought to have structure [Cu2+][S2-], corresponding to the formula CuS. It has now been assigned the structure [Cu+]₃[S2-][S₂-], which would correspond to Cu₃S₃. However, PubChem still has the old formula. We might want to update Wikidata to the new formula while also keeping the PubChem-referenced formula (with a note that it's not the correct formula).
    • Non-stoichiometric compounds, alloys, and mixtures of indeterminate composition are more complicated to support. E.g. pyrrhotite (Q421944) has formula Fe1-xS (x = 0 to 0.125). Rather than trying to support formula units with atom counts that are algebraic expressions (e.g. 1 - x), I think it would be easier if we could list the formulas of the endpoints: Fe₇S₈ and FeS. Similarly, superconducting yttrium barium copper oxide (Q414015) has formula YBa2Cu3O7−x (x = 0 to 0.65), with endpoint formulas YBa2Cu3O6.35 (i.e. Y20Ba40Cu60O127) and YBa2Cu3O7. I think it's hard to come up with a perfect solution though. InChI (P234) has similar issues for non-stoichiometric compounds: https://doi.org/10.1186/s13321-015-0068-4#Sec45.
    Preimage (talk) 17:47, 31 August 2024 (UTC)[reply]
  •   Support I also see more benefits than downsides. Support. Wostr I am not sure to understand how this would be a problem even for entities which could be described using different MF sequences of atoms like Q27260276#P274. Indeed the has part(s) (P527) and quantity (P1114) of the MF entity, see C₁₅H₂₀O₄ (Q129998552) would allow to efficiently retrieve such compounds represented in different MF notation systems. What would exactly be the inconvenient in this particular case? GrndStt (talk) 06:22, 29 August 2024 (UTC)[reply]
  •   Support, conditional on change of representation to molecular formula (Q188009). As noted in w:en:chemical formula#Types, chemical formula (Q83147) has four separate meanings: empirical formula (e.g. formaldehyde and glucose both have empirical formula CH₂O), molecular formula (e.g. urea and ammonium cyanate both have molecular formula CH₄N₂O in Hill notation, indicating they are isomers), structural formula (a graphical representation of the structure, not so relevant here), and condensed (or semi-structural) formula (e.g. urea has condensed formula CO(NH₂)₂ whereas ammonium cyanate has condensed formula [NH₄][OCN]). Molecular formulas "indicate the simple numbers of each type of atom in a molecule, with no information on structure", which is what we need for mass calculations. They also avoid the issue raised by Wostr regarding non-uniqueness of chemical formulas (e.g. NH₄NO₃ and H₄N₂O₃ are both valid formulas for ammonium nitrate), as each chemical should have a single canonical molecular formula in Hill notation (with the exception of rare cases where there is disagreement regarding structure, e.g. w:en:copper monosulfide). One last potential issue: molecular formulas are often defined as not including isotopes, e.g. PubChem lists both deuterated chloroform and chloroform as having molecular formula CHCl₃. Egon Willighagen's suggestion to have a subclass of [molecular] formulas with isotopic information would resolve this issue though, I think. Preimage (talk) 12:22, 29 August 2024 (UTC)[reply]
    Just revised the naming to change to molecular formula (Q188009), as suggested. 👍🏼 AdrianoRutz (talk) 07:16, 24 September 2024 (UTC)[reply]
  •   Oppose A chemical formula is an abstract entity and not one that has a mass.
It's worth noting that unicode can't capture all chemical formula and Mathematical expression could express more. ChristianKl16:29, 25 September 2024 (UTC)[reply]
You're wrong about that. Each chemical formula has a defined number of atoms of a defined number of elements. Although each element has multiple isotopes, for every element with stable isotopes there is a standard mass associated with it which is the atomic weight which will be found with a typical sample. So the molecular weight of a particular chemical formula very much can be expressed. David Newton (talk) 09:58, 27 September 2024 (UTC)[reply]
Currently, in Wikidata a chemical formula is a notation. Notations don't have inherent mass. The NCI description of what a chemical formula happens to be is "representation of a substance using symbols for its constituent elements". It's not the object that it's describing. While the object that a formula is describing can have mass the formula itself doesn't. It's a Document in NCI's ontology. In PROCO it's a quality and also not something that has mass. material entity (Q53617407) have mass and molecular formula (Q188009) isn't. ChristianKl12:47, 9 October 2024 (UTC)[reply]
The proposed items for formulas could make sense if we interpret the items as representing classes of those chemical entities that consist of the specified number of each element, regardless of bonding. Those underlying chemical entities do have a particular mass (up to some tiny difference due to mass-energy equivalence). 73.223.72.200 05:23, 16 December 2024 (UTC)[reply]
Indeed. The form of "mass" we are trying to capture is w:en:Mass (mass spectrometry)#Average mass, within which formulas do have inherent mass and isomers have exactly identical masses. Preimage (talk) 14:37, 16 December 2024 (UTC)[reply]
  •   Comment - This proposal strikes me as a hack to work around Wikibase's lack of support for computing properties. It seems more straightforward to update the software to compute the mass. As proposed, in the fairly common case that there's currently only one notable compound with a given formula, would we be creating an additional item just to hold properties like mass? That seems counterproductive wrt efficiency and redundancy. 73.223.72.200 05:23, 16 December 2024 (UTC)[reply]
    Understood. But we have been waiting over eight years for Wikibase to support computed properties.
    In the absence of computed properties, we should use a formula representation that makes downstream processing easier. E.g. suppose I want to set up property constraints to identify chemicals with molecular masses inconsistent with their molecular formulas. At present, chemical formula (P274) uses a string representation, which makes such processing more difficult than it ought to be. Switching to a molecular formula representation that links to elements using the has part(s) (P527) and quantity (P1114) properties would solve this problem.
    Another example: organic compounds with F mol% (excluding H) >= 30% are considered to be per- and polyfluoroalkyl substances (Q648037) under the EPA's PFASSTRUCTv5 definition. At present, to test this for a given organic compound, we need to (1) split the chemical formula (P274) string up into element-specific chunks, (2) parse each chunk, (3) combine chunks matching the same element, (4) compute the number of atoms excluding H, (5) compute the number of F atoms, and (6) compute the F mol% (excluding H). The proposal we are discussing here would allow us to skip steps 1–3. Preimage (talk) 13:00, 17 December 2024 (UTC)[reply]

Medicine

edit
Please visit Wikidata:WikiProject Medicine for more information. To notify participants use {{Ping project|Medicine}}

Mineralogy

edit
Please visit Wikidata:WikiProject Mineralogy for more information. To notify participants use {{Ping project|Mineralogy}}

Computer science

edit
Please visit Wikidata:WikiProject Informatics for more information. To notify participants use {{Ping project|Informatics}}

Geology

edit

Please visit Wikidata:WikiProject Geology for more information.

Geography

edit

‎SOIUSA code

edit
   Under discussion
DescriptionIdentifier of mountains, summits, mountain groups, etc. according to the International Standardized Mountain Subdivision of the Alps (SOIUSA)
RepresentsSOIUSA code (Q1628678)
Data typeExternal identifier
Template parametercodice in it:template:Montagna
Domainmountain, summits, mountain groups: summit (Q207326), mountain (Q8502), ridge (Q740445), ridge section (Q131521567), arête (Q1334383), back of a mountain (Q820144), mountain shoulder (Q15787792), ranks of the SOIUSA taxonomy: alpine main part (Q131311255), alpine major sector (Q3775635), alpine section (Q3958626), sector of alpine section (Q3958438), alpine subsection (Q3965305), sector of alpine subsection (Q3958440), alpine supergroup (Q3977906), sector of alpine supergroup (Q3958437), alpine group (Q3777462), sector of alpine group (Q131604769), alpine subgroup (Q514999), sector of alpine subgroup (Q3958436)
Allowed values
[I|II](/[A-C](-[1-36](/[A-B])?(\.[I|II|III|IV|V|VI|VII|VIII](/[A-B])?(-[A-F](/[a-z])?(\.[1-22](/[a-z])?(\.[a-z](/[a-z])?)?)?)?)?)?)?
Example 1Punta Sommeiller (Q2279001) → I/A-4.III-B.6.b
Example 2Hochgrat (Q459121) → II/B-22.II-B.5.b
Example 3Winterstaude (Q2585140) → II/B-22.I-B.6.b
Example 4Übelhorn (Q130718862) → II/B-22.II-D.12.a/b
Sourcehttps://it.wikipedia.org/wiki/Suddivisione_Orografica_Internazionale_Unificata_del_Sistema_Alpino
Planned useIndification where different geographic features of the Alps (mountains, summits, groups on different levels, sections, sectors, parts) belong from an orographic perspective
Number of IDs in source3498 (2 main parts + 5 major sectors + 36 sections + 132 subsections + 333 supergroups + 870 groups + 1625 subgroups + 31 sectors of sections + 30 sectors of subsections + 18 sectors of supergroups + 7 sectors of groups + 409 sectors of subgroups)
Expected completenesseventually complete (Q21873974)
Single-value constraintyes

Motivation

edit

The IDs are already used to a certain extend by different Wikimedia projects in articles on mountain ranges and mountains of the Alps. They are not only useful to locate a summit or mountain, but also to more easily identify duplicates as well as to distinguish different summits/mountains that have the same name.

-- Harald Hetzner

Discussion

edit

Linguistics

edit

Please visit Wikidata:WikiProject Linguistics for more information. To notify participants use {{Ping project|Linguistics}}

Mathematics

edit

Please visit Wikidata:WikiProject Mathematics for more information. To notify participants use {{Ping project|Mathematics}}

Material

edit

Please visit Wikidata:WikiProject Materials for more information. To notify participants use {{Ping project|Materials}}

Meteorology

edit

Glaciology

edit

Nutrition

edit