Page MenuHomePhabricator

Support for Chemical Markup Language
Open, LowPublicFeature

Assigned To
None
Authored By
bzimport
Nov 29 2008, 1:34 AM
Referenced Files
F56203759: image.png
Jul 3 2024, 5:06 PM
F56203756: image.png
Jul 3 2024, 5:06 PM
F56132905: image.png
Jul 1 2024, 5:02 PM
F56132888: image.png
Jul 1 2024, 5:02 PM
Tokens
"Mountain of Wealth" token, awarded by Sj."Like" token, awarded by Liuxinyu970226."Love" token, awarded by WalterKlossew."Like" token, awarded by Pine.

Description

Author: Eugene.Zelenko

Description:
Please allow uploading of files in Chemical Markup Language format (http://cml.sourceforge.net).

There is also Java-based free software Jmol viewer (http://jmol.sourceforge.net) for CML and extension for MediaWiki is already implemented (http://wiki.jmol.org/index.php/MediaWiki). Extension was implemented for 1.12 and some security concerns exists.


Version: unspecified
Severity: enhancement

Details

Reference
bz16491

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:24 PM
bzimport set Reference to bz16491.
bzimport added a subscriber: Unknown Object (MLST).

At a quick glance, it looks like the extensions may be exposing a bit too much of the nuts and bolts. IMHO, the ideal syntax for Wikimedia use would be similar to the <math> or <timeline> tags: just "<cml>CML MARKUP GOES HERE</cml>". There's no need to expose details like whether to use a signed or unsigned applet: such things are for the server admins to configure.

A CML/MDL media handler might be more convenient than a parser tag extension, anyway. Then you'd just upload the file and use it as if it were an image. From what I've seen, CML markup doesn't really look like something most people would want to edit by hand, at least no more so than SVG is.

Viewing the docs, javascript embedding is feature.
I think it should be rewritten as a media handler.

I'd tend to agree on the media handler thing. The extension syntax seems much more verbose and complicated than it needs to be, and seems to be bloated up with random features like extra UI scripting.

That said I'd generally recommend against too many application-specific formats like this which will make it more difficult for third-party users of Wikipedia material to support stuff.

Please also consider issues with printing and non-Java fallback displays.

The docs indicate it should be possible to run Jmol on the server side for static thumbnail generation. Alternatively, OpenRasMol might also be usable as a server side rasterizer.

In general, I think this should be treated a lot like the Cortado media player: users with Java get a nice little inline viewer, while the rest get a thumbnail and the ability to view the files using a browser plugin (like Chime) or download them for local viewing.

All that said, it does occur to me that there's one important difference between video clips and chemical structure data. Video files are "WYSIWYG"; they really only have one "normal" rendering, which the file format unambiguously specifies. Chemical structure data formats (like CML, MDL, PDB, etc.), on the other hand, generally just specify an abstract set of spatial data that can be rendered in a number of different ways. For RasMol-style renderers (which includes Jmol), this extra presentational information is effectively supplied via a scripting language that controls the viewer.

One possibility, if we want to go the media handler way, might be to add an extra "script=" parameter to the image link syntax, supporting a limited subset of the RasMol scripting language. Of course, we'd presumably have to parse the code and validate it, especially if we were to use it for server side rasterization. Even so, I'd prefer to use something based on RasMol script syntax rather than inventing our own, if only because people are likely to be familiar with it, and because there are probably tools out there that generate it. It's also pretty easy to read and edit,at least as long as you don't get too clever with it (which is about what one could say of wiki markup, too).

That said, both thumbnailing and scripting are really second-step features. The first step would be to get the media handler working at all.

Eugene.Zelenko wrote:

May be CML support will complicate life of MediaWiki developers, but from other side it'll definitely simplify life of those who maintain chemistry related topics on Commons. Different representations of same molecule and views from different angles in different files could be avoided (see http://commons.wikimedia.org/wiki/Category:Ethanol or http://commons.wikimedia.org/wiki/Category:Benzene as example).

Of course, chemistry is relatively small area in compare with Harry Potter, Pokemon, etc. :-) but definitely important one to justify support for special file format.

Gerard.meijssen wrote:

(In reply to comment #2)

Anyway, it's definitely not ready for prime time, security-wise:
http://wiki.jmol.org/index.php/User:Ilmari_Karonen/JS_injection_demo

This problem has been fixed. Thanks, GerardM

nicove wrote:

I have started working again on the Jmol extension.

I agree that the media handler would be really nice, and a lot easier to use than the existing jmol tag.
Creating a media handler requires quite some work and to understand how MediaWiki code is working internally for several points. I decided to start with an easier solution, and to work on the media handler later.

I have worked on a tag much easier to use than the existing jmol tag. For example, you can now use <jmolFile>Chair.cml</jmolFile> to have access to a popup window containing a Jmol applet displaying the molecular file uploaded as File:Chair.cml.
Working example can be seen at http://wiki.jmol.org/index.php/MediaWiki/Basic_Example

What do you think of this ?

I still want to add a few thing to this new tag : ability to use a preview image instead of a link (either generated on the server if I manage to do this, or using a static image), maybe allow a Jmol script to setup the display options, ...

Should we think of this feature as a VisualEditor plugin?

Eugene.Zelenko wrote:

If this will help to finally add visualization, you could do this :-)

But really this is data visualization, not editing. May be will be better to consider CML support as part of Media Viewer development?

CCing Fabrice to weigh in ref Media Viewer.

The Facebook Open Academy Program [1] is interested in this project, but we would need a technical mentor.

[1] http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073226.html

(In reply to comment #12)
I could help with reviewing JavaScript (if you permit me) and also have some understanding of the requirements (Chemical markup, Wikipedia, Commons) but I am lacking PHP-skills as well general practice in MW-development. Let me know if I can help!

Thank you for the offer, Rainer.

We still need to confirm the best approach to provide this feature. Currently this report is still sitting in MediaWiki/Unknown. CCing Multimedia devs to get more feedback.

vladjohn2013 wrote:

Hi, this project is still listed at https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Accessibility_for_the_colour-blind

Should this project be still listed in that page? If not, please remove it. If it still makes sense, then it could be moved to the "Featured projects" section if it has community support and mentors.

[[:mw:User:Rillke/Chemical Markup support for Wikimedia Commons]]

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

I guess security review (T66548) isn't carried out by a GSoC student?

A recent comment on mw.org brought up the request to support the .pdb file format for rendering chemical files. Mentioning it here for consideration.

https://www.mediawiki.org/wiki/Topic:Tpo0eyh0y9lnmd1c

If I say something wrong, correct me.
I do not know at what point the project has arrived, I see that since 2015 there are not many comments anymore. I was wondering if you could take inspiration from the already existing extension [[ https://www.mediawiki.org/wiki/Extension:Jmol | Extension:Jmo]]in order to allow the Support for Chemical Markup Language to proceed?

Developers could just integrate the extension into Wikimedia wikis, but this needs to be discussed with WMF first I think?

yes, sorry my suggestion was not out of malice or to copy the project in a petty way. I tried to contact them and I am attaching the discussion to you here, hoping you will be able to work together to bring the project forward and make it available.

Talk: https://www.mediawiki.org/wiki/Extension_talk:Jmol

the fact is that from how it seems to me to have understood anyway an extension must follow some controls and maybe if contacted by people more expert than me in these things could be able to make available the extension or collaborate with this project to get something more

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM

Actually, I'm closing this.

Hi @Kizule why are you closing this? It's an unencumbered format.

While MolHandler may not be a current way to render CML, we should support a future viewer for it, and allow uploading CML files in the meantime.

Yup, we need the CML support in Wikipedia, something I hope to pick up after thumbor getting upgraded.

Sorry, I was doing work for archiving MolHandler and StlHandler extensions, as per checklist available in each of tasks (T299158: Archive the MolHandler extension and T299161: Archive the StlHandler extension).

And I got confused, but thanks for checking this and reopening.

Ok, thanks -- and thanks for the essential archiving work

We have 1,362,174 uses of InChI according to Wikidata, https://www.wikidata.org/wiki/Special:WhatLinksHere/Property:P234

image.png (1×1 px, 298 KB)

So I was thinking what if we alternatively go for InChI in a manner similar to <math> and Mathoid, it doesn't support interactive 3d view at least right now, sure, but at least it can be a starter till we can do something nice with it over the time. InChI seems is backed by IUPAC https://iupac.org/who-we-are/divisions/division-details/inchi/ and other governmental entities https://iupac-inchi.github.io/InChI-Web-Demo/ and the textual representation is better than XML that can be subject of known headaches such as XML bomb https://en.wikipedia.org/wiki/Billion_laughs_attack and jmol or jsmol implementation doesn't seem like things we can easily go forward with right now but InChi https://github.com/IUPAC-InChI/InChI is implemented with C is getting fuzzed by oss-fuzz and seems like kind of implementation we can forward with so somehow I'm suggesting pivoting the idea or, keeping .cml as a wish but exploring possibility of representation of ImChI in Wikipedia and Wikidata.

image.png (799×1 px, 146 KB)

CDK https://github.com/cdk/cdk supports different formats so I've deployed it on tools, see https://chemistoid.toolforge.org/InChI=1S/C6H6/c1-2-4-6-5-3-1 and it's source https://github.com/ebraminio/Chemistoid I'm planning to create a PoC MediaWiki extension to understand with something like this <InChI>1S/C6H6/c1-2-4-6-5-3-1</InChI> syntax, similar to math syntax and Interactivity can be provided later by JSMol, if is anyone interested on this work.

And here is a very initial and very basic MediaWiki extension to use Chemistoid, https://github.com/ebraminio/InChI

image.png (1×1 px, 107 KB)

It even works on dark mode but CDK generates an extra white background which is turned into black which I think I can for the improving the upstream if is there any interest,

image.png (998×1 px, 105 KB)

The syntax is {{#InChI:1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3}} just in case you want to test the extension somewhere.

@Physikerwelt Hi! This proposal (T18491#9950713) is really similar to how math extension works. I wonder if you have some insights about this.

@Physikerwelt Hi! This proposal (T18491#9950713) is really similar to how math extension works. I wonder if you have some insights about this.

Not really. The math extension defines a grammar and converts the input string to HTML5 (MathML). It also provides image fallback for browsers that don't support MathML. This works within MediaWiki and Wikidata. In addition, the Math extension supports chemical sum formulae, e.g. H2O. If there is a way to convert this syntax <InChI>1S/C6H6/c1-2-4-6-5-3-1</InChI> to HTML with a PHP script, I think this could also be integrated into the Math extension. However, using external libraries seems to be a dead end to me in terms of maintainability.