Simplify endian conversion functions in std.bitmanip. #8822

jmdavis · 2023-09-24T09:32:04Z

They're overly complex (e.g. testing for the native endianness of the machine when that's actually completely unnecessary), and they use unions in a manner that is undefined behavior in C/C++ (since they write to one field and then read from another). I don't see any mention of whether that behavior is defined in D in D's spec, but it's probably undefined in D given that it's undefined in C/C++. Either way, it's an overly complex solution.

This solution gets rid of all of the endian version checks in std.bitmanip, and it allows the implementations of the endian conversion functions to be the same between CTFE and runtime.

@atilaneves Assuming that I didn't screw up the implementation somehow, this should be much more to your liking than what's been here. Either way, it results in a lot less code to be screwed up. :)

dlang-bot · 2023-09-24T09:32:07Z

Thanks for your pull request, @jmdavis!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8822"

jmdavis · 2023-09-24T09:39:22Z

If I screwed up anything, it would probably be the floating point stuff, but it passes the tests (at least on my machine).

jmdavis · 2023-09-24T10:28:15Z

I find it somewhat suspicious that I had to change @safe to @trusted on the endian conversion functions so that the compilation for vibe-d in buildkite would stop failing due to complaining due to taking the address of a local variable given that the same instantiation succeeded on my machine, but whatever. At one point time, that definitely would have been @system, so it's probably some difference in compilation flags related to DIP 1000.

n8sh · 2023-09-27T14:15:58Z

I dislike that this PR makes DMD's output much worse without giving any user-visible benefit elsewhere. Prior to this PR DMD does the same thing as LDC.

assume  CS:.text._D3std8bitmanip__T20nativeToLittleEndianTmZQzFNaNbNiNfxmZG8h
        push    RBP
        mov     RBP,RSP
        mov     RAX,RDI
        pop     RBP
        ret

assume  CS:.text._D3std8bitmanip__T17nativeToBigEndianTmZQwFNaNbNiNfxmZG8h
        push    RBP
        mov     RBP,RSP
        mov     RAX,RDI
        bswap   EAX
        pop     RBP
        ret

Post this PR it does not.

assume  CS:.text._D3std8bitmanip__T20nativeToLittleEndianTmZQzFNaNbNiNexmZG8h
        push    RBP
        mov     RBP,RSP
        sub     RSP,020h
        mov     -018h[RBP],RBX
        mov     -8[RBP],RDI
        mov     qword ptr -010h[RBP],0
        mov     -010h[RBP],DIL
        shr     RDI,8
        mov     -0Fh[RBP],DIL
        mov     RAX,-8[RBP]
        shr     RAX,010h
        mov     -0Eh[RBP],AL
        mov     RCX,-8[RBP]
        shr     RCX,018h
        mov     -0Dh[RBP],CL
        mov     DL,-4[RBP]
        mov     -0Ch[RBP],DL
        mov     RBX,-8[RBP]
        shr     RBX,028h
        mov     -0Bh[RBP],BL
        mov     RSI,-8[RBP]
        shr     RSI,030h
        mov     -0Ah[RBP],SIL
        mov     RAX,-8[RBP]
        shr     RAX,038h
        mov     -9[RBP],AL
        mov     RAX,-010h[RBP]
        mov     RBX,-018h[RBP]
        mov     RSP,RBP
        pop     RBP
        ret

assume  CS:.text._D3std8bitmanip__T17nativeToBigEndianTmZQwFNaNbNiNexmZG8h
        push    RBP
        mov     RBP,RSP
        sub     RSP,020h
        mov     -018h[RBP],RBX
        mov     -8[RBP],RDI
        mov     qword ptr -010h[RBP],0
        shr     RDI,038h
        mov     -010h[RBP],DIL
        mov     RAX,-8[RBP]
        shr     RAX,030h
        mov     -0Fh[RBP],AL
        mov     RCX,-8[RBP]
        shr     RCX,028h
        mov     -0Eh[RBP],CL
        mov     DL,-4[RBP]
        mov     -0Dh[RBP],DL
        mov     RBX,-8[RBP]
        shr     RBX,018h
        mov     -0Ch[RBP],BL
        mov     RSI,-8[RBP]
        shr     RSI,010h
        mov     -0Bh[RBP],SIL
        mov     RAX,-8[RBP]
        shr     RAX,8
        mov     -0Ah[RBP],AL
        mov     CL,-8[RBP]
        mov     -9[RBP],CL
        mov     RAX,-010h[RBP]
        mov     RBX,-018h[RBP]
        mov     RSP,RBP
        pop     RBP
        ret

I'm only half-joking when I say mentioning this might increase the chance this PR is accepted, but so long as DMD is available for download on the front page of dlang.org I don't think that's the right attitude. This is going to harm anyone who is currently using DMD and std.bitmanip under the assumption—which until this moment has been valid—that it does something reasonable.

std/bitmanip.d

jmdavis · 2023-10-02T03:20:19Z

I dislike that this PR makes DMD's output much worse without giving any user-visible benefit elsewhere. Prior to this PR DMD does the same thing as LDC.

Well, the union needs to go, because while it happens to work, it's undefined behavior, and the compiler should be able to optimize bitshifts such that there is no performance change here. If I had to guess, the issue is that each bitshift is done individually (because that was easier to do generically) rather than as a single expression, though dmd's optimizer may just be bad at optimizing this particular operation.

I guess that I'll have to try it with a string mixin instead so that I can generate a single expression, much as the code is uglier that way.

jmdavis · 2023-10-02T03:42:39Z

Actually, the disassembly that you were showing was for the nativeTo*Endian functions, which need to assign to each byte individually (whereas the *EndianToNative functions could do the assignment from the bytes in one statement), so I don't think that that can be turned into a single statement with bitshifts, which would mean that the issue here is that dmd's optimizer isn't smart enough.

pbackus · 2023-10-15T00:00:09Z

Type punning with a union is not undefined behavior in C. See this footnote from the C11 standard:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning'').

They're overly complex (e.g. testing for the native endianness of the machine when that's actually completely unnecessary), and they use unions in a manner that is undefined behavior in C/C++ (since they write to one field and then read from another). I don't see any mention of whether that behavior is defined in D in D's spec, but it's probably undefined in D given that it's undefined in C/C++. Either way, it's an overly complex solution. This solution gets rid of all of the endian version checks in std.bitmanip, and it allows the implementations of the endian conversion functions to be the same between CTFE and runtime.

jmdavis requested a review from atilaneves September 24, 2023 09:32

jmdavis requested a review from andralex as a code owner September 24, 2023 09:32

jmdavis force-pushed the simplify_endian branch 4 times, most recently from 4707c4a to dcb6d97 Compare September 24, 2023 10:13

n8sh reviewed Sep 27, 2023

View reviewed changes

std/bitmanip.d Show resolved Hide resolved

thewilsonator added the Needs Rebase label Oct 18, 2024

jmdavis added 2 commits October 27, 2024 01:44

Add additional tests for nativeToLittleEndian.

c8d085c

jmdavis force-pushed the simplify_endian branch from dcb6d97 to c8d085c Compare October 27, 2024 08:00

dlang-bot removed the Needs Rebase label Oct 27, 2024

LightBender added the auto-merge-squash label Oct 27, 2024

LightBender approved these changes Oct 27, 2024

View reviewed changes

LightBender merged commit b6245b4 into dlang:master Oct 27, 2024
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify endian conversion functions in std.bitmanip. #8822

Simplify endian conversion functions in std.bitmanip. #8822

jmdavis commented Sep 24, 2023

dlang-bot commented Sep 24, 2023

jmdavis commented Sep 24, 2023

jmdavis commented Sep 24, 2023

n8sh commented Sep 27, 2023

jmdavis commented Oct 2, 2023

jmdavis commented Oct 2, 2023

pbackus commented Oct 15, 2023

Simplify endian conversion functions in std.bitmanip. #8822

Simplify endian conversion functions in std.bitmanip. #8822

Conversation

jmdavis commented Sep 24, 2023

dlang-bot commented Sep 24, 2023

Bugzilla references

Testing this PR locally

jmdavis commented Sep 24, 2023

jmdavis commented Sep 24, 2023

n8sh commented Sep 27, 2023

jmdavis commented Oct 2, 2023

jmdavis commented Oct 2, 2023

pbackus commented Oct 15, 2023