Add 'std.format.read.formattedRead' overloads to return a Tuple with values read #8647

iK4tsu · 2022-12-10T15:14:37Z

This enhancement was mentioned in a Discord discussion as a potentially better alternative to the existing std.file.slurp. Although std.file.slurp does a similar thing, it is restricted to reading from files. This feature could be extended to any input range and replace the currently implemented in std.file since that one "combines multiple things together that would be more useful if they were separate and composable" (quoted from @pbackus).

dlang-bot · 2022-12-10T15:14:39Z

Thanks for your pull request and interest in making D better, @iK4tsu! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8647"

iK4tsu · 2022-12-10T15:57:52Z

It seems the documentation is not style-compliant. How do I document the template and the function correctly? I tried to look for existing examples but couldn't find any.

pbackus · 2022-12-10T16:39:08Z

You need to document both the outer template and the inner function.

iK4tsu · 2022-12-10T19:07:47Z

update: autosquashed commits since CI passed

ljmf00

Interesting approach. I'm basically in favor of this. I think it's worth adding a changelog entry for this. Can you also please add a unittest for a CTFE formattedRead?

std/format/read.d

atilaneves · 2022-12-12T13:47:29Z

std/format/read.d

+    import std.typecons : tuple;
+
+    auto expected = tuple("hello", 124, 34.5);
+    auto result = "hello!124:34.5".formattedRead!("%s!%s:%s", string, int, double);


If the format string can be checked at compile-time, why would one need to pass in types?

The format string is not responsible for unformatting into the types the user wants, it's only to know the places in the input string that need to be unformatted and to tell the formatted the restrictions of the characters it's reading. Using %f can still be valid for both float and double for example, and it asserts statically when the types are not format-compatible.

float value; "123".formattedRead!"%d"(value); // Error: static assert: "incompatible format character for floating point argument: %d"

It also asserts on orphan format specifiers or if the number of arguments is higher than the required by the format string. This is useful for generic code when we don't have total control over the argument quantity or the format string. The format string might be generated during compile time causing it to assert if some malformation occurs, or the variadic template arguments might be calculated at compile time asserting for the same reasons.

I actually think I found a bug in checkFormatException for the %f format.

int i; "123".formattedRead!"%f"(i); // should static assert but instead throws at run time with: // std.format.FormatException@/dlang/dmd/linux/bin64/../../src/phobos/std/format/internal/read.d(104): Wrong unformat specifier '%f' for int

I actually think I found a bug in checkFormatException for the %f format.

int i; "123".formattedRead!"%f"(i); // should static assert but instead throws at run time with: // std.format.FormatException@/dlang/dmd/linux/bin64/../../src/phobos/std/format/internal/read.d(104): Wrong unformat specifier '%f' for int

Could you please file a bug report for this?

pbackus · 2022-12-12T15:44:22Z

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:

auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference

We need some other way of signalling an incomplete read in situations like this.

iK4tsu · 2022-12-12T16:08:11Z

We could extend the type to return the number of arguments read as the first tuple argument. But that would defeat the practicality of this, and kinda force the user to handle the value, even if it means mapping it out.

We also could flip the signature and add an optional out parameter for the number of arguments read. But for that, I think the user might as well just use the already existing overload that receives a tuple since they would have to handle the value separately anyway.

Tuple!(int, float) values;
if (!"1 2.125".formattedRead("%s %s", values) == 2) ...
...

Or perhaps adding a sentinel template argument for that: countReadArgs or something like that. But I think this other way to distinguish these cases should be something that does not take away the practicality of the implementation and should not get in the way of the user/be forced upon them.

pbackus · 2022-12-12T16:45:33Z

Another option is to throw an exception—iirc this is what slurp does.

iK4tsu · 2022-12-12T17:06:34Z

Not really what slurp does. It does throw but only when the format is malformed (a format exception) or if there are remaining characters in the current line being parsed.
This throws:

writeFile("deleteme", "123 ");
scope(exit) removeFile("deleteme");
"deleteme".slurp!(int)("%d");
// object.Exception@/dlang/dmd/linux/bin64/../../src/phobos/std/file.d(5191): Trailing characters at the end of line: ` '

This is runs fine:

writeFile("deleteme", "123");
scope(exit) removeFile("deleteme");
assert("deleteme".slurp!(int, string)("%d %s").equal([tuple(123, "")]));

ljmf00 · 2022-12-16T22:01:22Z

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:
auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference
We need some other way of signalling an incomplete read in situations like this.

I think the way is to throw an exception, but I would say to also provide an overload to allow the user to specify a tuple to avoid throwing and allocating on these cases.

pbackus · 2022-12-16T22:04:32Z

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

ljmf00 · 2022-12-17T16:22:04Z

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

Ok cool. Then I think if the user wants to use this prettier version, my suggestion is to raise an exception. If the user wants to use it for control flow they have alternatives and exceptions are zero cost if not raised, so it's fine to do it.

iK4tsu · 2022-12-19T18:09:24Z

There's also the possibility of making it an optional choice too. Instead of always raising an exception we can have a flag to disable it (this would match the current slurp behavior).

formattedRead(Flag!"exhaustive" exhaustive = Yes.exhaustive, ...)
{
    ...
    auto argsRead = .formattedRead(...);
    static if (exhaustive) enforce(argsRead == Args.length)
    ...
}

This would make the transition to deprecate slurp in favor of this easier I think.

iK4tsu · 2022-12-19T18:13:29Z

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:

Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"

The non-templated version executes just fine.

iK4tsu · 2022-12-20T10:37:16Z

With only an exception being thrown there's no real way to handle failures. The best I could think of is:

only("123 num", "123")
    .formattedRead!(int, string)("%d %s")
    .map!nullable
    .handle!(<ExceptionName>, RangePrimitive.front, (e, r) => typeof(r.front).init );
    // [Tuple!(int, string)(123, "num"), Nullable.null]

But if the exception also stored info on how many arguments were read and the tuple result then the user could handle it better if they wanted to. There may be a high possibility the user would like to keep the read arguments.

ljmf00 · 2022-12-23T22:59:40Z

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:
Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"
The non-templated version executes just fine.

File a bug report.

ljmf00 · 2022-12-23T23:02:02Z

@atilaneves we need to decide an acceptable design for this feature. What's your thoughts about the ideas we discussed above?

iK4tsu · 2023-01-04T21:53:24Z

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:
Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"
The non-templated version executes just fine.
File a bug report.

done: #8661

atilaneves · 2023-01-22T14:31:25Z

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

ljmf00 · 2023-01-23T11:10:41Z

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

I think it makes sense too. So you are in favor of the overall change with the particularity of using an exception, on these cases?

@iK4tsu can you add a changelog entry so we can get this in?

ljmf00 · 2023-01-23T11:13:25Z

Also, use an exception, as suggested by @atilaneves

iK4tsu · 2023-03-22T21:21:50Z

Sorry for the delay on this. Now both functions optionally throw a FormatException if the format range ends early and not all arguments are filled. This optional behavior is activated by default but can be ignored by passing No.exhaustive as an argument. Is this a good compromise, or is it best if the exception is always enforced?

iK4tsu · 2023-03-22T21:24:34Z

can you add a changelog entry so we can get this in?

I will add a changelog entry when the final design gets the green light. That way I don't have to update it every time I change the design here.

…ent 'fmt' Signed-off-by: João Lourenço <[email protected]>

…lues Signed-off-by: João Lourenço <[email protected]>

Signed-off-by: João Lourenço <[email protected]>

iK4tsu · 2023-03-22T21:26:31Z

update: rebased with master

pbackus · 2023-03-22T22:10:24Z

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

iK4tsu · 2023-03-23T12:38:48Z

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

The main usage is to comply with the current slurp implementation. Currently slurp does nothing if the range ends early and arguments are not filled. By following slurp's implementation, it can be replaced by this version of formattedRead without breaking changes.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call. With the overload, they would ignore the result but they would be forced to declare all default initialized variables beforehand. With the forced exception, they would be forced to use std.exception.handle.

By being optional it still follows the current design. The "error" value can be ignored, and all unfilled arguments remain unchanged, in this case, with their default initializers, while providing a compatible way to replace the alternative/s.

One issue remains IMO if the exception is indeed thrown and caught. The user knows something happened with the format, but not in specific. If it failed due to unfilled arguments, the user won't know it - by the FormatException - nor how many were filled. Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception. Just throwing an exception doesn't really match what can be done with the overload.

pbackus · 2023-03-23T14:23:32Z

The main usage is to comply with the current slurp implementation.

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call.

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception.

I think the plain FormatException is fine. The purpose of this new overload is to make things more convenient in the common case, not to cover every possible use case. Most of the time, the outcomes you care about are "all arguments read successfully" and "something went wrong." If a user wants more detailed information, they can use the existing overloads.

iK4tsu · 2023-03-27T10:35:05Z

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

or any slurp usage.

But once again, this was more to facilitate the transition from one to another and to keep the existing nonthrowing logic of the other overloads.

pbackus · 2023-03-27T17:32:55Z

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

As far as I know there is no plan to deprecate or remove slurp.

If you mean, would it make it harder to refactor slurp to use the new formattedRead overloads internally—yes, it would. But I would rather have a good API for future users of the new formattedRead overloads and force slurp to use a workaround than have a good API for slurp and force future users of formattedRead to avoid a footgun (which is what No.exhaustive would be).

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

In other words: situations where users write buggy code because the API makes it easy to do the wrong thing. :)

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

iK4tsu · 2023-03-28T10:09:29Z

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

Fair enough. I will change it and create a changelog entry when I have a bit of free time.

ljmf00

Looks good. Fix the style issues reported by buildkite

LightBender · 2024-10-27T07:06:47Z

@iK4tsu if you can resolve the style issues found by buildkite, we'll put the auto-merge tag on this.

iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 5b7cb0c to 1870b04 Compare December 10, 2022 19:07

ljmf00 reviewed Dec 10, 2022

View reviewed changes

ljmf00 added Needs Changelog A changelog entry needs to be added to /changelog Needs Approval Atila Neves labels Dec 10, 2022

iK4tsu commented Dec 11, 2022

View reviewed changes

std/format/read.d Outdated Show resolved Hide resolved

atilaneves reviewed Dec 12, 2022

View reviewed changes

ljmf00 requested a review from atilaneves January 5, 2023 18:50

iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 1870b04 to 4489cd8 Compare March 22, 2023 21:17

iK4tsu added 3 commits March 22, 2023 21:26

refactor(formattedRead): add not is type validation to template argum…

7bcb86b

…ent 'fmt' Signed-off-by: João Lourenço <[email protected]>

feat(formattedRead): add overloads to return a tuple with the read va…

76fc185

…lues Signed-off-by: João Lourenço <[email protected]>

test(formattedRead): add unnittests for the tuple return type overloads

71f05e6

Signed-off-by: João Lourenço <[email protected]>

iK4tsu force-pushed the feat-formattedRead-tuple-return branch from 4489cd8 to 71f05e6 Compare March 22, 2023 21:26

ljmf00 approved these changes Apr 10, 2023

View reviewed changes

LightBender added Salvage This PR needs work, but we want to keep it and it can be salvaged. and removed Atila Neves labels Oct 27, 2024

Add 'std.format.read.formattedRead' overloads to return a Tuple with values read #8647

Are you sure you want to change the base?

Add 'std.format.read.formattedRead' overloads to return a Tuple with values read #8647

Conversation

iK4tsu commented Dec 10, 2022

dlang-bot commented Dec 10, 2022

Bugzilla references

Testing this PR locally

iK4tsu commented Dec 10, 2022

pbackus commented Dec 10, 2022

iK4tsu commented Dec 10, 2022

ljmf00 left a comment

Choose a reason for hiding this comment

atilaneves Dec 12, 2022

Choose a reason for hiding this comment

iK4tsu Dec 12, 2022

Choose a reason for hiding this comment

iK4tsu Dec 12, 2022 • edited Loading

Choose a reason for hiding this comment

ljmf00 Dec 16, 2022

Choose a reason for hiding this comment

pbackus commented Dec 12, 2022

iK4tsu commented Dec 12, 2022 • edited Loading

pbackus commented Dec 12, 2022

iK4tsu commented Dec 12, 2022 • edited Loading

ljmf00 commented Dec 16, 2022

pbackus commented Dec 16, 2022

ljmf00 commented Dec 17, 2022

iK4tsu commented Dec 19, 2022 • edited Loading

iK4tsu commented Dec 19, 2022 • edited Loading

iK4tsu commented Dec 20, 2022

ljmf00 commented Dec 23, 2022

ljmf00 commented Dec 23, 2022

iK4tsu commented Jan 4, 2023

atilaneves commented Jan 22, 2023

ljmf00 commented Jan 23, 2023

ljmf00 commented Jan 23, 2023

iK4tsu commented Mar 22, 2023

iK4tsu commented Mar 22, 2023

iK4tsu commented Mar 22, 2023

pbackus commented Mar 22, 2023

iK4tsu commented Mar 23, 2023

pbackus commented Mar 23, 2023

iK4tsu commented Mar 27, 2023

pbackus commented Mar 27, 2023

iK4tsu commented Mar 28, 2023

ljmf00 left a comment

Choose a reason for hiding this comment

LightBender commented Oct 27, 2024 • edited Loading

iK4tsu Dec 12, 2022 •

edited

Loading

iK4tsu commented Dec 12, 2022 •

edited

Loading

iK4tsu commented Dec 12, 2022 •

edited

Loading

iK4tsu commented Dec 19, 2022 •

edited

Loading

iK4tsu commented Dec 19, 2022 •

edited

Loading

LightBender commented Oct 27, 2024 •

edited

Loading