Open
Bug 976414
(LSan)
Opened 11 years ago
Updated 3 months ago
[meta] Run LeakSanitizer
Categories
(Core :: Sanitizers, defect)
Core
Sanitizers
Tracking
()
NEW
People
(Reporter: mccr8, Unassigned)
References
(Depends on 27 open bugs, )
Details
(Keywords: meta, Whiteboard: [MemShrink:meta])
Attachments
(2 files, 4 obsolete files)
Apparently ASAN has this LeakSanitizer thing, which I'm pretty sure we don't run.
https://code.google.com/p/address-sanitizer/wiki/LeakSanitizer
It is described as "experimental", which does not fill me with great hope, but it sounds like the overhead is entirely at the end of the run, so if it works we might be able to turn it on. But anyways, just running it and seeing what happens might produce some useful result.
Comment 1•11 years ago
|
||
This is great! Thanks for doing this.
Reporter | ||
Comment 2•11 years ago
|
||
Looks like Chrome runs it on their equivalent of TBPL:
http://www.chromium.org/developers/testing/leaksanitizer
Comment 3•11 years ago
|
||
I can help with integrating this if needed. In bug 957865 I'm currently working on upgrading the Clang we use for ASan so we can also do TSan with it. We can probably also use it for LSan then.
Comment 4•11 years ago
|
||
This is going to give us a ton more leak testing coverage! Definitely worth trying.
Whiteboard: [MemShrink] → [MemShrink:P1]
Reporter | ||
Comment 5•11 years ago
|
||
I'll try to look into this a bit, but if somebody else is excited about working on it let me know! :)
Alias: LSAN
Assignee: nobody → continuation
Reporter | ||
Comment 6•11 years ago
|
||
As far as I can see the instructions for using LSAN are:
1) Build a normal ASAN build.
2) Run as you would ASAN, except also set ASAN_OPTIONS="detect_leaks=1".
So let's see what happens: https://tbpl.mozilla.org/?tree=Try&rev=9dbbdc04ca04
Reporter | ||
Comment 7•11 years ago
|
||
The mochitests all seemed to fail after about 4 minutes with this:
18:27:23 INFO - INFO | runtests.py | ASan running in low-memory configuration
18:27:23 INFO - ==2416==Could not attach to thread 2413 (errno 1).
18:27:23 INFO - =================================================================
18:27:23 INFO - ==2413==ERROR: LeakSanitizer: detected memory leaks
18:27:23 INFO - Direct leak of 24 byte(s) in 1 object(s) allocated from:
18:27:23 INFO - #0 0x44a235 in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:74
18:27:23 INFO - #1 0x7ff6e3f44cab in PORT_Alloc_Util /builds/slave/try-l64-asan-00000000000000000/build/security/nss/lib/util/secport.c:86
18:27:23 INFO - #2 0x7ff6e072f0f7 (+0x5550f7)
18:27:23 INFO - #3 0x7ff6e072ee60 (+0x554e60)
18:27:23 INFO - #4 0x7ff6e07321f3 (+0x5581f3)
18:27:23 INFO - SUMMARY: LeakSanitizer: 24 byte(s) leaked in 1 allocation(s).
18:27:23 WARNING - TEST-UNEXPECTED-FAIL | runtests.py | Certificate integration failed
I have no idea what that means. It looks like maybe the MochitestServer failed to launch?
Comment 8•11 years ago
|
||
The error is coming from <http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#837>
which means certutil failed to launch (most likely), see line 1405 same file.
Reporter | ||
Comment 9•11 years ago
|
||
Crashtests did seem to produce some kind of useful result:
SUMMARY: LeakSanitizer: 11176265 byte(s) leaked in 59174 allocation(s).
CPP also produced a lot of things, but I assume we don't do any leak checking there right now, so maybe this is just bogus stuff that needs be cleaned up under whatever this is. I see a lot of stacks with things like nsComponentManagerImpl::RegisterModule.
Note that these are going to be a little high, because I should really also set MOZ_CC_RUN_DURING_SHUTDOWN so we run the GC and CC at shutdown.
Reporter | ||
Comment 10•11 years ago
|
||
With shutdown CCs, hopefully crash tests will have less leaks:
https://tbpl.mozilla.org/?tree=Try&rev=18b2e1d52c09
Reporter | ||
Comment 11•11 years ago
|
||
My wild guess is that the certutil thing runs, then leaks, so LSAN makes it return a failure, which the harness detects, then it bails out. It would be nice if the stack was a little more symbolicated. That line in PORT_Alloc is indeed an allocation. But maybe as a hack I can have it not do LSAN for certutil...
Comment 12•11 years ago
|
||
(In reply to comment #11)
> My wild guess is that the certutil thing runs, then leaks, so LSAN makes it
> return a failure, which the harness detects, then it bails out. It would be
> nice if the stack was a little more symbolicated. That line in PORT_Alloc is
> indeed an allocation. But maybe as a hack I can have it not do LSAN for
> certutil...
Yeah, actually I think at least on the beginning we should only set the env var for firefox, and not even the xpcshell which runs the HTTP server. That's really what matters the most.
Reporter | ||
Comment 13•11 years ago
|
||
Yeah, I didn't intend to set it for the server xpcshell, but I may have.
Depends on: 979721
Reporter | ||
Comment 14•11 years ago
|
||
I went through the log for crash tests. I didn't find anything particularly interesting, except that I'd say about 95% of the leaks it found are from WebRTC. I'll probably have to add a suppression for anything with webrtc in the path.
Comment 15•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #14)
> I went through the log for crash tests. I didn't find anything particularly
> interesting, except that I'd say about 95% of the leaks it found are from
> WebRTC. I'll probably have to add a suppression for anything with webrtc in
> the path.
That doesn't sound very encouraging. Can you file bugs for the leaks so we can fix them, or are there simple too many?
Given the number it's not clear that filing a bug for every reported leak is useful. The best path forward is probably for someone with (or willing to acquire) a passing familiarity with webrtc to go through the list and figure out what to do with them.
Comment 17•11 years ago
|
||
Maybe Randell? :)
Reporter | ||
Comment 18•11 years ago
|
||
Here's the LSAN report from crash tests, with the timing gunk removed and the paths shortened a bit for improved legibility.
Comment 19•11 years ago
|
||
> Here's the LSAN report from crash tests
My kingdom for some whitespace!
Comment 20•11 years ago
|
||
More seriously, this is good news! Because we can fix many of these.
mccr8, do you think you could go through them and put them in logical groups, in a form that's a little easier to read? Then you could maybe do one follow-up bug per group. Emailing dev-platform would also be a good idea, to get more eyes on the leaks.
And I don't understand the nsThread::ProcessNextEvent() leaks, but I'd like to. They're big, and seem bad, but maybe inlining is making it hard to see exactly what's going on?
Comment 21•11 years ago
|
||
Also, it might be worthwhile suppressing these right now so we can get LSAN running all the time, and then get the important ones fixed at a more leisurely pace.
Reporter | ||
Comment 22•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #20)
> mccr8, do you think you could go through them and put them in logical
> groups, in a form that's a little easier to read? Then you could maybe do
> one follow-up bug per group. Emailing dev-platform would also be a good
> idea, to get more eyes on the leaks.
It is still a little early to ask for broader participation. We need to get suppression figured out, and if there's any way to deal with the inlining issue, so people can actually figure out what is happening. I'd also like to get mochitests working first, so there's a little more meat to chew on. I've been going through the logs and filing things that look actionable that I've seen so far.
> And I don't understand the nsThread::ProcessNextEvent() leaks, but I'd like
> to. They're big, and seem bad, but maybe inlining is making it hard to see
> exactly what's going on?
Yeah, inlining is a big problem for understanding these stacks.
(In reply to Nicholas Nethercote [:njn] from comment #21)
> Also, it might be worthwhile suppressing these right now so we can get LSAN
> running all the time, and then get the important ones fixed at a more
> leisurely pace.
Yes, once I figure out how to modify the test harness to do that. You have to stick the path to the file in some environment variable, and I need to figure out what exactly that path should be, given a particular file in the tree.
There's also an issue of getting the test harness to produce slightly more useful errors on TBPL than a crash with a weird error code.
Comment 23•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #22)
>
> Yes, once I figure out how to modify the test harness to do that. You have
> to stick the path to the file in some environment variable, and I need to
> figure out what exactly that path should be, given a particular file in the
> tree.
For TSan we basically need to solve the same problem. Passing the environment variable is easy (see where we pass ASAN_SYMBOLIZER or ASAN_OPTIONS for example). However, even if the file is part of the tree, it won't be available on the testers, since they don't have the source tree. I guess we would have to package the file to make it available on the testers.
As for the location, we already have build/sanitizers/ for such files, it currently contains the TSan suppression list.
Comment 24•11 years ago
|
||
And build/valgrind/*.sup are the Valgrind suppression files, in case that's helpful.
Reporter | ||
Comment 25•11 years ago
|
||
Reporter | ||
Comment 26•11 years ago
|
||
Reporter | ||
Comment 27•11 years ago
|
||
The cert DB patch seemed to work locally for running Mochitests so let's see what we get.
https://tbpl.mozilla.org/?tree=Try&rev=8f7fc216e81a
Reporter | ||
Comment 28•11 years ago
|
||
Reporter | ||
Comment 29•11 years ago
|
||
With this suppression list, and the patches in the blocking bugs, M1 only reports the following leaks:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x440995 in malloc (/home/amccreight/ff-dbg-asan/dist/bin/plugin-container+0x440995)
#1 0x7f023901ee08 in moz_xmalloc /home/amccreight/mc/memory/mozalloc/mozalloc.cpp:52
SUMMARY: LeakSanitizer: 24 byte(s) leaked in 1 allocation(s).
Indirect leak of 3297 byte(s) in 230 object(s) allocated from:
#0 0x440f25 in malloc (/home/amccreight/ff-dbg-asan/dist/bin/firefox+0x440f25)
#1 0x7fb4f75959b9 in __GI___strdup (/usr/lib/libc.so.6+0x819b9)
SUMMARY: LeakSanitizer: 3297 byte(s) leaked in 230 allocation(s).
Only one of the suppressions, NS_InvokeByIndex, is cheating.
Reporter | ||
Comment 30•11 years ago
|
||
I wrote a python script that extracts LSAN info from a local Mochitest run:
https://github.com/amccreight/moz-lsan-tools
Comment 31•11 years ago
|
||
> With this suppression list, and the patches in the blocking bugs, M1 only
> reports the following leaks:
>
> Direct leak of 24 byte(s) in 1 object(s) allocated from:
> #0 0x440995 in malloc
> (/home/amccreight/ff-dbg-asan/dist/bin/plugin-container+0x440995)
> #1 0x7f023901ee08 in moz_xmalloc
> /home/amccreight/mc/memory/mozalloc/mozalloc.cpp:52
> SUMMARY: LeakSanitizer: 24 byte(s) leaked in 1 allocation(s).
>
> Indirect leak of 3297 byte(s) in 230 object(s) allocated from:
> #0 0x440f25 in malloc
> (/home/amccreight/ff-dbg-asan/dist/bin/firefox+0x440f25)
> #1 0x7fb4f75959b9 in __GI___strdup (/usr/lib/libc.so.6+0x819b9)
> SUMMARY: LeakSanitizer: 3297 byte(s) leaked in 230 allocation(s).
Are those the full stacks? If so, that's not very helpful :(
Looking at the suppressions file, does a leak get suppressed if any of its stack entries match any of the lines in the file? If so, that's crude enough to make me uncomfortable -- Valgrind's suppressions can match against entire stack traces or sub-traces, so the likelihood of suppressing the wrong thing is much smaller.
Having said that, why is __GI___strdup in the second stack when it's also in the suppressions file?
Reporter | ||
Comment 32•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #31)
> Are those the full stacks? If so, that's not very helpful :(
Yes. My guess is that these are calls from external libraries that aren't built with ASAN so there are no symbols. Decoder may have some idea of how to extract more information.
> If so, that's crude enough to make me uncomfortable
Yes, it is not good. They could be made a little better, but judging from Chrome's suppression file, there's nothing better than per-frame matching.
http://src.chromium.org/viewvc/chrome/trunk/src/tools/lsan/suppressions.txt
The documentation is extremely thin, though, so I'll have to read through the ASAN source code to see if there's anything better. Maybe we can write some patches to make it better in LSAN.
> Having said that, why is __GI___strdup in the second stack when it's also in
> the suppressions file?
I added it to the suppressions file because it was showing up, but LSAN fails to actually suppress it. Though you'll note in the file it is actually commented out. ;) But it wasn't when I did the run.
Reporter | ||
Comment 33•11 years ago
|
||
try run with unlanded fixes for bug 979928, bug 980625, bug 981167 from various other people:
https://tbpl.mozilla.org/?tree=Try&rev=a9a80c84b8b5
Thanks for all of the quick fixes!
Reporter | ||
Comment 34•11 years ago
|
||
According to the design document ( https://code.google.com/p/address-sanitizer/wiki/LeakSanitizerDesignDocument ) LSAN works by doing some kind of conservative scanning of memory, rather than just tallying blocks that have been allocated but not freed: "LSan scans this memory for byte patterns that look like pointers to heap blocks (interior pointers are treated the same as pointers to the beginning of a block). Those blocks are considered reachable and their contents are also treated as live memory. In this manner we discover all blocks reachable from the root set. We mark such blocks by setting a flag in the block's metadata. We then iterate over all existing heap blocks and report unreachable blocks as leaks, together with stack traces of corresponding allocations. Instead of reporting each allocation individually, it might be more useful to aggregate them by the last k frames in the stack trace."
Reporter | ||
Comment 35•11 years ago
|
||
New push with a couple of leaks fixed, frame pointers enabled (for better stacks), and using the patch I pushed to TBPL to make LSAN failures a little snazzier looking:
https://tbpl-dev.allizom.org/?tree=Try&rev=413a1ce105d7
In my local M1 run, this seemed to fix all leaks aside from the stuff I have suppressed. From reading Chrome's documentation, installing and using debug versions of these libraries should give more insight into what is happening with those leaks.
Reporter | ||
Updated•11 years ago
|
Whiteboard: [MemShrink:P1] → [MemShrink:meta]
Reporter | ||
Comment 36•11 years ago
|
||
Here's what I have so far. I still haven't gotten the suppressions file set up properly for CJR Cpp or X, but it seems to be working for Mochitests.
Mochitests are still orange because I haven't given up on figuring out how to get better stacks for these clipped stacks that call into system libraries.
Reporter | ||
Comment 37•11 years ago
|
||
Oops, I meant to post that in bug 988041.
Reporter | ||
Updated•11 years ago
|
Attachment #8386927 -
Attachment is obsolete: true
Reporter | ||
Updated•11 years ago
|
Attachment #8386928 -
Attachment is obsolete: true
Reporter | ||
Updated•11 years ago
|
Attachment #8396813 -
Attachment is obsolete: true
Reporter | ||
Updated•11 years ago
|
Attachment #8387718 -
Attachment is obsolete: true
Comment 38•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #32)
> (In reply to Nicholas Nethercote [:njn] from comment #31)
> > Are those the full stacks? If so, that's not very helpful :(
> Yes. My guess is that these are calls from external libraries that aren't
> built with ASAN so there are no symbols. Decoder may have some idea of how
> to extract more information.
Usually the last frame of the stack trace is the library which doesn't have frame pointers (not symbols). In that __GI_strdup stack trace it's libc.
You can try running with ASAN_OPTIONS=fast_unwind_on_malloc=0 to get complete stack traces. It'll help you see where those particular leaks are coming from. It's not a good idea to always run in this mode because the slowdown will be huge.
On an unrelated note, we have a rudimentary API in <sanitizer/lsan_interface.h> which you might find useful.
> > Having said that, why is __GI___strdup in the second stack when it's also in
> > the suppressions file?
>
> I added it to the suppressions file because it was showing up, but LSAN
> fails to actually suppress it.
That's not supposed to happen. Can you follow up with a reproducer?
Reporter | ||
Comment 39•11 years ago
|
||
(In reply to Sergey Matveev from comment #38)
> Usually the last frame of the stack trace is the library which doesn't have
> frame pointers (not symbols). In that __GI_strdup stack trace it's libc.
> You can try running with ASAN_OPTIONS=fast_unwind_on_malloc=0 to get
> complete stack traces. It'll help you see where those particular leaks are
> coming from. It's not a good idea to always run in this mode because the
> slowdown will be huge.
Thanks, I'll try that.
> That's not supposed to happen. Can you follow up with a reproducer?
Sure. I worked around it somehow, but I don't remember what I did. Hopefully I'll get a chance to start looking at LSAN again in the next week or two.
Reporter | ||
Comment 40•11 years ago
|
||
I wrote a little blurb on LeakSanitizer on the MDN AddressSanitizer page.
Reporter | ||
Comment 41•11 years ago
|
||
I've also written a few little Python scripts for extracting LSan results from Mochitest logs, parsing a suppression file, and checking what the effect of the suppression file is on a given set of LSan results.
https://github.com/amccreight/moz-lsan-tools
Reporter | ||
Updated•11 years ago
|
Alias: LSAN → LSan
Reporter | ||
Updated•9 years ago
|
Depends on: asan-clang3.7
Comment 42•8 years ago
|
||
Bug 1246801 might not need to block this — it's for LSan support in NSS's own build/testsuite/CI/etc. (When building NSS as part of Firefox, the sanitizer build flags carry over without needing any special support from NSS.) There might be some overlap as far as tuning the list of suppressions, but most of it would be independent, I'd think.
Reporter | ||
Comment 43•8 years ago
|
||
(In reply to Jed Davis [:jld] [⏰MDT; UTC-6] from comment #42)
> Bug 1246801 might not need to block this — it's for LSan support in NSS's
> own build/testsuite/CI/etc. (When building NSS as part of Firefox, the
> sanitizer build flags carry over without needing any special support from
> NSS.) There might be some overlap as far as tuning the list of
> suppressions, but most of it would be independent, I'd think.
The meta bug is just a combination of things related to running LSan on Firefox-y code.
Depends on: 1499202
Depends on: 1499206
Updated•6 years ago
|
Updated•5 years ago
|
Type: task → defect
Updated•5 years ago
|
Summary: [meta] Try out LeakSanitizer → [meta] Run LeakSanitizer
Updated•5 years ago
|
Assignee: continuation → nobody
Component: General → Sanitizers
Updated•5 years ago
|
Updated•3 years ago
|
Depends on: 1686948, 1532189, 1719493, 1753351, 1718974, 1720148, 1719945, 1718714, 1630917, 1743092, 1685499, 1572295, 1709043, 1712191, 1708828, 1711640, 1639768, 1753705, 1708028, 1599480, 1753338, 1756652, 1668918, 1634641, 1735085, 1687248, 1602685, 1750417, 1509373, 1728821, 1717657, 1731116, 1755767, 1602689, 1703952, 1479987, 1753183, 1750062, 1753368, 1600756, 1718811, 1747635, 1740390, 1740240, 1755129, 1626420, 1728820, 1754191, 1331152, 1513936, 1738540, 1401144, 1725085
Updated•2 years ago
|
Severity: normal → S3
Updated•2 years ago
|
Reporter | ||
Comment 44•1 year ago
|
||
I've removed all of the intermittent-failures blocking this bug. It looks like bugbot won't close intermittent-failures that haven't happened in a long time, so there were dozens of bugs that hadn't happened in years that were still open. Intermittent LSan failures are easy enough to find by searching for "Intermittent LeakSanitizer", and generally aren't very interesting bugs, so I think it is okay for them to not be blocking this meta bug.
Sorry for the noise. I guess I should have removed the blocking bugs manually in a single change to this bug rather than used the Bugzilla query mass change on the bugs that block this one.
You need to log in
before you can comment on or make changes to this bug.
Description
•