Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8342818: Implement CPU Time Profiling for JFR #20752

Open
wants to merge 126 commits into
base: master
Choose a base branch
from

Conversation

parttimenerd
Copy link
Contributor

@parttimenerd parttimenerd commented Aug 28, 2024

This is the code for the [JEP draft: CPU Time based profiling for JFR].


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342818: Implement CPU Time Profiling for JFR (Enhancement - P4)

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20752/head:pull/20752
$ git checkout pull/20752

Update a local copy of the PR:
$ git checkout pull/20752
$ git pull https://git.openjdk.org/jdk.git pull/20752/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20752

View PR using the GUI difftool:
$ git pr show -t 20752

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20752.diff

Using Webrev

Link to Webrev Comment

@parttimenerd
Copy link
Contributor Author

Please make sure you test this also on non-Linux platforms:

I usually do test it on Mac, I probably just forgot it. I'm fixing the bug tomorrow.

*/

#include "precompiled.hpp"
#include "jfr/recorder/service/jfrOptionSet.hpp"
Copy link

@mgronlun mgronlun Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All includes except:

#include "precompiled.hpp"
#include "jfr/periodic/sampling/jfrCPUTimeThreadSampler.hpp"
#include "utilities/debug.hpp"

Can be moved below #if defined(Linux) (to reduce compilation time on non-Linux platforms)

#include "runtime/osThread.hpp"

#if defined(LINUX)
#include "signals_posix.hpp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will also need to include <sys/types.h> and change the type from OSThread::thread_id_t to pid_t, else you can get the following error:

"error: 'typedef pid_t OSThread::thread_id_t' is private within this context"

{}

bool JfrAsyncStackTrace::record_async(JavaThread* jt, const frame& frame) {
NoHandleMark nhm;
Copy link

@mgronlun mgronlun Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stack: [0x00007f02ff2f4000,0x00007f02ff3f4000], sp=0x00007f02ff3ed8e0, free space=998k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xdf5649] HandleArea::allocate_null_handle()+0x179 (handles.cpp:46)
V [libjvm.so+0xc5259a] RegisterMap::RegisterMap(JavaThread*, RegisterMap::UpdateMap, RegisterMap::ProcessFrames, RegisterMap::WalkContinuation)+0xaa (frame.cpp:75)
V [libjvm.so+0xf9ffb9] JfrVframeStream::JfrVframeStream(JavaThread*, frame const&, bool, bool)+0x59 (jfrStackTrace.cpp:160)
V [libjvm.so+0xf0940c] JfrAsyncStackTrace::record_async(JavaThread*, frame const&)+0xac (jfrAsyncStackTrace.cpp:58)
V [libjvm.so+0xf0be3d] JfrCPUTimeThreadSampler::handle_timer_signal(void*)+0x4fd (jfrCPUTimeThreadSampler.cpp:231)
V [libjvm.so+0xf0bf7f] handle_timer_signal(int, siginfo*, void*)+0x4f (jfrCPUTimeThreadSampler.cpp:723)

NoHandleMark will not work with with continuations. Please see how we do it in JfrStackTrace::record()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps even more interesting is that I got this when running out-of-the-box with:

XX:StartFlightRecording:settings=profile.jfc

// profile.jfc

<event name="jdk.CPUTimeSample">
  <setting name="enabled" control="method-sampling-enabled">false</setting>
  <setting name="throttle">10ms</setting>
</event>

This indicates that the event is turned on, despite "enabled" being false. Is there a bug in the control / enable setting mechanism?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is both interesting (and worrying). I'm going to write a test for the latter issue.


private static void testThrottleSettingsPeriod() throws Exception {
float rate = countEvents(1000, "1ms").rate();
Asserts.assertTrue(rate > 950 && rate < 1050, "Expected around 1000 events per second, got " + rate);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails on linux-aarch64-debug builds:

----------System.out:(0/0)----------
----------System.err:(14/1061)----------
java.lang.RuntimeException: Expected around 1000 events per second, got 245.15503: expected true, was false
at jdk.test.lib.Asserts.fail(Asserts.java:691)
at jdk.test.lib.Asserts.assertTrue(Asserts.java:543)
at jdk.jfr.event.profiling.TestCPUTimeEventThrottling.testThrottleSettingsPeriod(TestCPUTimeEventThrottling.java:65)
at jdk.jfr.event.profiling.TestCPUTimeEventThrottling.main(TestCPUTimeEventThrottling.java:49)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:573)
at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
at java.base/java.lang.Thread.run(Thread.java:1576)

JavaTest Message: Test threw exception: java.lang.RuntimeException: Expected around 1000 events per second, got 245.15503: expected true, was false
JavaTest Message: shutting down test

@mgronlun
Copy link

Test: jdk/jfr/jcmd/TestJcmdStartWithOptions.java seems to be timing out when the CPU Time Profiler is turned on.

@mgronlun
Copy link

mgronlun commented Oct 31, 2024

Test: TestActiveSettingEvent also has problems on Linux, not only on non-Linux platforms.

Testing configuration default
----------System.err:(15/1230)----------
java.lang.RuntimeException: Incorrect settings value for jdk.CPUTimeSample#throttle was 0/s, expected 500/s expected: 500/s but was: 0/s
at jdk.test.lib.Asserts.fail(Asserts.java:691)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:204)
at jdk.jfr.event.runtime.TestActiveSettingEvent.testSettingConfiguration(TestActiveSettingEvent.java:308)
at jdk.jfr.event.runtime.TestActiveSettingEvent.testDefaultSettings(TestActiveSettingEvent.java:79)
at jdk.jfr.event.runtime.TestActiveSettingEvent.main(TestActiveSettingEvent.java:65)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:573)
at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
at java.base/java.lang.Thread.run(Thread.java:1576)

@mgronlun
Copy link

mgronlun commented Oct 31, 2024

Test: jdk/jfr/event/profiling/TestNative.java times out on Linux. Could perhaps be an indication that event control/settings is getting mixed up between the two samplers?

jdk.jfr.event.profiling.TestNative
result: Error. Agent error: java.lang.Exception: Agent 21 timed out with a timeout of 480 seconds; check console log for any additional details

@mgronlun
Copy link

mgronlun commented Oct 31, 2024

Linux AArch64-debug:

Are you enqueuing entries that have already been unloaded?

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x0000ffff898316e4, pid=458651, tid=458700

JRE version: Java(TM) SE Runtime Environment (24.0) (fastdebug build 24-internal-2024-10-30-2131063.markus.gronlund.jdkcopy)
Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 24-internal-2024-10-30-2131063.markus.gronlund.jdkcopy, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
Problematic frame:
V [libjvm.so+0xe1b6e4] module_id(PackageEntry const*, bool)+0x34

If you would like to submit a bug report, please visit:
https://bugreport.java.com/bugreport/crash.jsp

--------------- T H R E A D ---------------

Current thread (0x0000fffef803f480): JavaThread "JFR Recorder Thread" daemon [_thread_in_vm, id=458700, stack(0x0000ffff05772000,0x0000ffff05970000) (2040K)]

Stack: [0x0000ffff05772000,0x0000ffff05970000], sp=0x0000ffff0596e100, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xe1b6e4] module_id(PackageEntry const*, bool)+0x34 (moduleEntry.hpp:111)
V [libjvm.so+0xe1b938] package_id(Klass const*, bool)+0x48 (jfrTypeSet.cpp:219)
V [libjvm.so+0xe1c6f4] write_klass(JfrCheckpointWriter*, Klass const*, bool, int&)+0xf0 (jfrTypeSet.cpp:350)
V [libjvm.so+0xe22c2c] JfrArtifactCallbackHost<Klass const*, CompositeFunctor<Klass const*, JfrTypeWriterHost<JfrPredicatedTypeWriterImplHost<Klass const*, SerializePredicate<Klass const*>, &(write__klass(JfrCheckpointWriter*, void const*))>, 183u>, KlassArtifactRegistrator> >::do_artifact(void const*)+0xcc (jfrTypeSet.cpp:398)
V [libjvm.so+0xe14fe0] EpochDispatchOp<JfrEpochQueue::ElementDispatch >::dispatch(bool, unsigned char const*, unsigned long)+0xd0 (jfrTraceIdKlassQueue.hpp:42)
V [libjvm.so+0xe15574] void JfrEpochQueue::iterate(KlassFunctor&, bool)+0x324 (jfrStorageUtils.inline.hpp:160)
V [libjvm.so+0xe13ad0] JfrTraceIdKlassQueue::iterate(void ()(Klass), bool)+0x40 (jfrTraceIdKlassQueue.cpp:269)
V [libjvm.so+0xe220bc] JfrTypeSet::serialize(JfrCheckpointWriter*, JfrCheckpointWriter*, bool, bool)+0xf28 (jfrTypeSet.cpp:559)
V [libjvm.so+0xd64b98] flush_type_set(Thread*)+0x98 (jfrCheckpointManager.cpp:640)
V [libjvm.so+0xd6a8e0] JfrCheckpointManager::flush_type_set()+0x880 (jfrCheckpointManager.cpp:650)
V [libjvm.so+0xde0d68] JfrRecorderService::flush()+0xc8 (jfrRecorderService.cpp:158)
V [libjvm.so+0xde1660] JfrRecorderService::invoke_flush()+0xd0 (jfrRecorderService.cpp:158)
V [libjvm.so+0xde25e0] JfrRecorderService::flushpoint()+0x16c (jfrRecorderService.cpp:680)
V [libjvm.so+0xde3348] recorderthread_entry(JavaThread*, JavaThread*)+0x2c8 (jfrRecorderThreadLoop.cpp:83)
V [libjvm.so+0xd50400] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759)
V [libjvm.so+0x15e59e0] Thread::call_run()+0xac (thread.cpp:234)
V [libjvm.so+0x131fae4] thread_native_entry(Thread*)+0x130 (os_linux.cpp:858)
C [libpthread.so.0+0x7950] start_thread+0x190

@parttimenerd
Copy link
Contributor Author

Are you enqueuing entries that have already been unloaded?

Into the JFR buffers? It is possible that we access the methods of unloaded classes, when they are unloaded between the signal handler enqueue and the capture. I thought that this would crash in the method resolve, which would then be caught via the used ThreadCrashProtection. But apparently not. I'm unsure how to solve this. Maybe I could check whether the methods class is unloaded?

@tstuefe
Copy link
Member

tstuefe commented Oct 31, 2024

Is there any form design document? The description in the JEP is extremely sparse, and this PR carries no information at all.

The ability to accurately and precisely measure CPU consumption was added to Linux in 2.16.12 through a timer that emits signals at regular intervals of time spent on the CPU (as opposed to elapsed, or "wall-clock" time). Most profilers on Linux utilize this mechanism to produce CPU time profiles.

<snip>

This Linux capability should be employed by JFR to safely produce CPU time profilers on the JDK. This functionality will allow the wide audience of developers deploying Java applications on Linux to make them more efficient.

That is all I could find. Is there more?

I am mostly curious - and I admit, nervous - about reentrant safety. I see, again, use of setjmp/longjmp semantics here in combination with a mechanism that I believe stops the JVM at arbitrary(?) times, which to me is a warning sign.

@@ -40,14 +40,14 @@
*/
public class TestNative {

final static String NATIVE_EVENT = EventNames.NativeMethodSample;
static String nativeEvent = EventNames.NativeMethodSample;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

@parttimenerd
Copy link
Contributor Author

I am mostly curious - and I admit, nervous - about reentrant safety. I see, again, use of setjmp/longjmp semantics here in combination with a mechanism that I believe stops the JVM at arbitrary(?) times, which to me is a warning sign.

ThreadCrashProtection is not used inside the signal handler but in a separate worker thread.

@mgronlun
Copy link

SIGILL on AArch64

Current thread (0x0000ffff783aa220): JfrCPUTimeThreadSampler "JFR CPU Time Thread Sampler" [id=45622, stack(0x0000fffb01f6c000,0x0000fffb0216a000) (2040K)]

Stack: [0x0000fffb01f6c000,0x0000fffb0216a000], sp=0x0000fffb021683a0, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8439 c2 jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(Ljava/lang/String;)I java.base@24-internal (98 bytes) @ 0x0000ffff68300b4c [0x0000ffff68300d00+0xfffffffffffffe4c]
V [libjvm.so+0x816bd8] JFRRecordSampledThreadCallback::call()+0x18 (jfrCPUTimeThreadSampler.cpp:550)
V [libjvm.so+0xd2e598] ThreadCrashProtection::call(CrashProtectionCallback&)+0x68 (threadCrashProtection_posix.cpp:49)
V [libjvm.so+0x815238] JfrCPUTimeThreadSampler::process_trace_queue()+0x278 (jfrCPUTimeThreadSampler.cpp:590)
V [libjvm.so+0x8154e4] JfrCPUTimeThreadSampler::run()+0x110 (jfrCPUTimeThreadSampler.cpp:530)
V [libjvm.so+0xd2e208] Thread::call_run()+0xa8 (thread.cpp:234)
V [libjvm.so+0xb9ce90] thread_native_entry(Thread*)+0xdc (os_linux.cpp:858)
C [libc.so.6+0x806b8] start_thread+0x2d8

siginfo: si_signo: 4 (SIGILL), si_code: 1 (ILL_ILLOPC), si_addr: 0x0000ffff68300b4c

@mgronlun
Copy link

Linux x64-debug:

jfr/recorder/checkpoint/types/traceid/jfrTraceIdKlassQueue.cpp:190

assert(id > 0) failed: invariant

--------------- T H R E A D ---------------

Current thread (0x00007efe60f3f280): JavaThread "JFR Recorder Thread" daemon [_thread_in_vm, id=3499556, stack(0x00007efeb13f5000,0x00007efeb14f5000) (1024K)]

Stack: [0x00007efeb13f5000,0x00007efeb14f5000], sp=0x00007efeb14f37d0, free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xfcd274] EpochDispatchOp<JfrEpochQueue::ElementDispatch >::dispatch(bool, unsigned char const*, unsigned long)+0x194 (jfrTraceIdKlassQueue.cpp:190)
V [libjvm.so+0xfcd75e] void JfrEpochQueue::iterate(KlassFunctor&, bool)+0x37e (jfrStorageUtils.inline.hpp:160)
V [libjvm.so+0xfcbc88] JfrTraceIdKlassQueue::iterate(void ()(Klass), bool)+0x38 (jfrTraceIdKlassQueue.cpp:269)
V [libjvm.so+0xfda2f4] JfrTypeSet::serialize(JfrCheckpointWriter*, JfrCheckpointWriter*, bool, bool)+0xfb4 (jfrTypeSet.cpp:559)
V [libjvm.so+0xf11b6d] flush_type_set(Thread*)+0x8d (jfrCheckpointManager.cpp:640)
V [libjvm.so+0xf16a36] JfrCheckpointManager::flush_type_set()+0x116 (jfrCheckpointManager.cpp:650)
V [libjvm.so+0xf944a6] JfrRecorderService::flush()+0xe6 (jfrRecorderService.cpp:158)
V [libjvm.so+0xf94e2a] JfrRecorderService::invoke_flush()+0xba (jfrRecorderService.cpp:158)
V [libjvm.so+0xf95dfa] JfrRecorderService::flushpoint()+0x10a (jfrRecorderService.cpp:680)
V [libjvm.so+0xf96b38] recorderthread_entry(JavaThread*, JavaThread*)+0x178 (jfrRecorderThreadLoop.cpp:83)
V [libjvm.so+0xefc16c] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759)
V [libjvm.so+0x1836106] Thread::call_run()+0xb6 (thread.cpp:234)
V [libjvm.so+0x15144a8] thread_native_entry(Thread*)+0x128 (os_linux.cpp:858)

@parttimenerd
Copy link
Contributor Author

@mgronlun what is your reproducer? I didn't see these issues before in my own testing

@mgronlun
Copy link

@mgronlun what is your reproducer? I didn't see these issues before in my own testing

Internal stress testing.

@mgronlun
Copy link

linux-aarch64-debug

--------------- T H R E A D ---------------

Current thread (0x0000ffffac4222b0): JavaThread "JFR Recorder Thread" daemon [_thread_in_vm, id=144094, stack(0x0000ffff5e17a000,0x0000ffff5e378000) (2040K)]

Stack: [0x0000ffff5e17a000,0x0000ffff5e378000], sp=0x0000ffff5e376110, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xe19e54] get_cld(Klass const*)+0x34 (klass.hpp:684)
V [libjvm.so+0xe1c674] write_klass(JfrCheckpointWriter*, Klass const*, bool, int&)+0x70 (jfrTypeSet.cpp:370)
V [libjvm.so+0xe22d40] JfrArtifactCallbackHost<Klass const*, CompositeFunctor<Klass const*, CompositeFunctor<Klass const*, JfrTypeWriterHost<JfrPredicatedTypeWriterImplHost<Klass const*, LeakPredicate<Klass const*>, &(write__klass__leakp(JfrCheckpointWriter*, void const*))>, 183u>, JfrTypeWriterHost<JfrPredicatedTypeWriterImplHost<Klass const*, SerializePredicate<Klass const*>, &(write__klass(JfrCheckpointWriter*, void const*))>, 183u> >, KlassArtifactRegistrator> >::do_artifact(void const*)+0x90 (jfrTypeSet.cpp:398)
V [libjvm.so+0xe14fe0] EpochDispatchOp<JfrEpochQueue::ElementDispatch >::dispatch(bool, unsigned char const*, unsigned long)+0xd0 (jfrTraceIdKlassQueue.hpp:42)
V [libjvm.so+0xe15328] void JfrEpochQueue::iterate(KlassFunctor&, bool)+0xd8 (jfrStorageUtils.inline.hpp:160)
V [libjvm.so+0xe13ad0] JfrTraceIdKlassQueue::iterate(void ()(Klass), bool)+0x40 (jfrTraceIdKlassQueue.cpp:269)
V [libjvm.so+0xe21358] JfrTypeSet::serialize(JfrCheckpointWriter*, JfrCheckpointWriter*, bool, bool)+0x1c4 (jfrTypeSet.cpp:565)
V [libjvm.so+0xd6baf4] JfrCheckpointManager::write_type_set()+0x1a0 (jfrCheckpointManager.cpp:617)
V [libjvm.so+0xddfa54] JfrRecorderService::post_safepoint_write()+0xb0 (jfrRecorderService.cpp:613)
V [libjvm.so+0xde0bdc] JfrRecorderService::chunk_rotation()+0x3c (jfrRecorderService.cpp:573)
V [libjvm.so+0xde2104] JfrRecorderService::rotate(int)+0x100 (jfrRecorderService.cpp:539)
V [libjvm.so+0xde332c] recorderthread_entry(JavaThread*, JavaThread*)+0x2ac (jfrRecorderThreadLoop.cpp:81)
V [libjvm.so+0xd50400] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:759)
V [libjvm.so+0x15e59e0] Thread::call_run()+0xac (thread.cpp:234)
V [libjvm.so+0x131fae4] thread_native_entry(Thread*)+0x130 (os_linux.cpp:858)
C [libc.so.6+0x806b8] start_thread+0x2d8

@parttimenerd
Copy link
Contributor Author

I see a pattern in the last two segfaults, it seems to conform to what I wrote before. So I probably need a test case that unloads a lot of classes.
Do you share my intuition?

Comment on lines +210 to +221
frame topframe;
if (!jt->pd_get_top_frame_for_signal_handler(&topframe, ucontext, false)) {
_error = ERROR_NO_TOPFRAME;
return;
}
frame first_java_frame;
Method* method = nullptr;
JfrGetCallTrace gct(false, jt);
if (!gct.find_top_frame(topframe, &method, first_java_frame)) {
_error = ERROR_NO_TOPFRAME;
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we end up with the same top frame if we started at the anchor frame instead? In fact, the frame we want should always be the immediate caller of the anchor frame. But I guess we need to check that the anchor frame is walkable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always the case, as far as I know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always the case, as far as I know.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be ok to start from the anchor frame.

Please see JfrNativeSamplerCallback::call().

#if defined(LINUX)
#include "signals_posix.hpp"

const int64_t AUTOADAPT_INTERVAL_MS = 100;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

}
}

const int SIG = SIGPROF;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

}


int64_t compute_sampling_period(double rate) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static


static bool _showed_warning = false;

void warn() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

// none of it is safe
return false;
}
const traceid mid = JfrTraceId::load(frame._method);
Copy link

@mgronlun mgronlun Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is JfrTraceId::load() barrier code:

inline traceid JfrTraceIdLoadBarrier::load(const Klass* klass, const Method* method) {
assert(klass != nullptr, "invariant");
assert(method != nullptr, "invariant");
if (should_tag(method)) {
SET_METHOD_AND_CLASS_USED_THIS_EPOCH(klass);
SET_METHOD_FLAG_USED_THIS_EPOCH(method);
assert(METHOD_AND_CLASS_USED_THIS_EPOCH(klass), "invariant");
assert(METHOD_FLAG_USED_THIS_EPOCH(method), "invariant");
enqueue(klass);
JfrTraceIdEpoch::set_changed_tag_state();
}
return (METHOD_ID(klass, method));
}

What happens if the sampler crashes at one of these routines?

  1. Sets bits in the Klass, but not in the Method, and does not enqueue?
  2. Set bits in the Klass and Method, but does not enqueue?
  3. Sets bits in Klass, Method and enqueues, but does not notify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good point.

FYI: Today is a bank holiday in Germany, so I don't work on the PR today, but it'll be my main focus in the following weeks

@jbachorik
Copy link

Stack: [0x0000fffb01f6c000,0x0000fffb0216a000], sp=0x0000fffb021683a0, free space=2032k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J 8439 c2 jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(Ljava/lang/String;)I java.base@24-internal (98 bytes) @ 0x0000ffff68300b4c [0x0000ffff68300d00+0xfffffffffffffe4c]
V [libjvm.so+0x816bd8] JFRRecordSampledThreadCallback::call()+0x18 (jfrCPUTimeThreadSampler.cpp:550)

Am I reading this right that the callback does upcall to Java where it modifies bytecode via ASM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

8 participants