windows precompiled version not working #894

vipcxj · 2024-09-30T11:26:36Z

I downloaded windows-0.11.1.zip which has several executable files. I double-clicked to run tracy-profiler.exe and nothing happened, I ran several other programs and still nothing happened. I used the command line to run tracy-profiler.exe and it returned straight back with nothing outputted
My windows version is win10 22H2 (19045.4894)

wolfpld · 2024-09-30T11:43:15Z

#887

vipcxj · 2024-09-30T12:31:18Z

@wolfpld You are right, After install Visual C++ Redistributable, it works. By the way, is tracy support c++20 coroutine? I use coroutine with asio. i can't find any examples about asio with coroutine, only one for fiber. I don't use fiber, I don't even see the co_await and co_return keywords in this example. I tried using ZoneScoped directly, but it prompts ZoneEnd to execute 2 times. The documentation mentions that ZoneScoped is based on the RAII mechanism, which as I recall works in coroutines as well, I don't know what happened.

wolfpld · 2024-09-30T12:34:15Z

Tracy "fibers" provide a general mechanism for async tasks. If you can push instrumentation around the C++20 facilities, things should be working.

vipcxj · 2024-09-30T12:44:57Z

here is my code to start the asio io_context tasks

            auto m_io_pool = std::make_shared<BS::thread_pool>();
            for (size_t i = 0; i < m_io_pool->get_thread_count(); i++)
            {
                m_io_pool->detach_task([self = shared_from_this()]() {
                    auto guard = asio::make_work_guard(self->m_io_ctx.get_executor());
                    try
                    {
                        self->m_io_ctx.run();
                    }
                    catch(...)
                    {
                        self->m_logger->error(cfgo::what());
                    }
                    CFGO_SELF_DEBUG("io ctx completed in thread {}", std::this_thread::get_id());
                });
            }

How to make tracy work with it? Currently I'm hoping to use tracy to measure the execution times of my various functions to find out where exactly I'm stuck

vipcxj · 2024-09-30T12:57:06Z

Here is my code:

        auto Device::_ready_loop(const cfgo::close_chan & closer) -> asio::awaitable<void>
        {
            ...
            do
            {
                {
                    ZoneNamed(wait_not_full, true);
                    do
                    {
                        auto ch = m_ready_maybe_not_full_notifier.make_notfiy_receiver();
                        {
                            std::lock_guard lk(m_ready_mutex);
                            if (!_ready_full())
                            {
                                break;
                            }
                        }
                        co_await cfgo::chan_read_or_throw<void>(ch, closer);
                    } while (true);
                }
                // only ready loop can make ready full, so since ready is not full here, it will keey not full until the end of the while.
                {
                    ZoneNamed(lock_blocks, true);
                    co_await prom::measure_time<void>(
                        cfgo::fix_async_lambda([self, closer]() -> asio::awaitable<void> {
                            return self->m_block_manager.lock(std::move(closer));
                        }),
                        [prom_enabled](prom::duration_t time) {
                            if (prom_enabled)
                            {
                                auto & metrics = sr::Manager::instance().metrics();
                                metrics.m_infer_task_block_time_hist.Observe(std::chrono::duration_cast<std::chrono::milliseconds>(time).count());
                            }
                        }
                    );
                }
                DEFER({
                    m_block_manager.unlock();
                });
                {
                    ZoneNamed(copy_ready_data, true);
                    m_block_manager.collect_locked_blocker(locked_blockers);
                    int batches = locked_blockers.size();
                    assert(batches <= conf->ai_target_batch());
                    m_logger->trace("{} sample locked, target batch: {}", batches, conf->ai_target_batch());
                    if (prom_enabled)
                    {
                        auto & metrics = sr::Manager::instance().metrics();
                        metrics.m_ai_infer_batches_hist.Observe(batches);
                    }
                    if (locked_blockers.empty())
                    {
                        continue;
                    }
                    assert(!_ready_full());
                    std::uint32_t write_slot = m_ready_tail_offset;
                    m_ready_metas[write_slot] = Batch {};
                    auto & batch =  m_ready_metas[write_slot];

                    TIMED_CUDA_DECLARE;
                    START_CUDA_CONTEXT(m_context);
                    TIMED_CUDA_START(m_ready_stream);

                    for (int i = 0; i < batches; ++i)
                    {
                        auto & blocker = locked_blockers[i];
                        auto meta = std::static_pointer_cast<sr::DataPrepareTask::Meta>(blocker.get_pointer_user_data());
                        batch->add_meta(meta);
                        sr::cuda::copy_ai_input(
                            conf->ai_frame_width(), conf->ai_frame_height(), batches,
                            m_prepare_areas, meta->slot(), meta->offset(),
                            m_ready_areas, i, write_slot,
                            m_ready_stream
                        );
                    }

                    TIMED_CUDA_END(m_ready_stream);
                    SYNC_CUDA_STREAM_ASYNC(m_ready_stream, Device, m_ready_sync_ch);
                    END_CUDA_CONTEXT;
                    CUresult cu_res = co_await chan_read_or_throw<CUresult>(m_ready_sync_ch, closer);
                    checkCuda(cu_res);
                    m_logger->trace("Copying batch frames to ready areas use {} ms", TIMED_CUDA_GET());

                    {
                        std::lock_guard lk(m_ready_mutex);
                        assert(write_slot == m_ready_tail_offset);
                        assert(!_ready_full());
                        m_ready_tail_offset = (m_ready_tail_offset + 1) % m_ready_metas.size();
                        if (m_ready_tail_offset == m_ready_head_offset)
                        {
                            m_ready_full = true;
                        }
                    }
                    m_ready_maybe_not_empty_notifier.notify();
                }
                FrameMarkNamed(m_device_name.c_str());
            } while (true);
        }

If there is only zone wait_not_full, it works, However, when i add lock_blocks and zone copy_ready_data, profiler show "zone is ended twice". I think every scope only has one zone, it should end only once.

wolfpld closed this as completed Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

windows precompiled version not working #894

windows precompiled version not working #894

vipcxj commented Sep 30, 2024

wolfpld commented Sep 30, 2024

vipcxj commented Sep 30, 2024

wolfpld commented Sep 30, 2024

vipcxj commented Sep 30, 2024

vipcxj commented Sep 30, 2024

windows precompiled version not working #894

windows precompiled version not working #894

Comments

vipcxj commented Sep 30, 2024

wolfpld commented Sep 30, 2024

vipcxj commented Sep 30, 2024

wolfpld commented Sep 30, 2024

vipcxj commented Sep 30, 2024

vipcxj commented Sep 30, 2024