Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows precompiled version not working #894

Closed
vipcxj opened this issue Sep 30, 2024 · 5 comments
Closed

windows precompiled version not working #894

vipcxj opened this issue Sep 30, 2024 · 5 comments

Comments

@vipcxj
Copy link

vipcxj commented Sep 30, 2024

I downloaded windows-0.11.1.zip which has several executable files. I double-clicked to run tracy-profiler.exe and nothing happened, I ran several other programs and still nothing happened. I used the command line to run tracy-profiler.exe and it returned straight back with nothing outputted
My windows version is win10 22H2 (19045.4894)

@wolfpld
Copy link
Owner

wolfpld commented Sep 30, 2024

#887

@wolfpld wolfpld closed this as completed Sep 30, 2024
@vipcxj
Copy link
Author

vipcxj commented Sep 30, 2024

@wolfpld You are right, After install Visual C++ Redistributable, it works. By the way, is tracy support c++20 coroutine? I use coroutine with asio. i can't find any examples about asio with coroutine, only one for fiber. I don't use fiber, I don't even see the co_await and co_return keywords in this example. I tried using ZoneScoped directly, but it prompts ZoneEnd to execute 2 times. The documentation mentions that ZoneScoped is based on the RAII mechanism, which as I recall works in coroutines as well, I don't know what happened.

@wolfpld
Copy link
Owner

wolfpld commented Sep 30, 2024

Tracy "fibers" provide a general mechanism for async tasks. If you can push instrumentation around the C++20 facilities, things should be working.

@vipcxj
Copy link
Author

vipcxj commented Sep 30, 2024

here is my code to start the asio io_context tasks

            auto m_io_pool = std::make_shared<BS::thread_pool>();
            for (size_t i = 0; i < m_io_pool->get_thread_count(); i++)
            {
                m_io_pool->detach_task([self = shared_from_this()]() {
                    auto guard = asio::make_work_guard(self->m_io_ctx.get_executor());
                    try
                    {
                        self->m_io_ctx.run();
                    }
                    catch(...)
                    {
                        self->m_logger->error(cfgo::what());
                    }
                    CFGO_SELF_DEBUG("io ctx completed in thread {}", std::this_thread::get_id());
                });
            }

How to make tracy work with it? Currently I'm hoping to use tracy to measure the execution times of my various functions to find out where exactly I'm stuck

@vipcxj
Copy link
Author

vipcxj commented Sep 30, 2024

Here is my code:

        auto Device::_ready_loop(const cfgo::close_chan & closer) -> asio::awaitable<void>
        {
            ...
            do
            {
                {
                    ZoneNamed(wait_not_full, true);
                    do
                    {
                        auto ch = m_ready_maybe_not_full_notifier.make_notfiy_receiver();
                        {
                            std::lock_guard lk(m_ready_mutex);
                            if (!_ready_full())
                            {
                                break;
                            }
                        }
                        co_await cfgo::chan_read_or_throw<void>(ch, closer);
                    } while (true);
                }
                // only ready loop can make ready full, so since ready is not full here, it will keey not full until the end of the while.
                {
                    ZoneNamed(lock_blocks, true);
                    co_await prom::measure_time<void>(
                        cfgo::fix_async_lambda([self, closer]() -> asio::awaitable<void> {
                            return self->m_block_manager.lock(std::move(closer));
                        }),
                        [prom_enabled](prom::duration_t time) {
                            if (prom_enabled)
                            {
                                auto & metrics = sr::Manager::instance().metrics();
                                metrics.m_infer_task_block_time_hist.Observe(std::chrono::duration_cast<std::chrono::milliseconds>(time).count());
                            }
                        }
                    );
                }
                DEFER({
                    m_block_manager.unlock();
                });
                {
                    ZoneNamed(copy_ready_data, true);
                    m_block_manager.collect_locked_blocker(locked_blockers);
                    int batches = locked_blockers.size();
                    assert(batches <= conf->ai_target_batch());
                    m_logger->trace("{} sample locked, target batch: {}", batches, conf->ai_target_batch());
                    if (prom_enabled)
                    {
                        auto & metrics = sr::Manager::instance().metrics();
                        metrics.m_ai_infer_batches_hist.Observe(batches);
                    }
                    if (locked_blockers.empty())
                    {
                        continue;
                    }
                    assert(!_ready_full());
                    std::uint32_t write_slot = m_ready_tail_offset;
                    m_ready_metas[write_slot] = Batch {};
                    auto & batch =  m_ready_metas[write_slot];

                    TIMED_CUDA_DECLARE;
                    START_CUDA_CONTEXT(m_context);
                    TIMED_CUDA_START(m_ready_stream);

                    for (int i = 0; i < batches; ++i)
                    {
                        auto & blocker = locked_blockers[i];
                        auto meta = std::static_pointer_cast<sr::DataPrepareTask::Meta>(blocker.get_pointer_user_data());
                        batch->add_meta(meta);
                        sr::cuda::copy_ai_input(
                            conf->ai_frame_width(), conf->ai_frame_height(), batches,
                            m_prepare_areas, meta->slot(), meta->offset(),
                            m_ready_areas, i, write_slot,
                            m_ready_stream
                        );
                    }

                    TIMED_CUDA_END(m_ready_stream);
                    SYNC_CUDA_STREAM_ASYNC(m_ready_stream, Device, m_ready_sync_ch);
                    END_CUDA_CONTEXT;
                    CUresult cu_res = co_await chan_read_or_throw<CUresult>(m_ready_sync_ch, closer);
                    checkCuda(cu_res);
                    m_logger->trace("Copying batch frames to ready areas use {} ms", TIMED_CUDA_GET());

                    {
                        std::lock_guard lk(m_ready_mutex);
                        assert(write_slot == m_ready_tail_offset);
                        assert(!_ready_full());
                        m_ready_tail_offset = (m_ready_tail_offset + 1) % m_ready_metas.size();
                        if (m_ready_tail_offset == m_ready_head_offset)
                        {
                            m_ready_full = true;
                        }
                    }
                    m_ready_maybe_not_empty_notifier.notify();
                }
                FrameMarkNamed(m_device_name.c_str());
            } while (true);
        }

If there is only zone wait_not_full, it works, However, when i add lock_blocks and zone copy_ready_data, profiler show "zone is ended twice". I think every scope only has one zone, it should end only once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants