⚡ Is your application squeezing the most out of your CPUs? Or are you leaving performance and reliability gains on the table? Learn some new tips and tricks... Join the live recording of our podcast with Denis Bakhvalov, performance ninja at Intel Corporation 🥷🏻, author of Performance Analysis and Tuning on Modern CPUs. Register here: https://lu.ma/pmp14hq7 👈 Hosted by Tony Meehan, co-founder of Prequel.dev, we'll take live Q&A and dive into - - Emerging CPU trends and their impact on application performance - The state of compilers in optimizing software for modern hardware - Biggest performance engineering misconceptions - His journey into performance engineering - Best practices for benchmarking - Navigating tradeoffs between performance and scalability #reliability #performance #sre #infrastructure #cpu #observability
Detect.sh - Reliability Community
Technology, Information and Internet
The Open Problem Detection (and Resolution) Community
About us
The only community focused on the art and science of problem detection and troubleshooting in modern software applications.
- Website
-
http://detect.sh
External link for Detect.sh - Reliability Community
- Industry
- Technology, Information and Internet
- Company size
- 51-200 employees
- Type
- Privately Held
Updates
-
🤩 Lorenzo Fontana, co-author of Linux Observability with BPF, reveals the problem led him to eBPF. Don't miss this .... and other great stories in the full detect podcast episode. Link in the comments: 👇 #ebpf #kubernetes #linux @fntlnz Hosted by: @bugstony @influxdata
-
🚨 September's newsletter is live! 🚨 Packed with real-world debugging stories,⚠️ Major Incidents at OpenAI, Anthropic, HubSpot, and Google, a hidden bug in kafka, and more... 📖 Real Problem Detection & Troubleshooting Stories - Sharpen your technical skills with these deep dives. 🛠️ eBPF – Learn how engineers are using it for linux introspection - and faster, more efficient problem detection across distributed applications. ⚠️ Major Incidents from OpenAI, Anthropic, HubSpot, and Google – These case studies shed light on what happens when things go wrong and the innovative approaches these teams are taking to ensure reliability and resilience. 💡 Whether you're dealing with distributed systems, observability, or incident management, this edition has something valuable you. Link in the comments 👇 Shoutout to the following experts for being featured in this month's issue! Dan Slimmon Lorenzo Fontana Guillaume Mallet #srecon Bert Schiettecatte Swizec Teller Andrea Bergia Dominik Czarnota Denis Isaev Tomer Aberbach Rain, Rachel Kroll, Amnon Cohen, Alex Ewerlöf, Jade Rubick, John Allspaw Ben Linders Ajinkya Ghadge Uğur Erdem Seyfi Jean-Mark Wright Richard Artoul Katerina Petrova Ian Hoffman Michael Demmer Matheus Lichtnow Madan Thangavelu Steef-Jan Wiggers Catherine "Kyren" West Arthur O'Dwyer Tony Meehan #sre #softwareengineering #engineering #observability #platformengineering #problemdetection #troubleshooting #reliability #scalability
-
😫 Overwhelmed by errors and alerts? 🚨 Error backlog putting your SLOs at risk? 👇 ........ We know the pain of watching your errors grow while trying to keep up with everything else. But there’s hope! Our latest blog post contributed by community member Dan Slimmon (former Etsy, Hashicorp) provides practical tips to help you regain control. Don’t let those errors weigh you down any longer—find out how to tackle them head-on. Read more here: https://lnkd.in/eX4ggsdr #sre #sitereliabilityengineering #platformengineering #softwareengineering #incidentmanagment #incident #slo #observability
-
Detect.sh - Reliability Community reposted this
❓Curious about eBPF and why so many engineers are learning it? 🤔 Wondering how to get started? Join us for a conversation with Lorenzo Fontana, Co-author of O'Reilly's Linux Observability with BPF. Hosted by detect community member Tony Meehan. 📅 September 4th at 12 PM ET Register here 👉 https://lnkd.in/e9Zgb9hB Presented by Prequel.dev Shout out to a few of our newest community members: Olivier Mwanza Tshibemba Maya Li Dan Slimmon Joseph Hardeman Tudor Golubenco Robert Austin Ellora P. #linux #observability #eBPF #CNCF #cloudnative #kubernetes #Webinar #cloud #security #sre #softwareengineering #engineering #reliability
-
❓Curious about eBPF and why so many engineers are learning it? 🤔 Wondering how to get started? Join us for a conversation with Lorenzo Fontana, Co-author of O'Reilly's Linux Observability with BPF. Hosted by detect community member Tony Meehan. 📅 September 4th at 12 PM ET Register here 👉 https://lnkd.in/e9Zgb9hB Presented by Prequel.dev Shout out to a few of our newest community members: Olivier Mwanza Tshibemba Maya Li Dan Slimmon Joseph Hardeman Tudor Golubenco Robert Austin Ellora P. #linux #observability #eBPF #CNCF #cloudnative #kubernetes #Webinar #cloud #security #sre #softwareengineering #engineering #reliability
-
🚨 Our latest newsletter is out (https://lnkd.in/ejaq6vth). We’re diving deep into: - The hidden bug of the month 🐛 - Root cause of CrowdStrike’s $5B outage 💥 - Incidents at Cloudflare and GitHub 🔍 Plus, insights on memory profiling, Kafka monitoring, and more! 💡 Stay ahead of the curve and arm yourself with the knowledge to tackle reliability challenges. 👉 Read more: https://lnkd.in/ejaq6vth #sre #sitereliabilityengineering #platformengineering #softwareengineering #incidentmanagment #incident #slo #observability
-
eBPF is a game changer..... Join us for our next webinar featuring Lorenzo F., Co-author of O'Reilly's Linux Observability with BPF. 👉 Register here: https://lnkd.in/e9TRG_VP Hosted by CTO (and detect member) Tony Meehan, we'll dive into: 📌 eBPF's origin story and relationship to BPF 📌 Lorenzo's experience with eBPF as the former maintainer of Cloud Native Computing Foundation (CNCF)'s falco and IOVisor's kubectl-trace 📌 Lessons learned applying eBPF to observability and security 📌 Maintaining open source projects 📌 Building and troubleshooting reliable applications in 2024 📅 September 4th at 12 PM ET 🎙️ About the Speaker: Lorenzo F. is a renowned expert in Linux observability and security, bringing a wealth of knowledge and practical insights. Don't miss this opportunity to learn from one of the best in the field. Register now and secure your spot! Register here: https://lnkd.in/e9TRG_VP Presented by Prequel.dev. #linux #observability #eBPF #CNCF #cloudnative #kubernetes #Webinar #cloud #security #sre #softwareengineering #engineering #reliability
Revealing Linux's Secrets with EBPF · Zoom · Luma
lu.ma
-
🔥🧯 For SREs, knowing "What" is broken can be easy; but figuring out"Why" is often hard. Community-member Amin Astaneh formerly of Meta (currently at Certo Modo) breaks down the distinction. #sre #sitereliabilityengineering #platformengineering #softwareengineering #incidentmanagment #incident
-
🚨 Our July Newsletter is out. 🔥 Incidents at Google, Cloudflare, Github, and OpenAI; 🛠️ Get a handle on Flaky alerts...and more in our latest newsletter. 𝐍𝐞𝐰𝐬 📰 • 𝐑𝐞𝐭𝐢𝐫𝐞𝐝 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫 𝐃𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐬 𝟓𝟓-𝐘𝐞𝐚𝐫-𝐎𝐥𝐝 𝐁𝐮𝐠 𝐢𝐧 𝐋𝐮𝐧𝐚𝐫 𝐋𝐚𝐧𝐝𝐞𝐫 𝐂𝐨𝐦𝐩𝐮𝐭𝐞𝐫 𝐆𝐚𝐦𝐞 𝐂𝐨𝐝𝐞 - An incredible story of persistence and discovery in an old classic game. • 𝐈𝐧𝐭𝐞𝐥 𝐑𝐚𝐩𝐭𝐨𝐫 𝐋𝐚𝐤𝐞 𝐂𝐫𝐚𝐬𝐡 𝐅𝐢𝐱 - An important update for those working with Intel's Raptor Lake, addressing a critical crash issue. Link in comments. 👇 𝐁𝐥𝐨𝐠𝐬 📝 • 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐌𝐲𝐒𝐐𝐋 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 - Tips and techniques for resolving MySQL issues and improving performance. • 𝐄𝐥𝐢𝐱𝐢𝐫 𝐂𝐨𝐝𝐞 𝐀𝐧𝐭𝐢-𝐏𝐚𝐭𝐭𝐞𝐫𝐧𝐬 - Improve your Elixir code by avoiding common pitfalls and following best practices. • 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐓𝐢𝐩: What Happens to Pods on Unreachable Nodes? - Essential tips for managing Kubernetes pods on unreachable nodes. • 𝐓𝐞𝐦𝐩𝐥𝐞𝐎𝐒 𝐑𝐞𝐯𝐞𝐫𝐬𝐞 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐏𝐚𝐫𝐭 𝐈 - A deep dive into the reverse engineering of TempleOS. .... Link in comments. 👇 𝐍𝐨𝐭𝐚𝐛𝐥𝐞 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭𝐬 🔥 • 𝐋𝐚𝐬𝐭𝐏𝐚𝐬𝐬 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭 - Analysis of a recent security incident at LastPass. • 𝐂𝐥𝐨𝐮𝐝𝐟𝐥𝐚𝐫𝐞 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭 - 3x increase in p99 latency. Detailed breakdown of Cloudflare's incident and how it was resolved. • 𝐆𝐢𝐭𝐇𝐮𝐛 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭 - Examination of a recent outage at GitHub and the root cause. • 𝐄𝐧𝐭𝐫𝐲𝐖𝐚𝐧 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭 𝐏𝐨𝐬𝐭𝐦𝐨𝐫𝐭𝐞𝐦 - In-depth analysis of the recent incident at EntryWan. • 𝐆𝐨𝐨𝐠𝐥𝐞 𝐒𝐞𝐚𝐫𝐜𝐡 𝐁𝐮𝐠 𝐰𝐢𝐭𝐡 𝐈𝐧𝐝𝐞𝐱𝐢𝐧𝐠 - Insights into a Google Search bug affecting indexing. • 𝐎𝐩𝐞𝐧𝐀𝐈 𝐂𝐡𝐚𝐭𝐆𝐏𝐓 𝐎𝐮𝐭𝐚𝐠𝐞 - Details on the recent ChatGPT outage and what was learned. • 𝐁𝐢𝐭𝐛𝐮𝐜𝐤𝐞𝐭 𝐈𝐧𝐜𝐢𝐝𝐞𝐧𝐭 - Information on the recent service disruption at Bitbucket affecting pipelines. ....... Link in comments. 👇 𝐀𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 📐👷♀️ • 𝐈𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐏𝐮𝐬𝐡 𝐏𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 𝐨𝐧 𝐆𝐢𝐭𝐇𝐮𝐛 - How GitHub enhanced their push processing system. • 𝐒𝐭𝐫𝐢𝐩𝐞'𝐬 𝐙𝐞𝐫𝐨 𝐃𝐨𝐰𝐧𝐭𝐢𝐦𝐞 𝐃𝐚𝐭𝐚 𝐌𝐢𝐠𝐫𝐚𝐭𝐢𝐨𝐧𝐬 - Learn about Stripe's approach to maintaining uptime during data migrations. • 𝐅𝐥𝐚𝐤𝐲 𝐀𝐥𝐞𝐫𝐭𝐬 𝐀𝐫𝐞 𝐒𝐚𝐲𝐢𝐧𝐠 𝐒𝐨𝐦𝐞𝐭𝐡𝐢𝐧𝐠 - Understanding what flaky alerts indicate about your system. • 𝐇𝐨𝐰 𝐞𝐁𝐏𝐅 𝐢𝐬 𝐒𝐡𝐚𝐩𝐢𝐧𝐠 𝐭𝐡𝐞 𝐅𝐮𝐭𝐮𝐫𝐞 𝐨𝐟 𝐋𝐢𝐧𝐮𝐱 𝐚𝐧𝐝 𝐏𝐥𝐚𝐭𝐟𝐨𝐫𝐦 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 - Discover the impact of eBPF on Linux and platform engineering. Link in comments. 👇 𝐓𝐨𝐨𝐥𝐬 🛠️ • 𝐏𝐨𝐬𝐭𝐠𝐫𝐞𝐬-𝐁𝐏𝐅𝐓𝐫𝐚𝐜𝐞 - A useful tool for tracing PostgreSQL using BPF. • 𝐄𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝐃𝐞𝐛𝐮𝐠𝐠𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐨𝐧𝐬𝐨𝐥𝐞.𝐥𝐨𝐠 - Beginner to advanced tips for making the most out of console.log in your debugging process. Link in comments. 👇