Monitoring - We've identified the root cause of the issue and have mitigated it. We have also kicked off a backfill that will run over the weekend. We are shooting to have all events back in order and up to date by Monday morning. Expect updates over the weekend on the progress of the backfill of missing events. Thank you all for your patience and we hope you enjoy the rest of your Friday and the weekend!
Jul 19, 2024 - 22:53 UTC
Update - We continue investigating. We are close to understand the reason behind the event ingestion problem. It seems the root cause is not in the Kafka table engine, but on our write path to the distributed tables.

Events ingestion has been resumed, but it's going slowly to avoid those events disappearing, so there will be lag in the ingestion for some hours. We are working on pushing another patch to fix the lag.

After that is solved, we'll start the event backfill for the missing dates.

Jul 19, 2024 - 11:36 UTC
Update - We are investigating an issue with our kafka table engines and have purposely induced lag on our pipeline. All events are safe and will show up after this investigation, but for the moment we will fall behind on processing events and you will notice the last few hours missing in your reporting.
Jul 18, 2024 - 16:51 UTC
Update - We have started event recovery.

Data may be missing since 2024-07-17 at 21:00 UTC. The missing events will eventually be available for querying.

We are now working on pushing a fix to avoid this happening again.

Jul 18, 2024 - 13:29 UTC
Investigating - We've spotted that the events ingested are lower than expected. We are identifying the root cause of the issue.

No data have been lost and we are already tracing a plan to recover it, identifying the impacted dates.

Jul 18, 2024 - 12:34 UTC
PostHog.com ? Operational
90 days ago
100.0 % uptime
Today
US Cloud 🇺🇸 Operational
90 days ago
99.96 % uptime
Today
App ? Operational
90 days ago
99.92 % uptime
Today
Event and Data Ingestion Operational
90 days ago
100.0 % uptime
Today
Feature Flags and Experiments ? Operational
90 days ago
99.97 % uptime
Today
EU Cloud 🇪🇺 Operational
90 days ago
99.98 % uptime
Today
App ? Operational
90 days ago
99.97 % uptime
Today
Event and Data Ingestion Operational
90 days ago
100.0 % uptime
Today
Feature Flags and Experiments ? Operational
90 days ago
99.99 % uptime
Today
Support APIs Operational
90 days ago
100.0 % uptime
Today
Update Service Operational
90 days ago
100.0 % uptime
Today
License Server Operational
90 days ago
100.0 % uptime
Today
AWS US 🇺🇸 Operational
AWS ec2-us-east-1 Operational
AWS elb-us-east-1 Operational
AWS rds-us-east-1 Operational
AWS elasticache-us-east-1 Operational
AWS kafka-us-east-1 Operational
AWS EU 🇪🇺 Operational
AWS elb-eu-central-1 Operational
AWS elasticache-eu-central-1 Operational
AWS rds-eu-central-1 Operational
AWS ec2-eu-central-1 Operational
AWS kafka-eu-central-1 Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
US Ingestion End to End Time ?
Fetching
US Decide Endpoint Response Time
Fetching
US App Response Time
Fetching
US Event/Data Ingestion Response Time
Fetching
EU Ingestion End to End Time ?
Fetching
EU App Response Time
Fetching
EU Decide Endpoint Response Time
Fetching
EU Event/Data Ingestion Endpoint Response Time
Fetching
Past Incidents
Jul 20, 2024

No incidents reported today.

Jul 19, 2024
Resolved - Ingestion LAG has been recovered.
Jul 19, 03:44 UTC
Monitoring - The issue has been fixed and the ingestion LAG is recovering.
Jul 19, 02:06 UTC
Update - We've identified the issue and we are currently fixing it enable the ingestion again.
Jul 18, 22:36 UTC
Update - We're monitoring recovery of postgres now. Some tables are very large and this might take several hours - we're investigating whether we can work some magic to speed this up.

Current impact is still that event ingestion is delayed.

Note that this means person updates aren't being processed so any experiments or flags that rely on changes to person profiles won't see those until the event lag is resolved.

Jul 18, 18:19 UTC
Identified - We've spotted an issue with our postgres infrastructure and we're working to resolve it right now.

You'll experience ingestion lag since we're delaying event ingestion to reduce load while we fix this. No events have been lost.

We'll update with an expected time to recovery as soon as we have one. Sorry for the interruption.

Jul 18, 17:25 UTC
Jul 18, 2024
Jul 17, 2024

No incidents reported.

Jul 16, 2024
Resolved - This incident has been resolved.
Jul 16, 14:07 UTC
Monitoring - Recovery is continuing well and we expect to be caught up within an hour.

Sorry for the interruption!

Jul 16, 13:02 UTC
Identified - We've restarted our recordings ingestion infrastructure and ingestion is recovering. Folk will be experiencing between 40 minutes and 90 minutes of delay but that's already recovering quickly.

We're still looking into the root cause

Jul 16, 12:38 UTC
Investigating - We've spotted that recordings ingestion is delayed. we're investigating to identify why
Jul 16, 12:21 UTC
Jul 15, 2024

No incidents reported.

Jul 14, 2024
Resolved - Workers are processing queries as they arrive. All systems nominal.
Jul 14, 19:53 UTC
Monitoring - We have restarted the failed workers - queries are back to normal now.
Jul 14, 17:28 UTC
Identified - Async queries are failing - we are restarting the workers now.
Jul 14, 17:15 UTC
Investigating - Queries are timing out on EU, we are taking a look into what’s going on.
Jul 14, 16:51 UTC
Jul 13, 2024

No incidents reported.

Jul 12, 2024

No incidents reported.

Jul 11, 2024
Resolved - This incident has been resolved.
Jul 11, 14:36 UTC
Update - We've downgraded this and marked ingestion as operational now that we have duplicate ingestion infarstructure

Replay is working normally and we are continuing to process the delayed recordings

Jul 10, 13:23 UTC
Update - We've duplicated our ingestion infrastructure so that we can protect current recordings from the delay.

you should no longer see delay on ingestion of current recordings

we'll continue to ingest the delayed recordings in the background

Jul 10, 11:19 UTC
Update - We're continuing to work to increase ingestion throughput
Sorry for the continued interruption

Jul 10, 09:25 UTC
Update - We're continuing to slowly catch up with ingestion. We're being a little cautious as we don't want to overwhelm kafka while we're making solid process.

Appreciate delays like this are super frustrating and we're really grateful for your patience 🙏

Jul 9, 14:00 UTC
Update - We've continued to monitor ingestion overnight. Some kafka partitions are completely caught up, so some people won't experience any delay.

Unfortunately others are still lagging and so you will still see delayed availability of recordings

really sorry for the continued interruption!

Jul 9, 05:56 UTC
Update - We're continuing to monitor recovery, apologies for the delay!
Jul 8, 18:13 UTC
Monitoring - We've confirmed that the config rollback has resolved the problem, but we've kept ingestion throttled to ensure systems can recover.

We're slowly increasing ingestion rate to allow recovery and will keep monitoring

Sorry for the interruption

Jul 8, 14:05 UTC
Identified - A recent config change has unexpectedly impacted processing speed during ingestion of recordings

The change has been rolled back and we're monitoring for recovery

Jul 8, 11:31 UTC
Jul 10, 2024
Jul 9, 2024
Resolved - Planned maintenance is complete.
Jul 9, 21:56 UTC
Identified - We are doing an upgrade of our data processing infrastructure in the EU region. There will be temporary processing delays. No data has been lost and the system should be caught up shortly.
Jul 9, 20:10 UTC
Jul 8, 2024
Jul 7, 2024

No incidents reported.

Jul 6, 2024

No incidents reported.