PostHog Status

Monitoring - We've identified the root cause of the issue and have mitigated it. We have also kicked off a backfill that will run over the weekend. We are shooting to have all events back in order and up to date by Monday morning. Expect updates over the weekend on the progress of the backfill of missing events. Thank you all for your patience and we hope you enjoy the rest of your Friday and the weekend!
Jul 19, 2024 - 22:53 UTC

Update - We continue investigating. We are close to understand the reason behind the event ingestion problem. It seems the root cause is not in the Kafka table engine, but on our write path to the distributed tables.

Events ingestion has been resumed, but it's going slowly to avoid those events disappearing, so there will be lag in the ingestion for some hours. We are working on pushing another patch to fix the lag.

After that is solved, we'll start the event backfill for the missing dates.
Jul 19, 2024 - 11:36 UTC

Update - We are investigating an issue with our kafka table engines and have purposely induced lag on our pipeline. All events are safe and will show up after this investigation, but for the moment we will fall behind on processing events and you will notice the last few hours missing in your reporting.
Jul 18, 2024 - 16:51 UTC

Update - We have started event recovery.

Data may be missing since 2024-07-17 at 21:00 UTC. The missing events will eventually be available for querying.

We are now working on pushing a fix to avoid this happening again.
Jul 18, 2024 - 13:29 UTC

Investigating - We've spotted that the events ingested are lower than expected. We are identifying the root cause of the issue.

No data have been lost and we are already tracing a plan to recover it, identifying the impacted dates.
Jul 18, 2024 - 12:34 UTC

Uptime over the past 90 days. View historical uptime.

PostHog.com Operational

90 days ago

100.0 % uptime

Today

US Cloud 🇺🇸 Operational

90 days ago

99.96 % uptime

Today

App Operational

90 days ago

99.92 % uptime

Today

Event and Data Ingestion Operational

90 days ago

100.0 % uptime

Today

Feature Flags and Experiments Operational

90 days ago

99.97 % uptime

Today

EU Cloud 🇪🇺 Operational

90 days ago

99.98 % uptime

Today

App Operational

90 days ago

99.97 % uptime

Today

Event and Data Ingestion Operational

90 days ago

100.0 % uptime

Today

Feature Flags and Experiments Operational

90 days ago

99.99 % uptime

Today

Support APIs Operational

90 days ago

100.0 % uptime

Today

Update Service Operational

90 days ago

100.0 % uptime

Today

License Server Operational

90 days ago

100.0 % uptime

Today

AWS US 🇺🇸 Operational

AWS ec2-us-east-1 Operational

AWS elb-us-east-1 Operational

AWS rds-us-east-1 Operational

AWS elasticache-us-east-1 Operational

AWS kafka-us-east-1 Operational

AWS EU 🇪🇺 Operational

AWS elb-eu-central-1 Operational

AWS elasticache-eu-central-1 Operational

AWS rds-eu-central-1 Operational

AWS ec2-eu-central-1 Operational

AWS kafka-eu-central-1 Operational

Operational

Degraded Performance

Partial Outage

Major Outage

Maintenance

System Metrics Month Week Day

US Ingestion End to End Time

Fetching

US Decide Endpoint Response Time

Fetching

US App Response Time

Fetching

US Event/Data Ingestion Response Time

Fetching

EU Ingestion End to End Time

Fetching

EU App Response Time

Fetching

EU Decide Endpoint Response Time

Fetching

EU Event/Data Ingestion Endpoint Response Time

Fetching

Past Incidents

Jul 20, 2024

No incidents reported today.

Jul 19, 2024

EU ingestion lag

Resolved - Ingestion LAG has been recovered.
Jul 19, 03:44 UTC

Monitoring - The issue has been fixed and the ingestion LAG is recovering.
Jul 19, 02:06 UTC

Update - We've identified the issue and we are currently fixing it enable the ingestion again.
Jul 18, 22:36 UTC

Update - We're monitoring recovery of postgres now. Some tables are very large and this might take several hours - we're investigating whether we can work some magic to speed this up.

Current impact is still that event ingestion is delayed.

Note that this means person updates aren't being processed so any experiments or flags that rely on changes to person profiles won't see those until the event lag is resolved.
Jul 18, 18:19 UTC

Identified - We've spotted an issue with our postgres infrastructure and we're working to resolve it right now.

You'll experience ingestion lag since we're delaying event ingestion to reduce load while we fix this. No events have been lost.

We'll update with an expected time to recovery as soon as we have one. Sorry for the interruption.
Jul 18, 17:25 UTC

Jul 18, 2024

Jul 17, 2024

No incidents reported.

Jul 16, 2024

US recordings ingestion delayed

Resolved - This incident has been resolved.
Jul 16, 14:07 UTC

Monitoring - Recovery is continuing well and we expect to be caught up within an hour.

Sorry for the interruption!
Jul 16, 13:02 UTC

Identified - We've restarted our recordings ingestion infrastructure and ingestion is recovering. Folk will be experiencing between 40 minutes and 90 minutes of delay but that's already recovering quickly.

We're still looking into the root cause
Jul 16, 12:38 UTC

Investigating - We've spotted that recordings ingestion is delayed. we're investigating to identify why
Jul 16, 12:21 UTC

Jul 15, 2024

No incidents reported.

Jul 14, 2024

Query timeouts on EU

Resolved - Workers are processing queries as they arrive. All systems nominal.
Jul 14, 19:53 UTC

Monitoring - We have restarted the failed workers - queries are back to normal now.
Jul 14, 17:28 UTC

Identified - Async queries are failing - we are restarting the workers now.
Jul 14, 17:15 UTC

Investigating - Queries are timing out on EU, we are taking a look into what’s going on.
Jul 14, 16:51 UTC

Jul 13, 2024

No incidents reported.

Jul 12, 2024

No incidents reported.

Jul 11, 2024

replay capture delayed

Resolved - This incident has been resolved.
Jul 11, 14:36 UTC

Update - We've downgraded this and marked ingestion as operational now that we have duplicate ingestion infarstructure

Replay is working normally and we are continuing to process the delayed recordings
Jul 10, 13:23 UTC

Update - We've duplicated our ingestion infrastructure so that we can protect current recordings from the delay.

you should no longer see delay on ingestion of current recordings

we'll continue to ingest the delayed recordings in the background
Jul 10, 11:19 UTC

Update - We're continuing to work to increase ingestion throughput
Sorry for the continued interruption
Jul 10, 09:25 UTC

Update - We're continuing to slowly catch up with ingestion. We're being a little cautious as we don't want to overwhelm kafka while we're making solid process.

Appreciate delays like this are super frustrating and we're really grateful for your patience 🙏
Jul 9, 14:00 UTC

Update - We've continued to monitor ingestion overnight. Some kafka partitions are completely caught up, so some people won't experience any delay.

Unfortunately others are still lagging and so you will still see delayed availability of recordings

really sorry for the continued interruption!
Jul 9, 05:56 UTC

Update - We're continuing to monitor recovery, apologies for the delay!
Jul 8, 18:13 UTC

Monitoring - We've confirmed that the config rollback has resolved the problem, but we've kept ingestion throttled to ensure systems can recover.

We're slowly increasing ingestion rate to allow recovery and will keep monitoring

Sorry for the interruption
Jul 8, 14:05 UTC

Identified - A recent config change has unexpectedly impacted processing speed during ingestion of recordings

The change has been rolled back and we're monitoring for recovery
Jul 8, 11:31 UTC

Jul 10, 2024

Jul 9, 2024

Data Processing Delays in EU

Resolved - Planned maintenance is complete.
Jul 9, 21:56 UTC

Identified - We are doing an upgrade of our data processing infrastructure in the EU region. There will be temporary processing delays. No data has been lost and the system should be caught up shortly.
Jul 9, 20:10 UTC

Jul 8, 2024

Jul 7, 2024

No incidents reported.

Jul 6, 2024

No incidents reported.

Related