The best Hacker News stories from All from the past day

Go back

Latest posts:

Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?

A eulogy for Dark Sky, a data visualization masterpiece (2023)

Translating All C to Rust (TRACTOR)

LG and Samsung are making TV screens disappear

Our audit of Homebrew

Our audit of Homebrew

FastHTML – Modern web applications in pure Python

Children should be allowed to get bored (2013)

ps aux written in bash without forking

SVG Triangle of Compromise

tolower() with AVX-512

CrowdStrike's impact on aviation

CrowdStrike's impact on aviation

SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

One-man SaaS, 9 Years In

One-man SaaS, 9 Years In

How simultaneous multithreading works under the hood

How simultaneous multithreading works under the hood

Show HN: I built an open-source tool to make on-call suck less

Hey HN,<p>I am building an open source platform to make on-call better and less stressful for engineers. We are building a tool that can silence alerts and help with debugging and root cause analysis. We also want to automate tedious parts of being on-call (running runbooks manually, answering questions on Slack, dealing with Pagerduty). Here is a quick video of how it works: <a href="https://youtu.be/m_K9Dq1kZDw" rel="nofollow">https://youtu.be/m_K9Dq1kZDw</a><p>I hated being on-call for a couple of reasons:<p>* Alert volume: The number of alerts kept increasing over time. It was hard to maintain existing alerts. This would lead to a lot of noisy and unactionable alerts. I have lost count of the number of times I got woken up by alert that auto-resolved 5 minutes later.<p>* Debugging: Debugging an alert or a customer support ticket would need me to gain context on a service that I might not have worked on before. These companies used many observability tools that would make debugging challenging. There are always a time pressure to resolve issues quickly.<p>There were some more tangential issues that used to take up a lot of on-call time<p>* Support: Answering questions from other teams. A lot of times these questions were repetitive and have been answered before.<p>* Dealing with PagerDuty: These tools are hard to use. e.g. It was hard to schedule an override in PD or do holiday schedules.<p>I am building an on-call tool that is Slack-native since that has become the de-facto tool for on-call engineers.<p>We heard from a lot of engineers that maintaining good alert hygiene is a challenge.<p>To start off, Opslane integrates with Datadog and can classify alerts as actionable or noisy.<p>We analyze your alert history across various signals:<p>1. Alert frequency<p>2. How quickly the alerts have resolved in the past<p>3. Alert priority<p>4. Alert response history<p>Our classification is conservative and it can be tuned as teams get more confidence in the predictions. We want to make sure that you aren't accidentally missing a critical alert.<p>Additionally, we generate a weekly report based on all your alerts to give you a picture of your overall alert hygiene.<p>What’s next?<p>1. Building more integrations (Prometheus, Splunk, Sentry, PagerDuty) to continue making on-call quality of life better<p>2. Help make debugging and root cause analysis easier.<p>3. Runbook automation<p>We’re still pretty early in development and we want to make on-call quality of life better. Any feedback would be much appreciated!

< 1 2 3 ... 170 171 172 173 174 ... 826 827 828 >