When a website goes down, the failure often feels like a black box. Visitors see a spinning wheel, a cryptic error code, or a blank page. For the people responsible for keeping that site online, the first question is always the same: what broke?
The truth is that there is no single way a website “goes down.” Instead, a request from a browser passes through multiple steps—DNS resolution, TCP connection, TLS negotiation, and HTTP response. Each step depends on the ones before it. And at each step, different things can fail.
That’s why smart uptime monitoring doesn’t just tell you that the site is “down.” It tells you where in the chain the failure occurred. DNS errors point one way. TCP errors another. TLS/SSL errors indicate a different root cause than HTTP 5xxs. If you know which layer failed, you know which team or provider to contact, and you can shorten resolution time dramatically.
This article walks through each error type in the order a browser actually loads a site: DNS, TCP, TLS, and HTTP. For each, we’ll explain what the step does, what can go wrong, and how monitoring can catch issues before your customers do.
DNS Errors
DNS is where every web request begins. When a user types your domain into a browser, the first thing that happens is a lookup to resolve that domain into an IP address. If that step fails, nothing else matters—no connection can be made, no certificate can be checked, and no HTTP response will ever arrive. That’s why DNS errors are often the earliest and most critical signals of an outage.
Common DNS Errors
Below are some common DNS failures:
- NXDOMAIN — This means the domain name simply doesn’t exist. In practice, it usually comes from expired registrations, misconfigured zones, or typos in record entries. An expired domain can take your entire site offline instantly, while a fat-fingered record might only impact a single subdomain or service.
- SERVFAIL — A server error that indicates the authoritative DNS server could not process the request. This often points to broken zone files, missing glue records, or DNSSEC validation problems. SERVFAILs tend to appear suddenly after configuration changes, making them a useful early-warning sign of bad deployments.
- Timeouts — When no response comes back within expected limits, the client eventually gives up. Timeouts are often caused by overloaded nameservers, network outages, or DDoS attacks saturating the resolver. Because DNS lookups happen before caching kicks in, even small latency spikes here can ripple into slower page loads across your user base.
How to Monitor DNS
Monitoring DNS health goes beyond checking if your domain resolves once. It requires testing resolution paths the way real users experience them:
- Global checks: Synthetic monitoring agents should run DNS queries from multiple geographies and networks. A record might resolve cleanly from your office but fail in Asia or South America because of anycast routing issues or regional outages at your provider.
- TTL awareness: Every record carries a time-to-live (TTL) value that controls caching. Long TTLs make normal browsing faster but can delay propagation after changes. Monitoring should validate that new values are actually reflected in live queries and that stale cache isn’t lingering.
- Alerting on anomalies: The most actionable signals come from trends. A sudden surge in NXDOMAIN or SERVFAIL responses, or a spike in resolution latency, is often the first clue that something is wrong—even before customers begin complaining.
When DNS monitoring fails, it also gives you confidence in what isn’t the problem. If lookups don’t resolve, then TCP, TLS, and HTTP checks were never attempted. That narrows triage quickly. In most cases, fixes involve your DNS hosting provider, registrar, or whoever manages the zone file. Mature teams build relationships and escalation paths with those vendors so issues can be raised and resolved quickly.
TCP Connection Failures
Once DNS has resolved an IP address, the next step is the TCP handshake. This is the digital equivalent of shaking hands: the client sends a SYN, the server replies with SYN-ACK, and the client acknowledges back with ACK. Only after this exchange is a communication channel established.
If TCP fails, the browser knows where the server should be but can’t actually talk to it. The result feels like a black hole—pages hang, sockets never open, and users see endless spinning wheels. Unlike DNS errors, which are usually quick and obvious, TCP failures often create confusing partial outages where the site is up for some people but not others.
Common TCP Errors
- Connection refused — The client reached the host, but nothing was listening on the expected port. This often happens when services crash, containers die, or load balancers are misconfigured. A webserver that forgot to bind to port 443 is invisible even if the machine itself is fine.
- Connection timed out — Packets are being dropped somewhere along the path. This could be a firewall silently blocking traffic, a routing misconfiguration, or upstream congestion. Timeouts are especially frustrating because they provide no feedback—just silence until the client gives up.
- Connection reset — Here the handshake completes but is torn down almost immediately. Resets usually point to overloaded proxies, aggressive idle timeouts, or middleboxes (like WAFs) terminating what they see as suspicious sessions.
How to Monitor TCP
Basic uptime checks aren’t enough here. ICMP pings can succeed while TCP handshakes fail, giving a false sense of health. Proper TCP monitoring focuses on connection behavior:
- Handshake validation: Tools should explicitly attempt a SYN/SYN-ACK/ACK exchange on the actual service port. This ensures the listener is both reachable and responding.
- Path analysis: Traceroutes or MTRs from different regions can reveal where connections are stalling—whether inside your data center, at a CDN edge, or in an upstream ISP.
- Protocol parity: If you support both IPv4 and IPv6, monitor both. Many real-world incidents affect only one, creating customer-visible problems that slip through if you test only the other.
TCP monitoring provides confidence that servers are not just alive but ready to accept traffic. And it narrows triage: if TCP fails, DNS resolution already worked, so the problem lies with the host or network path. That clarity keeps teams from chasing red herrings at the application layer when the real issue is a firewall rule or a load balancer pool that silently dropped its last healthy node.
TLS/SSL Errors
Today, nearly every site runs on HTTPS (compared to previous years where a decade or two ago where SSL secured websites weren’t as common). That means after the TCP handshake, the browser and server need to negotiate a TLS (Transport Layer Security) session. TLS does two jobs at once: it encrypts the data in transit, and it proves the server is who it claims to be via digital certificates.
That trust comes with complexity. If certificates expire, don’t match the hostname, or can’t be validated, users will see browser warnings—or the page will refuse to load entirely. In practice, TLS errors are some of the most visible and embarrassing incidents a site can have, because they stop users at the front door with an alert they cannot bypass safely.
Common TLS/SSL Errors:
- Expired certificate — The certificate’s validity window has lapsed. This is one of the most common outages because automation isn’t in place or renewal didn’t propagate everywhere.
- Hostname mismatch — The cert was issued for www.example.com, but the user visited api.example.com. This often happens after adding new subdomains or moving services behind a CDN.
- Untrusted certificate authority (CA) — The browser doesn’t recognize the issuing CA, usually because the cert was self-signed or chained to a private root not installed on client devices.
- Handshake failure — The cryptographic negotiation itself fails. Causes range from unsupported cipher suites, deprecated protocol versions, or a corrupted certificate chain.
How to Monitor TLS:
TLS monitoring needs to be proactive and continuous. Certificates don’t fail gracefully—they work one day and block access the next. Good monitoring should:
- Track certificate validity and raise alarms well before expiry—ideally with multiple thresholds (30 days, 7 days, 1 day).
- Validate the full certificate chain from multiple regions, since missing intermediates or regional CA issues can break trust differently around the world.
- Check protocol and cipher support, ensuring the site remains compatible as browsers steadily deprecate older versions like TLS 1.0 and 1.1.
- Watch for handshake error spikes, which often coincide with load balancer misconfigurations or CDN rollouts.
When TLS failures show up in monitoring, they also provide context: DNS resolution succeeded, TCP connectivity was fine, but the secure channel couldn’t be established. That narrows troubleshooting immediately. The fix is usually in the realm of certificate renewal, load balancer configuration, or edge termination, not in the application code.
For many teams, the operational lesson is simple: treat certificates like code. Automate issuance and renewal, monitor expiration as aggressively as you monitor disk space, and rehearse rotations so that expiring certs never turn into serious, public outages.
HTTP Errors
Finally, after DNS, TCP, and TLS succeed, the browser sends an HTTP request. The server responds with an HTTP status code—200 if all is well, or an error code if not.
Monitoring HTTP is what most people think of when they think of “uptime monitoring.” But without context from the earlier steps, HTTP errors only tell part of the story.
Common HTTP Errors:
- 404 Not Found – The resource doesn’t exist. This can be a broken link, deleted page, or misrouted request.
- 500 Internal Server Error – The server encountered an unexpected condition. Usually code or configuration bugs.
- 502 Bad Gateway – A proxy or load balancer couldn’t get a valid response from an upstream server.
- 503 Service Unavailable – The server is overloaded or down for maintenance.
- 504 Gateway Timeout – An upstream service took too long to provide a response.
How to Monitor HTTP:
- Run synthetic GET requests from global agents to verify responses.
- Capture response codes and alert on anything outside the 200–299 range.
- Monitor transaction workflows, not just single pages (login, then add to cart, then checkout).
- Set thresholds for response time, not just availability.
HTTP monitoring tells you the application layer is broken. Unlike DNS/TCP/TLS issues, HTTP errors are often in the hands of developers or operations teams, not external providers.
Putting It Together: A Layered Error Monitoring Strategy
The value of breaking errors into types is clarity. Every failure happens in sequence. If DNS fails, nothing else happens. If TCP fails, DNS was fine. If TLS fails, DNS and TCP worked. If HTTP fails, everything up to that point worked.
A layered monitoring approach mirrors this sequence:
- Start with DNS checks.
- Add TCP connection monitoring.
- Layer TLS certificate monitoring.
- Finish with HTTP response monitoring.
This layered model allows you to pinpoint root causes quickly:
- DNS error? Call your DNS provider.
- TCP error? Engage your hosting or ISP.
- TLS error? Fix your certificate or edge config.
- HTTP error? Talk to your web team.
Instead of a vague “site is down” alert, you get a precise map of what’s broken and who should fix it. That reduces mean time to resolution (MTTR) and avoids finger-pointing between teams.
Conclusion
Websites don’t fail in a single way—they fail at layers. DNS, TCP, TLS, and HTTP each introduce their own risks and their own error signatures. Monitoring by error type turns that complexity into clarity.
With the right monitoring strategy (and a tool like Dotcom-Monitor), you don’t just know the site is down—you know why it’s down. You know whether to escalate to your DNS host, network provider, security team, or developers. And you get that insight fast, without waiting for a support ticket or a customer complaint.
In the end, error-type monitoring is not just about uptime. It’s about accountability and speed. The next time your site fails, don’t settle for “something broke.” Know exactly what layer failed, what it means, and how to fix it.
