{TITLE}

Ráno přišel tiket od zákazníka ze Singapuru, že webová stránka nefunguje. Monitoring dashboard, který běžel z jednoho serveru ve Frankfurtu, ukazoval vše v pořádku. Všechny kontroly procházely. Doba odezvy normální. Stránka byla dostupná. Kromě toho, že nebyla dostupná, alespoň ne pro nikoho, kdo směroval provoz přes určité asijské síťové cesty. Problém se ukázal být regionální problémy směrování u upstream providera, který ovlivnil dopravu z jihovýchodní Asie, zatímco evropský a severoamerický přístup zůstal zcela nedotčen. Systém monitorování, věrně kontrolující ze své jediné pozice v Německu, neměl možnost detekovat problém, který z místa, kde stál, neviděl.

Tento incident a několik podobných incidentů, které následovaly během příštího roku, demonstrovaly fundamentální omezení monitorování z jednoho místa, což se zpětně zdá zřejmé, ale překvapivě se snadno přehlédne. Internet není jednotná síť, kde všechny cesty vedou do stejné destinace přes stejnou infrastrukturu. Je to síť propojených autonomních systémů, peer-to-peer dohod, CDN hraničních uzlů a DNS resolverů, které vytvářejí různé zážitky pro uživatele v různých geografických regionech. Webová stránka může být dokonale přístupná z Evropy, zatímco je současně nedostupná z částí Asie, plně funkční ze Severní Ameriky, zatímco zažívá ztrátu paketů z Jižní Ameriky, a rychlá z jednoho města, zatímco pomalá z jiného města ve stejné zemi.

Řešení, které uptime.yeb.to implementuje, je současné monitorování ze šesti geografických míst rozptýlených na více kontinentech. Každá kontrola běží ze všech šesti míst ve stejném časovém okně a výsledky se porovnávají, aby se určilo, zda je problém globální nebo regionální. Když všech šest míst hlásí selhání, je stránka skutečně vypnutá všude. Když jedno nebo dvě místa hlásí selhání, zatímco ostatní ukazují úspěch, je problém regionální a selhavající místa okamžitě zúží, kde problém leží. Tato geografická triangulace transformuje monitorování z binárního signálu "dostupné nebo nedostupné" na nuancovanou mapu dostupnosti, která odráží, jak internet skutečně funguje.

Why Single Location Monitoring Creates Dangerous Blind Spots

Most uptime monitoring services, including many well known ones, default to checking from a single location or allow users to select one primary monitoring region. This approach works perfectly for detecting complete outages where the origin server is down and no one anywhere can access the site. For these catastrophic failures, a single probe is sufficient because the problem is universal. But complete server failure is only one category of outage, and increasingly it is not even the most common one. Modern web infrastructure, with its layers of CDNs, load balancers, DNS failover, and edge caching, has made total outages rare while making partial, regional, and intermittent failures more frequent.

CDN related issues are the most common source of regional discrepancies. Content delivery networks operate by caching content at edge servers distributed around the world, and each edge server serves visitors who are geographically closest to it. When a CDN edge node in a specific region experiences problems, whether hardware failure, misconfiguration, or capacity overload, visitors routed to that edge node experience degraded performance or complete unavailability while visitors routed to healthy edge nodes see no issue. A single location monitor that happens to be routed to a healthy edge node will report everything as normal while an entire region's worth of visitors are affected.

DNS propagation issues create another class of regional failures. When DNS records are updated, the changes propagate through the global DNS infrastructure at different speeds depending on TTL values, resolver caching behavior, and the specific resolution path each region follows. During the propagation window, some regions may resolve the domain to the old IP address while others resolve to the new one. If the old IP is no longer serving traffic, the regions still pointing to it experience an outage that the regions already pointed to the new IP will never see. A multi region monitoring setup detects this immediately because some probes will fail while others succeed, creating a pattern that is characteristic of DNS propagation issues and distinct from server level problems.

Six Probes and What Each Failure Pattern Reveals

The power of six simultaneous probes lies not just in detecting failures but in diagnosing them. Different failure patterns correspond to different categories of problems, and an experienced operator can often identify the root cause from the monitoring pattern alone before even opening a terminal window. When all six probes fail simultaneously with connection timeout errors, the origin server or its network is likely unreachable, suggesting a server crash, hosting provider outage, or network level issue at the data center. When all six probes fail with HTTP error responses like 502 or 503, the server is reachable but the application is broken, suggesting a deployment error, database failure, or application level crash.

When one or two probes fail while the others succeed, the pattern tells a regional story. If the failing probes are both in Asia while the European and North American probes succeed, the issue is almost certainly in the network path between Asia and the origin server, whether at a CDN edge, a transit provider, or a regional DNS resolver. If the failing probe is in the same region as the origin server while distant probes succeed, the problem might be at the hosting provider's local network level, with distant probes being served from a CDN cache that is masking the origin failure. Each pattern narrows the diagnostic field and accelerates the time to resolution.

Response time variations across probes provide a subtler but equally valuable signal. If all six probes show successful responses but one region's response time has doubled compared to its historical baseline, that region is experiencing degradation that has not yet progressed to a full failure. Catching degradation before it becomes an outage is one of the most valuable capabilities of multi region monitoring, because it gives the operator a window of time to investigate and intervene before users in that region start submitting support tickets. The monitoring dashboard displays response times for all six locations on a single timeline, making regional degradation patterns visible at a glance.

Geographic Routing and the Problems It Hides

Modern internet infrastructure uses geographic routing extensively, directing users to the nearest available server or CDN edge based on their location. This routing is generally beneficial because it reduces latency and improves performance for the majority of users. But it also means that the path a request takes from point A to point B varies dramatically depending on where point A is. A monitoring probe in New York and a monitoring probe in Tokyo will take entirely different network paths to reach the same website, passing through different ISPs, different peering exchanges, and different CDN edges. An obstruction anywhere along one path can be invisible from the other.

Anycast routing, used by most major CDNs and DNS providers, adds another layer of complexity. With anycast, the same IP address is announced from multiple geographic locations, and the internet's routing infrastructure directs each request to the nearest announcing location. This means that a DNS resolution or CDN request from Europe reaches a European server while the same request from Asia reaches an Asian server, even though the IP address in both cases is identical. If the Asian anycast node has a problem, Asian probes detect it while European probes cannot, because their requests never even reach the same physical server.

BGP routing changes can cause temporary or prolonged reachability issues for specific regions. When a border gateway protocol route is withdrawn or altered, traffic that previously flowed through a direct path may be rerouted through longer, potentially congested paths, increasing latency and sometimes causing packet loss. These BGP events are common, happening thousands of times per day globally, and their impact is inherently regional. A multi region monitoring system experiences these events firsthand through its distributed probes, detecting the impact on each region independently rather than relying on a single vantage point that may or may not be affected.

From Detection to Action and Knowing What to Fix

Detection without actionable information is just an alarm that makes noise without pointing toward a solution. The value of multi region monitoring extends beyond telling you that something is wrong. It tells you where it is wrong and, through the failure pattern, suggests what kind of wrong it is. This diagnostic context transforms the incident response process from a frantic search through logs and dashboards to a targeted investigation that starts with a strong hypothesis about the root cause.

When the monitoring alerts show that a single region has failed while others remain healthy, the operator can immediately focus their investigation on that region's network path. Is the CDN edge in that region reporting issues? Is there an active BGP incident affecting transit providers in that area? Has the DNS resolver for that region cached a stale or incorrect record? Each of these questions can be answered quickly, and the answers lead to specific remediation actions: purge the CDN cache for that region, contact the transit provider, or force a DNS refresh. Without the geographic context provided by multi region monitoring, the operator would be investigating blindly, checking every possible failure point rather than the ones most likely to be responsible.

The uptime monitoring platform pairs the multi region check results with historical data that adds temporal context to spatial context. If the same region has experienced failures at the same time of day on previous occasions, that suggests a recurring issue like a scheduled maintenance window at a transit provider or a predictable traffic pattern that causes capacity problems during peak hours. If the failure is a first occurrence with no historical precedent, it is more likely an acute incident that requires immediate attention. The combination of geographic and temporal context gives operators the fullest possible picture of what is happening, where it is happening, and whether it has happened before.

Frequently Asked Questions

Which six locations are used for monitoring

The monitoring platform uses probe locations distributed across North America, Europe, and Asia to provide global coverage. The specific locations are chosen to represent the major internet routing hubs where the majority of global web traffic flows.

What happens when only one location detects a failure

A single location failure triggers an alert indicating a regional issue rather than a global outage. The alert includes the specific location that failed and the response details, helping the operator determine whether the issue is at a CDN edge, a transit provider, or a DNS resolver serving that region.

Can multi region monitoring detect slow performance before a full outage

Yes. Response time monitoring across all six locations reveals degradation in specific regions even when the site remains technically accessible. A response time that has doubled from its baseline in one region while remaining stable in others is an early warning signal that allows the operator to investigate before users experience a complete failure.

How often do the checks run from each location

Check frequency is configurable depending on the monitoring plan. Each check interval triggers simultaneous probes from all six locations, ensuring that every check provides a complete geographic snapshot rather than a single point observation.

Does multi region monitoring work with sites behind Cloudflare or other CDNs

Yes, and CDN fronted sites are actually where multi region monitoring provides the most value. CDN edge issues are inherently regional, and only multi region monitoring can detect when a specific CDN edge is degraded while others remain healthy.

Is this useful for sites with traffic from only one country

Even sites with geographically concentrated traffic benefit from multi region monitoring because network path issues can affect any route. Additionally, search engine crawlers access sites from multiple regions, so a regional outage that blocks Googlebot from crawling affects SEO even if human visitors in the primary market are unaffected.