We identify the performance characteristics of today's high-speed enterprise-scale stub (i.e., non-transit) networks. POSTECH's campus network serves as a canonical example. Packet loss is one among several metrics. What is meant by "performance chara ...
We identify the performance characteristics of today's high-speed enterprise-scale stub (i.e., non-transit) networks. POSTECH's campus network serves as a canonical example. Packet loss is one among several metrics. What is meant by "performance characteristics" It is generally believed that due to bandwidth overprovisioning at enterprise networks there is no performance bottleneck. At large time scales, for example, measured by MRTG, this is true. Utilization is in the 20-30% range (or lower), with transient peaks at the minute time scale. But large ISPs also overprovision (50% utilization rule), with very little performance degradation attributed to their transit networks. All the available evidence points to their claim being reasonably accurate. our primary analysis tool is to use measured TCP connection behavior at key observation points to draw scientifically grounded inferences on the factors leading to degraded network performance. TCP, because it is a feedback controlled protocol, leaves a mark on what it thinks is happening over an end-to-end path. TCP header information carry sequence numbers, receiver advertised window size, etc. which, in conjunction with the time dynamics of the connection profile, allow us to infer whether a connection is doing slow start, congestion avoidance, etc. Like doctors using X-rays, MRIs, blood tests, etc. allows them to diagnose many (but not all) ills, the TCP connection dynamics, along with other basic traffic metrics, can allow us to diagnose the cause of most (if not all) problems that degrade network performance. We studeied the characteristics of microcongetions as following steps.(1) First, we need to learn a bit more about POSTECH's network. We don't need to become best friends, but we need to know it fairly well. This means capturing and representing the main connectivity which includes POSTECH backbone (very simple), Internet gateways, primary access switches connecting to the backbone switches, and secondary access switches connecting to the primary access switches. Second, and in some respects preceding (1), we need to check the soundness of the SNMP and DAG measurement systems. There cannot be any bugs or unknown problems. In the SNMP case, for the coarse measurement with a probing interval T (default is ~30min but the details are more involved since there are many devices and MIB variables to be probed), we must be assured that GET requests elicit timely responses. For the coarse time case, this should be the case but that has to be separately verified before any weather map data is collected in earnest. we're ready to the main thing: select target switches to measure and perform both DAG measurements on specific links and SNMP measurements on all ports. With (1) and (2) in place, (3) should be straightforward.