Building a resilient IP network is not just about having
redundant links and turning on the SSO features in all routers. In fact, the
most difficult aspects of achieving IP resiliency is establishing an overall
resiliency strategy.
With an overall resiliency strategy, you can predict what needs
to be done to the network in the next few phases. You should be able to map out
the logical design, knowing how it will eventually grow. From the logical
design, the corresponding physical design is mapped out. Finally, both the
logical and physical design translate to what sort of hardware is required, and
which capacity features have to be placed in the hardware.
The strategy has to remain consistent. Many times, the
resiliency of a network is compromised because of inconsistency in strategy.
Shortcuts are made or different hardware is selected to do a certain task,
perhaps because of shortages of funds or maybe because of changes in
decision-making personnel. Problems such as these ultimately create outages
later on.
Redundancy Strategy
Part of the overall resiliency strategy is how to achieve
redundancy in both the logical and physical networks. Many network managers find
this task challenging. For example, you might achieve physical redundancy, but
because of a lack of logical redundancy, the network still experiences
failure.
Logical Resiliency
When we talk about logical redundancy, we are mainly
protecting important parts of the network, such as the following, from
failing:
-
Network paths
-
Functional entity
As mentioned previously, a network path is the route that
traffic traverses between a source and a destination. It is a logical entity
because network paths usually arise from some route calculations (for example, a
shortest-path algorithm). The determination of the path is always done by the
routing protocol and the results stored in the various routers within the
network. As events occur within a network (for example, when a physical link
fails), the network path for a source and destination pair may potentially
change. This change might result in an alternative path, or it might result in a
broken connection between the same source and destination. The task then is to
make sure there is always a redundant network path to an important resource
within the network (to a server, for example).
Functional entity refers to the
logical functions that are performed by the routers (for example, a default
gateway function or a multicast routing function). A host such as a personal
computer usually needs a default gateway to help it send traffic to the rest of
the world. If there is only one default gateway and it fails, the personal
computer will never be able to contact any other hosts except those on the same
subnet. Another example is the Area Border Router (ABR) function in the OSPF
network. The multiple ABRs prevent the OSPF subarea from being disconnected from
area 0.
Ensuring redundancy for these logical functions is critical,
because it ensures a backup in the event of a failure. Information on logical
resiliency in routing is usually found in the design guides for routing
protocols such as OSPF and BGP:
-
OSPF Network Design Solution,
2nd Edition, by Tom M. Thomas. (Cisco Press, 2003. ISBN: 1587050323)
-
BGP Design and Implementation,
by Randy Zhang and Micah Bartell. (Cisco Press, 2003. ISBN:
1587051095)
Physical Resiliency
You might find the task of ensuring physical redundancy easier
because it is a more visual exercise. You should look at several areas with
regard to physical redundancy:
-
Device For device-level
resiliency, look into areas such as power supply and route processors. For
example, if redundant power supplies are used, they should be connected to
different power sources. You also need to know how a particular device behaves
under certain physical conditions such as heat and humidity. This is when
certification such as Network Equipment Building System (NEBS) proves
helpful.
-
Link For link-level
redundancy, look into areas such as the number of links required and how they
map to the logical design. For example, you might choose to have multiple
Ethernet links between two routers. If you choose to implement EtherChannel
technology, these links appear as one logical interface in the logical network.
On the other hand, if these links are used individually, there will be multiple
logical links in the logical design. For link redundancy, having multiple
logical links might not be advantageous. For one, cost might be prohibitive, as
in WAN links, or some protocols might impose a limit on the number of links that
it can support.
-
Site With device-level and
link-level resiliency addressed, the next thing to look into is whether there is
a need for the entire site to be protected from disaster. This is usually
applicable to data centers and for disaster recovery purposes; a remote site may
be required.
Scaling Strategy
Some people might find it strange how a scaling strategy
affects resiliency of a network. For one, you might not be able to tear down
everything in the network just to do improvement work on a congested link. With
such high expectations on the uptime on a network (remember the five-nines challenge), it is almost impossible to do
maintenance work without affecting network services. Therefore, many things have
to be "preprovisioned" so as to avoid downtime.
As with a redundancy strategy, a scaling strategy also involves
logical and physical aspects. To scale a network logically, consider aspects
such as the IP addressing scheme, subnet size, and the number of subnets
available within a network. You also need to look at how the routing design
scales. For example, consider how many routers should be within an OSPF area,
how many subnets should belong to a specific area, how many areas the network
should be, and how many ABRs your network needs.
In the physical aspect, look into areas such as scaling a link
speed. For example, you must decide whether to scale a 1-Gb backbone link by
adding another 1-Gb link or by upgrading to an OC-48 link. The first option is
called scaling horizontally; the latter is called
scaling vertically. The correct choice depends on
resource availability.
You might also look at things such as interface capacity, or
so-called real estate, and router performance. In a chassis-based router, the
number of slots, and thus the number of ports that it can support, dictates how
large a network it can connect to in terms of number of links. In addition, the
performance of the same router, in both switching capacity and forwarding
capability, affects how much traffic it can carry at any one time.
By relying on features such as OIR, you can keep adding
interfaces to a router and grow the network without affecting the rest. However,
you can do so only if the router has enough slots in the first place. Therefore,
having a capacity-planning exercise is important, and right sizing has to be done on the
hardware. You do not want too large a router that costs a lot of money, nor do
you want to run out of slots on a router.
Failure in this area usually results in network congestion and
costly downtime.
You might also like these recent post -
Understanding Five Nines of Uptime - Read This
Distribured Virtual Datacenter for Enterprise cloud - Read This
Cisco ASR 9000 - Network Virtualization Technology - Read This
Cisco ISR G2 Licensing - Simpified - Read This
Found it useful, Consider sharing it with your friends -