Planning and Design Service Provider Network
  • "There are lots of peer technical CCIE tracks, but CCDE was the only one that matched my strategic design role. It is unique, vendor-neutral course, based on universal design principles and technologies that can be applied to solving comprehensive business needs."
    Tony Brown, Enterprise Systems Architect - Verizon

    • VRRP:
      Transparent default gateway redundancy
      Virtual IP address can also be a real address
      IETF standard, so use VRRP if you need multivendor or interoperability
      Preempt is enabled by default
      Default Hello timer 1 seconds

    • VRRP use 1 Virtual IP and 1 Virtual MAC address for gateway functionality.

    Business Drives Technology:
    • Cost
      • Capital (CAPEX)
      • Operational (OPEX)
    • Flexibility/Agility
      • Changes in Business
      • Manageable
    • Security

    • Modularity
      • The first key concept in network design
      • Can be horizontal or vertical
    • Resilience
      • Decouples devices in different modules
      • Decreases MTTR
    • Manageability
      • Repeatability

    Security Observation Points:
    • Data Exfiltration
    • Unusual Traffic Patterns
    • Failed Sign-ins, Other Signs...

    • Observe
      • Observe what's going on
      • Know what's normal
    • Orient
      • Understand the context
      • Understand the intent
    • Decide
      • Evaluate possible courses of action
      • Decide what action to take
    • ACT > Execute

    Enabling OODA with Design:

    • "Chokepoints" between modules...
    • Make great observation points
      • To understand "normal"
      • To orient to real targets/goals
    • Make great decision points
      • To limit which services will be impacted by actions
      • To determine where to act and how
    • Make great action points
      • To pre-stage policy
      • To focus policy at specific, identifiable points

    Applications and Availability:
    • Traffic Engineering
      • Ample Bandwidth
      • Minimal Delay
    • Low Jitter
      • Consistent Path
      • Fast Convergence

    Key Resilience Terms:
    • MTBF: Mean Time Between Failures
    • MTTR: Mean Time to Repair
      • Time until traffic is flowing
      • Time until network is "as designed"
    • Reliability
      • "9's of availability"

    • a = uptime / (uptime + downtime (as measured))

      • 525,600 minutes in a year
        4 minutes downtime
        525,596 minutes uptime
        a = 525,596 / (525,596 + 4)
        a = .99999239...
        a = 99.999%

      • 99.999% ("five nines"):
        Downtime/year: 5.26 minutes
        Downtime/month: 25.9 seconds
        Downtime/week: 6.05 seconds

      • 99.9999% ("six nines"):
        Downtime/year: 31.5 seconds
        Downtime/month: 2.59 seconds
        Downtime/week: 604.8 milliseconds

      • 99.99999 ("seven nines"):
        Downtime/year: 3.15 seconds
        Downtime/month: 262.97 milliseconds
        Downtime/week: 60.48 milliseconds

    • a(proj) = time period / (time period + downtime (proj))
    • downtime(proj) = (time period / MTBF) * MTTR
    • MTTR = downtime (as measured) / number of failures

    • Discover:
      • How long does it take to discover the failure?
      • Protocol neighbor liveness
    • Report:
      • How long does it take to spread the news?
    • Calculate:
      • How long does it take to find a new path?
    • Install:
      • How long does it take to change to the new path?
      • Protocol interaction with the RIB/FIB

    • Protocol Hellos => Protocol Process => Fast
    • BFD Hellos => BFD Process => Protocol Process => Faster
    • ? => Interface Phy/Processor => Forwarding Plane/RIB => Protocol Process => Fastest

    Failure Domains:
    • Modularity
      • The first key concept in network design
      • Can be horizontal or vertical
    • Resilience
      • Failure Domains = Broadcast Segment
      • Decouples devices in different modules
      • Decreases MTTR

    Hiding Information:
    Five ways to hide information in the control plane
    • Aggregation
    • Summarization
    • Filtering
    • Virtualization
    • Caching
    • There are others, but not widely deployed

    Leaky Abstraction:
    • IP Address carried in the packet payload
    • IP address as a host identifier
    • Dropped IP packets in a TCP stream
    • Tunnel failure on underlay failure
    • Jitter on control plane path change

    Design Patterns:

    Put each in a single layer:
    • Forwarding:
      Carry traffic between modules, topological areas, geographical regions, etc.
    • Aggregation:
      Combine lots of smaller links into a smaller number of larger links; provide paths for engineering and virtualization
    • Policy:
      Engineer traffic and control access
      • Aggregation
      • Traffic Engineering
      • Source-based Routing
      • Service Chaining
      • Load Balancing
      • What do all of these have in common?
      • They each have the potential to increase stretch

      Control plane policy is anything that modifies the forwarding path (potentially of the shortest path) to achieve a specific goal

    • Admittance:
      Attach users, control access, classify traffic, terminate virtual overlays
      • Attach users
      • Control access
      • Classify traffic
      • Terminate virtual overlays
      • Includes any edge security policies
      • Includes any edge QoS policies
      • Towards a host, for instance...
        • Unicast RPF filtering
        • MAC Address filtering
        • AAA controls
        • Quality of service marking
      • Towards an external network
        • Bogon filtering
        • Unicast RPF (if possible)

    • Core => Forwarding
    • Distribution => Aggregation & Policy
    • Access => Admittance

    • Core => Forwarding & Policy
    • Aggregation => Aggregation & Admittance

    • Core => Forwarding
    • Aggregation => Policy & Aggregation & Admittance
      • Core => Policy & Aggregation
      • Aggregation => Admittance

    Common Topologies:
    • Degree of Connection
    • Regularity
    • Path Characteristics
      • Longest Path
      • Shortest Path
    • Convergence Characteristics
    • Troubleshooting Characteristics
    • Flexibility

    Hybrid Device Model:

    • A hybrid of protocol operation and network device operation models:
      • DARPA four-layer model with the network layer broken apart
      • Adds in network device (router) pieces
    • Includes the control plane:
      • Control plane protocols normally fall "outside" layered models

    • Application:
      • Uses information transported across the network
      • Presents data through formatting and marshalling
      • Can provide all four services "over the top"

      • Primary consumer and producer of data

    • Transport: TCP/UDP, etc.
      • Flow control
      • Error correction
      • Application multiplexing

      • Quality of service
      • Helping fairness in the transport (WRED)
  • 1 Comment sorted by
  • Hybrid Device Model:

    • Network: IP
      • End-to-end transport (including addressing)
      • Transport multiplexing

    • Link: FIB
      • Single hop transport (including addressing)
      • Media access flow control
      • Media access framing and marshalling
      • Per-hop (media specific) error correction

      • Fast forwarding in soft real-time
      • Consuming and producing packets

    • Control Plane: RIB
      • Discovering end-to-end reachability
      • Providing forwarding information to data plane (link level) and virtualization
      • Consuming reachability information provided by the data plane (virtualization and link)
      • Consuming addressing and other information provided by the network
      • Providing mapping between locations and identities to the transport

      • Routing protocols
      • Intersection of naming and addressing (DNS)

    • SDN adds a relationship between the application and the control plane

    • Virtualization:
      • Providing tunnel headers to the data plane (link)
      • Providing virtual topology state to the control plane
      • Consuming link-state information from the data plane (link)
      • Consuming reachability information from the control plane

      • Tunnel/overlay headends and tailends

    • ...complexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts

      In this model, the thin waist of the hourglass is envisioned as the (minimalist) IP layer, and any additional complexity is added above the IP layer. In short, the complexity of the Internet belongs at the edges, and the IP layer of the Internet should remain as simple as possible.

      It is always possible to add another level of indirection.

    Understanding Complexity Tradeoffs:
    • State:
      • How often state changes?
      • How rapidly state changes?
      • How much state there is?
      • How often the set of reachable destinations changes?
      • How often the state of a link changes?
      • How quickly is a link state change flooded?
      • How quickly is a failed link detected?
      • How many routes are in the routing table?
      • How many policies are configured on a device?
    • Optimization:
      • Network utilization
      • Optimal path
      • Application support
      • What is the excess capacity across the network?
      • What is the utilization of this link?
      • What is the stretch?
      • What is the jitter?
      • What is the delay?
      • How long does it take to converge?
    • Surface:
      • How deep?
      • How broad?
      • How many redistribution points?
      • How many places is the same policy configured?
      • One link failure impacts many virtual topologies (fate sharing)

    • No Aggregation - Optimal routing - Optimization:
      • More control plane state - State
      • Changes often - State
      • No configuration - Surface
    • Aggregation - Suboptimal routing - Optimization:
      • Less control plane state - State
      • Changes less often - State
      • Configured aggregation - Surface

    If you haven't found the tradeoffs, you haven't looked hard enough.

    Link State Review:

    Dijkstra's Algorithm example

    Splitting Flooding Domains:

    • What are technical reasons there for splitting a flooding domain?
    • Essentially... None
    • This doesn't mean shouldn't split flooding domains...
      • But technical reasons are generally a factor
    • Still - what technical reasons are there?

    • SPF runtime
      • Solid workload indicator
      • 100ms - rule of thumb
    • Incremental/Partial SPF
    • Exponential Backoff
    • Reduce the size of the link state database
    • If all else fails - split the flooding domain

    • Non-technical Reasons:
      • Providing good measurement points
        • Troubleshooting, security, etc.
      • Break up failure domains
        • Prevent massive failures in one part of the network from impacting the rest of the network
      • Provide policy choke points
      • Policy and management should drive splitting flooding domains
        • Not "because it's too big"
        • Be intentional and thoughtful here

    Optimizing Link State Convergence:
    • Why?:
      • Reduce convergence time
      • Allow for larger flooding domains
    • Optimization:
      • Optimize timers
      • Reduce scale of SPF runs

    Evolving Technologies Study Guide: B-)