squeak!
Syllabus Homepage
Course Overview
Course resources
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Day 8
Common errors
Internet Glossary
About Your Instructor
Credits: This site powered by the vi text editor, apache webserver, perl scripting, and Debian linux.
squeak!

solaris8 - Day 7

Goals

In this session you will:
  • tune your network

  1. ensure that your box is exporting (sharing) at least one filesystem
  2. your automounts for this POST will use a 120-second timeout so that you can see them falling off after a short period of non-use.
  3. use a direct map to automount the instructor's day8post export onto an arbitrary mountpoint on your box.
  4. use an indirect map to automount the exports of your peers into a single, organized directory
  5. after you check out the filesystems you just mounted, see if they show up in mount
  6. retreat from the filesystems to neutral ground (ie, your homedir or root dir, anywhere but the mounted FSes) and watch them fall off mount.

  7. what kind of delay did you experience on automounting? Was it long enough that users might complain?

  8. what is the default time after which an automounted fs will unmount?
  9. what kind of factors will influence the settings you use for that time interval setting?
  10. can you use a different time setting for each automounted FS, or is it global?

network tuning

  • automount
    • use failover (two -line \ syntax)
    • increase the automagic unmount time if users keep files open longer than the timeout.
    • recurring jobs occur (cron?) that have a period longer than the timeout
    • note that longer connection times are more efficient but increase the risk of a hang.
    • length of task execution will affect user perceptions
    • be aware a collection of direct-mapped FSs in a directory; ls can cause a storm
    • subdirectory:fields help with overmapped indirects (warning, can cause strange-looking paths)
  • tcp
    • ensure a consistent and vendor approved broadcast address syntax (ie, 1s or 0s): ifconfig
    • arp -a to see the IP-MAC mapping on the network
    • ping -s (solaris) to see if a host is accessible, to learn about host/IP inconsistencies. Various NFS services ping the various hosts. Random timeouts or peaks can be a termination or cable problem.
    • spray to test the network, NICs, etc. use various -l lengths. Writes tend to be large. Reads and attributes are tiny. Try passing a -d delay of 1 microsecond to see if it can keep up then. Note the bandwidth usage.
    • traceroute to see the route to a host
    • netstat -i will help ID NIC problems. Input errors caused by dying NIC on the network, malformed packets due to electrical interference.
    • collisions: two-nic networks can soak 90%; more hosts lessen bandwidth.
    • carrier-sense and the collision, increasing/random delay, retry
    • collision rate = Collis / output. 5-10% or higher may be reason to partition.
    • 35%-45% utilization will usually cause a LAN bottleneck. At 50%, the collision artifacts consume the entire bandwidth.
    • if only some clients report collision rates, check for specific problems. If all clients report high collisions, partition the network with a switch/bridge (partitioning) or router (inter-netting)
    • protocol filtering will reduce traffic across partitions.
    • remember bridge effects on NIS!
  • NFS
    • bursty, random type traffic: comes in bursts of unrelated traffic
    • "NFS rpc mixture" and overall traffic load will affect perceived speed
    • flat increase of response time until you hit the wall - timeouts/retrans
    • performance tuning involves moving the "knee" out
    • cachefs may help move the knee out by unloading the server; the object is not to directly speed up the client.
    • reads,writes, and symlink resolutions may require actual disc use. Other uses may be satisfied with no access (attributes, directory caches)
    • "threshold of pain" = 50-70ms, or twice the baseline

    • df and friends can be inaccurate in heterogenous environments.
  • stress one component at a time : read, write, etc
  • nfsstone
  • common bottlenecks
    1. client NIC: check that client's stats v. the others on the network
    2. network bandwidth: similar results between peers. Bridges can slow the RPC time.
    3. server NIC
    4. server cpu: check with w, top, vmstat, etc.
    5. server memory: NFS servers are helped greatly by more mem for caching (do not run other apps on it)
    6. server disk I/O: writes are particularly rough, as they bypass the normal write caching.
    7. sloppy configs (mount bursts, etc)

  • metrics:
  • nfsstat -s/-c
    • calls total NFS calls
    • badcalls rejected RPC becore NFS handoff
    • nullrecv nfsd with nothing to do... decrease them.
    • badcalls damaged or poorly-formed packet
    • retrans no response; asked again
    • timeout hang or fail (hard/soft)
    • badxids mean the server is receiving retrans but is not answering...
    • wait waiting on a file id handle
  • server problems:
    1. badcalls > 0 rejected RPC requests: authentication (too many groups, root, secure rpc)
    2. nullrecv > 0 too many nfsd
    3. symlink > 10 too many symlink lookups; reorganize on the server
    4. getattr > 60 funky noac option on the mount?
    5. null > 1 too short an automount interval (autofs looking for failover server)
  • client problems:
    1. timeout > 5% not getting through; the badxid will say whether it's the server or the network (see below).
    2. badxid ~ timeout server is slow; increase timeo option to allow for more time.
    3. badxid ~ 0 network dropping packets
    4. badcalls > 0 crashed server?
  • some math
    1. timeout % calls = retrans rate
    2. timeout == badxid network is ok, the server can't keep up and is working from a backlog
    3. timeout > badxid (badxid near zero) network (or NICs!) is hosed, the server isn't receiving the RPC

    4. number of nfsd running on the box: nfsd int too few: netstat -s socket overflows. Too many: nullrecv reads.
  • tuning Cachefs
    • turn on logging

    time service

    Time can affect NIS polls, makefiles, secure RPC etc. May want to define a timehost
    • rdate host
    • accurate to within a second or so
    • setting a hostname for timehost
    • public timeservers
    • scheduling the time service: regularity and load-balancing
    • ntp for finer granularity


    http://www.mousetrap.net/syllabus/solaris8/day8.html
    $Id: day8.orb,v 1.4 2002/11/22 17:19:47 mouse Exp $

  • © 1994-2002 jason carr.
    distributed under the terms of the GNU Free Documentation License.

    jason carr

    Reminders

    • Classroom temperature can be wildly variable. Dress lightly and bring layers.
    • your username is based on the class title and the last two digits of your workstation's hostname.
    • remember to take your work with you.