[ A copy of a column I wrote for SAGE-AU newsletter ]
This month, I’m going to talk about getting the most out of the hardware and software that your organisation has paid for. For those of you who don’t use NT, you may find the first bit of my column useful. It contains a few suggestions for performance tweaking networks, and this stuff works regardless of platform.
Many of you will have noticed that Microsoft funded a Mindcraft white paper purporting that NT is considerably faster than Linux on the same hardware for file (2.5x) and web (3.7x) serving purposes. I’m not going to defend that white paper, as I feel that it is massively flawed.
However, I am going to point out that Mindcraft have not only done a favour to the Linux, Apache and Samba developer communities (by pointing out easily rectified flaws), they’ve done a massive favour to NT administrators as well. How? By documenting in the one place exactly what settings you need to tune NT for the fastest possible speed. This information is spread over TechNet and the resource kits, and to a certain extent third party publications like Windows NT Magazine.
Infrastructure First
The tweaks contained in the Mindcraft document will not help at all if you don’t have the appropriate infrastructure to support your network. There’s no point in getting an extra 5% from an individual server/client combo if your network or network services sucks. In many cases, paying attention to your network will get you more of a boost than any amount of performance tweaking on your servers.
First off, try to determine overall network utilisation, particularly in server segments. 0-10% is good, 10-30% is average, 30-72% requires some thought about partitioning traffic or switching, and over 72% requires help from a professional.
Fix the low-level problems first. Use a network sniffer to see if any of your subnets are suffering from excessive jabber and other technical faults. Don’t daisy chain hubs. Don’t exceed Ethernet cable distances (approximately 185 m for cat 5 10/100 cable). Try to avoid putting more than 24 nodes into the same collision domain if you are still using dumb hubs. Use good punchdown patch panels (like Krone) and shortish good quality patch leads.
Subnetting is the forgotten friend of network administrators. Don’t be profligate with subnetting. Just because your switch vendor says that you can have a 65,000 node subnet by using switching doesn’t mean you should. A single subnet for this many nodes would have massive broadcast and multicast packet storms, regardless of whether a switch was used or not. At a site I have worked at, they went through the entire 10.0.0.0/8 network just because they had a System™. When they wanted to connect to the outside world, they suddenly found that Telstra had already used part of this address space, and that NAT would be necessary to connection to the Internet via the Telstra managed firewall, limiting their options. Be realistic about expectations for growth. If you’re setting up an outlying office and it has 3 workstations, you don’t need to give the outlying office a 65,000 node network (or three, as this site did; one network for the router, one network for the three workstations, and one network for the printer. They wasted approximately 196,600 IP addresses at each of their 54 regional offices). By being parsimonious with subnetting, you can really reduce traffic and make your network management that much easier. You might need a few more router interfaces, but routers are getting cheaper, particularly routers with multiple 10/100 Mb/s interfaces.
Check that you don’t have excessive broadcasts. Configured properly, using NetBIOS over TCP/IP (which can be a little bit noisy when badly configured) no more than 5% of all packets will be broadcasts. More than 5%, you need to look at your WINS configuration. Don’t have WINS and have more than one subnet? Shame on you! Make sure that you set the DHCP global scope to specify WINS node type 0x8, which is Hybrid. Hybrid almost completely avoids the broadcast overhead and is far preferable to M and P types. As a bonus, hybrid is almost always faster than no WINS at all.
If you are managing more than 3 nodes, you would be crazy not to use DHCP. It’s easy to configure and with the MS DNS server, you have zero maintenance DNS reverse lookups. To make DHCP work on internetworks, you need RFC1542 compliant routers. For those of you with really big networks, DHCP prior to service pack 4 scaled to about 1200 DHCP scopes per DHCP server. SP4 fixed that. I would suggest that each physical site have two DHCP servers. Configure DHCP server A to look after 50% of the subnets, and reserve 75% of each subnet. Configure DHCP server B to look after the other 50% of the subnets, similarly reserving 75% of each subnet. Then configure DHCP for fault tolerance: on server A, configure 25% of the space from server B’s subnets, and vice versa. This way, if one of the servers goes down (either for maintenance or for more sinister reasons), the few clients that need a new lease during that downtime can still be serviced. I find that 3 day leases are a good compromise for most networks. 7 days or longer is too long – if you need to renumber your network, seven days wont cut the mustard, and 1 day or less will cause a massive DHCP packet barrage around 9 AM every morning. Service Pack 4’s DHCP and WINS servers can cope much better than prior NT releases with these transient loads. Get there as soon as possible if you haven’t already.
If you use Exchange, always make sure both your servers and your clients have properly configured DNS. Not having DNS will slow Exchange down, in many cases make it unusable – for example the X.400 connector over TCP/IP will often fail to work. Since Windows 2000’s directory is based upon DDNS, you should consider some form of DNS server today if you don’t already have DNS installed at your site.
Do a traceroute from a random sampling of clients, and try to ensure that there are no more than two router hops to servers used commonly by users. For example, if user000 through user999 at site A require access to server349, make sure that all clients can talk to the server through no more than two routers, and preferably just one or none. The latency should be no more than 15 ms to provide users with seemingly fast response to their actions, particularly if they use Active Desktop or IE 4.0 or later. The pipe to their servers should be faster than 2 Mb/s to ensure that they don’t bitch and moan at the server being slow. If you can’t provide this sort of speed locally, figure out some way to get a file synchroniser or backup program to look after a local file server for them.
100 Mb/s EtherNet can be a minefield. Check to see that you are actually seeing an improvement in performance using the “Auto” setting when using 100 Mb/s. I’ve seen servers set to 100 Mb/s Full Duplex crawl – 39 minutes to copy a 72 MB file instead of 15-20 seconds. I’ve found that by falling back to half-duplex, the speed is almost the same as full duplex – particularly with large packets. If neither duplex settings help, fall back to 10 Mb/s and again test for maximum speed when using half/full duplex. Ensure that all devices on a switch or hub have the same duplex setting.
Finally, it’s important that servers are able to talk to each other via as big a pipe as you can afford. By doing careful analysis of your network, you’ll quickly come to realise that a $2000 8 port switch, or $4000 24 port switch will massively boost inter-server bandwidth, reduce collisions and reduce router load. With Unix servers, unless they’re heavily NFS cross-mounted, there’s not much point to putting servers on a switch, but putting NT servers on a switch can really help. Some common BackOffice components and poorly designed COM/COM+ objects call the security provider all the time, meaning that a good percentage of your servers will be sending a constant stream of packets to your domain controllers.
Win32 programs, almost without exception, print to the GDI print model. The NT print processor then ensures that the printer driver is capable of delivering results close to the original intentions using the native PDL, and if not, it fakes it by rasterizing the necessary areas. If you have non-PostScript printers, you’d be surprised at the size of even modest business documents. PostScript is one of the highest forms of PDL, and thus has the smallest need to rasterize a GDI call. In the field, the difference can be quite staggering: a simple PowerPoint job will take 100 kb instead of 5 MB on a PCL printer. If you care about network bandwidth, do not buy non-PostScript printers. If you have printers that can do both PCL and PostScript, you can do your network a favour by choosing the PostScript version of the driver. Your users will get more options, all jobs will print quicker, and the network will not be bogged down so often. It’s extremely worthwhile to place printers and the printer server on the same switch. This will do more for speeding your network up than almost every other trick here as printer traffic (which can get quite intense) will not traverse user segments, switches and routers.
Last printing tip: Don’t use any form of AppleTalk print server. They all suck and they all at least double your network traffic. If you need Macs to print, buy EtherTalk capable printers, such as the HP 4000N, and let the printer do the talking.
Tweaks
Now that I’ve saved a good percentage of the network traffic, you need to look at getting the most out of your hardware that you’ve already purchased.
Windows NT and all the BackOffice components supply a rich set of performance counters, which make it easy to work out what your servers are doing and if they’re coping. It’s a good idea to take PerfMon logs on every major counter for a week or so and work out baseline performance for your servers. Then about once every three months or during known busy times, do it again and compare against previous baselines. By looking at the results of the comparisons, you can determine if your servers are coping with the load, or if they need upgrading. There are heaps of performance tuning references, including the resource kits from Microsoft, so I’m not going to go into much detail here. These baselines make it easy to justify purchasing upgrades or new servers. Without them, you may as well wet your finger, stick it in the air and see which way the wind is blowing.
Conclusion
It’s important to set up your environment to best cope with the operating system you’re going to use most. If that happens to be Windows NT, then a few small tweaks and some generic good advice will make the difference between a marginal existence and a trouble free, fast environment.
Leave a Reply