Category: Life, the universe, and everything…

  • Windows NT 4.0 Manageability

    [ Copies of some of my older work for a SAGE-AU column ]

    A short column this month as I’m pretty pressed for time and working against a tight deadline, which I’ve definitely abused this time (sorry Donna!). This month, I’ll be dealing with the remote management blues. You’ll need a copy of the NT Server Resource Kit, 3rd Edition, and the NT Workstation Resource Kit, and if you’d like to get the full screen stuff happening, I suggest buying one of VNC, Control It! (nee pcAnywhere), or Timbuktu.

    Step Zero

    The biggest mistake I see with many naïve Windows NT installations is that the administrator installs every service and its dog on the off chance that it’ll be needed it later. Don’t do this – you can always install it later. As with all production systems, you install and run only those services that you actually require. By installing less, there’s more RAM for the real application or service to use, NT loads faster, and there’s probably fewer bugs or holes to exploit.

    After installing any installation of Windows NT, it’s important to sort out any warnings or errors in the Event Log. These warnings are harbingers of doom for your social life if you leave them lie.

    Another trick for easy performance boost and nice little trick to know is to set the page file to twice physical RAM for both minimal and maximal settings just after installation. This stops Windows NT resizing the page file on the fly, which under stress can cause a completely unresponsive server. By setting the page file just after installation, you get a fairly contiguous page file, which can help performance.

    Event Manager

    This first stop when diagnosing any problem with Windows NT is the Event Manager. If you only get one piece of information from this article, the key to successful problem resolution is Event Manager.

    If the server is blue screened, take good note of the exception and which driver or application killed itself, and reboot. Then hit the event monitor to see if there was something in the lead up to the blue screen that might have triggered the BSOD. Since BSOD’s are rare (I’ve seen less than five in the last twelve months), most times the only entrails of the problem will be in the Event Manager. Check all three logs, and see if you can replicate the problem. At least you have some event codes to plug into TechNet to see what turns up.

    Always set the log policy that suits your organization. If you’re not interested in the log contents, bump all three logs to 4096 KB, and over write as necessary. By leaving the logs with the default settings is asking for sudden unexplained application failure as NT will simply stop the logs from being used. Always check critical servers every morning, and other servers once per week. If this means you only have time to check logs, it’s time for a log management helper, like NetIQ or similar.

    Freebies

    Don’t ignore the command line processor. The command shell has hidden talents, such as command history, scrollable windows and expanded batch functionality, including conditional operations (&&), command grouping and serialization. Try using the function keys in the command shell. F1 is a character by character version of ye olde F3. F2 allows you to copy part of the command history to a specific character (sort of like yank line with a search in vi). F3 displays the last command. F4 allows you to delete from the insertion point to a specific character in the command history. F7 allows you to browse previous commands. F8 does the last command with the insert point at the beginning of the line. F9 allows the selection of a specific command from the history buffer (equivalent to !5 in tcsh, which repeats command 5). You can make the command processor much easier to cut and paste from by turning on quick insert mode. I like to use 43 or 50 lines, and a smaller font with blue background and white text, but that’s just me.

    The command line is useful under Windows NT remote troubleshooting because all the good stuff can be done using command line tools, particularly with net.exe. In W2K, the command line becomes even more useful. Microsoft have committed to be able to do everything (and I mean everything) by the command line. So far I count more than 400 executables in the Windows 2000 system directory. That’s more than double the total amount of Windows NT 4.0, even though all the graphical administration utilities are now Microsoft Management Console (MMC) snap-ins (which have a .msc extension). I think this bodes well.

    For bonus points, besides posix.exe, what is the only other POSIX subsystem application that is delivered with Windows NT? Sorry, no prizes for this one.

    Net.exe – Nifty Tool of the Week

    Windows NT and Windows 9x both ship with a program called net.exe. Net allows you do the vast majority of your remote administration. The first thing you need to know is a little known depth called mailslots. Mailslots are an old OS/2 LanMan RPC holdover, one of six different IPC methods that is available to Windows NT. Mailslots allow you to impersonate a user by connecting first to the IPC$ share.

    To invoke a new impersonated mailslot from the command line type the following:

    Net use \\server\ipc$ * /user:domain\account

    Remember to substitute the username, domain name and server to make it work for you. The asterisk allows you not to enter the password in the clear on the command line – important if you keep a command history or there are busy bodies wafting around your shoulders, say at work or a conference.

    NT 4.0 and later allows you to use DNS names and IP addresses as well as hosts that WINS can find for you. For example, if you have no WINS replication or resolution to a PC (say your PC at home), you can connect to it like this:

    Net use \\192.168.1.1\ipc$ * /user:domain\account

    Where obviously, you’d substitute 192.168.1.1 with the necessary IP address, and substituting the domain and account details. You could connect to it via a dns name, like \\hackbox.greebo.net\ipc$. There are bugs in 4.0 prior to SP4 regarding the use of hexadecimal or octal representations for the IP address. Upgrade to SP4 to avoid this.

    Why connect via a mailslot? Well, when you have a valid and active mailslot running, you can browse the machine, administrate it using the normal NT utilities, and use net commands against it, like net user or net statistics.

    This is the hidden su-like interface to Windows NT. The cool thing is that you can use any account, and you still get through as long as physical communication is possible (ie you can ping the remote machine and ports 137-139 are not blocked in the middle). Make sure you do a

    Net use \\server\ipc$ /del

    when you’re finished.

    Tip of the week: net stop/net start can avoid some reboots if you know what you’re doing. Many applications will ask for a reboot when all that’s really required is for the service(s) to be stopped and started. Practice before you trust this advice, but it can avoid downtime, so it’s worth a try. Sometimes logging out is that’s required as well. If availability is important to you, do try this. Otherwise, just reboot. It’s the NT way.

    Windows NT Diagnostics

    Run winmsd.exe from the start menu or start it from the Administrator Tools, and you get a handy little tool that can connect remotely. WINMSD can tell you what sort of processors you have and how much RAM, what sort of disks, etc, in one handy little utility. I’ve used this with some success when I needed to tell the difference between a PII/400 and a Xeon/400 at a site some 265 km from where I was sitting just this week. It works.

    Resource Kits – Don’t Leave Home Without Them™

    If you support NT for a living or just dabble with NT because you’re the only “computer” person working in your company or department, the resource kits are essential parts of the administrator’s toolkit. Right now, the NT Server 4.0 Resource Kit can be had for less than $300, but it’ll have a very short life span, so it’s not good value.

    The workstation and server resource kits CD’s are available via TechNet (aka DogNet). TechNet costs about $800 per year, and is well worth the price. You have to order through Microsoft directly, rather than ordering through a dealer. The resource kit utilities are partially available via Microsoft’s ftp site. You can buy books of the resource kits for about $400, but they typically don’t have the latest versions of the CDs, and the paper text will be out of date within six months.

    The resource kits contain many useful utilities, not the least being a telnet daemon, and the more useful “rconsole” (rconnect.exe). Both utilities give you access to a command prompt running on a remote NT server. Rconsole gives you full command shell functionality, and allows for most console programs to run (with the exception of things that change video modes, like Ghost 5.0 or games). Now that you know how to connect using mailslots above, you can do this inside a rconnect window as well. Layer upon layer upon layer…

    I treat Resource Kits like dictionaries – they are deep, and you don’t have to know every nook and cranny, but if you spend a little bit of time every week getting know new tools, it’ll pay off in the end, or when you have a tight deadline.

    Tip of the week: Check out the password filter in the resource kit. It does a great job of allowing you to define what sort of passwords your users can use. The downside? It needs to be on all workstations.

    Quickies

    NT has some nice functionality for managing remote sites, but sometimes the functionality is hidden somewhat. For example, if you wish to add a printer on a remote server, this used to be a doddle in NT 3.x, but it’s sort of hidden in NT 4.0. The trick? Browse the server and dive into the Printers folder. Add New Printer wizard is now available. You can’t easily create LPR or JetDirect ports, but if the ports already exist, then you can setup and manage printers remotely again.

    To tone down some of the more unnecessary NetBIOS broadcasts, you can turn off the Computer Browser service on NT Workstations and member servers. This stops these machines participating in Browser elections. If you have WAN sites with asynchronous or single channel ISDN connections, you might want to have a look at WINS replication intervals (every 30 minutes might be too often). The replication governor (look it up in TechNet) and possibly revisiting your WAN infrastructure to minimize WAN traffic by placing a NT Server at the other end.

    NT Services for Unix have been released as an actual product. This has a number of the MKS shell tools and an NFS server and client. It’s not the complete MKS tool kit, but it’s better than nothing. Internet Explorer 5.0 and Office 2000 are due on March 18th. One is free, the other will cost more J

    Conclusion

    Windows NT may not be the most manageable or serviceable operating system without some additional third party helpers, but judicious use of the available tools coupled with a methodical approach can help look after most technical support issues. As with most operating systems, proper production management techniques will boost reliability and availability.

  • Windows NT Serviceability

    A few years ago, I owned a lovely Beetle 1300 that only let me down about twenty or so times in the two years I owned it. As a result, I owned a great owner’s repair guide, written by an old hippie. It was a great read in its own right, and I used the book extensively. One of the things that stuck with me is that the author told of a time that he took apart and serviced a Buick auto transmission using the instructions for a Beetle auto transmission. It worked, and he learnt a lot during the process. In the same way, I am hoping that you’ll stick with me, think outside the square for a few minutes, and see if you can take an idea or two from my article and apply it to your own situation.

    Serviceability

    On October 20, Microsoft released Service Pack 4. This service pack is the service pack that all NT shops have been dying for. After a considerable wait, it’s finally here, and it looks as if Microsoft has finally taken comments from the system administration community seriously. One of the bigger problems with software development, especially on a code base of the complexity of Windows NT, Solaris or Linux, it’s hard to separate new functionality from fixes. Microsoft has provided three different levels of update for SP4, based upon feedback garnered over the last few years.

    The smallest update is just the fixes. In the minimal update, 641 new fixes (plus all the old ones) are provided in a single file 260 kb file. That’s fine if you don’t need to tick the y2k box or want any of the new features.

    The intermediate update, 32 MB in size, not only fixes all known NT problems, but provides a lot of extra fixes and some new functionality, asked for specifically by NT Security gurus, like new versions of PPTP and LMv2 security. In many cases, to get really secure you need to ditch Windows 9x from your network. For the dubious still reading, 32 MB is very comparable to the 41.9 MB in recommended patches for Solaris 2.5.1 or 20 MB for Solaris 2.6, which has not been out as long as NT 4.0 has.

    NT 4.0 had two y2k bugs and about 4 or five cosmetic y2k bugs. Microsoft provides the 76 MB y2k fix to get as many customers as possible to the same supportable configuration. This huge meta-update contains IE 4.01 SP1, SP4, a data connector update, and some BackOffice fixes you need to make NT 4.0 y2k compliant.

    SP4 is one of the easiest service packs to apply in a long time. Click a couple of boxes, and it munges away. The downtime window is very small – the time it takes your server to shutdown and restart (mostly less than five minutes on the Intel and Alpha servers I’ve updated so far). But as always, prepare for the worst. Make a emergency repair disk (rdisk.exe), do a full backup, ensure that you understand your own disaster recovery plan (DISPLAN), and make sure you have your NT CD (and if you need them, the three boot disks) handy. The best case down time window will be the same, but you’ll be in a much better position in case something goes wrong.

    My success ratio with SP4 is good – it fixed one seriously ill server that was cruising for a bruising with the NT install CD. The only thing stopping me is that it was our primary domain controller. It wouldn’t give up being the primary domain controller and the bandwidth to the box was approximately 9600 bps over a 100 Mbs full duplex Ethernet connection. However, NT is a very stable OS, and even though it was very sick, it stayed up for months on end, and it reliably serviced over 200,000 DNS and 142,000 WINS queries per week. Applying SP4 fixed both the promotion/demotion and the bandwidth issues, so it’s back to normal.

    There was one “server” that didn’t take SP4 too well. It comes down to what we class as servers. This unit was an old HP Vectra VL 5/200 with 48 MB of RAM. It was servicing the Cisco 5200’s TACACS+ needs for the place I used to work at. I’m no great fan of using desktop PC’s as servers. My basic requirements for a server is that if it’s important enough to dedicate a machine to, it’s important enough to do it right. This means providing the necessary infrastructure and support for a server level operating system, things like a CD-ROM drive, some way of backing up and restoring the server commensurate to its importance in the enterprise, and whether the tier one vendor will support you when you have problems.

    HP, like most tier one vendors (such as Sun, IBM, Compaq, Compaq nee Digital, Apple, and others) have two or more separate product lines – a desktop line and a server line. My personal opinion is that sometimes the distinctions can just be marketing, but HP provide support for NT Server, SCO Unix, Solaris x86, OS/2 Warp Server and NetWare only on their NetServers. They do not support these OS’s on their desktop PC’s. For any corporation, the data or service is of far more worth to the organisation than the hardware. That’s why I baulk at installing server level OS’s on desktop PC’s unless those PC’s are going to be used by a single user under test conditions – and even then a desktop PC is no predictor of success when translated to the real thing. In a bad taste analogy, it’s like clinical tests on mice – some drugs are fatal to mice that are benign to humans and vice versa.

    If you’re not buying servers from tier one vendors, I’m sorry but that’s not such a good idea. I know friends who have rolled their own servers, but let me relay to you what happened at my last site with a roll your own. The machine was massively built – it was a full tower with a Asus mainboard, a DPT caching RAID card, heaps of RAM, the works. The problem is that the drive cage was painted with non-conductive paint. After a year of heavy service, the insulation wore through from the vibration and the drives started to earth their circuitry to the cage and died. First one drive died, and no one noticed because the box didn’t have any monitoring software loaded nor did the RAID card have a $2 piezoelectric bleeper like the HP NetRAID cards do. So the DPT controller made up the difference using parity. Then the next drive died, and the server stopped. There were no backups of the box for a month because the DDS tape drive could not read its own tapes (which is why you verify). The excrement hit the fan and someone got the arse. The server cost only $2000 less than an equivalent HP server, which also had vendor support (ie if a component dies, they courier out a replacement), and it had true hot swap rather than just the cold swap of the roll your own. Is your job worth $2000? The month’s lost data was worth far more than $2000 (mid-six figures, actually). If you’re wondering, NT was not the NOS running this box, but it’s irrelevant to this recounting.

    Server Availability Tips

    • Do not install any protocols, services or products that are not going to be used as part of the server. For example, do not install IPX/SPX on an Oracle DBMS as clients will not use this protocol to communicate with the server. Never install Simple TCP/IP services.
    • Always have a CD-ROM drive on your servers. They’re only about $100, and can save you hours of repair time. I’m not too fussy about ATAPI vs SCSI CD-ROMs these days, just make sure that your OS can read it without additional drivers. Panasonic 32x SCSI CD-ROMs are less than $300, so if you can afford the SCSI alternative, go for it.
    • Take emergency repair and disk partition disks on a regular basis. I do ERD’s once a week, and disk partition disks about once a month, and I rotate the disks so I have more than one ERD per server. The reason is that floppies are terribly unreliable, and if you’re trusting a six or twelve month old floppy, you’re kidding yourself.
    • Try to avoid using the console at all. Domain Admin users are able to crash the server (as just as in Unix, root can cd / ; rm –rf * or kill –9 –1). There are some unavoidable reasons to use the console, so schedule this as part of your regular maintenance window.
    • Make sure you have a regular maintenance window. Never promise 100% uptime, as you’ll be setting unrealistic expectations. The aim is to have 100% availability for core hours. I worked in the hospital system, and we had the aim of 100% availability, but if we needed to, we could take some time from 4 am – 6 am on Sunday morning or longer if arranged beforehand. As it was, we were in the high 99.994% uptime (less than 30 minutes of unscheduled down time per year) for the vast majority of our servers (NT, Novell and Digital Unix). If anyone says that these operating systems are unreliable, I have a bone to pick with them based upon real life experience in the mission critical, health care enterprise arena.
    • With Windows NT, as in many OS’s, it’s worthwhile to separate the data from system files. This means at least two partitions on production servers. I have my own preference for partitioning, but to cut it short, you need about 1 GB for NT’s system partition (to have the OS, a copy of the installation files, the page file, and drivers), and the rest can be partitioned for user files. If you’re doing a print server (in my book, a server servicing more than 50 or so printers, or you’re doing PostScript RIP stuff), move the spool to the data partition, as you can fill the system partition with user files. The Q article in the knowledge base is Q123747.
    • Practice your disaster recovery plans. If you don’t have a test server that’s exactly like your production servers, allocate some budget, and buy it. It’ll pay you off the first time you have a crash. Learn (and document) how to restore your systems as quickly and as reliably as possible. Practice, practice, practice. Don’t have a DISPLAN? Write one today or seek advice on getting one written. They’re living documents, so keep them up to date.
    • If you don’t have a TechNet subscription, get it. It’s about $800 per year, and worth every cent. If you have even one developer in house, get the MSDN Universal subscription (about $3500 per year at today’s prices). It comes with lots of goodies, including MSDN Library (some of the best answers to your problems are in MSDN) and you get pretty much all the MS products including betas.
    • NT Magazine is a must have subscription – don’t waste your time with the emasculated Australian edition – pay the extra fifty bucks and get the US one airmailed to you.
    1. There are various NT resources all over the Net. My favourites include http://ntsecurity.ntadvice.com and http://ntbugtraq.ntadvice.com, both run by Russ Cooper, a featured 1997 SAGE-AU conference speaker.
    • Avoid letting staff with a little knowledge administrate NT. It’s a recipe for disaster. Teach them a few things every month and bring their knowledge up, rather than let them just go for it. Management will dislike you because you’re “reducing productivity” or looking like a control freak (management speak: “You are not being a team player”), but the alternative is massive amounts of down time. Make sure that they are interested in boosting their knowledge levels by making them go for the MCSE exams. They exams are $135 a pop and easy to get as long as you actually use and understand the product (the instructor led courses can help, but they’re not mandatory). Under no circumstances give out Domain Admin privilege to those who do not need it.

    In the next newsletter, I’ll explain how to use the resource kit utilities to administrate NT from a user level account (with access to a domain admin account, of course!).

    Slagging Microsoft

    Like many of you, I read Slashdot, although I am beginning to wonder why. Originally, Slashdot was a fun site that had many cool stories and lots of nifty Linux/Open Source articles. However, more and more often it has descended to outright MS bashing. Now I am not going to defend Microsoft for everything they do, because I personally find their marketing and monopolistic practices loathsome.

    What’s the relevance? The problem is that SAGE-AU’s mailing list has descended to the lowest levels of Slashdot of late. The Executive will be making some announcements soon on measures to curtail the level of vendor bashing on the lists. This is because we are putting off people who might not ask questions that are necessary for them to get their job done. For example, I haven’t seen a NetWare-specific question on the list this year. Is it because we have no NetWare people on the list, or is because the NetWare people are fearful of being slagged by both the MS and Unix weenies? This is not professional behaviour dudes!

    Whatever the reason, the SAGE-AU Executive have decided to take some action to curtail advocacy or just plain emotive slagging. There’s no point in voicing the opinion that OS x is not stable or is unsuitable to a particular task, particularly when the admin asking the question might already be using OS x in that situation quite happily. They may only have a small problem that would make their life easier if someone else on the list has already solved it.