Server Clustering





Okay, so you're a systems manager with responsibility for a mid to large range client-server system which is increasingly becoming mission critical - it needs to be running on a 7x24 basis with minimum downtime. You want to ensure that your information, housed on multiple servers, is able to be manipulated in a reliable fashion but you're not quite sure how big your network is going to become, so it must also be scalable. You know you want more processing power and an SMP infrastructure just doesn't cut it any more. Or, tediously, you may have budget constraints which prevent you from implementing a network of choice and forcing you to implement a network of basic needs.

A possible solution, and one which is getting a lot of attention thanks to the behemoth which is Microsoft, is clustering. Clustering has been around for centuries in IT time but until now has existed mainly at the high end on proprietary Unix architectures with offerings from companies such as Digital, Hewlett Packard and Tandem. So Microsoft didn't invent the concept. But it may well be the architect of its reinvention with a project centred around Windows NT Server - known as the Wolfpack Consortium - which is designed to proliferate in mainstream enterprise systems. Broad industry support virtually guarantees that Wolfpack (final name not yet released) will evolve into an industry standard.

More than 100 vendors contributed to the Wolfpack Open Process in 1995 which assisted Microsoft in development new interfaces. More than 300 vendors and customers are testing the product in beta. As US technology research firm, Illuminata, analyst Jonathan Eunice said last year, 'In five years every serious server will be clustered - it's a simple matter of not wanting to put all your processors in the one basket.' Yet there are some vendors willing to go down the hardware path of scaling up - witness the development of ccNUMA by players such as DataGeneral, an architecture which comprises multiple Standard High Volume motherboards with four Intel Pentium Pro processors, 512Kb of cache per processor, 3Gb of memory per node and dual PCI I/O channels. But more on that later.

server_1.gif - 138.0 K
Clustering

Clustering, according to a Hewlett Packard white paper released in June 1996, is the connection and tight integration of multiple servers for the purpose of increasing server availability (or reliability) and performance. Physically it may comprise two or more servers linked by a cable system and/or a shared disk subsystem enclosure connected to each server and running enabling software. Clustering can be implemented at a number of levels within a system, including the single storage subsystem, the device level (NetFrame) and the operating system (DEC Unix, Unix vendors, Microsoft, Novell).

At the subsystem level both Oracle and Lotus have developed product and at the middleware layer NetFrame has made Novell's Network Directory Servers clusteraware with its ClusterData product. The net effect is that the workload is distributed over the clustered servers. When one fails, the load is redistributed without noticeable disruption to the user. One server can be used as the primary server to provide files and applications to users, with the remaining server available only for taking over in the event of a failure by the primary server.

Another approach is enabling the second server to do application processing and also enabling it to take over from the primary server in the event of failure. In this case both servers can be entirely independent with their own disks (meaning that data needs to be continually copied from one to the other), they can be cabled to the same disk (although retaining their own disks, eliminating data copy) or multiple servers can concurrently share the same disks. Both of the last two options would utilise redundant mirrored disks or RAID technology to ensure data availability.

For users who could potentially lose significant revenue due to failure, the mirrored server clustering approach enables servers to run the identical code and operations, and retain the same data in separate mass storage subsystems. But it does require application modification. Server clustering has a number of advantages, not least of which is availability or the ability to continue operations when a server crashes by 'failing over' to another server. The cluster software is then able to recover and redistribute data with unfilled user requests being restarted and completed by the failover server.

Performance benefits, depending on the configuration of the cluster, also accrue by having multiple servers act against an application or database. Scalability is another advantage as the prevalence of the mission-critical network into the less esoteric business world has meant an increase in the use of PC LAN servers which contain the information crucial to the running of the enterprise.

Applications, users and data are added daily to this networking environment, additions which tick over every so often into the requirement for additional server capabilities. The advent of the Wolfpack Consortium raises the possibility of clustering solutions for the PC LAN enterprise, solutions which will be standards-based and non-proprietary.

server_2.gif - 182.9 K
The Wolfpack Consortium

The Wolfpack Consortium consists of six vendors including Microsoft, Tandem, Digital Equipment Corporation (DEC), Hewlett Packard, NCR and Compaq. The players have been instructed to say publicly that each is contributing equally to the development of Wolfpack. DEC and Tandem have software cross-licensing agreements in place. Microsoft has paid Tandem $US35 million to port its ServerWare middleware (including database, transaction processing and messaging software) to the NT platform. 'This is something that Tandem has been doing for 20 years in a proprietary environment,' said Peter Neuhold, Tandem's Windows NT Product Manager. 'Microsoft understands that if they want to drive NT as an enterprise operating system, something that is really going to tackle the high end in terms of the middleware, database and transaction processing capabilities they have to have the high end covered,' Neuhold added.

Neuhold admits that Microsoft will probably build its own software to cover this area (although not in time for the scheduled release of Wolfpack) and that Tandem faces a challenge to continue to provide differentiated product which is superior to Microsoft's. 'They would like their product to have the capabilities of our product,' Neuhold said. 'The day they inherit the capability of our product is the day that our product goes out of business.'

Neuhold predicts that Windows NT will quickly gain market acceptance due to the mindshare it has in the industry and the amount of resource that Microsoft can throw behind it. While NT may not be an Unix-killer it will take a lot of the high-end market share from Unix. The general market acceptance that Windows NT does not scale well past eight processors in an SMP environment is another factor which has led Microsoft to pursue the clustering concept.

The genesis of the partnership with Tandem lies in needs from both sides. Microsoft needs Tandem's experience in clustering proprietary systems, using its Non Stop Kernel architecture, and Tandem, under the leadership of CEO Roel Pieper, is driving its technology to become standard in an open platform environment. In addition Pieper has initiated a culture change within Tandem: 'Previously we would have innovated and developed technology and then kept it as a differentiator,' Neuhold said. 'Now we are innovating and developing technology and licensing it to other vendors.'

Developments most relevant to the cluster debate include the licensing of the ServerWare middleware platform and the ServerNet interconnection infrastructure - both of which will be incorporated in Wolfpack. In addition ServerNet has been licensed to Compaq, Dell, NEC and Siemens Nixdorf/Pyramid,. while software houses such as Veritas, Unisys and Computer Associates have announced they will write enterprise applications based on or ported to ServerWare.

ServerWare is an ANSI-standard product suite, including SQL database, Tandem and Tuxedo transaction processing environments and an API transaction processing environment. It sits on top of Microsoft's Wolfpack automatic restart package which provides automatic failover. ServerWare leverages scalability and reliability onto that platform, redistributing the workload in the case of a server failure.

ServerNet on the other hand is the interconnect mechanism, which has been designed as a series of routers on a board which enables the connection of devices within a computer (also known as a Systems Area Network or SAN). 'This was conceived as an intrasystem connectivity method - a method of connecting components in a computer to overcome the problems of bus-based architecture,' Neuhold said. But, under the Wolfpack scheme ServerNet will be used to connect systems together in an NT environment, providing the benefits of scalability, reliability and the ability to do direct any-to-any input/output. Neuhold said Tandem, along with other vendors implementing NT clustering solutions, faces the decision of whether to retain their own product (Cluster Availability Solution) over Wolfpack or drop it in favour of what is expected to become the de facto industry standard.

server_3.gif - 125.9 K


Wolfpack itself will comprise a cluster alias and IP cluster address, multiple failover objects (disks, databases, applications, NFTS file services), automatic failback, manual failover of objects, reconfigure without reboot, a GUI for NT integrated administration and automatic Windows client reconnect.

It was first announced in October 1995 and is expected to be available for two-node systems in either Q2 or Q3 this year, depending who you talk to. Beta testing is being carried out on cluster configurations from the six Wolfpack Consortium members. Primary benefits will be fault tolerance and ease of administration, particularly for the 7x24 environments. N-node systems will not be available until mid to late 1998 and will add scalability.

Microsoft's Terry Clancy said NT has evolved to the point where it is mature, scalable and beginning to attract the interest of users. 'The design is standard and nonvariable with a wide array of software written to it and for the platform, so there is choice and it is a competitive market place with good value for money,' Clancy said.

Intel and Digital Equipment Corporation are also working on a common chipset for an eight-processor NT machine after which 'the value for money will improve dramatically and further squeeze what is left of the high end computer market in Unix and mainframe environments'. Compaq's part of the bargain is to develop the hardware server platform as well as the fibre channel shared storage device in return for which it will tilt at fulfilling its vision as an organisation to be in the top three computer companies in the world.

To do that we need to move into a broader market appeal and one of the markets is this high availability segment,' said Tony Bill, Systems Product Marketing Manager, Compaq. Until now external storage devices have been connected by SCSI cable, but Compaq will demonstrate the next generation, fibre channel, at its worldwide conference in Houston in April. 'When we get into a cluster, SCSI will not have the bandwidth whereas the fibre channel can pump terabytes of data through (as opposed to megabytes with SCSI cable),' Bill said. 'Wolfpack is going to give customers standards-based clustering and a cost effective upgrade path - if a customer can buy servers incrementally and cluster them together then he is not spending money on equipment he may not use.'

What else is out there?

In addition to being a member of the Wolfpack Consortium, HP is a cluster vendor in its own right and will seek to leverage both the Windows NT and Unix platforms. HP's 9000 series servers are marketed as enterprise clusters maintaining compliance with open systems, enabling the use of unmodified applications and standards-based hardware such as Ethernet and SCSI cable. The company also manufactures a number of accompanying products, including MC/ServiceGuard (specific to the HP 9000 Unix server), HP ClusterView (monitoring and management), the HP Enterprise Switch Model 266 and fibre channel fabric (interconnection).

MC/ServiceGuard and Wolfpack provide the focus for HP in the current server clustering climate with MC/ServiceGuard being ported to Windows NT for HP NetServers. MC/ServiceGuard is an application which provides high availability to cluster servers, monitors the health of system components and provides restart or reroute capabilities in the event of a server failure, via the active secondary server approach.

Hewlett Packard will also support Oracle's Parallel Server (OPS), a competitor to Tandem's ServerWare product and a share everything environment. OPS enables each cluster node to run part of the Oracle database code and to access a shared database, ensuring data integrity and consistency, and enabling additional performance against the database. The database can be accessed via another server should a node fail.

But, the bigger picture for HP is its belief that clustering should be deployed only in situations where it is adequately managed. This fits firmly with HP's pushing of its OpenView/ Cluster-View network monitoring and management platform from the high end Unix environment down to the PC LAN server environment - an environment it sees as providing the same levels of functionality as Unix within a matter of months.

'Although large Unix and mainframe vendors offer clustering products superior to those now available for PC LAN servers, the gap between them is closing rapidly,' said a HP white paper on clustering. IBM is another vendor with a foot in both camps. IBM is actively involved in the distribution of Wolfpack to IBM's PC server customer base. At the same time it is porting its Phoenix range of enterprise clustering solutions to Windows NT, which will enable advanced systems measurement and network health monitoring.

Microsoft and IBM will carry out certification of IBM PC server configurations to run the Wolfpack software. They are also negotiating the details of the distribution agreement.

server_4.gif - 115.7 K
Novell?

For the moment NetFrame is staying put on the Novell side of the fence. But, Pat Ryan, Pre-Sales System Engineer, said that the company intends to run Wolfpack on its 9000 systems. While Novell's SFT III and SMP strategies provide cluster-like capabilities, these have not been pushed as clustering as such. However NetFrame's push to have Novell rewrite its Client 32 software interface between a Windows 95 and the server is 'probably getting them to look at it a little bit earlier than they had thought'.

'To make our clustering work correctly we have had to get Novell to make some changes to Client 32 which may become the standard for all networks,' Ryan said. 'We have always been involved in clustering where we have had multiple processors in one machine, not where one fails another takes over.' NetFrame's implementation of clustering is focussed around its ClusterData and ClusterSystem 9000, following on from its 1988 release of multi-processor server architecture (MPSA) cluster servers.

ClusterData and ClusterSystem 9000 will be released during Q2 of this year, with Ryan attributing the delay to Novell rewriting its code. NetFrame's commitment is to extend cluster servers into the mainstream PC LAN environment using ClusterData which is an N-way (each server has own users and workload) and an N+1-way (standby system backs up primary system) clustering solution. It will be implemented in both the ClusterSystem 9000 and the ClusterServer 8500 series.

server_5.gif - 150.7 K


Throw Hardware at a Software Problem

Another approach to the conundrum of clustering Windows NT for the high end in an SMP environment is the one taken by DataGeneral and co(NCR, Sequent, Tricord). They have employed a concept called Non Uniform Memory Access (NUMA) to scale NT past the four or eight processors that Microsoft says is the maximum. Symmetrical multiprocessing, where several microprocessors are connected for the purposes of sharing memory and access to data, has been the preferred infrastructure for enterprise computing to date. But, as mentioned before, it does have scalability problems in certain environments.

ccNUMA is designed to overcome these problems. The architecture comprises a shared memory system, using hierarchical memories, intelligent resource management and multiple input/output paths. This means that the processor-memory interaction is transparent to the user. Because it doesn't matter where the memory is stored, vast numbers of processors can be connected - effectively constructing a massively parallel system. The only drawback is availability, which NUMA does not have in spades. This is what Neuhold calls the 'throw hardware at a software problem' approach.

But a number of vendors, particularly Unix vendors, see the NUMA approach as valid and have product in the pipeline, if not already available, including SCO, Hewlett Packard, Silicon Graphics, DEC, Fujitsu, and NCR. For example NCR apparently can run 32-way systems (predominantly for Unix but also for NT). Users will not need to alter their existing software base because NUMA will run existing applications. DataGeneral has, despite their investment in NUMA (with the NUMALiiNE range expected to be available first quarter), recently released its NT Cluster-in-aBox solution. This includes two Intel-based AViiON servers, a fault tolerant CLARiiON RAID technology storage system and its own NTAlert problem detection functionality combined with Veritas Software Corporation's Firstwatch for NT.

Stay tuned for the jury's verdict the prosecution (Wolfpack) or the defendant (proprietary Unix clustering). It may be a hung trial, considering NUMA and Novell. Whatever the outcome, it should be an interesting verdict and one that will very possibly decide the future of high-end enterprise computing.

Please email1.gif - 1.2 K me and tell me if you liked my webpage on servers, clustering and LAN information, or even if you have any contributing sites on similar info that I can include here.

Click here to go back to my Technical Page

This page has been accessed times.

Last revised: Saturday, 12 April 1997