US20050125557A1 - Transaction transfer during a failover of a cluster controller - Google Patents

Transaction transfer during a failover of a cluster controller Download PDF

Info

Publication number
US20050125557A1
US20050125557A1 US10/730,349 US73034903A US2005125557A1 US 20050125557 A1 US20050125557 A1 US 20050125557A1 US 73034903 A US73034903 A US 73034903A US 2005125557 A1 US2005125557 A1 US 2005125557A1
Authority
US
United States
Prior art keywords
server
cluster
node
failure
servers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/730,349
Inventor
Bharath Vasudevan
Nam Nguyen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dell Products LP
Original Assignee
Dell Products LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dell Products LP filed Critical Dell Products LP
Priority to US10/730,349 priority Critical patent/US20050125557A1/en
Assigned to DELL PRODUCTS L.P. reassignment DELL PRODUCTS L.P. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NGUYEN, NAM, VASUDEVAN, BHARATH
Publication of US20050125557A1 publication Critical patent/US20050125557A1/en
Assigned to BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS FIRST LIEN COLLATERAL AGENT reassignment BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS FIRST LIEN COLLATERAL AGENT PATENT SECURITY AGREEMENT (NOTES) Assignors: APPASSURE SOFTWARE, INC., ASAP SOFTWARE EXPRESS, INC., BOOMI, INC., COMPELLENT TECHNOLOGIES, INC., CREDANT TECHNOLOGIES, INC., DELL INC., DELL MARKETING L.P., DELL PRODUCTS L.P., DELL SOFTWARE INC., DELL USA L.P., FORCE10 NETWORKS, INC., GALE TECHNOLOGIES, INC., PEROT SYSTEMS CORPORATION, SECUREWORKS, INC., WYSE TECHNOLOGY L.L.C.
Assigned to BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT reassignment BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT PATENT SECURITY AGREEMENT (ABL) Assignors: APPASSURE SOFTWARE, INC., ASAP SOFTWARE EXPRESS, INC., BOOMI, INC., COMPELLENT TECHNOLOGIES, INC., CREDANT TECHNOLOGIES, INC., DELL INC., DELL MARKETING L.P., DELL PRODUCTS L.P., DELL SOFTWARE INC., DELL USA L.P., FORCE10 NETWORKS, INC., GALE TECHNOLOGIES, INC., PEROT SYSTEMS CORPORATION, SECUREWORKS, INC., WYSE TECHNOLOGY L.L.C.
Assigned to BANK OF AMERICA, N.A., AS COLLATERAL AGENT reassignment BANK OF AMERICA, N.A., AS COLLATERAL AGENT PATENT SECURITY AGREEMENT (TERM LOAN) Assignors: APPASSURE SOFTWARE, INC., ASAP SOFTWARE EXPRESS, INC., BOOMI, INC., COMPELLENT TECHNOLOGIES, INC., CREDANT TECHNOLOGIES, INC., DELL INC., DELL MARKETING L.P., DELL PRODUCTS L.P., DELL SOFTWARE INC., DELL USA L.P., FORCE10 NETWORKS, INC., GALE TECHNOLOGIES, INC., PEROT SYSTEMS CORPORATION, SECUREWORKS, INC., WYSE TECHNOLOGY L.L.C.
Assigned to DELL INC., WYSE TECHNOLOGY L.L.C., DELL USA L.P., FORCE10 NETWORKS, INC., CREDANT TECHNOLOGIES, INC., ASAP SOFTWARE EXPRESS, INC., SECUREWORKS, INC., APPASSURE SOFTWARE, INC., DELL MARKETING L.P., DELL SOFTWARE INC., COMPELLANT TECHNOLOGIES, INC., DELL PRODUCTS L.P., PEROT SYSTEMS CORPORATION reassignment DELL INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT
Assigned to DELL INC., PEROT SYSTEMS CORPORATION, WYSE TECHNOLOGY L.L.C., DELL PRODUCTS L.P., CREDANT TECHNOLOGIES, INC., COMPELLENT TECHNOLOGIES, INC., ASAP SOFTWARE EXPRESS, INC., SECUREWORKS, INC., DELL SOFTWARE INC., APPASSURE SOFTWARE, INC., FORCE10 NETWORKS, INC., DELL MARKETING L.P., DELL USA L.P. reassignment DELL INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: BANK OF AMERICA, N.A., AS COLLATERAL AGENT
Assigned to CREDANT TECHNOLOGIES, INC., APPASSURE SOFTWARE, INC., DELL SOFTWARE INC., FORCE10 NETWORKS, INC., WYSE TECHNOLOGY L.L.C., ASAP SOFTWARE EXPRESS, INC., DELL USA L.P., DELL MARKETING L.P., DELL PRODUCTS L.P., SECUREWORKS, INC., PEROT SYSTEMS CORPORATION, COMPELLENT TECHNOLOGIES, INC., DELL INC. reassignment CREDANT TECHNOLOGIES, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare

Definitions

  • the present disclosure relates, in general, to the field of information handling systems and, more particularly, to computer clusters having a failover mechanism.
  • An information handling system generally processes, compiles, stores and/or communicates information or data for business, personal or other purposes, thereby allowing users to take advantage of the value of the information.
  • information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, as well as how quickly and efficiently the information may be processed, stored, or communicated.
  • the variations in information handling systems allow for information handling systems to be general, or to be configured for a specific user or a specific use, such as financial transaction processing, airline reservations, enterprise data storage, or global communications.
  • information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information, and may include one or more computer systems, data storage systems, and networking systems, e.g., computer, personal computer workstation, portable computer, computer server, print server, network router, network hub, network switch, storage area network disk array, redundant array of independent disks (“RAID”) system and telecommunications switch.
  • a server cluster is a group of independent servers that is managed as a single system for higher availability, manageability, and scalability.
  • a server cluster is composed of two or more servers that are connected by a network.
  • the cluster must have a method for each server to access the other server's disk data.
  • some software application is needed to manage the cluster.
  • One such management tool is the Microsoft Cluster Storage® (“MSCS”) which is produced by the Microsoft Corporation of Redmond, Wash.
  • MSCS Microsoft Cluster Storage®
  • Clustering typically involves the configuring of a group of independent servers so that the servers appear on a network as a single machine.
  • clusters are managed as a single system, share a common namespace, and are designed specifically to tolerate component failures and to support the addition or subtraction of components in a transparent manner.
  • cluster configurations with several active nodes are possible.
  • An active node in a high-available (“HA”) cluster hosts some application, and a passive node waits for an active node to fail so that the passive node can host the failed node's application.
  • Cluster applications have their data on a shared storage area network (“SAN”) attached disks that are accessible by all of the nodes.
  • SAN shared storage area network
  • the node that hosts an application can own the application's shared disks.
  • the applications remain spread across different nodes of the cluster, there arises a requirement to have a cluster backup solution that is completely SAN based, using a shared tape library that is accessible by all of the nodes of the cluster.
  • there is also a need for the solution to the problem to be failover aware because the applications may reside on different (failover or backup nodes) at different points in time during the backup cycle.
  • the surviving server accesses the failed server's disk data via one of three techniques that computer clusters use to make disk data available to more than one server: shared disks, mirrored disks, and simply not sharing information.
  • a flexible alternative to shared disks is to let each server have its own disks, and to run software that “mirrors” every write from one server to a copy of the data on at least one other server. This useful technique keeps data at a disaster recovery site in sync with a primary server.
  • a large number of disk-mirroring solutions is available today. Many of the mirroring vendors also offer cluster-like HA extensions that can switch workload over to a different server using a mirrored copy of data. However, mirrored-disk failover solutions cannot deliver the scalability benefits of clusters.
  • mirrored-disk failover solutions can never deliver as high a level of availability and manageability as the shared-disk clustering solutions since there is always a finite amount of time during the mirroring operation in which the data at both servers is not one hundred percent (100%) identical.
  • modem cluster solutions employ a “shared nothing” architecture in which each server owns its own disk resources (that is, the servers share “nothing” at any point in time).
  • a shared-nothing cluster has software that can transfer ownership of a disk from one server to another. This provides the same high level of availability as shared-disk clusters, and potentially higher scalability since it does not have the inherent bottleneck of a DLM. Best of all, a shared nothing cluster works with standard applications since there's no special disk access requirements.
  • MSCS clusters provide high-availability to customers by providing a server failover capability. If a server goes down due to either a hardware or to a software failure, the remaining nodes within the cluster will assume the load that was being handled by the failed server and will resume operation to the failed server's clients. In order to increase uptime, other techniques to improve fault tolerance within a server, such as hot plug components, redundant adapters and multiple network interfaces, are also implemented on customer environments.
  • a cluster node When a cluster node receives a request, the node processes that request and returns a result.
  • resources will fail over to the remaining nodes only after a series of retries has failed. While the retries are failing, any requests that are resident (queued) in the now-failed cluster node will either timeout or return to the client with an error message. These timeouts or bad returns happened because of the failure of the node. If the client issued the request from a cluster-aware application, the client will have to retry the request after the timeout.
  • the client request will fail, and the client will need to rescend (need to be retried) the request (manually).
  • the timeout or failure is needless because another node in the cluster should have serviced the failed node.
  • a system and method are provided for transferring the transaction queue of a first server within a cluster to one or more other servers within the same cluster when the first server is unable to perform the transactions. All servers within the cluster are provided with a heartbeat mechanism that is monitored by one or more other servers within the cluster. If a process on a server becomes unstable, or if a problem with the infrastructure of the cluster prevents that server from servicing a transaction request, then all or part of the transaction queue from the first server can be transferred to one or more other servers within the cluster so that the client requests (transactions) can be serviced.
  • a method for managing a cluster that enables the servicing of requests even when a node and/or section of the cluster is inoperative.
  • a method for employing a heartbeat mechanism between nodes of the cluster enables the detection of problems so that failover operations can be conducted in the event of a failure.
  • a transaction queue from one server can be moved to another server, or to another set of servers.
  • a copy of the transaction queues for each of the servers within the cluster can be stored in a shared source so that, if one server fails completely and is unable to transfer its transaction queue to another server, the copy of the transaction queue that is stored in the shared data source can be transferred to one or more servers so that the failed server's transactions can be completed by another server.
  • the system may also include a plurality of computing platforms communicatively coupled to the first node. These computing platforms may be, for example, a collection of networked personal computers and/or a set of server computers.
  • the system may also include a Fibre Channel (“FC”) switch communicatively coupled to the first node and to a plurality of storage resources.
  • FC switch may, in some embodiments, include a central processing unit operable to execute a resource management engine.
  • a system and method incorporating teachings of the present disclosure may provide significant improvements over conventional cluster resource backup/failover solutions.
  • teachings of the present disclosure may facilitate other ways to reallocate workload among servers and/or nodes within a cluster in case of a failure of any node or portion of the cluster infrastructure.
  • Other technical advantages should be apparent to one of ordinary skill in the art in view of the specification, claims, and drawings.
  • FIG. 1 is a block diagram illustrating an information handling system capable of implementing all or part of the present disclosure
  • FIG. 2 is a block diagram illustrating a two-node failover cluster
  • FIG. 3 is a block diagram illustrating a two-node failover cluster with a failed storage path
  • FIG. 4 is a block diagram illustrating a two-node failover cluster implementing an embodiment of the present disclosure.
  • the present disclosure provides a cluster with a set of nodes, each node being capable of transferring its outstanding transaction queue to the surviving nodes using the cluster heartbeat.
  • the cluster heartbeat is a dedicated link between the cluster nodes which tells every other node that the node is active and operating properly. If a failure of a node is detected within a cluster node (e.g., network, hardware, storage, interconnects, etc.), then a failover will be initiated.
  • the present disclosure relates to all conditions where the cluster heartbeat is still intact and the failing node is still able to communicate to other node(s) in the cluster. Examples of such a failure are failure of a path to the storage system, and failure of an application.
  • the surviving nodes can serve outstanding client- requests after assuming the load from the failed node without waiting until after the requests timeout.
  • present disclosure helps make non-cluster-aware clients survive a cluster node failure. Instead of becoming disconnected because of a failed request, the client can switch the connection to the new node.
  • Elements of the present disclosure can be implemented on a computer system as illustrated in FIG. 1 .
  • the information handling system is a computer system.
  • the information handling system generally referenced by the numeral 100 , comprises processors 110 and associated voltage regulator modules (“VRMs”) 112 configured as processor nodes 108 .
  • VRMs voltage regulator modules
  • a north bridge 140 which may also be referred to as a “memory controller hub” or a “memory controller,” is coupled to a main system memory 150 .
  • the north bridge 140 is coupled to the processors 110 via the host bus 120 .
  • the north bridge 140 is generally considered an application specific chip set that provides connectivity to various buses, and integrates other system functions such as memory interface.
  • an INTEL® 820E and/or INTEL® 815E chip set available from the Intel Corporation of Santa Clara, Calif., provides at least a portion of the north bridge 140 .
  • the chip set may also be packaged as an application specific integrated circuit (“ASIC”).
  • the north bridge 140 typically includes functionality to couple the main system memory 150 to other devices within the information handling system 100 .
  • memory controller functions such as main memory control functions typically reside in the north bridge 140 .
  • the north bridge 140 provides bus control to handle transfers between the host bus 120 and a second bus(es), e.g., PCI bus 170 and AGP bus 171 , the AGP bus 171 being coupled to the AGP video 172 and/or the video display 174 .
  • the second bus may also comprise other industry standard buses or proprietary buses, e.g., ISA, SCSI, USB buses 168 through a south bridge (bus interface) 162 .
  • These secondary buses 168 may have their own interfaces and controllers, e.g., RAID storage system 160 and input/output interface(s) 164 .
  • BIOS 180 is operative with the information handling system 100 , as illustrated in FIG. 1 .
  • the information handling system 100 can be combined with other like systems to form larger systems.
  • the information handling system 100 can be combined with other elements, such as networking elements, to form even larger and more complex information handling systems.
  • FIG. 2 illustrates a two-node failover cluster.
  • the shared storage unit can be a storage area network (“SAN”) or other device that can store information emanating from the server A (node) 204 and server B (node) 206 .
  • the servers 204 and 206 are connected to the shared data 202 via storage interconnections 203 a and 203 b, as illustrated in FIG. 2 .
  • the server nodes 204 and 206 are connected to a virtual cluster internet protocol (“IP”) address 211 via, for example, Ethernet interconnections 207 a and 207 b.
  • IP virtual cluster internet protocol
  • Clients 210 that issue requests to and receive responses from the cluster 200 are connected to the virtual cluster IP address 211 by, for example, the local area network 212 . Between the servers is the cluster heartbeat signal channel 205 . Each of the servers of the cluster has at least one heartbeat line 205 that is monitored by at least one other server.
  • the heartbeat 205 is used to determine, for example, some type of application failure, or some type of failure of the paths emanating from the server in question, such as the path 203 to the shared data 202 , or the path 207 to the virtual cluster IP address 211 . For example, if the path 207 a had failed, the server A 204 would still be operative, but it would not be able to send the results back to the clients 210 because the path to the virtual cluster IP address 211 was blocked.
  • the heartbeat from the server A 204 would still be active (as determined by the server B 206 via the heartbeat connection 205 ) so that the server B 206 could either take over the workload of the server A 204 , or server A 204 could merely be commanded to convey the results of its work through server B 206 and its operative connection 207 b to the client 210 before timeout of the request.
  • server A 204 would not be able to store its results, but could, via the heartbeat connection 205 , convey its connection problems to server B 206 , which could then service the storage request of server A 204 .
  • a copy of the request queue of the various servers of the cluster 200 are stored on the shared data 202 .
  • another server could also take over in the event of failure of one of the server nodes.
  • each path and element within the cluster 200 is designated to be in one of three modes: active (operative and in use); passive (operative but not in use); and failed (inoperative). Failed paths are routed around. Other elements are tasked with the work of a failed element. In other words, when a node within the cluster 200 cannot complete its task (either because the node is inoperative, or its connections to its clients are inoperative) then the outstanding transaction queue of the failed node is transferred to one or more surviving nodes using the cluster heartbeat 205 .
  • the heartbeat 205 detects whether the path 203 to the storage system is operative and/or if an application on a server node is responsive, and if not, the outstanding requests for that application, or for the node as a whole, can be transferred to one of the other nodes on the cluster 200 .
  • FIGS. 3 and 4 are but illustrative examples of the myriad ways in which the present disclosure can accommodate a failure of one or more elements of the cluster 200 .
  • the example of FIG. 3 does not have the benefit of the present disclosure. Instead, the example of FIG. 3 illustrates what happens when the present disclosure is not implemented.
  • the path 203 a is inoperative, precluding access to the shared data 202 from server A 204 .
  • the server A 204 could not service its request, and the requests tasked to server A 204 would eventually time out (expire).
  • step 304 the time out of the transaction would be detected and the transaction rejected by the cluster.
  • step 306 the transactions that were tasked to server A 204 are re-requested by the one or more clients that initiated the first set of requests.
  • the second set of requests is serviced by a second server B 206 .
  • step 308 the input/output (“I/O”) transaction is received/sent to the common storage (shared data) 202 .
  • the second server B 206 successfully services the requests in step 310 , and the example method ends generally at step 312 .
  • the transaction queue of the first server A 204 is transferred to a second server B 206 within the cluster 200 in step 406 .
  • the I/O transaction is sent to the common (shared data) storage 202 .
  • the second server B 206 can service the transaction queue of the first server A 204 successfully in step 410 and the method ends generally at step 412 .
  • the transaction queues of the various servers 204 , 206 , etc. are copied to the shared data storage 202 . While such duplexing of the transaction queues marginally increases network traffic, it can prove useful if one of the servers 204 , 206 fails completely.
  • the server itself was still functional. Because the server itself was still functional, it was able to transfer its transaction queue to another server via, for example, the heartbeat connection 205 . In this example, however, the server itself is inoperative, and is unable to transfer its transaction queue to another server.
  • a copy of each server's transaction queue is (routinely) stored on the shared data storage 202 .
  • the copy of the transaction queue that resides on the shared data storage 202 is then transferred to the second server B 206 to service the transactions, preferably before the time out on the requesting client device.
  • the device that determines whether or not a server can perform a transaction can be another server, or a specialized device, or a cluster management service that is running on another server or node within the cluster.
  • another server within the cluster detects (through the heartbeat mechanism 205 ) that a problem with one server exists, and that server attempts to handle the failed server's transactions.
  • the second server can receive the failed server's transaction queue via the heartbeat mechanism 205 , or through another route on the network within the cluster 200 , or by obtaining a copy of the transaction queue from the shared data source 202 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

An apparatus, system, and method are provided for causing a failed node in a cluster to transfer its outstanding transaction queue to one or more surviving nodes. The heartbeat of the various cluster nodes is used to monitor the nodes within the cluster as the heartbeat is a dedicated link between the cluster nodes. If a failure is detected anywhere within the cluster node (such as a network section, hardware failure, storage device, or interconnections) then a failover procedure will be initiated. The failover procedure includes transferring the transaction queue from the failed node to one or more other nodes within the cluster so that the transactions can be serviced, preferably before a time out period, so that clients are not prompted to re-request the transaction.

Description

    TECHNICAL FIELD OF THE DISCLOSURE
  • The present disclosure relates, in general, to the field of information handling systems and, more particularly, to computer clusters having a failover mechanism.
  • BACKGROUND OF THE RELATED ART
  • As the value and the use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores and/or communicates information or data for business, personal or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, as well as how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general, or to be configured for a specific user or a specific use, such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information, and may include one or more computer systems, data storage systems, and networking systems, e.g., computer, personal computer workstation, portable computer, computer server, print server, network router, network hub, network switch, storage area network disk array, redundant array of independent disks (“RAID”) system and telecommunications switch.
  • Computers, such as servers or workstations, are often grouped in clusters in order to perform specific tasks. A server cluster is a group of independent servers that is managed as a single system for higher availability, manageability, and scalability. As a minimum, a server cluster is composed of two or more servers that are connected by a network. In addition, the cluster must have a method for each server to access the other server's disk data. Finally, some software application is needed to manage the cluster. One such management tool is the Microsoft Cluster Storage® (“MSCS”) which is produced by the Microsoft Corporation of Redmond, Wash. Clustering typically involves the configuring of a group of independent servers so that the servers appear on a network as a single machine. Often, clusters are managed as a single system, share a common namespace, and are designed specifically to tolerate component failures and to support the addition or subtraction of components in a transparent manner.
  • With the advent of eight-node clusters, cluster configurations with several active nodes (up to eight active nodes) are possible. An active node in a high-available (“HA”) cluster hosts some application, and a passive node waits for an active node to fail so that the passive node can host the failed node's application. Cluster applications have their data on a shared storage area network (“SAN”) attached disks that are accessible by all of the nodes. At any point in time, only the node that hosts an application can own the application's shared disks. In this scenario, where the applications remain spread across different nodes of the cluster, there arises a requirement to have a cluster backup solution that is completely SAN based, using a shared tape library that is accessible by all of the nodes of the cluster. Moreover, there is also a need for the solution to the problem to be failover aware because the applications may reside on different (failover or backup nodes) at different points in time during the backup cycle.
  • When a cluster is recovering from a server failure, the surviving server accesses the failed server's disk data via one of three techniques that computer clusters use to make disk data available to more than one server: shared disks, mirrored disks, and simply not sharing information.
  • The earliest server clusters permitted every server to access every disk. This originally required expensive cabling and switches, plus specialized software and applications. (The specialized software that mediates access to shared disks is generally called a Distributed Lock Manager (“DLM”). Today, standards like SCSI have eliminated the requirement for expensive cabling and switches. However, shared-disk clustering still requires specially modified applications. This means it is not broadly useful for the variety of applications deployed on the millions of servers sold each year. Shared-disk clustering also has inherent limits on scalability since DLM contention grows exponentially as servers are added to the cluster. Examples of shared-disk clustering solutions include Digital VAX Clusters available from Hewlett-Packard Company, of Palo Alto, Calif., and Oracle Parallel Server available from Orade Corporation of Redwood Shores, Calif.
  • A flexible alternative to shared disks is to let each server have its own disks, and to run software that “mirrors” every write from one server to a copy of the data on at least one other server. This useful technique keeps data at a disaster recovery site in sync with a primary server. A large number of disk-mirroring solutions is available today. Many of the mirroring vendors also offer cluster-like HA extensions that can switch workload over to a different server using a mirrored copy of data. However, mirrored-disk failover solutions cannot deliver the scalability benefits of clusters. It is also arguable that mirrored-disk failover solutions can never deliver as high a level of availability and manageability as the shared-disk clustering solutions since there is always a finite amount of time during the mirroring operation in which the data at both servers is not one hundred percent (100%) identical.
  • In response to the limitations of shared-disk clustering, modem cluster solutions employ a “shared nothing” architecture in which each server owns its own disk resources (that is, the servers share “nothing” at any point in time). In case of a server failure, a shared-nothing cluster has software that can transfer ownership of a disk from one server to another. This provides the same high level of availability as shared-disk clusters, and potentially higher scalability since it does not have the inherent bottleneck of a DLM. Best of all, a shared nothing cluster works with standard applications since there's no special disk access requirements.
  • MSCS clusters provide high-availability to customers by providing a server failover capability. If a server goes down due to either a hardware or to a software failure, the remaining nodes within the cluster will assume the load that was being handled by the failed server and will resume operation to the failed server's clients. In order to increase uptime, other techniques to improve fault tolerance within a server, such as hot plug components, redundant adapters and multiple network interfaces, are also implemented on customer environments.
  • When a cluster node receives a request, the node processes that request and returns a result. Within the current MSCS implementation, after the failure of a node, resources will fail over to the remaining nodes only after a series of retries has failed. While the retries are failing, any requests that are resident (queued) in the now-failed cluster node will either timeout or return to the client with an error message. These timeouts or bad returns happened because of the failure of the node. If the client issued the request from a cluster-aware application, the client will have to retry the request after the timeout. However, if the client did not issue the request from a cluster-aware application, the client request will fail, and the client will need to rescend (need to be retried) the request (manually). In either case, however, the timeout or failure is needless because another node in the cluster should have serviced the failed node. There is, therefore, a need in the art for a failover system that will not allow workable requests to be neglected until the timeout period, and there is a further need to relieve the client from retrying a request in case of a node failure.
  • SUMMARY OF THE INVENTION
  • In accordance with the present disclosure, a system and method are provided for transferring the transaction queue of a first server within a cluster to one or more other servers within the same cluster when the first server is unable to perform the transactions. All servers within the cluster are provided with a heartbeat mechanism that is monitored by one or more other servers within the cluster. If a process on a server becomes unstable, or if a problem with the infrastructure of the cluster prevents that server from servicing a transaction request, then all or part of the transaction queue from the first server can be transferred to one or more other servers within the cluster so that the client requests (transactions) can be serviced.
  • According to one aspect of the present disclosure, a method for managing a cluster is provided that enables the servicing of requests even when a node and/or section of the cluster is inoperative. According to another aspect of the present disclosure, a method for employing a heartbeat mechanism between nodes of the cluster enables the detection of problems so that failover operations can be conducted in the event of a failure. According to another aspect of the present disclosure, during a failover operation, a transaction queue from one server can be moved to another server, or to another set of servers. Similarly, a copy of the transaction queues for each of the servers within the cluster can be stored in a shared source so that, if one server fails completely and is unable to transfer its transaction queue to another server, the copy of the transaction queue that is stored in the shared data source can be transferred to one or more servers so that the failed server's transactions can be completed by another server.
  • In one embodiment, the system may also include a plurality of computing platforms communicatively coupled to the first node. These computing platforms may be, for example, a collection of networked personal computers and/or a set of server computers. The system may also include a Fibre Channel (“FC”) switch communicatively coupled to the first node and to a plurality of storage resources. The FC switch may, in some embodiments, include a central processing unit operable to execute a resource management engine. A system and method incorporating teachings of the present disclosure may provide significant improvements over conventional cluster resource backup/failover solutions. In addition, the teachings of the present disclosure may facilitate other ways to reallocate workload among servers and/or nodes within a cluster in case of a failure of any node or portion of the cluster infrastructure. Other technical advantages should be apparent to one of ordinary skill in the art in view of the specification, claims, and drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • A more complete understanding of the present disclosure and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
  • FIG. 1 is a block diagram illustrating an information handling system capable of implementing all or part of the present disclosure;
  • FIG. 2 is a block diagram illustrating a two-node failover cluster;
  • FIG. 3 is a block diagram illustrating a two-node failover cluster with a failed storage path; and
  • FIG. 4 is a block diagram illustrating a two-node failover cluster implementing an embodiment of the present disclosure.
  • The present disclosure may be susceptible to various modifications and alternative forms. Specific exemplary embodiments thereof are shown by way of example in the drawing and are described herein in detail. It should be understood, however, that the description set forth herein of specific embodiments is not intended to limit the present disclosure to the particular forms disclosed. Rather, all modifications, alternatives, and equivalents falling within the spirit and scope of the invention as defined by the appended claims are intended to be covered.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The present disclosure provides a cluster with a set of nodes, each node being capable of transferring its outstanding transaction queue to the surviving nodes using the cluster heartbeat. The cluster heartbeat is a dedicated link between the cluster nodes which tells every other node that the node is active and operating properly. If a failure of a node is detected within a cluster node (e.g., network, hardware, storage, interconnects, etc.), then a failover will be initiated. The present disclosure relates to all conditions where the cluster heartbeat is still intact and the failing node is still able to communicate to other node(s) in the cluster. Examples of such a failure are failure of a path to the storage system, and failure of an application. With the present disclosure, the surviving nodes can serve outstanding client- requests after assuming the load from the failed node without waiting until after the requests timeout. Thus, present disclosure helps make non-cluster-aware clients survive a cluster node failure. Instead of becoming disconnected because of a failed request, the client can switch the connection to the new node. Elements of the present disclosure can be implemented on a computer system as illustrated in FIG. 1.
  • Referring to FIG. 1, depicted is an information handling system having electronic components mounted on at least one printed circuit board (“PCB”) (not shown) and communicating data and control signals therebetween over signal buses. In one embodiment, the information handling system is a computer system. The information handling system, generally referenced by the numeral 100, comprises processors 110 and associated voltage regulator modules (“VRMs”) 112 configured as processor nodes 108. There may be one or more processors 110, VRMs 112 and processor nodes 108, illustrated in FIG. 1 by nodes 110 a and 110 b, 112 a and 112 b and 108 a and 108 b, respectively. A north bridge 140, which may also be referred to as a “memory controller hub” or a “memory controller,” is coupled to a main system memory 150. The north bridge 140 is coupled to the processors 110 via the host bus 120. The north bridge 140 is generally considered an application specific chip set that provides connectivity to various buses, and integrates other system functions such as memory interface. For example, an INTEL® 820E and/or INTEL® 815E chip set, available from the Intel Corporation of Santa Clara, Calif., provides at least a portion of the north bridge 140. The chip set may also be packaged as an application specific integrated circuit (“ASIC”). The north bridge 140 typically includes functionality to couple the main system memory 150 to other devices within the information handling system 100. Thus, memory controller functions such as main memory control functions typically reside in the north bridge 140. In addition, the north bridge 140 provides bus control to handle transfers between the host bus 120 and a second bus(es), e.g., PCI bus 170 and AGP bus 171, the AGP bus 171 being coupled to the AGP video 172 and/or the video display 174. The second bus may also comprise other industry standard buses or proprietary buses, e.g., ISA, SCSI, USB buses 168 through a south bridge (bus interface) 162. These secondary buses 168 may have their own interfaces and controllers, e.g., RAID storage system 160 and input/output interface(s) 164. Finally, a BIOS 180 is operative with the information handling system 100, as illustrated in FIG. 1. The information handling system 100 can be combined with other like systems to form larger systems. Moreover, the information handling system 100 can be combined with other elements, such as networking elements, to form even larger and more complex information handling systems.
  • FIG. 2 illustrates a two-node failover cluster. At the base of the cluster 200 is the shared storage unit 202. The shared storage unit can be a storage area network (“SAN”) or other device that can store information emanating from the server A (node) 204 and server B (node) 206. The servers 204 and 206 are connected to the shared data 202 via storage interconnections 203 a and 203 b, as illustrated in FIG. 2. The server nodes 204 and 206 are connected to a virtual cluster internet protocol (“IP”) address 211 via, for example, Ethernet interconnections 207 a and 207 b. Clients 210 that issue requests to and receive responses from the cluster 200 are connected to the virtual cluster IP address 211 by, for example, the local area network 212. Between the servers is the cluster heartbeat signal channel 205. Each of the servers of the cluster has at least one heartbeat line 205 that is monitored by at least one other server.
  • In one embodiment of the present disclosure, the heartbeat 205 is used to determine, for example, some type of application failure, or some type of failure of the paths emanating from the server in question, such as the path 203 to the shared data 202, or the path 207 to the virtual cluster IP address 211. For example, if the path 207 a had failed, the server A 204 would still be operative, but it would not be able to send the results back to the clients 210 because the path to the virtual cluster IP address 211 was blocked. In those cases, the heartbeat from the server A 204 would still be active (as determined by the server B 206 via the heartbeat connection 205) so that the server B 206 could either take over the workload of the server A 204, or server A 204 could merely be commanded to convey the results of its work through server B 206 and its operative connection 207 b to the client 210 before timeout of the request. Similarly, if the connection 203 a between the server A 204 and the shared data 202 becomes inoperative, server A 204 would not be able to store its results, but could, via the heartbeat connection 205, convey its connection problems to server B 206, which could then service the storage request of server A 204.
  • In another embodiment of the present disclosure, a copy of the request queue of the various servers of the cluster 200 are stored on the shared data 202. In that case, not only can another server take over for the failed sever in case of interruption of the return path (to the virtual cluster IP address 211) or the storage interconnect (to the shared data 202), but another server could also take over in the event of failure of one of the server nodes.
  • In operation, each path and element (e.g., a node) within the cluster 200 is designated to be in one of three modes: active (operative and in use); passive (operative but not in use); and failed (inoperative). Failed paths are routed around. Other elements are tasked with the work of a failed element. In other words, when a node within the cluster 200 cannot complete its task (either because the node is inoperative, or its connections to its clients are inoperative) then the outstanding transaction queue of the failed node is transferred to one or more surviving nodes using the cluster heartbeat 205. The heartbeat 205 detects whether the path 203 to the storage system is operative and/or if an application on a server node is responsive, and if not, the outstanding requests for that application, or for the node as a whole, can be transferred to one of the other nodes on the cluster 200.
  • The operation of the present disclosure is better understood with the aid of the flowcharts of FIGS. 3 and 4. It should be noted that the flowcharts of FIGS. 3 and 4 are but illustrative examples of the myriad ways in which the present disclosure can accommodate a failure of one or more elements of the cluster 200. The example of FIG. 3 does not have the benefit of the present disclosure. Instead, the example of FIG. 3 illustrates what happens when the present disclosure is not implemented. In this example scenario 300, the path 203 a is inoperative, precluding access to the shared data 202 from server A 204. In that scenario, the server A 204 could not service its request, and the requests tasked to server A 204 would eventually time out (expire). Thus, in step 304, the time out of the transaction would be detected and the transaction rejected by the cluster. In step 306, the transactions that were tasked to server A 204 are re-requested by the one or more clients that initiated the first set of requests. However, unlike the first set of requests, the second set of requests is serviced by a second server B 206. In step 308, the input/output (“I/O”) transaction is received/sent to the common storage (shared data) 202. Finally, the second server B 206 successfully services the requests in step 310, and the example method ends generally at step 312.
  • The example of FIG. 4 illustrates a different result to the same scenario of FIG. 3 when the present disclosure is implemented. The method 400 begins generally at step 402. As before, the path 203 a to the shared data 202 is inoperative, preventing access to the shared data 202 by the first server A 204. As the first server A 204 cannot access the shared data 202, it cannot service its requests. It will be understood that other failures, such as an application on the first server A 204, or failure of the server A 204 itself can cause similar problems. In any case, the heartbeat mechanism 205 will detect the problem with the first server 204 in step 404 of FIG. 4. Once the problem has been detected, the transaction queue of the first server A 204 is transferred to a second server B 206 within the cluster 200 in step 406. In step 408, the I/O transaction is sent to the common (shared data) storage 202. With access to the shared data storage 202, the second server B 206 can service the transaction queue of the first server A 204 successfully in step 410 and the method ends generally at step 412. It will be understood that the ability to detect problems in one server, and then to transfer transaction queues of the affected server to one or more other servers within the cluster, can overcome a myriad number of failures and other problems besides the ones described herein.
  • In another embodiment of the present disclosure, the transaction queues of the various servers 204, 206, etc., are copied to the shared data storage 202. While such duplexing of the transaction queues marginally increases network traffic, it can prove useful if one of the servers 204, 206 fails completely. In the scenarios described above, although a process on the server, or a portion of the cluster infrastructure servicing the server, were unstable or inoperative, the server itself was still functional. Because the server itself was still functional, it was able to transfer its transaction queue to another server via, for example, the heartbeat connection 205. In this example, however, the server itself is inoperative, and is unable to transfer its transaction queue to another server. To recover from such a failure, a copy of each server's transaction queue is (routinely) stored on the shared data storage 202. In case the first server A 204 fails completely (i.e., in a way that it is unable to transfer the transaction queue to another server) as detected by another sever via the heartbeat mechanism 205, the copy of the transaction queue that resides on the shared data storage 202 is then transferred to the second server B 206 to service the transactions, preferably before the time out on the requesting client device. As with the other examples noted above, the device that determines whether or not a server can perform a transaction (or trigger a failover event) can be another server, or a specialized device, or a cluster management service that is running on another server or node within the cluster. In the examples above, another server within the cluster detects (through the heartbeat mechanism 205) that a problem with one server exists, and that server attempts to handle the failed server's transactions. The second server can receive the failed server's transaction queue via the heartbeat mechanism 205, or through another route on the network within the cluster 200, or by obtaining a copy of the transaction queue from the shared data source 202.
  • The invention, therefore, is well adapted to carry out the objects and to attain the ends and advantages mentioned, as well as others inherent therein. While the invention has been depicted, described, and is defined by reference to exemplary embodiments of the invention, such references do not imply a limitation on the invention, and no such limitation is to be inferred. The invention is capable of considerable modification, alteration, and equivalents in form and function, as will occur to those ordinarily skilled in the pertinent arts and having the benefit of this disclosure. The depicted and described embodiments of the invention are exemplary only, and are not exhaustive of the scope of the invention. Consequently, the invention is intended to be limited only by the spirit and scope of the appended claims, giving full cognizance to equivalents in all respects.

Claims (11)

1. A method for failover in a cluster having two or more servers, the two or more servers operative with each other by a heartbeat mechanism, comprising:
detecting a failure of a first server of the two or more servers;
transferring a transaction queue from the first server to a second server of the two or more servers; and
servicing the transactions of the transaction queue of the first server by the second server.
2. The method of claim 1, wherein detecting comprises detecting a failure via the heartbeat mechanism.
3. The method of claim 2, wherein the failure is an unstable application.
4. The method of claim 2, wherein the failure is a data path.
5. The method of claim 1, wherein transferring comprises:
forwarding the transaction queue from the first server to the second server via the heartbeat mechanism.
6. The method of claim 1, wherein transferring comprises:
forwarding the transaction queue from the first server to the second server via a network of the cluster.
7. A method for failover of a sever in a cluster having two or more servers, the two or more servers operative with each other by a heartbeat mechanism, comprising:
copying a transaction queue from a first of the two or more servers to a shared storage device;
detecting a failure of the first server;
transferring the transaction queue from the shared storage device to a second server of the two or more servers; and
servicing the transactions of the transaction queue of the first server by the second server.
8. The method of claim 7, wherein detecting comprises detecting a failure via the heartbeat mechanism.
9. The method of claim 8, wherein the failure is an unstable application.
10. The method of claim 8, wherein the failure is a data path.
11. The method of claim 7, wherein transferring comprises:
forwarding the transaction queue from the shared data source to the second server via a network of the cluster.
US10/730,349 2003-12-08 2003-12-08 Transaction transfer during a failover of a cluster controller Abandoned US20050125557A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/730,349 US20050125557A1 (en) 2003-12-08 2003-12-08 Transaction transfer during a failover of a cluster controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/730,349 US20050125557A1 (en) 2003-12-08 2003-12-08 Transaction transfer during a failover of a cluster controller

Publications (1)

Publication Number Publication Date
US20050125557A1 true US20050125557A1 (en) 2005-06-09

Family

ID=34634142

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/730,349 Abandoned US20050125557A1 (en) 2003-12-08 2003-12-08 Transaction transfer during a failover of a cluster controller

Country Status (1)

Country Link
US (1) US20050125557A1 (en)

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138462A1 (en) * 2003-12-23 2005-06-23 Nokia, Inc. System and method for managing protocol network failures in a cluster system
US20050262143A1 (en) * 2004-05-21 2005-11-24 Rao Sudhir G Lock acquisition among nodes of divided cluster
US20050283636A1 (en) * 2004-05-14 2005-12-22 Dell Products L.P. System and method for failure recovery in a cluster network
US20050289540A1 (en) * 2004-06-24 2005-12-29 Lu Nguyen Providing on-demand capabilities using virtual machines and clustering processes
GB2415857A (en) * 2004-06-30 2006-01-04 Zarlink Semiconductor Inc Rapid end-to-end failover in a packet switched network
US20060041681A1 (en) * 2000-12-18 2006-02-23 Shaw Parsing, Llc Techniques for delivering personalized content with a real-time routing network
US20060075279A1 (en) * 2004-08-17 2006-04-06 Shaw Parsing, Llc Techniques for upstream failure detection and failure recovery
US20060117318A1 (en) * 2004-08-17 2006-06-01 Shaw Parsing, Llc Modular event-driven processing
US7114094B2 (en) 2004-01-09 2006-09-26 Hitachi, Ltd. Information processing system for judging if backup at secondary site is necessary upon failover
US20070050519A1 (en) * 2000-12-18 2007-03-01 Cano Charles E Storing state in a dynamic content routing network
US20070174723A1 (en) * 2006-01-18 2007-07-26 Omar Cardona Sub-second, zero-packet loss adapter failover
US20070239822A1 (en) * 2000-12-18 2007-10-11 Timothy Tuttle Asynchronous messaging using a node specialization architecture in the dynamic routing network
US20070294596A1 (en) * 2006-05-22 2007-12-20 Gissel Thomas R Inter-tier failure detection using central aggregation point
US20080215909A1 (en) * 2004-04-14 2008-09-04 International Business Machines Corporation Apparatus, system, and method for transactional peer recovery in a data sharing clustering computer system
US20080250266A1 (en) * 2007-04-06 2008-10-09 Cisco Technology, Inc. Logical partitioning of a physical device
US20090158082A1 (en) * 2007-12-18 2009-06-18 Vinit Jain Failover in a host concurrently supporting multiple virtual ip addresses across multiple adapters
US20100042869A1 (en) * 2008-08-18 2010-02-18 F5 Networks, Inc. Upgrading network traffic management devices while maintaining availability
US20100125557A1 (en) * 2008-11-17 2010-05-20 Microsoft Corporation Origination based conflict detection in peer-to-peer replication
US7730489B1 (en) * 2003-12-10 2010-06-01 Oracle America, Inc. Horizontally scalable and reliable distributed transaction management in a clustered application server environment
US7760719B2 (en) 2004-06-30 2010-07-20 Conexant Systems, Inc. Combined pipelined classification and address search method and apparatus for switching environments
US20100235488A1 (en) * 2004-11-08 2010-09-16 Cisco Technology, Inc. High availability for intelligent applications in storage networks
US20100251237A1 (en) * 2009-03-31 2010-09-30 International Business Machines Corporation Managing orphaned requests in a multi-server environment
US7814364B2 (en) 2006-08-31 2010-10-12 Dell Products, Lp On-demand provisioning of computer resources in physical/virtual cluster environments
US20110131448A1 (en) * 2009-11-30 2011-06-02 Iron Mountain, Incorporated Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
US20120124431A1 (en) * 2010-11-17 2012-05-17 Alcatel-Lucent Usa Inc. Method and system for client recovery strategy in a redundant server configuration
US20120159241A1 (en) * 2010-12-16 2012-06-21 Hitachi, Ltd. Information processing system
US20120166639A1 (en) * 2005-10-25 2012-06-28 Oracle International Corporation Multipath Routing Process
US20120173919A1 (en) * 2010-01-06 2012-07-05 Burzin Patel System and method for creating and maintaining secondary server sites
US8261286B1 (en) 2008-06-18 2012-09-04 Amazon Technologies, Inc. Fast sequential message store
US20130268495A1 (en) * 2012-04-09 2013-10-10 Microsoft Corporation Split brain protection in computer clusters
US8583840B1 (en) 2012-04-25 2013-11-12 Lsi Corporation Methods and structure for determining mapping information inconsistencies in I/O requests generated for fast path circuits of a storage controller
US20130332507A1 (en) * 2012-06-06 2013-12-12 International Business Machines Corporation Highly available servers
US8621603B2 (en) 2011-09-09 2013-12-31 Lsi Corporation Methods and structure for managing visibility of devices in a clustered storage system
US20140115176A1 (en) * 2012-10-22 2014-04-24 Cassidian Communications, Inc. Clustered session management
US20140164485A1 (en) * 2005-04-29 2014-06-12 Netapp, Inc. Caching of data requests in session-based environment
US9154367B1 (en) * 2011-12-27 2015-10-06 Google Inc. Load balancing and content preservation
US20160149801A1 (en) * 2013-06-13 2016-05-26 Tsx Inc. Apparatus and method for failover of device interconnect using remote memory access with segmented queue
US9395923B1 (en) * 2013-09-27 2016-07-19 Emc Corporation Method and system for recovering from embedded errors from writing data to streaming media
US9509842B2 (en) 2011-06-17 2016-11-29 Airbus Ds Communications, Inc. Collaborative and distributed emergency multimedia data management
US9800481B1 (en) * 2016-10-20 2017-10-24 International Business Machines Corporation Communicating health status when a management console is unavailable for a server in a mirror storage environment
US20180052747A1 (en) * 2016-08-19 2018-02-22 Bank Of America Corporation System for increasing intra-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US20180157429A1 (en) * 2016-12-06 2018-06-07 Dell Products L.P. Seamless data migration in a clustered environment
US20180367618A1 (en) * 2017-06-19 2018-12-20 Sap Se Event processing in background services
US10180881B2 (en) 2016-08-19 2019-01-15 Bank Of America Corporation System for increasing inter-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US10270654B2 (en) 2016-08-19 2019-04-23 Bank Of America Corporation System for increasing computing efficiency of communication between applications running on networked machines
US10362131B1 (en) * 2008-06-18 2019-07-23 Amazon Technologies, Inc. Fault tolerant message delivery
US10382380B1 (en) 2016-11-17 2019-08-13 Amazon Technologies, Inc. Workload management service for first-in first-out queues for network-accessible queuing and messaging services
US10691513B1 (en) * 2010-02-03 2020-06-23 Twitter, Inc. Distributed message queue with best consumer discovery and area preference
US10970177B2 (en) * 2017-08-18 2021-04-06 Brian J. Bulkowski Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
US10996993B2 (en) 2019-06-20 2021-05-04 Western Digital Technologies, Inc. Adaptive work distribution in distributed systems
CN113434345A (en) * 2021-06-15 2021-09-24 浙江大华技术股份有限公司 Method, cluster, equipment, platform and storage medium for hardware cluster failure management
US20220129357A1 (en) * 2020-10-27 2022-04-28 Hitachi, Ltd. Cluster system and fail-over control method of cluster system
US11416354B2 (en) * 2019-09-05 2022-08-16 EMC IP Holding Company LLC Techniques for providing intersite high availability of data nodes in a virtual cluster

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5906658A (en) * 1996-03-19 1999-05-25 Emc Corporation Message queuing on a data storage system utilizing message queuing in intended recipient's queue
US6246666B1 (en) * 1998-04-09 2001-06-12 Compaq Computer Corporation Method and apparatus for controlling an input/output subsystem in a failed network server
US6360331B2 (en) * 1998-04-17 2002-03-19 Microsoft Corporation Method and system for transparently failing over application configuration information in a server cluster
US6393485B1 (en) * 1998-10-27 2002-05-21 International Business Machines Corporation Method and apparatus for managing clustered computer systems
US6438705B1 (en) * 1999-01-29 2002-08-20 International Business Machines Corporation Method and apparatus for building and managing multi-clustered computer systems
US20020129146A1 (en) * 2001-02-06 2002-09-12 Eyal Aronoff Highly available database clusters that move client connections between hosts
US6484276B1 (en) * 1999-10-25 2002-11-19 Lucent Technologies Inc. Method and apparatus for providing extensible object-oriented fault injection
US6490610B1 (en) * 1997-05-30 2002-12-03 Oracle Corporation Automatic failover for clients accessing a resource through a server
US20020188711A1 (en) * 2001-02-13 2002-12-12 Confluence Networks, Inc. Failover processing in a storage system
US6539494B1 (en) * 1999-06-17 2003-03-25 Art Technology Group, Inc. Internet server session backup apparatus
US20030061537A1 (en) * 2001-07-16 2003-03-27 Cha Sang K. Parallelized redo-only logging and recovery for highly available main memory database systems
US6609213B1 (en) * 2000-08-10 2003-08-19 Dell Products, L.P. Cluster-based system and method of recovery from server failures
US20030237018A1 (en) * 2002-06-25 2003-12-25 Hitachi, Ltd. Server takeover system and method
US6718383B1 (en) * 2000-06-02 2004-04-06 Sun Microsystems, Inc. High availability networking with virtual IP address failover
US6760859B1 (en) * 2000-05-23 2004-07-06 International Business Machines Corporation Fault tolerant local area network connectivity
US6763479B1 (en) * 2000-06-02 2004-07-13 Sun Microsystems, Inc. High availability networking with alternate pathing failover
US6789213B2 (en) * 2000-01-10 2004-09-07 Sun Microsystems, Inc. Controlled take over of services by remaining nodes of clustered computing system
US20040225915A1 (en) * 2003-05-09 2004-11-11 Hewlett-Packard Development Company, L.P. Minimum latency reinstatement of database transaction locks
US6832298B2 (en) * 2001-10-24 2004-12-14 Hitachi, Ltd. Server system operation control method
US6862613B1 (en) * 2000-01-10 2005-03-01 Sun Microsystems, Inc. Method and apparatus for managing operations of clustered computer systems
US6915445B2 (en) * 2002-05-08 2005-07-05 Pluris, Inc. Fault-protection mechanism for protecting multi-protocol-label switching (MPLS) capability within a distributed processor router operating in an MPLS network
US6922791B2 (en) * 2001-08-09 2005-07-26 Dell Products L.P. Failover system and method for cluster environment
US6934875B2 (en) * 2000-12-29 2005-08-23 International Business Machines Corporation Connection cache for highly available TCP systems with fail over connections
US7055053B2 (en) * 2004-03-12 2006-05-30 Hitachi, Ltd. System and method for failover
US7124320B1 (en) * 2002-08-06 2006-10-17 Novell, Inc. Cluster failover via distributed configuration repository
US7231391B2 (en) * 2001-02-06 2007-06-12 Quest Software, Inc. Loosely coupled database clusters with client connection fail-over
US7302607B2 (en) * 2003-08-29 2007-11-27 International Business Machines Corporation Two node virtual shared disk cluster recovery

Patent Citations (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5906658A (en) * 1996-03-19 1999-05-25 Emc Corporation Message queuing on a data storage system utilizing message queuing in intended recipient's queue
US6490610B1 (en) * 1997-05-30 2002-12-03 Oracle Corporation Automatic failover for clients accessing a resource through a server
US6246666B1 (en) * 1998-04-09 2001-06-12 Compaq Computer Corporation Method and apparatus for controlling an input/output subsystem in a failed network server
US6360331B2 (en) * 1998-04-17 2002-03-19 Microsoft Corporation Method and system for transparently failing over application configuration information in a server cluster
US6393485B1 (en) * 1998-10-27 2002-05-21 International Business Machines Corporation Method and apparatus for managing clustered computer systems
US6438705B1 (en) * 1999-01-29 2002-08-20 International Business Machines Corporation Method and apparatus for building and managing multi-clustered computer systems
US6539494B1 (en) * 1999-06-17 2003-03-25 Art Technology Group, Inc. Internet server session backup apparatus
US6484276B1 (en) * 1999-10-25 2002-11-19 Lucent Technologies Inc. Method and apparatus for providing extensible object-oriented fault injection
US6862613B1 (en) * 2000-01-10 2005-03-01 Sun Microsystems, Inc. Method and apparatus for managing operations of clustered computer systems
US6789213B2 (en) * 2000-01-10 2004-09-07 Sun Microsystems, Inc. Controlled take over of services by remaining nodes of clustered computing system
US6760859B1 (en) * 2000-05-23 2004-07-06 International Business Machines Corporation Fault tolerant local area network connectivity
US6763479B1 (en) * 2000-06-02 2004-07-13 Sun Microsystems, Inc. High availability networking with alternate pathing failover
US6718383B1 (en) * 2000-06-02 2004-04-06 Sun Microsystems, Inc. High availability networking with virtual IP address failover
US6609213B1 (en) * 2000-08-10 2003-08-19 Dell Products, L.P. Cluster-based system and method of recovery from server failures
US6934875B2 (en) * 2000-12-29 2005-08-23 International Business Machines Corporation Connection cache for highly available TCP systems with fail over connections
US20020129146A1 (en) * 2001-02-06 2002-09-12 Eyal Aronoff Highly available database clusters that move client connections between hosts
US7231391B2 (en) * 2001-02-06 2007-06-12 Quest Software, Inc. Loosely coupled database clusters with client connection fail-over
US7039827B2 (en) * 2001-02-13 2006-05-02 Network Appliance, Inc. Failover processing in a storage system
US20020188711A1 (en) * 2001-02-13 2002-12-12 Confluence Networks, Inc. Failover processing in a storage system
US20030061537A1 (en) * 2001-07-16 2003-03-27 Cha Sang K. Parallelized redo-only logging and recovery for highly available main memory database systems
US6922791B2 (en) * 2001-08-09 2005-07-26 Dell Products L.P. Failover system and method for cluster environment
US6832298B2 (en) * 2001-10-24 2004-12-14 Hitachi, Ltd. Server system operation control method
US6915445B2 (en) * 2002-05-08 2005-07-05 Pluris, Inc. Fault-protection mechanism for protecting multi-protocol-label switching (MPLS) capability within a distributed processor router operating in an MPLS network
US20030237018A1 (en) * 2002-06-25 2003-12-25 Hitachi, Ltd. Server takeover system and method
US7124320B1 (en) * 2002-08-06 2006-10-17 Novell, Inc. Cluster failover via distributed configuration repository
US7100076B2 (en) * 2003-05-09 2006-08-29 Hewlett-Packard Development Company, L.P. Minimum latency reinstatement of database transaction locks
US20040225915A1 (en) * 2003-05-09 2004-11-11 Hewlett-Packard Development Company, L.P. Minimum latency reinstatement of database transaction locks
US7302607B2 (en) * 2003-08-29 2007-11-27 International Business Machines Corporation Two node virtual shared disk cluster recovery
US7055053B2 (en) * 2004-03-12 2006-05-30 Hitachi, Ltd. System and method for failover

Cited By (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110161458A1 (en) * 2000-12-18 2011-06-30 Shaw Parsing, Llc Techniques For Delivering Personalized Content With A Real-Time Routing Network
US20060041681A1 (en) * 2000-12-18 2006-02-23 Shaw Parsing, Llc Techniques for delivering personalized content with a real-time routing network
US7814225B2 (en) 2000-12-18 2010-10-12 Rumelhart Karl E Techniques for delivering personalized content with a real-time routing network
US9613076B2 (en) 2000-12-18 2017-04-04 Zarbaña Digital Fund Llc Storing state in a dynamic content routing network
US20070239822A1 (en) * 2000-12-18 2007-10-11 Timothy Tuttle Asynchronous messaging using a node specialization architecture in the dynamic routing network
US8505024B2 (en) 2000-12-18 2013-08-06 Shaw Parsing Llc Storing state in a dynamic content routing network
US9071648B2 (en) 2000-12-18 2015-06-30 Shaw Parsing L.L.C. Asynchronous messaging using a node specialization architecture in the dynamic routing network
US8407722B2 (en) 2000-12-18 2013-03-26 Shaw Parsing L.L.C. Asynchronous messaging using a node specialization architecture in the dynamic routing network
US10860567B2 (en) 2000-12-18 2020-12-08 Zarbaña Digital Fund Llc Storing state in a dynamic content routing network
US7930362B2 (en) 2000-12-18 2011-04-19 Shaw Parsing, Llc Techniques for delivering personalized content with a real-time routing network
US20070050519A1 (en) * 2000-12-18 2007-03-01 Cano Charles E Storing state in a dynamic content routing network
US20070033293A1 (en) * 2000-12-18 2007-02-08 Shaw Parsing, L.L.C. Techniques for delivering personalized content with a real-time routing network
US7730489B1 (en) * 2003-12-10 2010-06-01 Oracle America, Inc. Horizontally scalable and reliable distributed transaction management in a clustered application server environment
US20050138462A1 (en) * 2003-12-23 2005-06-23 Nokia, Inc. System and method for managing protocol network failures in a cluster system
US7257731B2 (en) * 2003-12-23 2007-08-14 Nokia Inc. System and method for managing protocol network failures in a cluster system
US7114094B2 (en) 2004-01-09 2006-09-26 Hitachi, Ltd. Information processing system for judging if backup at secondary site is necessary upon failover
US7870426B2 (en) * 2004-04-14 2011-01-11 International Business Machines Corporation Apparatus, system, and method for transactional peer recovery in a data sharing clustering computer system
US20080215909A1 (en) * 2004-04-14 2008-09-04 International Business Machines Corporation Apparatus, system, and method for transactional peer recovery in a data sharing clustering computer system
US20050283636A1 (en) * 2004-05-14 2005-12-22 Dell Products L.P. System and method for failure recovery in a cluster network
US20050262143A1 (en) * 2004-05-21 2005-11-24 Rao Sudhir G Lock acquisition among nodes of divided cluster
US20050289540A1 (en) * 2004-06-24 2005-12-29 Lu Nguyen Providing on-demand capabilities using virtual machines and clustering processes
US7577959B2 (en) * 2004-06-24 2009-08-18 International Business Machines Corporation Providing on-demand capabilities using virtual machines and clustering processes
US7813263B2 (en) 2004-06-30 2010-10-12 Conexant Systems, Inc. Method and apparatus providing rapid end-to-end failover in a packet switched communications network
US20060002292A1 (en) * 2004-06-30 2006-01-05 Zarlink Semiconductor Inc. Method and apparatus providing rapid end-to-end failover in a packet switched communications network
GB2415857B (en) * 2004-06-30 2006-09-20 Zarlink Semiconductor Inc Methods and apparatus providing rapid end-to-end failover in a packet switched communications network
US7760719B2 (en) 2004-06-30 2010-07-20 Conexant Systems, Inc. Combined pipelined classification and address search method and apparatus for switching environments
GB2415857A (en) * 2004-06-30 2006-01-04 Zarlink Semiconductor Inc Rapid end-to-end failover in a packet switched network
US9043635B2 (en) * 2004-08-17 2015-05-26 Shaw Parsing, Llc Techniques for upstream failure detection and failure recovery
US8397237B2 (en) 2004-08-17 2013-03-12 Shaw Parsing, L.L.C. Dynamically allocating threads from a thread pool to thread boundaries configured to perform a service for an event
US8356305B2 (en) 2004-08-17 2013-01-15 Shaw Parsing, L.L.C. Thread boundaries comprising functionalities for an event by a single thread and tasks associated with the thread boundaries configured in a defined relationship
US20070061811A1 (en) * 2004-08-17 2007-03-15 Shaw Parsing, L.L.C. Modular Event-Driven Processing
US20060117318A1 (en) * 2004-08-17 2006-06-01 Shaw Parsing, Llc Modular event-driven processing
US20060075279A1 (en) * 2004-08-17 2006-04-06 Shaw Parsing, Llc Techniques for upstream failure detection and failure recovery
US20100235488A1 (en) * 2004-11-08 2010-09-16 Cisco Technology, Inc. High availability for intelligent applications in storage networks
US8332501B2 (en) * 2004-11-08 2012-12-11 Cisco Technology, Inc. High availability for intelligent applications in storage networks
US20140164485A1 (en) * 2005-04-29 2014-06-12 Netapp, Inc. Caching of data requests in session-based environment
US8706906B2 (en) * 2005-10-25 2014-04-22 Oracle International Corporation Multipath routing process
US20120166639A1 (en) * 2005-10-25 2012-06-28 Oracle International Corporation Multipath Routing Process
US20070174723A1 (en) * 2006-01-18 2007-07-26 Omar Cardona Sub-second, zero-packet loss adapter failover
US20070294596A1 (en) * 2006-05-22 2007-12-20 Gissel Thomas R Inter-tier failure detection using central aggregation point
US7814364B2 (en) 2006-08-31 2010-10-12 Dell Products, Lp On-demand provisioning of computer resources in physical/virtual cluster environments
US8949662B2 (en) 2007-04-06 2015-02-03 Cisco Technology, Inc. Logical partitioning of a physical device
US20080250266A1 (en) * 2007-04-06 2008-10-09 Cisco Technology, Inc. Logical partitioning of a physical device
US8225134B2 (en) * 2007-04-06 2012-07-17 Cisco Technology, Inc. Logical partitioning of a physical device
US20090158082A1 (en) * 2007-12-18 2009-06-18 Vinit Jain Failover in a host concurrently supporting multiple virtual ip addresses across multiple adapters
US7913106B2 (en) * 2007-12-18 2011-03-22 International Business Machines Corporation Failover in a host concurrently supporting multiple virtual IP addresses across multiple adapters
US10362131B1 (en) * 2008-06-18 2019-07-23 Amazon Technologies, Inc. Fault tolerant message delivery
US8261286B1 (en) 2008-06-18 2012-09-04 Amazon Technologies, Inc. Fast sequential message store
US9485324B2 (en) 2008-06-18 2016-11-01 Amazon Technologies, Inc. Fast sequential message store
US8763013B2 (en) 2008-06-18 2014-06-24 Amazon Technologies, Inc. Fast sequential message store
US8209403B2 (en) 2008-08-18 2012-06-26 F5 Networks, Inc. Upgrading network traffic management devices while maintaining availability
US8438253B2 (en) 2008-08-18 2013-05-07 F5 Networks, Inc. Upgrading network traffic management devices while maintaining availability
US20100042869A1 (en) * 2008-08-18 2010-02-18 F5 Networks, Inc. Upgrading network traffic management devices while maintaining availability
US20100125557A1 (en) * 2008-11-17 2010-05-20 Microsoft Corporation Origination based conflict detection in peer-to-peer replication
US20100251237A1 (en) * 2009-03-31 2010-09-30 International Business Machines Corporation Managing orphaned requests in a multi-server environment
US8312100B2 (en) * 2009-03-31 2012-11-13 International Business Machines Corporation Managing orphaned requests in a multi-server environment
US8549536B2 (en) 2009-11-30 2013-10-01 Autonomy, Inc. Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
US20110131448A1 (en) * 2009-11-30 2011-06-02 Iron Mountain, Incorporated Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
EP2357559A1 (en) * 2009-11-30 2011-08-17 Iron Mountain Incorporated Performing a workflow having a set of dependancy-related predefined activities on a plurality of task servers
US20120173919A1 (en) * 2010-01-06 2012-07-05 Burzin Patel System and method for creating and maintaining secondary server sites
US9372809B2 (en) 2010-01-06 2016-06-21 Storsimple, Inc. System and method for storing data off site
US9189421B2 (en) 2010-01-06 2015-11-17 Storsimple, Inc. System and method for implementing a hierarchical data storage system
US9110837B2 (en) * 2010-01-06 2015-08-18 Storsimple, Inc. System and method for creating and maintaining secondary server sites
US10691513B1 (en) * 2010-02-03 2020-06-23 Twitter, Inc. Distributed message queue with best consumer discovery and area preference
US20120124431A1 (en) * 2010-11-17 2012-05-17 Alcatel-Lucent Usa Inc. Method and system for client recovery strategy in a redundant server configuration
US20120159241A1 (en) * 2010-12-16 2012-06-21 Hitachi, Ltd. Information processing system
US9509842B2 (en) 2011-06-17 2016-11-29 Airbus Ds Communications, Inc. Collaborative and distributed emergency multimedia data management
US8793443B2 (en) 2011-09-09 2014-07-29 Lsi Corporation Methods and structure for improved buffer allocation in a storage controller
US9052829B2 (en) 2011-09-09 2015-06-09 Avago Technologies General IP Singapore) Pte Ltd Methods and structure for improved I/O shipping in a clustered storage system
US8898385B2 (en) 2011-09-09 2014-11-25 Lsi Corporation Methods and structure for load balancing of background tasks between storage controllers in a clustered storage environment
US8839030B2 (en) 2011-09-09 2014-09-16 Lsi Corporation Methods and structure for resuming background tasks in a clustered storage environment
US9134913B2 (en) 2011-09-09 2015-09-15 Avago Technologies General Ip (Singapore) Pte Ltd Methods and structure for improved processing of I/O requests in fast path circuits of a storage controller in a clustered storage system
US8806124B2 (en) 2011-09-09 2014-08-12 Lsi Corporation Methods and structure for transferring ownership of a logical volume by transfer of native-format metadata in a clustered storage environment
US8621603B2 (en) 2011-09-09 2013-12-31 Lsi Corporation Methods and structure for managing visibility of devices in a clustered storage system
US8751741B2 (en) 2011-09-09 2014-06-10 Lsi Corporation Methods and structure for implementing logical device consistency in a clustered storage system
US8984222B2 (en) 2011-09-09 2015-03-17 Lsi Corporation Methods and structure for task management in storage controllers of a clustered storage system
US9154367B1 (en) * 2011-12-27 2015-10-06 Google Inc. Load balancing and content preservation
US9146705B2 (en) * 2012-04-09 2015-09-29 Microsoft Technology, LLC Split brain protection in computer clusters
US20130268495A1 (en) * 2012-04-09 2013-10-10 Microsoft Corporation Split brain protection in computer clusters
US8583840B1 (en) 2012-04-25 2013-11-12 Lsi Corporation Methods and structure for determining mapping information inconsistencies in I/O requests generated for fast path circuits of a storage controller
US9742676B2 (en) * 2012-06-06 2017-08-22 International Business Machines Corporation Highly available servers
US10819641B2 (en) 2012-06-06 2020-10-27 International Business Machines Corporation Highly available servers
US20130332507A1 (en) * 2012-06-06 2013-12-12 International Business Machines Corporation Highly available servers
US20140115176A1 (en) * 2012-10-22 2014-04-24 Cassidian Communications, Inc. Clustered session management
US9948545B2 (en) * 2013-06-13 2018-04-17 Tsx Inc. Apparatus and method for failover of device interconnect using remote memory access with segmented queue
US20160149801A1 (en) * 2013-06-13 2016-05-26 Tsx Inc. Apparatus and method for failover of device interconnect using remote memory access with segmented queue
US9395923B1 (en) * 2013-09-27 2016-07-19 Emc Corporation Method and system for recovering from embedded errors from writing data to streaming media
US20180052747A1 (en) * 2016-08-19 2018-02-22 Bank Of America Corporation System for increasing intra-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US11106553B2 (en) * 2016-08-19 2021-08-31 Bank Of America Corporation System for increasing intra-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US10180881B2 (en) 2016-08-19 2019-01-15 Bank Of America Corporation System for increasing inter-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US10270654B2 (en) 2016-08-19 2019-04-23 Bank Of America Corporation System for increasing computing efficiency of communication between applications running on networked machines
US10459811B2 (en) * 2016-08-19 2019-10-29 Bank Of America Corporation System for increasing intra-application processing efficiency by transmitting failed processing work over a processing recovery network for resolution
US11025518B2 (en) 2016-10-20 2021-06-01 International Business Machines Corporation Communicating health status when a management console is unavailable
US9800481B1 (en) * 2016-10-20 2017-10-24 International Business Machines Corporation Communicating health status when a management console is unavailable for a server in a mirror storage environment
US10397078B2 (en) 2016-10-20 2019-08-27 International Business Machines Corporation Communicating health status when a management console is unavailable for a server in a mirror storage environment
US10382380B1 (en) 2016-11-17 2019-08-13 Amazon Technologies, Inc. Workload management service for first-in first-out queues for network-accessible queuing and messaging services
US20180157429A1 (en) * 2016-12-06 2018-06-07 Dell Products L.P. Seamless data migration in a clustered environment
US10353640B2 (en) * 2016-12-06 2019-07-16 Dell Products L.P. Seamless data migration in a clustered environment
US10652338B2 (en) * 2017-06-19 2020-05-12 Sap Se Event processing in background services
US20180367618A1 (en) * 2017-06-19 2018-12-20 Sap Se Event processing in background services
US10970177B2 (en) * 2017-08-18 2021-04-06 Brian J. Bulkowski Methods and systems of managing consistency and availability tradeoffs in a real-time operational DBMS
US10996993B2 (en) 2019-06-20 2021-05-04 Western Digital Technologies, Inc. Adaptive work distribution in distributed systems
US11416354B2 (en) * 2019-09-05 2022-08-16 EMC IP Holding Company LLC Techniques for providing intersite high availability of data nodes in a virtual cluster
US20220129357A1 (en) * 2020-10-27 2022-04-28 Hitachi, Ltd. Cluster system and fail-over control method of cluster system
US11734133B2 (en) * 2020-10-27 2023-08-22 Hitachi, Ltd. Cluster system and fail-over control method of cluster system
CN113434345A (en) * 2021-06-15 2021-09-24 浙江大华技术股份有限公司 Method, cluster, equipment, platform and storage medium for hardware cluster failure management

Similar Documents

Publication Publication Date Title
US20050125557A1 (en) Transaction transfer during a failover of a cluster controller
US7234075B2 (en) Distributed failover aware storage area network backup of application data in an active-N high availability cluster
US6389555B2 (en) System and method for fail-over data transport
US8443232B1 (en) Automatic clusterwide fail-back
US7275100B2 (en) Failure notification method and system using remote mirroring for clustering systems
US6571354B1 (en) Method and apparatus for storage unit replacement according to array priority
US6134673A (en) Method for clustering software applications
US20050108593A1 (en) Cluster failover from physical node to virtual node
US6704812B2 (en) Transparent and dynamic management of redundant physical paths to peripheral devices
US6363497B1 (en) System for clustering software applications
US6609213B1 (en) Cluster-based system and method of recovery from server failures
US7272674B1 (en) System and method for storage device active path coordination among hosts
US6766470B1 (en) Enhancing reliability and robustness of a cluster
US20110214007A1 (en) Flexible failover policies in high availability computing systems
US6070251A (en) Method and apparatus for high availability and caching data storage devices
US20070055797A1 (en) Computer system, management computer, method of managing access path
WO2013192017A1 (en) Virtual shared storage in a cluster
US7246261B2 (en) Join protocol for a primary-backup group with backup resources in clustered computer system
JP2008107896A (en) Physical resource control management system, physical resource control management method and physical resource control management program
US7711978B1 (en) Proactive utilization of fabric events in a network virtualization environment
US8683258B2 (en) Fast I/O failure detection and cluster wide failover
US7797394B2 (en) System and method for processing commands in a storage enclosure
Dell
US7590811B1 (en) Methods and system for improving data and application availability in clusters
JPH0934852A (en) Cluster system

Legal Events

Date Code Title Description
AS Assignment

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VASUDEVAN, BHARATH;NGUYEN, NAM;REEL/FRAME:014779/0230

Effective date: 20031205

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT, TE

Free format text: PATENT SECURITY AGREEMENT (ABL);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;AND OTHERS;REEL/FRAME:031898/0001

Effective date: 20131029

Owner name: BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT, TEXAS

Free format text: PATENT SECURITY AGREEMENT (ABL);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;AND OTHERS;REEL/FRAME:031898/0001

Effective date: 20131029

Owner name: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS FIRST LIEN COLLATERAL AGENT, TEXAS

Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;BOOMI, INC.;AND OTHERS;REEL/FRAME:031897/0348

Effective date: 20131029

Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH CAROLINA

Free format text: PATENT SECURITY AGREEMENT (TERM LOAN);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;AND OTHERS;REEL/FRAME:031899/0261

Effective date: 20131029

Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH

Free format text: PATENT SECURITY AGREEMENT (TERM LOAN);ASSIGNORS:DELL INC.;APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;AND OTHERS;REEL/FRAME:031899/0261

Effective date: 20131029

Owner name: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS FI

Free format text: PATENT SECURITY AGREEMENT (NOTES);ASSIGNORS:APPASSURE SOFTWARE, INC.;ASAP SOFTWARE EXPRESS, INC.;BOOMI, INC.;AND OTHERS;REEL/FRAME:031897/0348

Effective date: 20131029

AS Assignment

Owner name: COMPELLANT TECHNOLOGIES, INC., MINNESOTA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: APPASSURE SOFTWARE, INC., VIRGINIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: DELL SOFTWARE INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: ASAP SOFTWARE EXPRESS, INC., ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: DELL USA L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: SECUREWORKS, INC., GEORGIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: DELL MARKETING L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: DELL INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: CREDANT TECHNOLOGIES, INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: PEROT SYSTEMS CORPORATION, TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

Owner name: FORCE10 NETWORKS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS ADMINISTRATIVE AGENT;REEL/FRAME:040065/0216

Effective date: 20160907

AS Assignment

Owner name: COMPELLENT TECHNOLOGIES, INC., MINNESOTA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL USA L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: CREDANT TECHNOLOGIES, INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: FORCE10 NETWORKS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: APPASSURE SOFTWARE, INC., VIRGINIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL MARKETING L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: ASAP SOFTWARE EXPRESS, INC., ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: PEROT SYSTEMS CORPORATION, TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: SECUREWORKS, INC., GEORGIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL SOFTWARE INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A., AS COLLATERAL AGENT;REEL/FRAME:040040/0001

Effective date: 20160907

Owner name: DELL PRODUCTS L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: FORCE10 NETWORKS, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: COMPELLENT TECHNOLOGIES, INC., MINNESOTA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: DELL SOFTWARE INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: SECUREWORKS, INC., GEORGIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: DELL USA L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: PEROT SYSTEMS CORPORATION, TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: APPASSURE SOFTWARE, INC., VIRGINIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: CREDANT TECHNOLOGIES, INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: DELL INC., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: ASAP SOFTWARE EXPRESS, INC., ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: WYSE TECHNOLOGY L.L.C., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907

Owner name: DELL MARKETING L.P., TEXAS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF NEW YORK MELLON TRUST COMPANY, N.A., AS COLLATERAL AGENT;REEL/FRAME:040065/0618

Effective date: 20160907