CN103763130A

CN103763130A - Method, device and system for managing large-scale cluster

Info

Publication number: CN103763130A
Application number: CN201310752189.5A
Authority: CN
Inventors: 王黎; 吴晓明
Original assignee: Huawei Digital Technologies Suzhou Co Ltd
Current assignee: Huawei Digital Technologies Suzhou Co Ltd
Priority date: 2013-12-31
Filing date: 2013-12-31
Publication date: 2014-04-30
Anticipated expiration: 2033-12-31
Also published as: CN103763130B; WO2015101089A1

Abstract

The embodiment of the invention provides a method, device and system for managing a large-scale cluster. The method, device and system for managing the large-scale cluster can carry out performance management and resource scheduling on users according to service grades, and improve user experience. The method comprises the steps that at least one management object is determined in management objects corresponding to the first service grade of multiple service grades, wherein the management objects are resource units in the large-scale cluster; the target performance of the at least one management object is determined; the actual performance of the at least one management object is obtained; performance management is carried out on the management objects corresponding to the first service grade according to the target performance and the actual performance. According to the method, device and system for managing the large-scale cluster, the at least one management object is determined in the management objects corresponding to the first service grade in the large-scale cluster, the performance management is carried out on all the management objects corresponding to the first service grade according to the target performance and the actual performance of the at least one management object, and therefore the fact that the performance of most even all the users can reach the target performance is guaranteed, and the user experience is improved.

Description

Management method, device and the system of large-scale cluster

Technical field

The present invention relates to cloud computing field, and more specifically, relate to management method, device and the system of large-scale cluster.

Background technology

Along with further developing and the requirement of mass data computing capability of computer network, the computer hardware of various mass computing abilities constantly occurs.In addition, global information system World Wide Web (WWW) is also very popular.The appearance of these software and hardware technologies or equipment, provides possibility for proposing the computation model of a kind of novel being called " cloud computing (Cloud Computing) ".

The cloud computing of narrow sense refers to payment and the use pattern of information technology (Information Technology, referred to as " IT ") infrastructure, refers in the mode of as required, easily expanding, obtain resource requirement by network; Provide the network of resource to be called as " cloud (Cloud) ".Resource in " cloud " can infinite expanding In the view of user, and can obtain at any time, and expansion at any time, is used as required, and pays by using.

The cloud computing of broad sense refers to payment and the use pattern of service, refers in the mode of as required, easily expanding, obtain required service by network.This service can be relevant to IT, software, the Internet, can be also other services, provides the network of service to be called as " cloud (Cloud) "." cloud " is that some can self and the virtual computational resource of management, is generally some large server clusters, comprises calculation server, storage server, broadband resource etc.Cloud computing, to carrying out unified management and scheduling with the computational resource that network connects in a large number, forms a computational resource pond, to provide on-demand service to user.

Because cloud computing has the characteristics such as ultra-large, virtual, high reliability, versatility, high scalability, on-demand service, cloud computing is paid close attention to more and more widely.

In cloud computing application, cloud computing data center conformity calculation resource, storage resources and Internet resources, utilize the technology such as virtual and offer user by network and use.The form of application can comprise virtual machine (Virtual Machine, referred to as " VM "), storage volume etc.Intel Virtualization Technology, by producing the application such as large-scale virtual machine and large-scale storage volume, forms extensive large-scale cluster.How extensive large-scale cluster is carried out performance management and experiences guaranteeing to become the problem that more and more needs concern.

The management of existing extensive large-scale cluster conventionally take server (Server), resource pool (Pool) even cluster (Cluster) as unit, even if the performance management take user as unit is also only for the corresponding a small amount of resource of minority VIP user, like this, most users' performance management is cannot be guaranteed, and user experiences poor.

Summary of the invention

The embodiment of the present invention provides a kind of management method, device and system of large-scale cluster, can to user, carry out performance management and scheduling of resource according to the grade of service, improves user and experiences.

First aspect, provides a kind of management method of large-scale cluster, comprising: in management object corresponding to the first service grade of multiple grades of service, determine at least one management object, wherein said management object is the Resource Unit in described large-scale cluster; Determine the target capabilities of described at least one management object; Obtain the actual performance of described at least one management object; According to described target capabilities and described actual performance, management object corresponding to described first service grade carried out to performance management.

In conjunction with first aspect, in the first implementation of first aspect, describedly in management object corresponding to the first service grade of multiple grades of service, determine before at least one management object, also comprise: according to service-level agreement SLA, be that management object in described large-scale cluster is determined described multiple grade of service.

In conjunction with first aspect and above-mentioned implementation thereof, in the second implementation of first aspect, described is, after management object in described large-scale cluster is determined multiple grades of service, also to comprise: the target capabilities of determining first service grade in described multiple grade of service according to SLA; The described target capabilities of determining described at least one management object, comprising: the target capabilities that the target capabilities of described first service grade is defined as to described at least one management object.

In conjunction with first aspect and above-mentioned implementation thereof, in the third implementation of first aspect, the described target capabilities of determining described at least one management object comprises with lower at least one: according to predetermined performance strategy, determine described described target capabilities corresponding at least one management object; Or the described target capabilities of described at least one management object is manually set.

In conjunction with first aspect and above-mentioned implementation thereof, in the 4th kind of implementation of first aspect, the type of described target capabilities comprises at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.

In conjunction with first aspect and above-mentioned implementation thereof, in the 5th kind of implementation of first aspect, described in obtain the actual performance of described at least one management object, comprising: the periodically or routinely actual performance of described at least one management object of monitoring.

In conjunction with first aspect and above-mentioned implementation thereof, in the 6th kind of implementation of first aspect, describedly according to described target capabilities and described actual performance, management object corresponding to described first service grade carried out to performance management, comprising: determine whether the described actual performance getting meets described target capabilities; When described actual performance does not meet described target capabilities, management object corresponding to other grades of service except described first service grade in management object corresponding to described first service grade and/or described multiple grade of service carried out to described performance management, to make the actual performance of described first service grade meet described target capabilities.

In conjunction with first aspect and above-mentioned implementation thereof, in the 7th kind of implementation of first aspect, described performance management comprises with lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

In conjunction with first aspect and above-mentioned implementation thereof, in the 8th kind of implementation of first aspect, when described actual performance meets described target capabilities, repeat the described step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or described in repeating, obtain the step of the actual performance of described at least one management object.

In conjunction with first aspect and above-mentioned implementation thereof, in the 9th kind of implementation of first aspect, described in management object corresponding to the first service grade of multiple grades of service determine at least one management object, comprise: in management object corresponding to described first service grade, determine at least one management object that meets predetermined condition, wherein said predetermined condition comprises at least one in settling time, positional information, loading condition and historical record; Or according to pre-defined algorithm, in management object corresponding to described first service grade, determine at least one management object, wherein said pre-defined algorithm comprise at random choose, order is chosen, at least one in time choice of dynamical.

In conjunction with first aspect and above-mentioned implementation thereof, in the tenth kind of implementation of first aspect, described management object comprises at least one in virtual machine VM, storage volume, virtual switch vSwitch, virtual local area network vLAN, input and output I/O port, switch, the network bandwidth and server.

Second aspect, a kind of management devices of large-scale cluster is provided, comprise: determining unit, for the management object that the first service grade in multiple grades of service is corresponding, determine at least one management object, wherein said management object is the Resource Unit in described large-scale cluster; Described determining unit is also for determining the target capabilities of described at least one management object; Acquiring unit, for obtaining the actual performance of described at least one management object; Performance management unit, for carrying out performance management according to described target capabilities and described actual performance to management object corresponding to described first service grade.

In conjunction with second aspect, in the first implementation of second aspect, described determining unit also for: the management object that is described large-scale cluster according to service-level agreement SLA is determined described multiple grade of service.

In conjunction with second aspect and above-mentioned implementation thereof, in the second implementation of second aspect, described determining unit is also for the target capabilities of determining described multiple grade of service first service grades; The target capabilities of described first service grade is defined as to the target capabilities of described at least one management object.

In conjunction with second aspect and above-mentioned implementation thereof, in the third implementation of second aspect, described determining unit specifically for: according to predetermined performance strategy, determine described described target capabilities corresponding at least one management object; Or the described target capabilities of described at least one management object is manually set.

In conjunction with second aspect and above-mentioned implementation thereof, in the 4th kind of implementation of second aspect, the type of the definite target capabilities of described determining unit comprises at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.

In conjunction with second aspect and above-mentioned implementation thereof, in the 5th kind of implementation of second aspect, described acquiring unit is specifically for the actual performance of described at least one management object of monitoring periodically or routinely.

In conjunction with second aspect and above-mentioned implementation thereof, in the 6th kind of implementation of second aspect, described performance management unit specifically for: by described determining unit, determine whether the described actual performance that gets meets described target capabilities; When described actual performance does not meet described target capabilities, management object corresponding to other grades of service except described first service grade in management object corresponding to described first service grade and/or described multiple grade of service carried out to described performance management, to make the actual performance of described first service grade meet described target capabilities.

In conjunction with second aspect and above-mentioned implementation thereof, in the 7th kind of implementation of second aspect, described performance management comprises with lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

In conjunction with second aspect and above-mentioned implementation thereof, in the 8th kind of implementation of second aspect, when described actual performance meets described target capabilities, described determining unit repeats the described step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or described acquiring unit obtains the step of the actual performance of described at least one management object described in repeating.

In conjunction with second aspect and above-mentioned implementation thereof, in the 9th kind of implementation of second aspect, described determining unit specifically for:

In management object corresponding to described first service grade, determine at least one management object that meets predetermined condition, wherein said predetermined condition comprises at least one in settling time, positional information, loading condition and historical record; Or according to pre-defined algorithm, in management object corresponding to described first service grade, determine at least one management object, wherein said pre-defined algorithm comprise at random choose, order is chosen, at least one in time choice of dynamical.

In conjunction with second aspect and above-mentioned implementation thereof, in the 9th kind of implementation of second aspect, described management object comprises at least one in virtual machine VM, storage volume, virtual switch vSwitch, virtual local area network vLAN, input and output I/O port, switch, the network bandwidth and server.

The embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby can guarantee the overwhelming majority or even all users' performance reach target capabilities, improve or ensured user's experience.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, to the accompanying drawing of required use in the embodiment of the present invention be briefly described below, apparently, described accompanying drawing is only some embodiments of the present invention below, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the system block diagram of the large-scale cluster management system of one embodiment of the invention;

Fig. 2 is the flow chart of the management method of one embodiment of the invention;

Fig. 3 is the flow chart of the management method of one embodiment of the invention;

Fig. 4 is the schematic block diagram of the management devices of one embodiment of the invention;

Fig. 5 is the schematic block diagram of the management devices of another embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiment.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skills obtain under the prerequisite of not making creative work, should belong to the scope of protection of the invention.

Fig. 1 is the system block diagram of the management system of the large-scale cluster of one embodiment of the invention.The management system 100 of the large-scale cluster shown in Fig. 1 comprises: management object determination module 101, target capabilities determination module 102, actual performance acquisition module 103, performance management module 104 and large-scale cluster 105.Wherein management object determination module 101, actual performance acquisition module 103 and performance management module 104 are all connected with large-scale cluster 105, management object determination module 101 is connected with target capabilities determination module 102, and target capabilities determination module 102 is all connected with performance management module 104 with actual performance acquisition module 103.

Management object determination module 101 is determined at least one management object for management object corresponding to the first service grade in multiple grades of service, and wherein management object is the Resource Unit in large-scale cluster 105.Resource Unit can be divided into computational resource unit, storage resource cells, Internet resources unit, physical resource unit etc.More specifically, computational resource unit can be virtual machine (Virtual Machine, VM) etc., storage resource cells can be storage volume and LUN (Logical Unit Number, LUN) etc., Internet resources unit can be input and output (Input/Output, I/O) port, the network bandwidth, virtual switch (Virtual Switch, vSwitch), VLAN (Virtual Local Area Network, vLAN), switch etc., physical resource unit can be server etc.

Target capabilities determination module 102, for determining the target capabilities of above-mentioned at least one management object, particularly, can be determined the target capabilities that at least one management object is corresponding according to predetermined performance strategy; Or the target capabilities of at least one management object is manually set; Or the target capabilities of first service grade corresponding above-mentioned at least one management object is defined as to the target capabilities of this at least one management object.

Actual performance acquisition module 103 is for obtaining the actual performance of above-mentioned at least one management object, particularly, and can be periodically or routinely monitor and add up the actual performance of at least one management object.

Performance management module 104 is carried out performance management for the actual performance getting according to the definite target capabilities of target capabilities determination module 102 and actual performance acquisition module 103 to management object corresponding to first service grade.

Particularly, when the discontented foot-eye performance of actual performance, management object corresponding to other grades of service except first service grade in management object corresponding to first service grade and/or multiple grade of service carried out to performance management, with the actual performance that makes first service grade, meet target capabilities, wherein the method for performance management includes but not limited to following several: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm etc.

When actual performance meets target capabilities, can redefine at least one management object by target capabilities determination module 102, or can be continued by actual performance acquisition module 103 actual performance of at least one management object of determining before monitoring.

The management system 100 of the large-scale cluster of the embodiment of the present invention by determining at least one management object in management object corresponding to first service grade, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby can guarantee the overwhelming majority or even all users' performance reach target capabilities, improve or ensured user's experience.

Fig. 2 is the flow chart of the management method of one embodiment of the invention.

201, in management object corresponding to the first service grade of multiple grades of service, determine at least one management object, wherein management object is the Resource Unit in large-scale cluster.

202, determine the target capabilities of at least one management object.

203, obtain the actual performance of at least one management object.

204, according to target capabilities and actual performance, management object corresponding to first service grade carried out to performance management.

The embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby the performance that can guarantee the overwhelming majority or even whole users reaches target capabilities, has improved user's experience.

The Resource Unit that should be understood that large-scale cluster can be divided into computational resource unit, storage resource cells, Internet resources unit, physical resource unit etc., is used to user that the services such as calculating, storage, transmission are provided.More specifically, computational resource unit can be virtual machine VM etc., storage resource cells can be storage volume and LUN LUN etc., Internet resources unit can be input and output I/O port, virtual switch vSwitch, VLAN vLAN, switch and the network bandwidth etc., and physical resource unit can be server etc.

Alternatively, as an embodiment, determine at least one management object in management object corresponding to the first service grade of multiple grades of service before, also comprise: according to service-level agreement (Service level Agreement, SLA), determine multiple grades of service for the management object in large-scale cluster.

First, as a preposition process, can before choosing management object, first to the user in large-scale cluster or management object, carry out the division of the grade of service.Can carry out grade classification by SLA particularly, also can be by network maintenance staff according to certain attribute, the location information of such as management object, COS, service goal etc. are carried out grade classification.When the object of grade classification is user, be equal to the object of grade classification at least one Resource Unit of service, i.e. management object are provided to user.

In addition, the division of the grade of service can be simple grade classification, also can carry out just having determined when the grade of service is divided certain/target capabilities of multiple grades of service, target capabilities can be understood as the service quality (Quality of Service, QoS) that will reach here.

Alternatively, as an embodiment, according to SLA, be, after management object in large-scale cluster is determined multiple grades of service, also to comprise: the target capabilities of determining first service grade in multiple grades of service; The target capabilities of determining at least one management object, comprising: the target capabilities that the target capabilities of first service grade is defined as at least one management object.In conjunction with above-described embodiment, if determined the target capabilities of the grade of service when segmentation service grade, the target capabilities of at least one management object as sample that target capabilities of this grade of service can be defined as choosing in this grade of service.

Alternatively, as an embodiment, the target capabilities of determining at least one management object comprises with lower at least one: according to predetermined performance strategy, determine the target capabilities that at least one management object is corresponding; Or the target capabilities of at least one management object is manually set.

Except the above-mentioned target capabilities by the grade of service is defined as the service performance of management object, can also directly at least one management object of determining, determine its target capabilities, can determine according to predetermined performance strategy particularly, be can be preset with performance strategy file in system, some attribute binding ability strategy file by management object can be determined the target capabilities that makes management object can obtain performance guarantee, give an example, strategy file can comprise the corresponding relation of the information such as the COS, geographical position of management object and target capabilities.The target capabilities of management object can also be manually set by administration interface by network maintenance staff in addition.

Alternatively, as an embodiment, the type of target capabilities can include but not limited at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.Hold intelligibly, target capabilities can be single parameter, can be also the combination of many kinds of parameters, and the present invention does not limit this.

Alternatively, as an embodiment, obtain the actual performance of at least one management object, comprising: periodically or routinely monitor the actual performance of at least one management object.Should be understood that actual performance can be identical with the type of target capabilities, also can be different.

Alternatively, as an embodiment, according to target capabilities and actual performance, management object corresponding to first service grade carried out to performance management, comprising: determine whether the actual performance getting meets target capabilities; When the discontented foot-eye performance of actual performance, management object corresponding to other grades of service except first service grade in management object corresponding to first service grade and/or multiple grade of service carried out to performance management, with the actual performance that makes first service grade, meet target capabilities.

Alternatively, performance management can include but not limited to lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

That is to say, if the actual performance detecting does not meet expection (target capabilities), can carry out the operations such as business migration, traffic limits, flow control, scheduling of resource to the first service grade of current detection or other grades of service and make this first service grade can meet target capabilities.For example, the actual performance being monitored to when at least one management object selected in first service grade is that CPU usage is that CPU usage is less than or equal to 90% higher than 90%(target capabilities), can carry out business migration to the management object of this first service grade, to make CPU usage, be down to 90% or following, should understand, can also reach target capabilities with other regulate and control methods, for example for the management object of this first service grade, distribute more resource etc., the present invention does not limit this.

In addition, also may be by other grades of service being carried out to management and control or dispatch to make first service grade reach target capabilities, for example, when the discontented foot-eye performance of the actual performance I/O of first service grade time delay, can make first service grade meet target capabilities by the service traffics of the grade of service of reduction lower priority.Certainly, can also be by first service grade and other grades of service be carried out management and control or dispatch making first service grade reach target capabilities simultaneously.In addition, can also send alarm and wouldn't carry out management and control or scheduling, the further instruction of wait staff or other Network Management Equipments.Without loss of generality, can also be by first service grade is carried out to performance management, to make other grades of service reach expected performance.

Alternatively, when the discontented foot-eye performance of actual performance, also can repeat the step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or repeat the step of the actual performance of obtaining at least one management object.That is to say, can re-start sampling and again detect, or continue to continue to monitor.Like this, can make the sampling of performance management system and monitoring have higher precision by the threshold value of setting number of repetition, more approach actual situation.For example, can preset the actual performance that repeated sampling monitors for 2 times and all be discontented with foot-eye performance, determine and carry out above-mentioned performance management.

Alternatively, as an embodiment, when actual performance meets target capabilities, repeat the step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or repeat the step of the actual performance of obtaining at least one management object.When performance meets not needs management and control or scheduling, can carry out resampling, in first service grade, again select at least one management object.Also can continue to monitor at least one management object of previous sampling, so that carry out performance management when the discontented foot-eye performance of its performance.

Alternatively, as an embodiment, in management object corresponding to the first service grade of multiple grades of service, determine at least one management object, comprise: in management object corresponding to first service grade, determine at least one management object that meets predetermined condition, wherein predetermined condition comprises at least one in settling time, positional information, loading condition and historical record; Or according to pre-defined algorithm, in management object corresponding to first service grade, determine at least one management object, wherein pre-defined algorithm comprise at random choose, order is chosen, at least one in time choice of dynamical.

Alternatively, as an embodiment, management object comprises at least one in virtual machine VM, storage volume, input and output I/O port, the network bandwidth and server.

Fig. 3 is the flow chart of the management method of one embodiment of the invention.

301, the grade of service is divided

First, as an optional step, can before choosing management object, to the user in large-scale cluster or management object, carry out the division of the grade of service.Can carry out grade classification by SLA particularly, also can be by network maintenance staff according to certain attribute, the location information of such as management object, COS, service goal etc. are carried out grade classification.When the object of grade classification is user, be equal to the object of grade classification at least one Resource Unit of service, i.e. management object are provided to user.

302, choose management object

In large-scale cluster, choose a small amount of management object as management object, need here to guarantee to choose at least one management object in a grade of service, wherein management object is for providing the Resource Unit of service for user in large-scale cluster.Particularly, the Resource Unit of large-scale cluster can be divided into computational resource unit, storage resource cells, Internet resources unit, physical resource unit etc., is used to user that the services such as calculating, storage, transmission are provided.More specifically, computational resource unit can be virtual machine VM etc., and storage resource cells can be storage volume and LUN LUN etc., and Internet resources unit can be input and output I/O port and the network bandwidth etc., and physical resource unit can be server etc.

For first service grade, can in management object corresponding to first service grade, determine at least one management object that meets predetermined condition, wherein predetermined condition comprises at least one in settling time, positional information, loading condition and historical record, for example, predetermined condition be loading condition reach maximum load 90%, or in historical record, occurred that N fault was with first-class.Should be understood that at least one management object of choosing can be same class management object, can be also inhomogeneous management object, for example, can be all VM, also can be all storage volume, can also VM, storage volume etc. all comprises, as long as they meet above-mentioned predetermined condition.In addition, predetermined condition can be also that combining form exists, and for example loading condition reaches 90% VM of maximum load, occurs server more than N fault in historical record, etc., the present invention does not limit this.

In addition, can also in management object corresponding to first service grade, determine according to pre-defined algorithm at least one management object, wherein pre-defined algorithm includes but not limited to choose at random, order is chosen, time choice of dynamical, intelligence are chosen etc.As an example, if pre-defined algorithm is for choosing at random, when management object is chosen, the management object of random selected some in first service grade, the quantity here can be preassigned in pre-defined algorithm equally, again for example, time choice of dynamical, can be in the different time periods, or along with the variation of time, dynamically choose management object, can guarantee like this activity of sample.

Without loss of generality, can also directly specify the management object being sampled, for example can by network maintenance staff in network topology interface for certain grade of service is chosen one or more management objects, as the sample of performance management.

Should understand, because above-mentioned steps 301 is optional step, therefore when step 301 is carried out, first service grade in step 302 is in multiple grades of service of dividing in above-mentioned steps 301, here, " first " grade of service, only for representing certain grade of service, can be any one in above-mentioned multiple grade of service.When step 301 is not carried out, still can presence service grade in large-scale cluster, this grade of service can be historical definite grade of service, when can be also that user is signing network, the grade of service of agreement does not limit herein.The grade of service can be understood as the management object grouping definite according to identical or close performance requirement, performance index, type of service etc.

303, determine target capabilities

After at least one management object of having determined as performance management sample, can determine the target capabilities of management object.Particularly, can determine the target capabilities that at least one management object is corresponding according to predetermined performance strategy, the target capabilities of at least one management object can also be manually set.That is to say, in system, can be preset with performance strategy file, some attribute binding ability strategy file by management object can be determined the target capabilities that makes management object can obtain performance guarantee, give an example, strategy file can comprise the corresponding relation of the information such as the COS, geographical position of management object and target capabilities.The target capabilities of management object can also be manually set by administration interface by network maintenance staff in addition.For example, management object is storage volume, has multiple grades of service, can its target capabilities be set to time delay be less than 3ms for the storage volume that is chosen for sample in one of them grade of service, this setting can, by manually setting, also can be determined by strategy file.

In addition, also likely the grade of service is corresponding in advance target capabilities (service quality QoS), for example, if determined the target capabilities of the grade of service during segmentation service grade in above-mentioned steps 301, the target capabilities of at least one management object as sample that target capabilities of this grade of service can be defined as choosing in this grade of service.

The type of target capabilities has a lot, can include but not limited to response delay, read-write number of times IOPS per second, message transmission rate, CPU usage etc.Hold intelligibly, target capabilities can be single parameter, can be also the combination of many kinds of parameters, and the present invention does not limit this.

304, monitoring actual performance

The actual performance of at least one management object of determining in periodicity or routinely monitoring step 303.The type of the actual performance detecting can be identical with target type, also can be different.Particularly, in the definite target capabilities of above-mentioned steps 303, be time delay while being less than 3ms, also time delay of the type of the actual performance of detection, the actual time delay that for example monitors management object is 4ms.In addition, the actual performance situation different from target type detecting also may exist, for example, it is that VM creation-time is less than 2min that target capabilities requires, and the actual performance index of monitoring is MBPS(bandwidth), system thinks that MBPS does not reach 50MB/S, and the goal nonreachable having created in VM2min becomes, therefore carry out performance strategy scheduling etc.

305, judgement

System is receiving after the actual performance of detection, can analyze the data combining target performance of the actual performance detecting, judges whether actual performance reaches target capabilities.That is to say, can by the performance of the management object of sampling definite in above-mentioned steps 302, show to estimate management object or the cluster resource of the whole same grade of service of decision-making, so that this grade of service is carried out to total evaluation and management.

306, discontented foot-eye performance

If determine the discontented foot-eye performance of above-mentioned actual performance through judgement, need to determine the performance management that carries out which kind of mode.In general there are several performance management modes: for example move, limit, scheduling, alarm etc.For example, target capabilities has been set IO time delay, IOPS and CPU usage, actual monitoring to actual performance CPU usage exceed standard, can specify migration strategy, carry out business migration, reduce the business load of the management object of this grade of service, to meet user, experience index request, simultaneously can the system-wide load of balance; If actual performance IO time delay exceeds standard, can carry out scheduling of resource, increase the resource proportioning of this grade of service, as CPU, buffer memory etc., can also meet by the service traffics of the grade of service of restriction lower priority the demand of this grade of service.In addition, can also send alarm and wouldn't carry out management and control or scheduling, the further instruction of wait staff or other Network Management Equipments.In addition, can also the demand of other grades of service be met by first service grade being carried out to performance management.

In addition, when the discontented foot-eye performance of actual performance, also can repeat the step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or repeat the step of the actual performance of obtaining at least one management object.That is to say, can re-start sampling and again detect, or continue to continue to monitor.Like this, can make the sampling of performance management system and monitoring have higher precision by the threshold value of setting number of repetition, more approach actual situation.For example, can preset the actual performance that repeated sampling monitors for 2 times and all be discontented with foot-eye performance, determine and carry out above-mentioned performance management.

307, meet target capabilities

When actual performance meets target capabilities, can return to step 302 or can return to step 304.That is to say and meet and not needs management and control or when scheduling when performance, can carry out resampling, be i.e. selected at least one management object again in first service grade.Also can continue to monitor at least one management object of previous sampling, so that carry out performance management when the discontented foot-eye performance of its performance.

Fig. 4 is the schematic block diagram of the management devices of one embodiment of the invention.Management devices 400 in Fig. 4 comprises determining unit 401, acquiring unit 402 and performance management unit 403.

Determining unit 401 is determined at least one management object in management object corresponding to the first service grade of multiple grades of service, and wherein management object is the Resource Unit in large-scale cluster; Determining unit 401 is determined the target capabilities of at least one management object; Acquiring unit 402 obtains the actual performance of at least one management object.Performance management unit 403 carries out performance management according to target capabilities and actual performance to management object corresponding to first service grade.

The management devices 400 of the embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby the performance that can guarantee the overwhelming majority or even whole users reaches target capabilities, has improved user's experience.

The Resource Unit that should be understood that large-scale cluster can be divided into computational resource unit, storage resource cells, Internet resources unit, physical resource unit etc., is used to user that the services such as calculating, storage, transmission are provided.More specifically, computational resource unit can be virtual machine VM etc., and storage resource cells can be storage volume and LUN LUN etc., and Internet resources unit can be input and output I/O port and the network bandwidth etc., and physical resource unit can be server etc.

Should also be understood that the determining unit 401 in the embodiment of the present invention can be corresponding to the management object determination module 101 in the large-scale cluster management system 100 shown in above-mentioned Fig. 1 and target capabilities determination module 102; Acquiring unit 402 can be corresponding to the actual performance acquisition module 103 in the large-scale cluster management system 100 shown in above-mentioned Fig. 1; Performance management unit 403 can be corresponding to the performance management module 104 in the large-scale cluster management system 100 shown in above-mentioned Fig. 1.

Alternatively, as an embodiment, determining unit 401 is determined multiple grades of service according to service-level agreement (Service level Agreement, SLA) for the management object in large-scale cluster.

First, as a preposition process, can before choosing management object, first to the user in large-scale cluster or management object, carry out the division of the grade of service by determining unit 401.Can carry out grade classification by SLA particularly, also can be by network maintenance staff according to certain attribute, the location information of such as management object, COS, service goal etc. are carried out grade classification.When the object of grade classification is user, be equal to the object of grade classification at least one Resource Unit of service, i.e. management object are provided to user.

Alternatively, as an embodiment, according to SLA, be after management object in large-scale cluster is determined multiple grades of service, determining unit 401 can also be used for determining the target capabilities of multiple grade of service first service grades; The target capabilities of determining at least one management object, comprising: the target capabilities that the target capabilities of first service grade is defined as at least one management object.In conjunction with above-described embodiment, if determined the target capabilities of the grade of service when segmentation service grade, the target capabilities of at least one management object as sample that target capabilities of this grade of service can be defined as choosing in this grade of service.

Alternatively, as an embodiment, determining unit 401 can also be used for determining according to predetermined performance strategy the target capabilities that at least one management object is corresponding; Or the target capabilities of at least one management object is manually set.

Except the above-mentioned target capabilities by the grade of service is defined as the service performance of management object, determining unit 401 can also directly be determined its target capabilities at least one management object of determining, can determine according to predetermined performance strategy particularly, be can be preset with performance strategy file in system, some attribute binding ability strategy file by management object can be determined the target capabilities that makes management object can obtain performance guarantee, give an example, strategy file can comprise the COS of management object, the corresponding relation of the information such as geographical position and target capabilities.The target capabilities of management object can also be manually set by administration interface by network maintenance staff in addition.

Alternatively, as an embodiment, acquiring unit 402 is specifically for periodically or routinely monitor the actual performance of at least one management object.Should be understood that actual performance can be identical with the type of target capabilities, also can be different.

Alternatively, as an embodiment, performance management unit 403 is specifically for determining whether the actual performance getting meets target capabilities; When the discontented foot-eye performance of actual performance, management object corresponding to other grades of service except first service grade in management object corresponding to first service grade and/or multiple grade of service carried out to performance management, with the actual performance that makes first service grade, meet target capabilities.

In addition, also may be by other grades of service being carried out to management and control or dispatch to make first service grade reach target capabilities, for example, when the discontented foot-eye performance of the actual performance I/O of first service grade time delay, can make first service grade meet target capabilities by the service traffics of the grade of service of reduction lower priority.Certainly, can also be by first service grade and other grades of service be carried out management and control or dispatch making first service grade reach target capabilities simultaneously.In addition, can also send alarm and wouldn't carry out management and control or scheduling, the further instruction of wait staff or other Network Management Equipments.

Alternatively, as an embodiment, when actual performance meets target capabilities, determining unit 401 repeats the step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or acquiring unit 402 repeats the step of the actual performance of obtaining at least one management object.When performance meets not needs management and control or scheduling, can carry out resampling, in first service grade, again select at least one management object.Also can continue to monitor at least one management object of previous sampling, so that carry out performance management when the discontented foot-eye performance of its performance.

Alternatively, as an embodiment, determining unit 401 is also for determining in management object corresponding to first service grade at least one management object that meets predetermined condition, and wherein predetermined condition comprises at least one in settling time, positional information, loading condition and historical record; Or according to pre-defined algorithm, in management object corresponding to first service grade, determine at least one management object, wherein pre-defined algorithm comprise at random choose, order is chosen, at least one in time choice of dynamical.

Alternatively, as an embodiment, management object comprises at least one in virtual machine VM, storage volume, input and output I/O port, virtual switch vSwitch, VLAN vLAN, switch, the network bandwidth and server.

The management devices 400 of the embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby can guarantee the overwhelming majority or even all users' performance reach target capabilities, improve or ensured user's experience.

Fig. 5 is the schematic block diagram of the management devices of another embodiment of the present invention.The management devices 500 of Fig. 5 comprises processor 51 and memory 52, and processor 51 is connected by bus system 53 with memory 52.

Memory 52 makes processor 51 carry out the instruction of following operation for storing: in the management object that the first service grade of multiple grades of service is corresponding, determine at least one management object, wherein management object is the Resource Unit in large-scale cluster; Determine the target capabilities of at least one management object; Obtain the actual performance of at least one management object; According to target capabilities and actual performance, management object corresponding to first service grade carried out to performance management.

The management devices 500 of the embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby the performance that can guarantee the overwhelming majority or even whole users reaches target capabilities, has improved user's experience.

In addition, management devices 50 can also comprise radiating circuit 54, receiving circuit 55 etc.The operation of processor 51 control and management devices 50, processor 51 can also be called CPU(Central Processing Unit, CPU).Memory 52 can comprise read-only memory and random access memory, and provides instruction and data to processor 51.A part for memory 52 can also comprise nonvolatile RAM (NVRAM).Each assembly of management devices 50 is coupled by bus system 53, and wherein bus system 53, except comprising data/address bus, can also comprise power bus, control bus and status signal bus in addition etc.But for the purpose of clearly demonstrating, in the drawings various buses are all designated as to bus system 53.

The method that the invention described above embodiment discloses can be applied in processor 51, or is realized by processor 51.Processor 51 may be a kind of integrated circuit (IC) chip, has the disposal ability of signal.In implementation procedure, each step of said method can complete by the instruction of the integrated logic circuit of the hardware in processor 51 or form of software.Above-mentioned processor 51 can be general processor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), ready-made programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic device, discrete hardware components.Can realize or carry out disclosed each method, step and logic diagram in the embodiment of the present invention.General processor can be that microprocessor or this processor can be also the processors of any routine etc.In conjunction with the step of the disclosed method of the embodiment of the present invention, can directly be presented as that hardware decoding processor is complete, or complete with the hardware in decoding processor and software module combination.Software module can be positioned at random asccess memory, and flash memory, read-only memory, in the storage medium of this area maturations such as programmable read only memory or electrically erasable programmable memory, register.This storage medium is positioned at memory 52, and the information in processor 51 read memories 52 completes the step of said method in conjunction with its hardware.

Alternatively, as an embodiment, in management object corresponding to the first service grade of multiple grades of service, determine before at least one management object, also comprise: according to service-level agreement SLA, be that management object in large-scale cluster is determined multiple grades of service.

Alternatively, as an embodiment, according to SLA, be, after management object in large-scale cluster is determined multiple grades of service, also to comprise: the target capabilities of determining first service grade in multiple grades of service; The target capabilities of determining at least one management object, comprising: the target capabilities that the target capabilities of first service grade is defined as at least one management object.

Alternatively, as an embodiment, the type of target capabilities comprises at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.

Alternatively, as an embodiment, obtain the actual performance of at least one management object, comprising: periodically or routinely monitor the actual performance of at least one management object.

Alternatively, as an embodiment, performance management comprises with lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

Alternatively, as an embodiment, when actual performance meets target capabilities, repeat the step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or repeat the step of the actual performance of obtaining at least one management object.

Alternatively, as an embodiment, management object comprises at least one in virtual machine VM, storage volume, input and output I/O port, the network bandwidth, virtual switch vSwitch, VLAN vLAN, switch and server.

The management devices 500 of the embodiment of the present invention by determining at least one management object in management object corresponding to the first service grade of large-scale cluster, and according to the target capabilities of this at least one management object and actual performance, all management objects corresponding to this first service grade are carried out to performance management, thereby can guarantee the overwhelming majority or even all users' performance reach target capabilities, improve or ensured user's experience.

Should be understood that term "and/or" herein, is only a kind of incidence relation of describing affiliated partner, and expression can exist three kinds of relations, and for example, A and/or B, can represent: individualism A exists A and B, these three kinds of situations of individualism B simultaneously.In addition, character "/" herein, generally represents that forward-backward correlation is to liking a kind of relation of "or".

Should understand, in various embodiment of the present invention, the size of the sequence number of above-mentioned each process does not also mean that the priority of execution sequence, and the execution sequence of each process should determine with its function and internal logic, and should not form any restriction to the implementation process of the embodiment of the present invention.

Those of ordinary skills can recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with the combination of electronic hardware or computer software and electronic hardware.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can realize described function with distinct methods to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.

Those skilled in the art can be well understood to, and for convenience and simplicity of description, the specific works process of the system of foregoing description, device and unit, can, with reference to the corresponding process in preceding method embodiment, not repeat them here.

In the several embodiment that provide in the application, should be understood that disclosed system, apparatus and method can realize by another way.For example, device embodiment described above is only schematic, for example, the division of described unit, be only that a kind of logic function is divided, during actual realization, can have other dividing mode, for example multiple unit or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, indirect coupling or the communication connection of device or unit can be electrically, machinery or other form.

The described unit as separating component explanation can or can not be also physically to separate, and the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in multiple network element.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.

In addition, the each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.

If described function realizes and during as production marketing independently or use, can be stored in a computer read/write memory medium using the form of SFU software functional unit.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or the part of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprise that some instructions (can be personal computers in order to make a computer equipment, server, or the network equipment etc.) carry out all or part of step of method described in each embodiment of the present invention.And aforesaid storage medium comprises: various media that can be program code stored such as USB flash disk, portable hard drive, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CDs.

The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection range of described claim.

Claims

1. a management method for large-scale cluster, is characterized in that, comprising:

In management object corresponding to the first service grade of multiple grades of service, determine at least one management object, wherein said management object is the Resource Unit in described large-scale cluster;

Determine the target capabilities of described at least one management object;

Obtain the actual performance of described at least one management object;

According to described target capabilities and described actual performance, management object corresponding to described first service grade carried out to performance management.

2. method according to claim 1, it is characterized in that, describedly in management object corresponding to the first service grade of multiple grades of service, determine before at least one management object, also comprise: according to service-level agreement SLA, be that management object in described large-scale cluster is determined described multiple grade of service.

3. method according to claim 2, is characterized in that, described is, after management object in described large-scale cluster is determined multiple grades of service, also to comprise: the target capabilities of determining first service grade in described multiple grade of service according to SLA;

The described target capabilities of determining described at least one management object, comprising: the target capabilities that the target capabilities of described first service grade is defined as to described at least one management object.

4. according to the method in claim 2 or 3, it is characterized in that, the described target capabilities of determining described at least one management object comprises with lower at least one: according to predetermined performance strategy, determine described described target capabilities corresponding at least one management object; Or the described target capabilities of described at least one management object is manually set.

5. according to the method described in any one in claim 1-4, it is characterized in that, the type of described target capabilities comprises at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.

6. method according to claim 5, is characterized in that, described in obtain the actual performance of described at least one management object, comprising: the periodically or routinely actual performance of described at least one management object of monitoring.

7. method according to claim 1, is characterized in that, describedly according to described target capabilities and described actual performance, management object corresponding to described first service grade is carried out to performance management, comprising:

Determine whether the described actual performance getting meets described target capabilities;

When described actual performance does not meet described target capabilities, management object corresponding to other grades of service except described first service grade in management object corresponding to described first service grade and/or described multiple grade of service carried out to described performance management, to make the actual performance of described first service grade meet described target capabilities.

8. method according to claim 7, is characterized in that, described performance management comprises with lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

9. method according to claim 7, it is characterized in that, when described actual performance meets described target capabilities, repeat the described step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or described in repeating, obtain the step of the actual performance of described at least one management object.

10. according to the method described in any one in claim 1-9, it is characterized in that, described in management object corresponding to the first service grade of multiple grades of service determine at least one management object, comprising:

In management object corresponding to described first service grade, determine at least one management object that meets predetermined condition, wherein said predetermined condition comprises at least one in settling time, positional information, loading condition and historical record; Or

According to pre-defined algorithm, in management object corresponding to described first service grade, determine at least one management object, wherein said pre-defined algorithm comprise at random choose, order is chosen, at least one in time choice of dynamical.

11. according to the method described in any one in claim 1-10, it is characterized in that, described management object comprises at least one in virtual machine VM, storage volume, virtual switch vSwitch, virtual local area network vLAN, input and output I/O port, the network bandwidth, switch and server.

The management devices of 12. 1 kinds of large-scale clusters, is characterized in that, comprising:

Determining unit, determines at least one management object for the management object that the first service grade in multiple grades of service is corresponding, and wherein said management object is the Resource Unit in described large-scale cluster;

Described determining unit is also for determining the target capabilities of described at least one management object;

Acquiring unit, for obtaining the actual performance of described at least one management object;

Performance management unit, for carrying out performance management according to described target capabilities and described actual performance to management object corresponding to described first service grade.

13. devices according to claim 12, is characterized in that, described determining unit also for: the management object that is described large-scale cluster according to service-level agreement SLA is determined described multiple grade of service.

14. devices according to claim 13, is characterized in that, described determining unit also for:

Determine the target capabilities of first service grade in described multiple grade of service;

The target capabilities of described first service grade is defined as to the target capabilities of described at least one management object.

15. according to the device described in claim 13 or 14, it is characterized in that, described determining unit specifically for: according to predetermined performance strategy, determine described described target capabilities corresponding at least one management object; Or the described target capabilities of described at least one management object is manually set.

16. according to the device described in any one in claim 12-15, it is characterized in that, the type of the definite target capabilities of described determining unit comprises at least one in response delay, read-write number of times IOPS per second, message transmission rate, CPU usage.

17. devices according to claim 16, is characterized in that, described acquiring unit is specifically for the actual performance of described at least one management object of monitoring periodically or routinely.

18. devices according to claim 12, is characterized in that, described performance management unit specifically for:

By described determining unit, determine whether the described actual performance getting meets described target capabilities;

19. devices according to claim 18, is characterized in that, described performance management comprises with lower at least one: business migration; Traffic limits; Flow control; Scheduling of resource; Send alarm.

20. devices according to claim 18, it is characterized in that, when described actual performance meets described target capabilities, described determining unit repeats the described step of determining at least one management object in management object corresponding to the first service grade of multiple grades of service, or described acquiring unit obtains the step of the actual performance of described at least one management object described in repeating.

21. according to the device described in any one in claim 12-20, it is characterized in that, described determining unit specifically for:

22. according to the device described in any one in claim 12-21, it is characterized in that, described management object comprises at least one in virtual machine VM, storage volume, virtual switch vSwitch, virtual local area network vLAN, input and output I/O port, switch, the network bandwidth and server.