CN117724833B - PCIe tag cache self-adaptive resource allocation method and device based on stream attribute - Google Patents

PCIe tag cache self-adaptive resource allocation method and device based on stream attribute Download PDF

Info

Publication number
CN117724833B
CN117724833B CN202311673885.7A CN202311673885A CN117724833B CN 117724833 B CN117724833 B CN 117724833B CN 202311673885 A CN202311673885 A CN 202311673885A CN 117724833 B CN117724833 B CN 117724833B
Authority
CN
China
Prior art keywords
tag
cache
data
pcie
read request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311673885.7A
Other languages
Chinese (zh)
Other versions
CN117724833A (en
Inventor
顾大晔
郭二辉
黄小菲
金俊浩
丁诗通
董树林
武卫红
钟世鹏
王烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Zhongxing Microsystem Technology Co ltd
Original Assignee
Wuxi Zhongxing Microsystem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Zhongxing Microsystem Technology Co ltd filed Critical Wuxi Zhongxing Microsystem Technology Co ltd
Priority to CN202311673885.7A priority Critical patent/CN117724833B/en
Publication of CN117724833A publication Critical patent/CN117724833A/en
Application granted granted Critical
Publication of CN117724833B publication Critical patent/CN117724833B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a PCIe tag cache self-adaptive resource allocation method and device based on stream attributes, which divide a read request into a plurality of data stream types. Determining the data stream type of the read request through stream table mapping, distributing tag resources from different tag resource groups according to the data stream type for the read request, and constructing a corresponding context cache; the method comprises the steps of distributing tag resources in an on-chip cache to return a data cache, updating a context cache according to a distribution result, arbitrating a plurality of read requests of the distributed tag resources, forming a read request packet and sending the read request packet to a PCIe link; and receiving a return packet, analyzing to obtain a corresponding tag, and storing the data load of the return packet into a return data cache corresponding to the tag. And updating tag resource groups, cache granularity and a mapping flow table according to the running real-time state. The invention realizes tag self-adaptive grouping and sharing and self-adaptive granularity distribution and sharing of the cache, improves the utilization rate of the on-chip cache, reduces the chip area and meets the requirements of different data flow performances.

Description

PCIe tag cache self-adaptive resource allocation method and device based on stream attribute
Technical Field
The invention belongs to the field of bus design, and particularly relates to a PCIe tag cache self-adaptive resource allocation method and device based on stream attributes.
Background
PCI-Express (peripheral component interconnect express) is a high-speed serial computer expansion bus standard, widely used for interconnection among components of a computer. During a PCIe bus transaction, the tag field and Requester ID fields together form a transaction ID. The tag is used as a unique identifier of the request packet in the transmission process. The PCIe protocol specification specifies that for non-forwarded transactions (NPs), all transactions that have been issued and not processed are necessarily uniquely marked with a transaction ID.
As PCIe links are faster and faster, the number of outstanding requests (describing a state in the link that a request packet has been sent but a reply packet has not been returned) that the device is required to be able to support in order to fully utilize the link bandwidth is increasing, as is the corresponding buffer space requirement. Current PCIe devices are often multi-function entities (functions) with independent configuration spaces, each function implementing a different type of host access. In order to adapt to the demands of virtualization, bandwidth utilization under multi-type access and reasonable chip performance (access delay and area …), higher requirements are put on techniques such as tag allocation and cache configuration in the circuit design aspect of PCIe chips.
However, conventional patents and schemes mostly use a static allocation manner, or allocate tag resources by group according to different flow types, or according to functions. For the caches reserved for the CPLD, the caches corresponding to each tag are distributed according to fixed granularity (or according to the granularity of a read completion boundary RCB or according to the granularity of a maximum load size MPS), the characteristics of different read requests cannot be distinguished, and the caches are managed in finer granularity, so that the distribution mode is very low-efficiency under the conditions that the link bandwidth is continuously improved and the traffic type is more and more complex, and meanwhile, the cache resource waste can be caused.
Disclosure of Invention
The invention aims to provide a PCIe tag cache self-adaptive resource allocation method and device based on stream attributes, aiming at improving the utilization rate of tag resources and on-chip caches.
According to a first aspect of the present invention, there is provided a PCIe tag cache adaptive resource allocation method based on stream attributes, including:
determining data flow types of PCIe read requests from a plurality of functional entities through flow table mapping, distributing tag resources for the read requests from corresponding tag resource groups according to the data flow types, and constructing corresponding context caches;
Distributing a return data cache for the tag resources in an on-chip cache, updating the context cache according to the distribution result, arbitrating the read requests of the tag resources distributed, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is updated in real time based on the cache utilization rate of each data stream type;
and receiving a return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing a data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity.
Preferably, the determining the data flow type of the read request through flow table mapping further includes:
The data stream types of the read requests are determined from the read request ID input stream tables from different functional entities or different streams, wherein the data stream types comprise low-delay access, common access, high-bandwidth access and zero-load access, the stream tables are configured by software through the stream types of the different read request IDs, or the characteristic information of the read requests of the different IDs is tracked by hardware to be dynamically updated.
Preferably, the allocating tag resources for the read request from the corresponding tag resource packet according to the data stream type further includes:
And grouping all tag resources according to different data stream types by utilizing preset configuration parameters, setting a shared area among all groups, distributing corresponding tag resources for the read request in the corresponding groups of the data stream types, temporarily distributing the tag resources of the shared area when the tag resources in the groups are exhausted, and dynamically updating the group boundary according to the use condition of the tag resources.
Preferably, the updating the context cache according to the allocation result further includes:
The identity of the original request, whether the current tag is the earliest tag under the ID, the dependency of the returned data corresponding to the current tag on the previous outstanding tag, whether the current tag is the last tag of the read request, whether the current tag has been allocated but not sent to the PCIe network, whether the request corresponding to the current tag has been sent and is waiting for the returned data, whether the response corresponding to the current tag returns over time, and whether the data of the current tag has all been returned are recorded in each entry of the context cache.
Preferably, before the returning data cache is allocated to the tag resource, the method further includes:
And calculating the cache utilization rate by dividing the sum of read lengths corresponding to all the tags in the outstanding state by the number of the tags in the outstanding state and multiplying the cache granularity of the current cache packet, and dynamically adjusting the cache granularity according to the cache utilization rate.
According to a second aspect of the present invention, there is provided a PCIe tag resource allocation apparatus based on stream attribute, including:
the tag allocation unit is used for acquiring PCIe read requests from a plurality of functional entities, determining the data flow types of the read requests through flow table mapping, allocating tag resources for the read requests from corresponding tag resource groups according to the data flow types, and constructing corresponding context caches;
The cache allocation unit is used for allocating a return data cache for the tag resource in the on-chip cache, updating the context cache according to an allocation result, arbitrating the read request of the allocated tag resource, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is determined based on the cache utilization rate of each data stream type;
and the receiving unit is used for receiving the return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing the data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity.
Compared with the prior art, the technical scheme of the invention has the following advantages:
The performance requirements of different streams are better met by the sort management of read data access requests. The tag resources are managed in groups according to different data stream types, so that tag allocation of different streams can be processed in parallel, and throughput is increased. the tag resources can be shared among different types of data streams in real time, so that blocking caused by access burst among different data streams is prevented, and concurrency performance of different data streams is improved. Under the condition that the data stream does not have priori knowledge, the performance requirements of different types of accesses are met through self-adaptive updating iteration, the requirement of low-delay data stream on link delay is high, the access burst requirement of high bandwidth is met, a larger optimization space is provided for 1O operation of a host side, and tag grouping and cache management are facilitated by a stream classification method. The buffer memory is grouped according to different data stream types and tag groups, different groups are managed according to different granularity, and meanwhile, the buffer memory granularity of different groups can be independently updated in real time, so that the utilization rate of the on-chip buffer memory is greatly improved. The information such as the tag context, the cache address and the like in the tag can be uniquely indexed according to the tag number, so that the efficiency of indexing the tag context and the corresponding cache address is very high. The method is not limited to hardware implementation, can be matched with software implementation, is flexible in implementation mode, can be used for chip design implementation, and can also be used for performance convergence simulation under chip modeling.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure and process particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of an exemplary PCIe multi-function terminal device in accordance with the present invention.
FIG. 2 is a general flow chart of a PCIe tag resource allocation method based on stream attributes according to the present invention.
Fig. 3 is a circuit configuration diagram of a read channel implementing the method according to the invention.
Fig. 4 is a flow table structure diagram of indexing data flow types based on function ID/flow ID according to the present invention.
Fig. 5 is a schematic diagram of a tag packet based on a data stream type according to the present invention.
Fig. 6 is a schematic diagram of the mapping relationship between different packet tags and on-chip caches according to the present invention.
FIG. 7 is a schematic diagram of a data structure of a tag context according to the present invention.
Fig. 8 is a diagram of tag grouping and cache fine granularity mapping according to the present invention.
Fig. 9 is a flow chart of real-time update based on tag packet boundaries in accordance with the present invention.
FIG. 10 is a flow of real-time update of cache management granularity based on cache utilization in accordance with the present invention.
Fig. 11 is a flow of real-time update based on function ID/flow ID data stream type in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which are derived by a person skilled in the art from the embodiments according to the invention without creative efforts, fall within the protection scope of the invention.
Based on the analysis, the invention provides a PCIe tag cache self-adaptive resource allocation method and device based on stream attributes, which are mainly suitable for PCIe protocol interface equipment, wherein tags are dynamically allocated according to groups according to stream attributes, a flow classification mode of a read channel of a PCIe function node is determined, tag resources are grouped according to different data flow types, tag contexts are respectively maintained according to different tag groups, different data cache sizes are allocated for different types of tags, caches corresponding to different groups are managed according to different granularity, and the granularity is dynamically adjustable, so that the higher utilization rate of the tags and cache is realized.
One PCIe device often contains multiple functional entities (functions), each function having read data requests to a host (host memory/cache) or other node, which typically include access to large amounts of data, access to control data, 0-length data read requests for write acknowledgements, and access to general data structures. Different access types have different requirements on access latency and data burst, so in a specific embodiment of the present invention, read requests are subdivided into four types, low latency access, normal access, high bandwidth access, and zero load access. The four types of features are different. Low latency access is characterized by low latency requirements and small data length per request. High bandwidth access often involves sequential access to a segment of address space, and a request often spans multiple tags, consuming multiple blocks of cache space. The zero load access is characterized by read-after-write, ensuring that the previous write request is successfully written into the main memory without returning data. The common access is characterized by a low-latency access and a high-bandwidth access, and consumes more cache space than the low-latency access, but generally does not span tags.
In order to better use tag resources, the invention divides the whole tag resources into 4 groups according to the attribute of the stream. The boundaries between different groupings are not fixed and the window sizes of the groupings are automatically adjusted according to the runtime state. Meanwhile, a shared area exists between windows, and the shared area is used as shared resource allocation in tag allocation.
According to the tag grouping attribute, the data cache resources are divided into three groups, wherein the zero load access corresponding tags do not need to be cached. In order to finely utilize the cache resources, the sizes of the cache blocks corresponding to the tags are different for different packets, the sizes of the cache blocks corresponding to the tags are updated in real time based on the tag-ram utilization rate, and meanwhile, under an emergency condition, the low-delay access and the high-bandwidth access can share the caches corresponding to the common stream.
And mapping read accesses from different functions according to function IDs and stream IDs (flow IDs) and 4 data stream access types to form a stream classification table. The flow classification table may be updated in real time based on the access status. And according to the attribute of the flow, different requests are sent to the PCIe link through scheduling according to a certain rule. The scheduling rule is to ensure that multiple requests for high bandwidth flows are continuous and low latency flows are as unblocked as possible. Recording the state of the current tag by using the tag context, and returning the returned data to the corresponding function according to the destination function ID and the flow ID according to the tag state.
According to the scheme provided by the invention, through reasonable tag allocation, the blocking among different data streams is effectively solved, the low delay and the high bandwidth are optimally balanced among different data streams, and meanwhile, through fine-granularity cache management, the cache utilization rate is improved, the actual cache size is reduced, and the chip area is finally reduced.
In the PCIe device shown in fig. 1, the functional entities 101 to 104 independent of each other are called functions. The data path includes three separate data channels, of which 105 is denoted as a function read Host, called a read channel, 106 is denoted as a function write Host, called a write channel, and 107 is denoted as a Host access function access channel. 108, a controller implementing a PCIe transport layer to high speed serial interface for converting the application layer read request protocol into a transport layer message tlp, and sending the transport layer message tlp to PCIe interconnect network 109. The functional entities 101-104 initiate different read requests to the PCIe link, and each read request requires a response from the link, so that operations such as reasonable tag marking, cache allocation, and priority arbitration need to be performed on the requests to accommodate load balancing, low latency, and reasonable allocation of cache resources for multiple requests. The functional circuitry described in the present invention is implemented in the read channel 105 of fig. 1.
Referring to the flowchart of fig. 2, the PCIe tag cache adaptive resource allocation method based on stream attributes provided by the present invention includes:
Step 101: determining data flow types of PCIe read requests from a plurality of functional entities through flow table mapping, distributing tag resources for the read requests from corresponding tag resource groups according to the data flow types, and constructing corresponding context caches.
Fig. 3 shows an implementation of the read channel 105 shown in fig. 1. Different types of read requests 201 from the functions of the respective functional entities are mapped into different data stream types after being mapped by the stream table of the stream classifier 202. The read request mapped by the flow table is subjected to tag allocation by the tag allocator 203 according to the data flow type, and the allocated result forms a context which is updated into the tag context cache 204.
FIG. 4 depicts a table lookup structure based on a function ID/flow ID index. This table is maintained in the flow classifier 202 in fig. 3. Requests from different functions and different flows first get which data flow type the current request belongs to through the table. The data stream types may be the 4 stream types mentioned above. The table may be configured either statically or dynamically. Under static configuration, the table is statically configured by the stream type of the different read request IDs of the software. Under dynamic configuration, the feature information of different ID read requests can be tracked by hardware for dynamic updating.
Fig. 5 depicts tag packets according to the data stream type. Configuration parameters 401 of tag grouping the entire tag resources into 4 groups using 4 groups of integer parameters: the low-delay access tag, the normal access tag, the high-bandwidth access tag and the zero-load tag form a linear arrangement result 402 of tag packets, and different tag packets are not overlapped. The boundary area between groups is the shared area, when the tag resources in the groups are exhausted, the tags of the shared area can be temporarily allocated. In the working process, tag consumption conditions of different data stream types can be tracked in real time, and boundaries of different packets are updated in real time according to the consumption conditions.
Step 102: and distributing a return data cache for the tag resources in the on-chip cache, updating the context cache according to the distribution result, arbitrating the read requests of the tag resources distributed, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is updated in real time based on the cache utilization rate of each data stream type.
Referring to fig. 3, after the tag allocation is completed, a corresponding return data buffer is allocated to the allocated tag by the buffer management module 206, and the allocation result forms a context, so that the context is synchronously updated to the context buffer 204. the tag allocator 203 arbitrates the read requests that have completed tag allocation to the arbitration module 205 according to the context of different types of streams maintained by the context cache 204, the arbitration module 205 arbitrates multiple requests according to preset rules, forms the context from the arbitration result to update the context cache 204, and sends relevant information (including tag, read address, read data length, etc., and also information such as read address and read data length, etc., which are cached in the context cache, not described in detail herein) of the packet to the packet generator module 207. The packet generator module 207 forms a read request packet from the received information and sends it to the PCIe link through the PCIe protocol controller 210.
Fig. 6 depicts the mapping of tags to the return data cache. Where 501 denotes a low latency access tag packet, 502 denotes a normal access tag packet, 503 denotes a high bandwidth access tag packet, and 504 denotes a zero load tag packet. 505 identifies the data cache corresponding to the low latency access tag packet, 506 represents the data cache corresponding to the normal access tag packet, 507 represents the data cache corresponding to the high bandwidth access tag packet, and zero load tag packets do not require caching. 508 describes the offset in the cache block of the last valid byte of the corresponding data cache block. Each tag corresponds to an independent region of the data cache. The cache block sizes of different packets are different, wherein the cache block 505 has the smallest granularity, the cache granularity 506 has the moderate size, and the cache block 507 has the largest granularity. In the tag-buffer mapping structure, a memory area can be uniquely marked by using a tag.
The context data structure corresponding to the tag is described as shown in fig. 7. Wherein each tag in each of the low latency access tag packet 601, the normal access tag packet 602, the high bandwidth access tag packet 603, and the zero load tag packet 604 corresponds to a context entry. Where 605 is a description of a context entry, the function ID/flow ID in the entry is the identity of the original request. Because read return data in the same function ID/flow ID needs order preservation, pre tags and next tags are used for tracking whether the current tag is the earliest tag under the ID. The order field indicates the dependency relationship between the returned data corresponding to the current tag and the previous unfinished tag, when the order is 0, the read request corresponding to the tag does not need order preservation, and when the data corresponding to the tag is returned, the data can be immediately returned to the corresponding function interface. Since in highband streams, it may happen that one read request spans multiple tags, the last field identifies whether the current tag is the last tag of the read request. The req field identifies that the current tag has been allocated, but is not sent to the PCIe network; the wait field identifies that the request corresponding to the current tag has been issued and is waiting for return data: the timeout field identifies that the response corresponding to the tag returns to timeout, and triggers operations such as timeout recovery. The complete field identifies whether the data for the current tag has been returned.
For the access requests sent by the 4 data stream types, the arbitration policy can adopt an arbitration policy based on priority weighting, and the priority is low-delay access low_latency, zero-load access zero load, normal access high band from high to low. In a request of a high bandwidth type, the read length may span multiple tags, and when the request is successfully arbitrated, it must be ensured that the arbitration module can receive a new request after the read request corresponding to the last tag is sent.
Step 103: and receiving a return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing a data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity.
The PCIe protocol controller 210 receives the returned CPL/CPLD (a returned packet type defined in the PCIe bus protocol, including read status and read data returned by the read request), returns to the packet resolution module 208 for packet validity checking and tag matching. For the matched CPL/CPLD, the contained data and return information are extracted and written into the tag cache corresponding to the cache management module 206, the tag context information of the context cache 204 is updated, and after all the data corresponding to the read request are returned, the data are sorted to different function data return interfaces 212 according to the original stream ID through the stream sorting module 211, and the read data are returned to the function.
FIG. 8 depicts one runtime state of data structures during operation. Where 704 represents a low latency access tag packet, 701 represents 704 a corresponding cache packet, where the size of the data cache block corresponding to one tag is 32B. Reference numeral 705 denotes a normal access tag packet, and 702 denotes a data cache packet corresponding to reference numeral 705, wherein the size of a data cache block corresponding to one tag is 64B.706 represents a high bandwidth access tag packet, 703 represents a data cache packet corresponding to 706, where the size of the data cache block corresponding to one tag is 128B. 704. The four tag groups 705, 706 and 707 have shared areas, so that the tag empty rate of each group can be tracked and counted during running, and the tag group table can be updated in real time. There is a linear arrangement similar to a tag packet among the three data cache packets 701, 702, 703, and real-time adjustment is performed on the data cache packets 701, 702, 703 according to real-time variation of the tag packet. And adjusting the granularity of the buffer memory in each packet according to the tag-ram utilization rate. 708 describes the context information corresponding to the tags that have been allocated. All the data structures presented in fig. 8 can be looked up quickly using tags as indexes.
Fig. 9 depicts one way of updating the boundaries of tag packets. As shown in fig. 9, for the current tag packet, two counters, low_cnt and high_cnt, are maintained. When a new read request needs to be allocated with tags, if the number of the tags used after the current request is added exceeds the total number of the tags in the tag group, the low_cnt is subtracted by 1, and the high_cnt is added by 1; when a new read request needs to be allocated with a tag, the number of tags used after the current request is added and does not exceed the number of tags in the tag packet, and then low_cnt is added by 1, and high_cnt is subtracted by 1. When high_cnt is accumulated to a certain threshold, the tag boundary is extended by 1 to the shared area, and the high_cnt is cleared. When the low_cnt is accumulated to a certain threshold value, the tag boundary is retracted by 1 to the own area direction. Fig. 9 simply adjusts the boundaries according to tag packet dynamic utilization. Regarding the selection of the count and threshold, it may be set according to specific requirements.
Fig. 10 depicts a flow of size adjustment of the cache granularity of a cache packet corresponding to a tag. As shown in fig. 10, allocating buffer space for each read request triggers the calculation of buffer utilization within the current buffer packet. In one embodiment, the cache utilization is calculated by dividing the sum of the read lengths corresponding to all tags in outstanding states by the number of tags in outstanding states times the cache granularity of the current cache packet. When the cache utilization rate is higher than the upper limit, hig _cnt is accumulated, and when the high_cnt reaches a predefined threshold, the cache granularity of the on-chip cache is adjusted upwards according to a preset step size; and accumulating the low_cnt when the utilization rate of the on-chip cache is lower than the lower limit, and adjusting the cache granularity of the on-chip cache downwards according to a preset step size when the low_cnt is lower than a predefined threshold. Regarding the selection of the counts and the predefined threshold values, they may be set according to specific requirements.
Fig. 11 depicts the flow of updating the data stream type corresponding to the function ID/flow ID. When a new read request appears in the corresponding function ID/flow ID, if the size of the request is larger than the mem_size corresponding to the current flow type, subtracting 1 from the low_cnt, and adding 1 to the high_cnt; if the requested size is smaller than the mem_size corresponding to the current flow type, then low_cnt is increased by 1 and high_cnt is decreased by 1. And if the low_cnt reaches a predefined threshold, migrating the data stream type corresponding to the current function ID/flow ID according to the direction of the diagram. Specifically, the normal access normal may be migrated to the low latency access, or the high bandwidth access may be migrated to the normal access normal. When the high_cnt reaches a predefined threshold, the data stream type corresponding to the current function ID/flow ID is migrated in the direction of illustration, specifically, low-latency access may be migrated to normal access normal, or normal access normal may be migrated to high-bandwidth access high.
It should be noted that, the update flows of fig. 9 to 11 may be independently controlled by adding switches.
Compared with the prior art, the PCIe tag cache resource allocation method based on the stream attribute has the following advantages:
And the read data access request of the equipment to the PCIe node is divided into different streams according to the attribute, and the performance requirements of the different streams are better met through classified management. The tag resources are managed in groups according to different data stream types, so that tag allocation of different streams can be processed in parallel, and throughput is increased. Buffer areas are arranged between different tag packet boundaries, the packet boundaries are updated in real time, tag resources can be shared in real time between different types of data streams, and blocking caused by access bursts between different data streams is prevented. The buffer memory is grouped according to different data stream types and tag groups, and different groups are managed according to different granularity, so that the utilization rate of the buffer memory can be improved, buffer areas are arranged among different buffer memories, dynamic sharing and boundary updating are performed according to the running condition, and the utilization rate of the whole buffer memory can be improved. The flow mapping table is established for different data flows and updated in real time according to the running state, and the performance requirements of different types of accesses can be satisfied through self-adaptive updating iteration under the condition that the data flows do not have priori knowledge. The arbitration policy of the access requests sent by the 4 data stream types can adopt an arbitration policy based on priority weighting, the priority is low-delay access low_latency, zero-load access zero load, normal access normal and high-bandwidth access high band in sequence from high to low, so that the requirement of low-delay data streams on link delay is met, the requirement of high-bandwidth access burst is met, and a larger optimization space is provided for IO operation of a host side. When the request with high bandwidth type is required to be sent, the arbitration circuit can only receive new request after the read request corresponding to the last tag is sent, so that the cutoff transmission of large amount of data can be avoided. The information such as the tag context, the cache address and the like in the tag can be uniquely indexed according to the tag number, so that the efficiency of indexing the tag context and the corresponding cache address is very high. The provided flow classification method facilitates tag grouping and cache management, including but not limited to the 4 grouping modes mentioned herein. The method provided by the invention is not limited to hardware implementation, can be realized by matching with software, and has flexible implementation mode. The method has the characteristic of self-adaption, can be used for realizing chip design and can also be used for performance convergence simulation under chip modeling.
Accordingly, in a second aspect, the present invention provides a PCIe tag resource allocation apparatus based on stream attribute, including:
the tag allocation unit is used for determining data stream types of PCIe read requests from a plurality of functional entities through stream table mapping, allocating tag resources for the read requests from corresponding tag resource groups according to the data stream types, and constructing corresponding context caches;
The cache allocation unit is used for allocating a return data cache for the tag resources in the on-chip cache, updating the context cache according to an allocation result, arbitrating the read requests of the tag resources allocated, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is updated in real time based on the cache utilization rate of each data stream type;
and the receiving unit is used for receiving the return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing the data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity.
The above device may be implemented by the PCIe tag cache adaptive resource allocation method based on the flow attribute provided by the embodiment of the first aspect, and specific implementation manner may be referred to the description in the embodiment of the first aspect, which is not repeated herein.
It will be appreciated that the circuit structures, field structures and parameters described in the above embodiments are by way of example only. Those skilled in the art may also make and adjust the structural features of the above embodiments as desired without limiting the inventive concept to the specific details of the examples described above. For example, the 4 grouping modes mentioned herein are just specific examples for implementing tag allocation and buffer allocation. The application is not limited to these 4 grouping modes.
While the invention has been described in detail with reference to the foregoing embodiments, it will be appreciated by those skilled in the art that variations may be made in the techniques described in the foregoing embodiments, or equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A PCIe tag cache self-adaptive resource allocation method based on stream attributes is characterized by comprising the following steps:
determining data flow types of PCIe read requests from a plurality of functional entities through flow table mapping, distributing tag resources for the read requests from corresponding tag resource groups according to the data flow types, and constructing corresponding context caches;
Distributing a return data cache for the tag resources in an on-chip cache, updating the context cache according to the distribution result, arbitrating the read requests of the tag resources distributed, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is updated in real time based on the cache utilization rate of each data stream type;
receiving a return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing a data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity;
the updating the context cache according to the allocation result further comprises:
the method includes the steps that an identification of an original request, whether a current tag is the earliest tag under a read request ID, the dependency relationship between returned data corresponding to the current tag and the previous outstanding tag, whether the current tag is the last tag of the read request, whether the current tag is allocated but not sent to a PCIe network, whether the request corresponding to the current tag is sent and is waiting for returned data, whether the response corresponding to the current tag returns overtime, and whether the data of the current tag are all returned are recorded in each table item of a context cache.
2. The PCIe tag cache adaptive resource allocation method based on flow attributes according to claim 1, wherein said determining a data flow type of said read request through flow table mapping further comprises:
The data stream types of the read requests are determined from the read request ID input stream tables from different functional entities or different streams, wherein the data stream types comprise low-delay access, common access, high-bandwidth access and zero-load access, the stream tables are configured by software through the stream types of the different read request IDs, or the characteristic information of the read requests of the different IDs is tracked by hardware to be dynamically updated.
3. The PCIe tag cache adaptive resource allocation method based on stream attributes according to claim 1, wherein allocating tag resources for the read request from a corresponding tag resource packet according to the data stream type further comprises:
And grouping all tag resources according to different data stream types by utilizing preset configuration parameters, setting a shared area among all groups, distributing corresponding tag resources for the read request in the corresponding groups of the data stream types, temporarily distributing the tag resources of the shared area when the tag resources in the groups are exhausted, and dynamically updating the group boundary according to the use condition of the tag resources.
4. The method for adaptive resource allocation of PCIe tag caches based on stream attributes according to claim 1, wherein before said allocating return data caches for said tag resources, the method further comprises:
And calculating the cache utilization rate by dividing the sum of read lengths corresponding to all the tags in the outstanding state by the number of the tags in the outstanding state and multiplying the cache granularity of the current cache packet, and dynamically adjusting the cache granularity according to the cache utilization rate.
5. PCIe tag cache self-adaptive resource allocation device based on stream attribute, which is characterized by comprising:
the tag allocation unit is used for determining data stream types of PCIe read requests from a plurality of functional entities through stream table mapping, allocating tag resources for the read requests from corresponding tag resource groups according to the data stream types, and constructing corresponding context caches;
The cache allocation unit is used for allocating a return data cache for the tag resources in the on-chip cache, updating the context cache according to an allocation result, arbitrating the read requests of the tag resources allocated, forming a read request data packet and sending the read request data packet to a PCIe link, wherein the space granularity of the return data cache is updated in real time based on the cache utilization rate of each data stream type;
The receiving unit is used for receiving the return packet of the read request from the PCIe link, analyzing to obtain a tag corresponding to the return packet, storing the data load in the return packet into a return data cache corresponding to the tag, and returning to a corresponding functional entity;
the cache allocation unit is further configured to:
the method includes the steps that an identification of an original request, whether a current tag is the earliest tag under a read request ID, the dependency relationship between returned data corresponding to the current tag and the previous outstanding tag, whether the current tag is the last tag of the read request, whether the current tag is allocated but not sent to a PCIe network, whether the request corresponding to the current tag is sent and is waiting for returned data, whether the response corresponding to the current tag returns overtime, and whether the data of the current tag are all returned are recorded in each table item of a context cache.
6. The PCIe tag cache adaptive resource allocation apparatus based on stream attributes according to claim 5, wherein the tag allocation unit is further configured to:
The data stream types of the read requests are determined from the read request ID input stream tables from different functional entities or different streams, wherein the data stream types comprise low-delay access, common access, high-bandwidth access and zero-load access, the stream tables are configured by software through the stream types of the different read request IDs, or the characteristic information of the read requests of the different IDs is tracked by hardware to be dynamically updated.
7. The PCIe tag cache adaptive resource allocation apparatus based on stream attributes according to claim 5, wherein the tag allocation unit is further configured to:
And grouping all tag resources according to different data stream types by utilizing preset configuration parameters, setting a shared area among all groups, distributing corresponding tag resources for the read request in the corresponding groups of the data stream types, temporarily distributing the tag resources of the shared area when the tag resources in the groups are exhausted, and dynamically updating the group boundary according to the use condition of the tag resources.
8. The PCIe tag cache adaptive resource allocation apparatus based on stream attribute according to claim 5, wherein the cache allocation unit is further configured to:
before the data cache is returned for the tag resource allocation, the cache utilization rate is calculated by dividing the sum of read lengths corresponding to all tags in outstanding states by the number of tags in outstanding states and multiplying the sum by the cache granularity of the current cache packet, and the cache granularity is dynamically adjusted according to the cache utilization rate.
CN202311673885.7A 2023-12-06 2023-12-06 PCIe tag cache self-adaptive resource allocation method and device based on stream attribute Active CN117724833B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311673885.7A CN117724833B (en) 2023-12-06 2023-12-06 PCIe tag cache self-adaptive resource allocation method and device based on stream attribute

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311673885.7A CN117724833B (en) 2023-12-06 2023-12-06 PCIe tag cache self-adaptive resource allocation method and device based on stream attribute

Publications (2)

Publication Number Publication Date
CN117724833A CN117724833A (en) 2024-03-19
CN117724833B true CN117724833B (en) 2024-05-28

Family

ID=90206353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311673885.7A Active CN117724833B (en) 2023-12-06 2023-12-06 PCIe tag cache self-adaptive resource allocation method and device based on stream attribute

Country Status (1)

Country Link
CN (1) CN117724833B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681222A (en) * 2016-03-03 2016-06-15 深圳市同创国芯电子有限公司 Method and apparatus for data receiving and caching, and communication system
CN112131155A (en) * 2020-09-29 2020-12-25 中国船舶重工集团公司第七二四研究所 PCIE transaction layer transmission method based on FPGA with high expansibility
WO2021014114A1 (en) * 2019-07-19 2021-01-28 Arm Limited An apparatus and method for processing flush requests within a packet network
WO2021202175A1 (en) * 2020-03-30 2021-10-07 Pure Storage, Inc. File systems constructed of block objects
CN114138502A (en) * 2020-09-03 2022-03-04 Arm有限公司 Data processing
CN114553776A (en) * 2022-02-28 2022-05-27 深圳市风云实业有限公司 Signal out-of-order control and rate self-adaptive transmission device and transmission method thereof
CN114741341A (en) * 2022-03-01 2022-07-12 西安电子科技大学 Method, system and storage medium for realizing Crossbar structure arbitration
CN114860785A (en) * 2022-07-08 2022-08-05 深圳云豹智能有限公司 Cache data processing system, method, computer device and storage medium
CN115002052A (en) * 2022-07-18 2022-09-02 井芯微电子技术(天津)有限公司 Layered cache controller, control method and control equipment
CN116821011A (en) * 2023-08-24 2023-09-29 摩尔线程智能科技(北京)有限责任公司 Parameter determination and data reading and writing method, processor, device and computer equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11604683B2 (en) * 2019-12-19 2023-03-14 Marvell Asia Pte Ltd System and methods for tag-based synchronization of tasks for machine learning operations

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681222A (en) * 2016-03-03 2016-06-15 深圳市同创国芯电子有限公司 Method and apparatus for data receiving and caching, and communication system
WO2021014114A1 (en) * 2019-07-19 2021-01-28 Arm Limited An apparatus and method for processing flush requests within a packet network
EP3999972A1 (en) * 2019-07-19 2022-05-25 ARM Limited An apparatus and method for processing flush requests within a packet network
WO2021202175A1 (en) * 2020-03-30 2021-10-07 Pure Storage, Inc. File systems constructed of block objects
CN114138502A (en) * 2020-09-03 2022-03-04 Arm有限公司 Data processing
CN112131155A (en) * 2020-09-29 2020-12-25 中国船舶重工集团公司第七二四研究所 PCIE transaction layer transmission method based on FPGA with high expansibility
CN114553776A (en) * 2022-02-28 2022-05-27 深圳市风云实业有限公司 Signal out-of-order control and rate self-adaptive transmission device and transmission method thereof
CN114741341A (en) * 2022-03-01 2022-07-12 西安电子科技大学 Method, system and storage medium for realizing Crossbar structure arbitration
CN114860785A (en) * 2022-07-08 2022-08-05 深圳云豹智能有限公司 Cache data processing system, method, computer device and storage medium
CN115002052A (en) * 2022-07-18 2022-09-02 井芯微电子技术(天津)有限公司 Layered cache controller, control method and control equipment
CN116821011A (en) * 2023-08-24 2023-09-29 摩尔线程智能科技(北京)有限责任公司 Parameter determination and data reading and writing method, processor, device and computer equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A Parallel and Pipelined Architecture for Accelerating Fingerprint Computation in High Throughput Data Storages》;Dongyang Li 等;《IEEE》;20151231;全文 *
《基于Tag的PCIe总线事物并行处理技术研究》;李少博;《中国优秀硕士学位论文全文数据库》;20170201;全文 *

Also Published As

Publication number Publication date
CN117724833A (en) 2024-03-19

Similar Documents

Publication Publication Date Title
US8190795B2 (en) Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program
US10530846B2 (en) Scheduling packets to destination virtual machines based on identified deep flow
WO2021228103A1 (en) Load balancing method and apparatus for cloud host cluster, and server
JP5041805B2 (en) Service quality controller and service quality method for data storage system
US8997109B2 (en) Apparatus and method for managing data stream distributed parallel processing service
US8996756B2 (en) Using process location to bind IO resources on NUMA architectures
US20170177221A1 (en) Dynamic core allocation for consistent performance in a non-preemptive scheduling environment
US20110161972A1 (en) Goal oriented performance management of workload utilizing accelerators
US20070011396A1 (en) Method and apparatus for bandwidth efficient and bounded latency packet buffering
US10742560B2 (en) Intelligent network resource manager
US10394606B2 (en) Dynamic weight accumulation for fair allocation of resources in a scheduler hierarchy
WO2014022395A1 (en) Priority driven channel allocation for packet transferring
US20060133418A1 (en) System and method for connection capacity reassignment in a multi-tier data processing system network
CN110753009A (en) Virtual machine and network bandwidth joint distribution method based on multi-QoS grouping
CN117724833B (en) PCIe tag cache self-adaptive resource allocation method and device based on stream attribute
CN115129621A (en) Memory management method, device, medium and memory management module
CN116248699B (en) Data reading method, device, equipment and storage medium in multi-copy scene
CN113448516B (en) Data processing method, system, medium and equipment based on RAID card
KR20180134219A (en) The method for processing virtual packets and apparatus therefore
US20140050221A1 (en) Interconnect arrangement
KR101773528B1 (en) Network interface apparatus and method for processing virtual machine packets
CN114217733B (en) IO (input/output) processing framework and IO request processing method for IO forwarding system
US20230052614A1 (en) Pacing in a storage sub-system
US20240320170A1 (en) Method, device, and computer program product for data access
US20220365728A1 (en) Method and system for maximizing performance of a storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant