CN111124492B - Instruction generation method and device, instruction execution method, processor and electronic equipment - Google Patents

Instruction generation method and device, instruction execution method, processor and electronic equipment Download PDF

Info

Publication number
CN111124492B
CN111124492B CN201911300243.6A CN201911300243A CN111124492B CN 111124492 B CN111124492 B CN 111124492B CN 201911300243 A CN201911300243 A CN 201911300243A CN 111124492 B CN111124492 B CN 111124492B
Authority
CN
China
Prior art keywords
instruction
path
source operand
address information
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911300243.6A
Other languages
Chinese (zh)
Other versions
CN111124492A (en
Inventor
蒋宇翔
王晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Haiguang Microelectronics Technology Co Ltd
Original Assignee
Chengdu Haiguang Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Haiguang Microelectronics Technology Co Ltd filed Critical Chengdu Haiguang Microelectronics Technology Co Ltd
Priority to CN201911300243.6A priority Critical patent/CN111124492B/en
Publication of CN111124492A publication Critical patent/CN111124492A/en
Priority to PCT/CN2020/114002 priority patent/WO2021120712A1/en
Application granted granted Critical
Publication of CN111124492B publication Critical patent/CN111124492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30094Condition code generation, e.g. Carry, Zero flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30196Instruction operation extension or modification using decoder, e.g. decoder per instruction set, adaptable or programmable decoders

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)

Abstract

The application relates to an instruction generation method, an instruction generation device, an instruction execution method, a processor and electronic equipment. The method comprises the following steps: determining that the instruction execution unit supports data pass-through; when generating an instruction, setting a first identifier used for indicating that destination data of the ith instruction is written into a through path in the ith instruction, and setting a second identifier used for indicating that a source operand is obtained from the through path in the (i + j) th instruction; and sending the ith instruction and the (i + j) th instruction to an instruction execution unit, so that the instruction execution unit writes the result of the ith instruction into the through path according to the first identifier when executing the instruction, and acquires the required source operand from the through path according to the second identifier. By setting a first identifier for indicating that destination data is written into a through path and a second identifier for indicating that a source operand is from the through path in an instruction, explicit data through is forcibly realized by hardware by means of software, and the access times of a memory are reduced.

Description

Instruction generation method and device, instruction execution method, processor and electronic equipment
Technical Field
The application belongs to the technical field of computers, and particularly relates to an instruction generation method, an instruction generation device, an instruction execution method, a processor and electronic equipment.
Background
The power consumption is a key point of attention in computing application, in typical high-intensity computing application, 70-80% of the power consumption is used by a computing unit, and in the computing unit, 50% of the power consumption is used for reading and writing data. In typical scientific computing and machine learning applications, matrix multiplication is one of the most popular use cases, and in matrix multiplication computing applications, 35-40% of the power is used to access Vector General Purpose Registers (VGPR).
Disclosure of Invention
In view of the foregoing, an object of the present application is to provide an instruction generating method, an instruction generating apparatus, an instruction executing method, a processor, and an electronic device, so as to solve the problem that a large amount of memory access is required in typical high-strength computing applications, which results in a large amount of power consumption.
The embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides an instruction generation method, where the method includes: determining that the instruction execution unit supports data pass-through; when an instruction is generated, if the destination data of an ith instruction is required to be used as a source operand of an (i + j) th instruction, setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the ith instruction, and setting a second identifier for indicating that the source operand is obtained from the through path in the (i + j) th instruction; wherein i and j are positive integers; sending the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes the result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and acquires a required source operand from the through path according to the second identifier when executing the (i + j) th instruction. In the embodiment of the application, the first identifier for indicating that the destination data is written into the through path and the second identifier for indicating that the source operand is from the through path are arranged in the instruction, and the hardware is forced to realize the explicit data through by means of software, so that the access times of the memory can be greatly reduced, and a large amount of power consumption is saved.
With reference to a possible implementation manner of the embodiment of the first aspect, sending the ith instruction and the (i + j) th instruction to the instruction execution unit includes: splicing the ith instruction and the (i + j) th instruction according to a generation sequence to obtain an instruction block; sending the instruction block to a decoder, so that the decoder sequentially acquires first key information in the ith instruction from the instruction block and sends the first key information to the instruction execution unit, acquires second key information in the (i + j) th instruction and sends the second key information to the instruction execution unit, wherein the first key information comprises the first identifier, and the second key information comprises the second identifier. In the embodiment of the application, the instruction blocks are obtained by splicing the instructions according to the generation sequence, so that when a plurality of instruction blocks exist, the instructions generated in the embodiment of the application can be correctly executed, and because the execution party of the whole instruction block needs to be completed when the current instruction block is executed by hardware, the instruction can be switched to other instruction blocks, and the condition that the instruction is switched to the instruction for realizing other functions in the midway of the instruction execution process, so that the data direct connection has a logic error is avoided.
In a second aspect, an embodiment of the present application further provides an instruction execution method, where the method includes: acquiring an instruction to be executed; obtaining key information in the instruction to be executed, wherein the key information comprises: source operand address information and destination address information, wherein the source operand address information is used for indicating the source of a source operand, and the destination address information is used for indicating a writing path of destination data; judging whether the source operand indicated by the source operand address information is from a through path or not; when the source operand indicated by the source operand address information is from a through path, acquiring a required source operand from the through path; judging whether a write path of the destination data indicated by the destination address information is the through path; and when the write path of the destination data indicated by the destination address information is the through path, writing the result of executing the instruction to be executed into the through path. In the embodiment of the application, the source operand address information, the destination address information and other key information in the instruction to be executed are acquired, the required source operand is directly acquired from the through path when the source operand indicated by the source operand address information is from the through path, and the instruction execution result is directly written into the through path when the write path of the destination data indicated by the destination address information is the through path, so that the access of a memory is reduced, and the power consumption is reduced.
With reference to a possible implementation manner of the embodiment of the second aspect, the determining whether the source operand indicated by the source operand address information is sourced from a direct path includes: judging whether the source operand indicated by the source operand address information is from a direct path by judging whether the source operand address information contains a second identifier; when the source operand address information contains the second identification, the source operand indicated by the source operand address information is characterized to be derived from a through path. In the embodiment of the application, the second identifier for indicating that the source operand is obtained from the through path is arranged in the instruction, so that whether the source operand indicated by the source operand address information is sourced from the through path can be quickly judged by judging whether the source operand address information contains the first identifier.
With reference to a possible implementation manner of the embodiment of the second aspect, the determining whether a write path of destination data indicated by the destination address information is the through path includes: judging whether a write path of destination data indicated by the destination address information is the through path by judging whether the destination address information contains a first identifier; when the destination address information contains the first identification, the write path of the destination data indicated by the destination address information is characterized as the through path. In the embodiment of the application, the first identifier for indicating that the target data of the instruction to be executed is written into the through path is set in the instruction, so that whether the write path of the target data indicated by the destination address information is the through path can be quickly judged by judging whether the destination address information contains the first identifier.
In a third aspect, an embodiment of the present application further provides a processor, including: an instruction execution unit and a processor core; a processor core to determine that the instruction execution unit supports data pass-through; when an instruction is generated, if the destination data of the ith instruction is required to be used as the source operand of the (i + j) th instruction, setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the ith instruction, and setting a second identifier for indicating that the source operand is obtained from the through path in the (i + j) th instruction; i is a positive integer; and the instruction execution unit is also used for sending the ith instruction and the (i + j) th instruction to the instruction execution unit; wherein i and j are positive integers; the instruction execution unit is configured to, when the ith instruction is executed, write a result of the ith instruction into a pass-through path according to the first identifier, and, when the (i + j) th instruction is executed, obtain a required source operand from the pass-through path according to the second identifier.
With reference to a possible implementation manner of the third aspect, the processor further includes a decoder, where the processor core is configured to splice the ith instruction and the (i + j) th instruction according to a generation order to obtain an instruction block, and send the instruction block to the decoder; the decoder is configured to sequentially acquire first key information in the ith instruction from the instruction block, send the first key information to the instruction execution unit, acquire second key information in the (i + j) th instruction, and send the second key information to the instruction execution unit, where the first key information includes the first identifier, and the second key information includes the second identifier.
In a fourth aspect, an embodiment of the present application further provides a processor, including: a decoder and an instruction execution unit; a decoder, configured to obtain an instruction to be executed and obtain key information in the instruction to be executed, where the key information includes: source operand address information and destination address information, wherein the source operand address information is used for indicating a source of a source operand, and the destination address information is used for indicating a writing path of destination data; and further for sending the key information to an instruction execution unit; the instruction execution unit is used for judging whether the source operand indicated by the source operand address information is from a direct path or not; when the source operand indicated by the source operand address information is from a through path, acquiring a required source operand from the through path; and further for judging whether a write path of the destination data indicated by the destination address information is the through path; and when the write path of the destination data indicated by the destination address information is the through path, writing the result of executing the instruction to be executed into the through path.
In combination with one possible implementation manner of the embodiment of the fourth aspect, the instruction execution unit is configured to determine whether the source operand indicated by the source operand address information is derived from a direct path by determining whether the source operand address information includes a second identifier; when the source operand address information contains the second identification, the source operand indicated by the source operand address information is characterized to be derived from a through path.
With reference to one possible implementation manner of the embodiment of the fourth aspect, the instruction execution unit is configured to determine whether a write path of destination data indicated by the destination address information is the through path by determining whether the destination address information includes a first identifier; and when the destination address information contains the first identifier, representing that a write path of destination data indicated by the destination address information is the through path.
In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a processor as provided in the above-mentioned third aspect embodiment and/or in connection with one possible implementation of the third aspect embodiment, or a processor as provided in the above-mentioned fourth aspect embodiment and/or in connection with any one possible implementation of the fourth aspect embodiment.
In a sixth aspect, an embodiment of the present application further provides an instruction generating apparatus, where the apparatus includes: the device comprises a determining module, a generating module and a sending module; the determining module is used for determining that the instruction execution unit supports data direct connection; the generating module is used for setting a first identifier for indicating that the target data of the ith instruction is written into a through path and setting a second identifier for indicating that the source operand is obtained from the through path in the ith + j instruction if the target data of the ith instruction is required to be used as the source operand of the (i + j) th instruction when the instruction is generated; i is a positive integer; a sending module, configured to send the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes a result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and obtains a required source operand from the through path according to the second identifier when executing the (i + j) th instruction.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the application.
Fig. 1 shows a flowchart of an instruction generation method provided in an embodiment of the present application.
Fig. 2 is a schematic diagram illustrating fields in a VOP3 instruction according to an embodiment of the present disclosure.
Fig. 3 is a logic diagram for executing a complete instruction block according to an embodiment of the present application.
Fig. 4 shows a flowchart of an instruction execution method provided in an embodiment of the present application.
Fig. 5 shows a hardware schematic diagram of a matrix multiplication application provided in an embodiment of the present application.
Fig. 6 shows a functional block diagram of an instruction generating apparatus according to an embodiment of the present application.
Fig. 7 shows a block diagram of a processor according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In view of the current problem of large memory access and thus large power consumption in typical high-intensity computing applications, for example, in matrix multiplication applications, it is necessary to repeatedly obtain a source operand a, a source operand B, and a source operand C from a vector general-purpose register VGPR, perform a + B + C calculation, and then write the multiplied result into the VGPR. For example, when instruction0 (instruction 0) is executed, source operand a, source operand B, and source operand C (in this case, 0) are obtained from VGPR, and when the calculation is completed, the calculation result needs to be written into VGPR; when instruction1 is executed, source operand a, source operand B, and source operand C are obtained from VGPR (at this time, the result of the previous calculation is C1 ═ a0+ B0+ C0), and when the calculation is completed, the calculation result needs to be written into VGPR, and the above steps are repeated until the matrix calculation is completed. Therefore, a new instruction is provided in the embodiment of the present application, so that the access of the VGPR may be converted into data pass, that is, the destination data (calculation result) of the ith instruction may be used as the source data of the (i + j) th instruction, so that the hardware directly skips the VGPR write of the destination data (calculation result) of the ith instruction and the VGPR read of the source operand of the (i + j) th instruction. Wherein i and j are both positive integers, and the maximum value of j does not exceed the maximum number of stages of data through supported by hardware, for example, assuming that the maximum number of stages of data through supported by hardware is 2, the maximum value of j is 2.
Referring to fig. 1, steps included in a method for generating an instruction according to an embodiment of the present application will be described with reference to fig. 1.
Step S101: it is determined that the instruction execution unit supports data pass-through.
In the embodiment of the present application, before generating the Instruction, a compiler (program software) detects whether an Instruction Execution unit (Instruction Execution) supports data pass-through, and when it is determined that the Instruction Execution unit (hardware) supports data pass-through, step S102 is performed. Currently, most hardware supports 1/2 levels of implicit data pass-through: when hardware executes the instruction1 or instruction2, if it is detected that source data can be obtained from a direct path (forwarding path), reading of VGPR may be skipped, and the source data may be obtained directly from the direct path, but this approach still has a large number of memory write operations of the instruction0 (previous instruction), and there is also a case where forwarding is unsuccessful, that is, even if hardware supports implicit data direct pass, source data may not necessarily be obtained from the direct path.
Step S102: when an instruction is generated, if the destination data of an ith instruction is required to be used as a source operand of an (i + j) th instruction, a first identifier used for indicating that the destination data of the ith instruction is written into a through path is set in the ith instruction, and a second identifier used for indicating that the source operand is obtained from the through path is set in the (i + j) th instruction.
When determining that hardware supports data pass-through, when generating an instruction, if the destination data of the ith instruction needs to be used as the source operand of the (i + j) th instruction, setting a first identifier for indicating to write the destination data of the ith instruction into a pass-through path in the ith instruction.
Wherein i and j are both positive integers, and the maximum value of j does not exceed the maximum number of stages of data through supported by hardware, for example, assuming that the maximum number of stages of data through supported by hardware is 2, the maximum value of j is 2. If the destination data of the previous instruction is required to be used as the source operand of the next instruction, a first identifier is set in the ith instruction when the ith instruction is generated, and a second identifier is set in the (i + 1) th instruction when the (i + 1) th instruction is generated. If the target data of the previous instruction is required to be used as the source operand of the next instruction, a first identifier is set in the ith instruction when the ith instruction is generated, and a second identifier is set in the (i + 2) th instruction when the (i + 2) th instruction is generated.
For ease of understanding, the present application is described with reference to generating a Vector Operation instruction VOP3(Vector Operation with 3 operands) instruction with 3 operands, as shown in fig. 2. The setting type is "110010", i.e., 110010 means that the instruction is a VOP3 instruction. The meaning of each field in the VOP3 instruction is shown in table 1.
TABLE 1
Figure BDA0002320768500000091
Note that the number of bits (bit width) of each field in table 1 is relatively fixed, and the position thereof may be changed, for example, operandd 0_ ID0 may no longer be [ 40: 32] which may be a value between [ 8: 0] this number of bits, and the rest of the fields are similar.
Wherein, the Operand0_ ID0 field, Operand1_ ID1 field, and Operand2_ ID2 field are used to indicate the source of the Operand, i.e. where to obtain the source Operand, that is, for the Operand0, if VF ═ 125 (second identifier) in this field, it indicates that Operand0 originated from the through path, otherwise Operand operanded 0 is obtained from the position pointed to by Operand0_ ID 0; for the Operand Operand1, if VF in the field is 125 (second identification), it indicates that the Operand Operand1 originates from a direct path, otherwise the Operand Operand1 is obtained from the position pointed to by Operand0_ ID 1; for Operand Operand2, if VF in this field is 125 (second identification), then it indicates that Operand Operand2 originates from the pass-through path, otherwise Operand Operand2 is fetched from the location pointed to by Operand0_ ID 2. The Result _ ID field and the DF field are used to indicate the write path of the destination data, and if DF is equal to 1 (first identifier), the destination data is written directly to the pass-through path, otherwise it is written to the location pointed to by the Result _ ID. Note that, the VF value for indicating that the operand is derived from the pass-through path is not limited to 125, and similarly, the value for indicating that the destination data is written into the pass-through path is not limited to 1.
As described in table 1, if the destination data of the ith instruction needs to be used as the source Operand of the (i + j) th instruction, the value in the destination pass-through DF field of the ith instruction is set to 1, and the value of the VF in the address field of the (i + j) th instruction used for indicating the source of the source Operand is set to 125, that is, if the vector pass-through VF value in the Operand0_ ID0 field is set to 125, it indicates that the Operand operanded 0 is from the pass-through path; if the vector pass VF value in the Operand0_ ID1 field is set to 125, it indicates that Operand1 originates from the pass-through path; if the vector pass VF value in the Operand0_ ID2 field is set to 125, it indicates that Operand operands 2 originate from the pass-through path. In the embodiment of the application, by setting a first identifier (DF ═ 1) for indicating that destination data is written into a pass-through path and setting a second identifier (VF ═ 125) for indicating that a source operand is derived from the pass-through path in an instruction, explicit data pass-through is forcibly realized by hardware by means of software, so that forwarding can be certainly executed.
Step S103: sending the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes the result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and acquires a required source operand from the through path according to the second identifier when executing the (i + j) th instruction.
And after the ith instruction and the (i + j) th instruction are generated, the ith instruction and the (i + j) th instruction are sent to an instruction execution unit to be executed. When the ith instruction is executed, the instruction execution unit writes the result of the ith instruction into the through path according to the first identifier, and when the (i + j) th instruction is executed, the instruction execution unit acquires the required source operand from the through path according to the second identifier. For example, if DF in the ith instruction is equal to 1, the result of the ith instruction is directly written into the pass-through path, and if VF in the operandd 0_ ID0 field in the (i + j) th instruction is equal to 125, the source Operand operandd 0 is directly obtained from the pass-through path.
Considering that, when executing an instruction to realize the same function, an instruction to realize another function is not allowed to be inserted halfway, therefore, in order to ensure that the pass-through scheme provided by the application can be correctly implemented when there are a plurality of instructions for implementing different functions, in the embodiment of the application, or when the ith instruction and the (i + j) th instruction are sent to the instruction execution unit, the ith instruction and the (i + j) th instruction are spliced according to the generation sequence to obtain an instruction block (namely an instruction group or an instruction set), then sending the instruction block to a decoder (hardware) so that the decoder sequentially acquires first key information in the ith instruction from the instruction block and sends the first key information to an instruction execution unit, enabling the instruction execution unit to write the execution result of the ith instruction into the through path according to the first identifier in the first key information; and acquiring second key information in the (i + j) th instruction, and sending the second key information to the instruction execution unit, so that the instruction execution unit acquires the required source operand from the through path according to a second identifier in the second key information. The first key information comprises a first identifier, and the second key information comprises a second identifier. Therefore, when the instruction is executed, the switching to other instruction blocks can be performed only after all the instructions in the current instruction block are executed. Wherein the instruction group includes a group header (group header) and a group body (group body). The group head defines how many resources are used by the instruction group, and the instruction group comprises how many instructions; the group body contains all the instructions of the instruction group.
In order to avoid switching to other instruction blocks during the execution of the current instruction block, a blocking lock (lock) can be added to lock the Arbitration logic (Arbitration) when the hardware runs an instruction block. In each cycle, the decoder reads an instruction from the instruction block to execute, when an instruction with "BS ═ 1" (indicating the start of the instruction block) is encountered, the "lock" logic is enabled, the arbitration logic keeps track of the current "wave _ id", i.e. the arbitration logic can only select an instruction from the current instruction block. When an instruction of "BE ═ 1" (representing the end of an instruction block) is encountered, the "lock" logic will BE disabled, causing the arbitration logic to unlock, entering normal mode. In other words, the execution side, where the hardware must complete the entire instruction block, can switch to other instruction blocks, whose logic diagram is shown in fig. 3.
In order to support a new instruction, an embodiment of the present application further provides an instruction execution method, as shown in fig. 4, and the steps included therein will be described below with reference to fig. 4.
Step S201: and acquiring an instruction to be executed.
Step S202: obtaining key information in the instruction to be executed, wherein the key information comprises: source operand address information and destination address information.
After the instruction to be executed is obtained, key information in the instruction to be executed is obtained, wherein the key information comprises: source operand address information and destination address information. For example, the source Operand address information corresponding to the operatand 0_ ID0 field, the operatand 1_ ID1 field, and the operatand 2_ ID2 field in the instruction shown in fig. 1 is obtained, and the destination address information corresponding to the Result _ ID field and the DF field in the instruction shown in fig. 1 is obtained. The source operand address information is used for indicating the source of the source operand, and the destination address information is used for indicating the writing path of destination data.
Step S203: and judging whether the source operand indicated by the source operand address information is from a through path or not.
After the source operand address information is acquired, whether the source operand indicated by the source operand address information is from a direct path or not is judged, if yes, step S204 is executed, and if not, the source operand is acquired from the address pointed by the source operand address information.
Wherein, the process of determining whether the indicated source operand is derived from the pass-through path may be: determining whether the source operand indicated by the source operand address information originates from a through path by determining whether the source operand address information contains a second identifier; when the source operand address information contains a second identification (VF 125), the source operand indicated by the source operand address information is characterized to be sourced from a through path. It should be noted that the VF value indicating that the operand originates from the pass-through path is not limited to 125.
Step S204: obtaining a required source operand from the pass-through path.
When the source operand indicated by the source operand address information is derived from the pass-through path, the required source operand is obtained from the pass-through path.
Step S205: and judging whether a write path of the destination data indicated by the destination address information is the through path.
After the destination address information is acquired, it is determined whether or not the write path of the destination data indicated by the destination address information is a through path, and if so, step S206 is executed, and if not, the destination data is written to the address indicated by the destination address information.
Optionally, the process of determining whether the write path of the destination data indicated by the destination address information is the through path may be: determining whether a write path of destination data indicated by the destination address information is the through path by determining whether the destination address information contains a first flag (DF ═ 1); and when the destination address information contains the first identification, representing that the write path of the destination data indicated by the destination address information is the through path. Note that the value for instructing to write the destination data to the through path is not limited to 1.
Step S206: and writing the result of executing the instruction to be executed into the through path.
And when the write path of the destination data indicated by the destination address information is a through path, writing the result of executing the instruction to be executed into the through path.
In the embodiment of the application, when a compiler (a software program) detects that hardware can use pass-through data as source data, the compiler explicitly passes through destination data of an instruction0 to a source of an instruction1 or an instruction2 to realize data pass-through, so that the hardware skips VGPR writing of the instruction0 and VGPR reading of the instruction1 or the instruction2, and a large amount of power consumption is saved. For ease of understanding, an example of applying the method provided herein in matrix multiplication will be described below with reference to fig. 5.
It should be noted that the 3 a temporary registers (Temp Register For a) in fig. 5 are all the same temporary Register in physical sense, but 3 temporary registers are represented only because 3 unit times are delayed; similarly, the 2B temporary registers (Temp Register For B) in the figure are all the same temporary Register in physical sense, but 2 temporary registers are represented because 2 unit time is delayed. Since the VGPR has only one read port, three input operands (a, B, C) need to be staggered in time, and can be delayed by the temporary register, and finally aligned at the entry of the Arithmetic Unit (ALU), that is, the operand a obtained from the VGPR is temporarily placed in the temporary register of a at the first time, the operand B obtained from the VGPR is placed in the temporary register of B at the second time, and the operand B obtained from the VGPR is placed in the temporary register of C at the third time, so that the three input operands (a, B, C) can be simultaneously input into the Arithmetic Unit ALU for calculation. The flow logic shown by the dotted line in the figure is the execution logic of the existing instruction, that is, when the instruction0 (instruction 0) is executed, the source operand a, the source operand B and the source operand C (in this case, 0) are obtained from the VGPR, and when the calculation is completed, the calculation result needs to be written into the VGPR; when instruction1 is executed, source operand a, source operand B, and source operand C are obtained from VGPR (at this time, the result of the previous calculation is C1 ═ a0+ B0+ C0), and when the calculation is completed, the calculation result needs to be written into VGPR, and the above steps are repeated until the matrix calculation is completed. The execution logic after the new instruction provided by the application is shown by a solid line in the figure, namely the solid line marked with the first and the second in the figure. It is clear that with the new instructions provided by the present application, the output of the ALU is bypassed directly to the input of the ALU. Note that, in the figure, the solid line denoted by (r) indicates a data through in the case where operands are the same, and in this case, the output of the ALU is directly used as the input of three operands of the ALU, that is, the case where a ═ B ═ C is applied. The solid line labeled c represents the output of the ALU directly as an input to one of the three operands of the ALU. Among these, the three Result temporary registers (Temp Register For Result) in the figure represent three paths, i.e., the output of the ALU serves as either the input of operand a, the input of operand B, or the input of operation C.
As is clear from fig. 5, after the method provided by the embodiment of the present application is applied, only reading of the VGPR of the first instruction and writing of the VGPR of the last instruction are involved, and reading and writing of the VGPR of a large number of intermediate instructions are omitted, so that a large amount of power consumption can be reduced. Next, explanation will be given with an example of multiplying a specific matrix a by a matrix B, in which the result of the detection 0 is used as the source operand of the detection 1 and the result of the detection 1 is used as the source operand of the detection 2 when matrix multiplication is performed. In normal mode, instruction0 writes the result to VGPR, instruction1 reads its source operand from VGPR, instruction1 writes the result to VGPR, and instruction2 reads its source operand from VGPR. Below with C 64x64 =A 64x64 *B 64x64 By way of example, it should be noted that the matrix size of 64X64 is used herein only as an example and is not limited thereto. And assume that there are 64 arithmetic operation units (ALUs), each with a VGPR space of 200x64 bit.
The calculation process is roughly as follows:
1) matrix a is loaded in linear mode to LDS (Local Data Share, Local Share unit):
a (0,0) → LDS (Address 0); // A (0,0) is stored at the location of Address0 of LDS;
a (0,1) → LDS (Address 1); // A (0,1) is stored at the location of Address1 of LDS;
a (0,2) → LDS (Address 2); // A (0,2) is stored at the location of Address2 of LDS;
……
2) matrix B is loaded into the VGPR space as shown in Table 2.
TABLE 2
ALU0 ALU1 ALU2 …… ALU62 ALU63
B0,0 B0,1 B0,2 …… B0,62 B0,63
B1,0 B1,1 B1,2 …… B1,62 B1,63
…… …… …… …… …… ……
B63,0 B63,1 B63,2 …… B63,62 B63,63
During calculation, elements in the matrix A are loaded into 64 ALUs one by one in parallel and multiplied by elements corresponding to columns stored in 64 vector general registers respectively, and the 64 ALUs accumulate multiplication results generated by the elements in the same row in the matrix A and the corresponding elements in the matrix B one by one in parallel in sequence to obtain all elements in the same row in the matrix C, so that multiplication operation of the matrix A and the second matrix B is completed.
3) Calculating a matrix C:
the instructions for calculating matrix C in the normal mode are as follows:
m0_ register is start _ address; the initial address of a register of/M0, wherein the register of M0 is used for storing the address of each element in the reading matrix A and automatically updating to the address corresponding to the next element after the 64 ALUs read the corresponding element in the matrix A from the LDS according to the current address of the register of M0 in parallel;
//-----------------------------------------
// Calculate the first row of Matrix C (first row of calculation Matrix C):
// C (0,0) is calculated on ALU _ Index0 ALU _ Index ═ 0(ALU0 calculates C (0,0)).
// C (0,1) is calculated on ALU _ Index1 ALU _ Index ═ 1(ALU1 calculates C (0,0)).
//......
The execution instruction for each ALU to calculate a corresponding element in the first row of the matrix C is as follows:
Block_Start::C(0,ALU_Index)=LDS_Direct(M0_register)*B(0,ALU_Index);
C(0,ALU_Index)=LDS_Direct(M0_register)*B(1,ALU_Index)+C(0,ALU_Index);
C(0,ALU_Index)=LDS_Direct(M0_register)*B(2,ALU_Index)+C(0,ALU_Index);
C(0,ALU_Index)=LDS_Direct(M0_register)*B(3,ALU_Index)+C(0,ALU_Index);
C(0,ALU_Index)=LDS_Direct(M0_register)*B(4,ALU_Index)+C(0,ALU_Index);
......
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+
C(0,ALU_Index);
//-----------------------------------------
// Calculate the second row of Matrix C (second row of calculation Matrix C):
// C (1,0) is calculated on ALU _ Index0(ALU0 calculates C (1,0)).
// C (1,1) is calculated on ALU _ Index1(ALU1 calculates C (1,1)).
//......
The execution instruction for each ALU to compute a corresponding element in the second row of the matrix C is as follows:
Block_Start::C(1,ALU_Index)=LDS_Direct(M0_register)*B(0,ALU_Index);
C(1,ALU_Index)=LDS_Direct(M0_register)*B(1,ALU_Index)+C(1,ALU_Index);
C(1,ALU_Index)=LDS_Direct(M0_register)*B(2,ALU_Index)+C(1,ALU_Index);
C(1,ALU_Index)=LDS_Direct(M0_register)*B(3,ALU_Index)+C(1,ALU_Index);
C(1,ALU_Index)=LDS_Direct(M0_register)*B(4,ALU_Index)+C(1,ALU_Index);
......
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+
C(1,ALU_Index);
......
//-----------------------------------------
// Calculate the last row of Matrix C:
// C (63,0) is calculated on ALU _ Index0(ALU0 calculates C (63,0)).
// C (63,1) is calculated on ALU _ Index1(ALU1 calculates C (63,1)).
//......
The execution instruction for each ALU to compute a corresponding element in the last row of the matrix C is as follows:
Block_Start::C(63,ALU_Index)=LDS_Direct(M0_register)*B(0,ALU_Index);
C(63,ALU_Index)=LDS_Direct(M0_register)*B(1,ALU_Index)+C(63,ALU_Index);
C(63,ALU_Index)=LDS_Direct(M0_register)*B(2,ALU_Index)+C(63,ALU_Index);
C(63,ALU_Index)=LDS_Direct(M0_register)*B(3,ALU_Index)+C(63,ALU_Index);
......
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+
C(63,ALU_Index);
referring to the above instruction table, it can be seen that: each row of the computation matrix C requires 64 instructions, so the total number of instructions is 64x64 — 4096. Each instruction is executed once on each thread, so the total number of executions is 64x64x 64. The first instruction for calculating the instruction block for each row of matrix C is, for example, as follows:
C(63,ALU_Index)=LDS_Direct(M0_register)*B(0,ALU_Index);
there is one VGPR read and one VGPR write, and such an instruction occurs a total of 64 times, so there are 64x64 reads and 64x64 writes. Other lines of the instruction block, for example, are as follows:
C(63,ALU_Index)=LDS_Direct(M0_register)*B(1,ALU_Index)+C(63,ALU_Index);
there are two VGPR reads and one VGPR write, and this instruction occurs 63x64 times in total, so there are 2x63x64x64 reads and 63x64x64 writes. A summary of the reading and writing of VPGR is shown in Table 3.
TABLE 3
Figure BDA0002320768500000171
Figure BDA0002320768500000181
By using the explicit vector pass-through technique provided by the embodiment of the present application, the instruction for calculating the matrix C is as follows:
M0_register=start_address;
//-----------------------------------------
// Calculate the first row of Matrix C (first row of calculation Matrix C):
//C(0,0)is calculated on ALU_Index0:ALU_Index=0.
//C(0,1)is calculated on ALU_Index1:ALU_Index=1.
//......
//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
......
Block_End::C(0,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
//-----------------------------------------
// Calculate the second row of Matrix C (second row of calculation Matrix C):
//C(1,0)is calculated on ALU_Index0.
//C(1,1)is calculated on ALU_Index1.
//...........
//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
......
Block_End::C(1,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
......
//-----------------------------------------
// Calculate the last row of Matrix C (the last row of Matrix C):
//C(63,0)is calculated on ALU_Index0.
//C(63,1)is calculated on ALU_Index1.
//...........
//-----------------------------------------
Block_Start::Forwarding=LDS_Direct(M0_register)*B(0,ALU_Index);
Forwarding=LDS_Direct(M0_register)*B(1,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(2,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(3,ALU_Index)+Forwarding;
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
......
Block_End::C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
referring to the above instruction table, it can be seen that: each row of the computation matrix C requires 64 instructions, so the total number of instructions is 64x64 — 4096. Each instruction is executed once on each thread, so the total number of executions is 64x64x 64. The last instruction of the instruction block for each row of the computation matrix C is, for example, as follows:
C(63,ALU_Index)=LDS_Direct(M0_register)*B(63,ALU_Index)+Forwarding;
here, there is one VGPR read and one VGPR write, and such an instruction occurs a total of 64 times, so there are 64x64 reads and 64x64 writes. Other lines of the instruction block, for example, are as follows:
Forwarding=LDS_Direct(M0_register)*B(4,ALU_Index)+Forwarding;
here, there is one VGPR read, and such an instruction occurs a total of 64 times, thus a total of 63x64x64 reads. A summary of the reading and writing of VPGR is shown in Table 4.
TABLE 4
Figure BDA0002320768500000191
In summary, in the typical matrix multiplication example described above, the number of VGPR reads and writes is from 3x2 using the explicit vector pass technique of the present application 18 Optimization to 2 18 And is reduced to about 1/3, so that a great deal of energy consumption can be saved.
As shown in fig. 6, an embodiment of the present application further provides an instruction generating apparatus 100, including: a determination module 110, a generation module 120, and a transmission module 130.
A determining module 110, configured to determine that the instruction execution unit supports data pass-through.
A generating module 120, configured to, when an instruction is generated, if it is necessary to use destination data of an ith instruction as a source operand of an (i + j) th instruction, set, in the ith instruction, a first identifier for indicating to write the destination data of the ith instruction into a through path, and set, in the (i + j) th instruction, a second identifier for indicating to obtain the source operand from the through path; i is a positive integer.
A sending module 130, configured to send the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes a result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and obtains a required source operand from the through path according to the second identifier when executing the (i + j) th instruction.
Optionally, the sending module 130 is configured to splice the ith instruction and the (i + j) th instruction according to a generation order to obtain an instruction block; sending the instruction block to a decoder, so that the decoder sequentially acquires first key information in the ith instruction from the instruction block and sends the first key information to the instruction execution unit, acquires second key information in the (i + j) th instruction and sends the second key information to the instruction execution unit, wherein the first key information comprises the first identifier, and the second key information comprises the second identifier.
The instruction generating apparatus 100 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to corresponding contents in the foregoing method embodiments for parts that are not mentioned in the apparatus embodiments.
As shown in fig. 7, fig. 7 is a block diagram illustrating a structure of a processor 200 according to an embodiment of the present disclosure. The processor 200 includes: a processor core 210 (kernel), a decoder 220, and an instruction execution unit 230. The processor core 210, decoder 220, and instruction execution unit 230 are connected by a bus interconnect.
The processor core 210 has program code embodied therein, which when executed, generates instructions, and, accordingly, the processor core 210 is configured to determine that the instruction execution unit 230 supports data pass-through; when an instruction is generated, if the destination data of the ith instruction is required to be used as the source operand of the (i + j) th instruction, setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the ith instruction, and setting a second identifier for indicating that the source operand is obtained from the through path in the (i + j) th instruction; i is a positive integer; and is further configured to send the ith instruction and the (i + j) th instruction to the instruction execution unit 230; wherein i and j are positive integers.
The instruction execution unit 230 is configured to, when the ith instruction is executed, write a result of the ith instruction into a pass-through path according to the first identifier, and, when the (i + j) th instruction is executed, obtain a required source operand from the pass-through path according to the second identifier.
Optionally, the processor core is further configured to splice the ith instruction and the (i + j) th instruction according to a generation sequence to obtain an instruction block; the block of instructions is sent to the decoder 220. Correspondingly, the decoder 220 is configured to sequentially obtain first key information in the ith instruction from the instruction block, send the first key information to the instruction execution unit 230, obtain second key information in the (i + j) th instruction, and send the second key information to the instruction execution unit 230. Wherein the first key information includes the first identifier, and the second key information includes the second identifier.
In addition, the decoder 220 is further configured to obtain an instruction to be executed, and obtain key information in the instruction to be executed, where the key information includes: source operand address information and destination address information, wherein the source operand address information is used for indicating the source of a source operand, and the destination address information is used for indicating a writing path of destination data; and also for issuing the critical information to the instruction execution unit 230. Accordingly, the instruction execution unit 230 is configured to: judging whether the source operand indicated by the source operand address information is from a through path or not; when the source operand indicated by the source operand address information is from a through path, acquiring a required source operand from the through path; judging whether a write path of the destination data indicated by the destination address information is the through path; and when the write path of the destination data indicated by the destination address information is the through path, writing the result of executing the instruction to be executed into the through path.
Optionally, the instruction execution unit 230 is configured to determine whether the source operand indicated by the source operand address information originates from a direct path by determining whether the source operand address information contains a second identifier; when the source operand address information contains the second identification, the source operand indicated by the source operand address information is characterized to be originated from a through path. Optionally, the instruction execution unit 230 is configured to determine whether a write path of destination data indicated by the destination address information is the through path by determining whether the destination address information contains a first identifier; and when the destination address information contains the first identifier, representing that a write path of destination data indicated by the destination address information is the through path.
The processor 200 may be an integrated circuit chip having signal processing capability. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 200 may be any conventional processor or the like.
The embodiment of the application also provides electronic equipment comprising the processor, and the electronic equipment can be equipment such as a computer and a server.
The present embodiment also provides a non-volatile computer-readable storage medium (hereinafter referred to as a storage medium), where the storage medium stores a computer program, and when the computer program is executed by the processor 200, the computer program executes the steps included in the instruction generating method and the instruction executing method in the above embodiments.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. An instruction generation method, the method comprising:
determining that the instruction execution unit supports data pass-through;
when an instruction is generated, if the destination data of an ith instruction is required to be used as a source operand of an (i + j) th instruction, setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the destination address information of the ith instruction, and setting a second identifier for indicating that the source operand is obtained from the through path in the source operand address information of the (i + j) th instruction; wherein i and j are positive integers, and the maximum value of j does not exceed the maximum stage number of data straight-through supported by hardware;
sending the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes the result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and acquires a required source operand from the through path according to the second identifier when executing the (i + j) th instruction.
2. The method of claim 1, wherein sending the ith instruction and the (i + j) th instruction to the instruction execution unit comprises:
splicing the ith instruction and the (i + j) th instruction according to a generation sequence to obtain an instruction block;
sending the instruction block to a decoder, so that the decoder sequentially acquires first key information in the ith instruction from the instruction block and sends the first key information to the instruction execution unit, acquires second key information in the (i + j) th instruction and sends the second key information to the instruction execution unit, wherein the first key information comprises the first identifier, and the second key information comprises the second identifier.
3. An instruction execution method, the method comprising:
acquiring an instruction to be executed;
obtaining key information in the instruction to be executed, wherein the key information comprises: source operand address information and destination address information, wherein the source operand address information is used for indicating the source of a source operand, and the destination address information is used for indicating a writing path of destination data;
judging whether a source operand indicated by the source operand address information is from a through path or not by judging whether the source operand address information contains a second identifier or not, wherein when the source operand address information contains the second identifier, the source operand indicated by the source operand address information is represented to be from the through path, and the second identifier is used for indicating that the source operand is obtained from the through path;
when the source operand indicated by the source operand address information is from a through path, acquiring a required source operand from the through path;
judging whether a write path of destination data indicated by the destination address information is the through path or not by judging whether the destination address information contains a first identifier, wherein when the destination address information contains the first identifier, the write path of the destination data indicated by the destination address information is characterized as the through path, and the first identifier is used for indicating that the destination data of an instruction is written into the through path;
and when the write path of the destination data indicated by the destination address information is the through path, writing the result of executing the instruction to be executed into the through path.
4. A processor, comprising:
an instruction execution unit;
a processor core to determine that the instruction execution unit supports data pass-through; when an instruction is generated, if the destination data of the ith instruction is required to be used as the source operand of the (i + j) th instruction, setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the destination address information of the ith instruction, and setting a second identifier for indicating that the source operand is obtained from the through path in the source operand address information of the (i + j) th instruction; i is a positive integer; and the instruction execution unit is also used for sending the ith instruction and the (i + j) th instruction to the instruction execution unit; wherein i and j are positive integers, and the maximum value of j does not exceed the maximum stage number of data straight-through supported by hardware;
the instruction execution unit is configured to, when the ith instruction is executed, write a result of the ith instruction into a pass-through path according to the first identifier, and, when the (i + j) th instruction is executed, obtain a required source operand from the pass-through path according to the second identifier.
5. The processor of claim 4, further comprising a decoder, wherein the processor core is configured to splice the ith instruction and the (i + j) th instruction according to a generation order to obtain an instruction block, and send the instruction block to the decoder;
the decoder is configured to sequentially acquire first key information in the ith instruction from the instruction block, send the first key information to the instruction execution unit, acquire second key information in the (i + j) th instruction, and send the second key information to the instruction execution unit, where the first key information includes the first identifier, and the second key information includes the second identifier.
6. A processor, comprising:
a decoder, configured to obtain an instruction to be executed and obtain key information in the instruction to be executed, where the key information includes: source operand address information and destination address information, wherein the source operand address information is used for indicating the source of a source operand, and the destination address information is used for indicating a writing path of destination data; and further for sending the key information to an instruction execution unit;
the instruction execution unit is configured to determine whether a source operand indicated by the source operand address information is from a pass-through path by determining whether the source operand address information includes a second identifier, where when the source operand address information includes the second identifier, the source operand indicated by the source operand address information is represented as originating from the pass-through path, and the second identifier is used to indicate that a source operand is obtained from the pass-through path; when the source operand indicated by the source operand address information is from a through path, acquiring a required source operand from the through path; and further determining whether a write path of destination data indicated by the destination address information is the through path by determining whether the destination address information contains a first identifier, wherein when the destination address information contains the first identifier, the write path of the destination data indicated by the destination address information is characterized as the through path, and the first identifier is used for indicating that destination data of an instruction is written into the through path; and when the write path of the destination data indicated by the destination address information is the through path, writing the result of executing the instruction to be executed into the through path.
7. An electronic device, comprising: a processor according to claim 4 or 5 or a processor according to claim 6.
8. An instruction generating apparatus, the apparatus comprising:
the determining module is used for determining that the instruction execution unit supports data direct connection;
the generating module is used for setting a first identifier for indicating that the destination data of the ith instruction is written into a through path in the destination address information of the ith instruction and setting a second identifier for indicating that a source operand is obtained from the through path in the source operand address information of the (i + j) th instruction if the destination data of the ith instruction is required to be used as the source operand of the (i + j) th instruction when the instruction is generated; i is a positive integer, j is a positive integer, and the maximum value of j does not exceed the maximum stage number of data straight-through supported by hardware;
a sending module, configured to send the ith instruction and the (i + j) th instruction to the instruction execution unit, so that the instruction execution unit writes a result of the ith instruction into a through path according to the first identifier when executing the ith instruction, and obtains a required source operand from the through path according to the second identifier when executing the (i + j) th instruction.
CN201911300243.6A 2019-12-16 2019-12-16 Instruction generation method and device, instruction execution method, processor and electronic equipment Active CN111124492B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911300243.6A CN111124492B (en) 2019-12-16 2019-12-16 Instruction generation method and device, instruction execution method, processor and electronic equipment
PCT/CN2020/114002 WO2021120712A1 (en) 2019-12-16 2020-09-08 Instruction generation method and apparatus, instruction execution method, processor, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911300243.6A CN111124492B (en) 2019-12-16 2019-12-16 Instruction generation method and device, instruction execution method, processor and electronic equipment

Publications (2)

Publication Number Publication Date
CN111124492A CN111124492A (en) 2020-05-08
CN111124492B true CN111124492B (en) 2022-09-20

Family

ID=70498193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911300243.6A Active CN111124492B (en) 2019-12-16 2019-12-16 Instruction generation method and device, instruction execution method, processor and electronic equipment

Country Status (2)

Country Link
CN (1) CN111124492B (en)
WO (1) WO2021120712A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124492B (en) * 2019-12-16 2022-09-20 成都海光微电子技术有限公司 Instruction generation method and device, instruction execution method, processor and electronic equipment
CN112182496B (en) * 2020-09-24 2022-09-16 成都海光集成电路设计有限公司 Data processing method and device for matrix multiplication
CN114968358A (en) * 2020-10-21 2022-08-30 上海壁仞智能科技有限公司 Apparatus and method for configuring cooperative thread bundle in vector computing system
CN112199119B (en) * 2020-10-21 2022-02-01 上海壁仞智能科技有限公司 Vector operation device
CN112506567B (en) * 2020-11-27 2022-11-04 海光信息技术股份有限公司 Data reading method and data reading circuit
CN112559045B (en) * 2020-12-23 2022-09-16 中国电子科技集团公司第五十八研究所 RISCV-based random instruction generation platform and method

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5974538A (en) * 1997-02-21 1999-10-26 Wilmot, Ii; Richard Byron Method and apparatus for annotating operands in a computer system with source instruction identifiers
US6542982B2 (en) * 2000-02-24 2003-04-01 Hitachi, Ltd. Data processer and data processing system
US20080148020A1 (en) * 2006-12-13 2008-06-19 Luick David A Low Cost Persistent Instruction Predecoded Issue and Dispatcher
US7921279B2 (en) * 2008-03-19 2011-04-05 International Business Machines Corporation Operand and result forwarding between differently sized operands in a superscalar processor
CN101477456B (en) * 2009-01-14 2011-06-08 北京大学深圳研究生院 Self-correlated arithmetic unit and processor
US9424041B2 (en) * 2013-03-15 2016-08-23 Samsung Electronics Co., Ltd. Efficient way to cancel speculative ‘source ready’ in scheduler for direct and nested dependent instructions
CN104216681B (en) * 2013-05-31 2018-02-13 华为技术有限公司 A kind of cpu instruction processing method and processor
CN103455454B (en) * 2013-09-02 2016-09-07 华为技术有限公司 A kind of method and apparatus controlling memory startup
CN104516726B (en) * 2013-09-27 2018-08-07 联想(北京)有限公司 A kind of method and device of instruction processing
CN104536914B (en) * 2014-10-15 2017-08-11 中国航天科技集团公司第九研究院第七七一研究所 The associated processing device and method marked based on register access
US10108417B2 (en) * 2015-08-14 2018-10-23 Qualcomm Incorporated Storing narrow produced values for instruction operands directly in a register map in an out-of-order processor
US20170315812A1 (en) * 2016-04-28 2017-11-02 Microsoft Technology Licensing, Llc Parallel instruction scheduler for block isa processor
CN110058884B (en) * 2019-03-15 2021-06-01 佛山市顺德区中山大学研究院 Optimization method, system and storage medium for computational storage instruction set operation
CN111124492B (en) * 2019-12-16 2022-09-20 成都海光微电子技术有限公司 Instruction generation method and device, instruction execution method, processor and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于提前写回策略的数据转发优化方法;蔡卫光等;《浙江大学学报(工学版)》;20100115;第44卷(第01期);81-86 *

Also Published As

Publication number Publication date
WO2021120712A8 (en) 2021-08-05
WO2021120712A1 (en) 2021-06-24
CN111124492A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111124492B (en) Instruction generation method and device, instruction execution method, processor and electronic equipment
JP5647859B2 (en) Apparatus and method for performing multiply-accumulate operations
US8918627B2 (en) Multithreaded processor with multiple concurrent pipelines per thread
US9367319B2 (en) System and method for a multi-schema branch predictor
US6948051B2 (en) Method and apparatus for reducing logic activity in a microprocessor using reduced bit width slices that are enabled or disabled depending on operation width
US7694112B2 (en) Multiplexing output from second execution unit add/saturation processing portion of wider width intermediate result of first primitive execution unit for compound computation
US8577948B2 (en) Split path multiply accumulate unit
KR101105474B1 (en) Instruction and logic for performing range detection
CN104035895A (en) Apparatus and Method for Memory Operation Bonding
US5053986A (en) Circuit for preservation of sign information in operations for comparison of the absolute value of operands
US5119324A (en) Apparatus and method for performing arithmetic functions in a computer system
CN101438236A (en) Method and system to combine corresponding half word units from multiple register units within a microprocessor
JP5326314B2 (en) Processor and information processing device
Knofel Fast hardware units for the computation of accurate dot products
CN110377339B (en) Long-delay instruction processing apparatus, method, and device, and readable storage medium
US20120102496A1 (en) Reconfigurable processor and method for processing a nested loop
CN110688153B (en) Instruction branch execution control method, related equipment and instruction structure
JP7461953B2 (en) Mixed-precision processing unit
US10289386B2 (en) Iterative division with reduced latency
US8700887B2 (en) Register, processor, and method of controlling a processor using data type information
CN113407154A (en) Vector calculation device and method
KR20150089570A (en) Method and apparatus of dynamic analysis
CN117008977B (en) Instruction execution method, system and computer equipment with variable execution period
US11966619B2 (en) Background processing during remote memory access
US9164770B2 (en) Automatic control of multiple arithmetic/logic SIMD units

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 300450 Tianjin Binhai New Area Huayuan Industrial Zone Haitai West Road 18 North 2-204 Industrial Incubation-3-8

Applicant after: Haiguang Information Technology Co.,Ltd.

Address before: 1809-1810, block B, blue talent port, No.1, Intelligent Island Road, high tech Zone, Qingdao, Shandong Province

Applicant before: HAIGUANG INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20210331

Address after: 610000 China (Sichuan) pilot Free Trade Zone, Chengdu high tech Zone

Applicant after: CHENGDU HAIGUANG MICROELECTRONICS TECHNOLOGY Co.,Ltd.

Address before: Industrial incubation-3-8, North 2-204, No. 18, Haitai West Road, Huayuan Industrial Zone, Binhai New Area, Tianjin 300450

Applicant before: Haiguang Information Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant