CN116530092A - Visual sensor chip, method and device for operating visual sensor chip - Google Patents

Visual sensor chip, method and device for operating visual sensor chip Download PDF

Info

Publication number
CN116530092A
CN116530092A CN202080104370.3A CN202080104370A CN116530092A CN 116530092 A CN116530092 A CN 116530092A CN 202080104370 A CN202080104370 A CN 202080104370A CN 116530092 A CN116530092 A CN 116530092A
Authority
CN
China
Prior art keywords
bit
data
light intensity
preset
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080104370.3A
Other languages
Chinese (zh)
Inventor
董思维
刘闯闯
方舒
方运潭
陈褒扬
刘畅
张慧敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116530092A publication Critical patent/CN116530092A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/95Computational photography systems, e.g. light-field imaging systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Devices (AREA)

Abstract

A vision sensor chip (200), comprising: and a pixel array circuit (210) for generating at least one data signal corresponding to a pixel in the pixel array circuit (210) by measuring an amount of change in light intensity, the at least one data signal being indicative of a light intensity change event, the light intensity change event being indicative of the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit (210) exceeding a predetermined threshold. A read circuit (220), the read circuit (220) being coupled to the pixel array circuit (210) for reading at least one data signal from the pixel array circuit (210) in a first event representation. The read circuit (220) is further configured to provide at least one data signal to the control circuit (230). The reading circuit (220) is further configured to switch to reading the at least one data signal from the pixel array circuit (210) in a second event representation upon receiving a switch signal generated based on the at least one data signal from the control circuit (230).

Description

Visual sensor chip, method and device for operating visual sensor chip Technical Field
The present application relates to the field of computers, and more particularly, to a vision sensor chip, a method of operating a vision sensor chip, and an apparatus.
Background
The visual sensing technology has wide application in the fields of video monitoring, digital cameras, robot navigation, automobile autonomous navigation, biomedical pixel analysis, human-computer interfaces, virtual reality, industrial control, wireless remote sensing, microscope technology, scientific instruments and the like. By using the optical element and the imaging device, the vision sensor can acquire image information from the external environment and realize operations such as image processing, image storage, image output, and the like.
Many different types of vision sensors have emerged over decades of development. For example, for a biomimetic vision sensor, the principle is to simulate a biological retina with an integrated circuit, each pixel in a pixel array circuit simulates a biological neuron, and the change of light intensity is expressed in the form of an event. In practice, two event expressions are included, which represent events using light intensity information and events using polarity information. Currently, due to hardware design, chip architecture, etc., a single vision sensor can only use one event representation. However, it is difficult for a visual sensor employing a single event representation to accommodate various environmental changes and motion states, resulting in poor performance of the visual sensor in certain application scenarios.
Disclosure of Invention
The embodiment of the application provides an image processing method and device, which are used for obtaining a clearer image.
In a first aspect, the present application provides a switching method applied to an electronic device, where the electronic device includes an RGB sensor and a motion sensor, where the RGB (red green bule) sensor is configured to collect an image in a shooting range, and the motion sensor is configured to collect information generated when an object moves relative to the motion sensor in a detection range of the motion sensor, and the method includes: at least one of the RGB sensor and the motion sensor is selected based on scene information including at least one of status information of the electronic device, a type of an application program requesting acquisition of an image in the electronic device, or environmental information, and data is acquired through the selected sensor.
Therefore, in the embodiment of the application, different sensors in the electronic equipment can be selectively started according to different scenes, so that the electronic equipment adapts to more scenes and has strong generalization capability. And moreover, the corresponding sensors can be started according to the actual scene, all the sensors are not required to be started, and the power consumption of the electronic equipment is reduced.
In a possible implementation manner, the state information includes a remaining power and a remaining memory of the electronic device; the environmental information includes a change value of illumination intensity in a photographing range of the color RGB sensor and the motion sensor or information of a moving object in the photographing range.
Therefore, in the embodiment of the application, the started sensor can be selected according to the state or the environmental information of the electronic equipment, so that the electronic equipment is suitable for more scenes and has strong generalization capability.
In addition, in the following different embodiments, the activated sensor may be different, and when the mentioned sensor collects data, that is, the sensor is turned on, the description is omitted below.
In a second aspect, the present application provides a vision sensor chip, which may include: and a pixel array circuit for generating at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of change in light intensity, the at least one data signal being indicative of a light intensity change event, the light intensity change event indicating that the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. And a reading circuit coupled to the pixel array circuit for reading the at least one data signal from the pixel array circuit in a first event representation. The reading circuit is also used for providing at least one data signal to the control circuit. And a reading circuit for switching to reading the at least one data signal from the pixel array circuit in a second event representation upon receiving a switching signal generated based on the at least one data signal from the control circuit. According to the first aspect, the visual sensor can adaptively switch between two event representation modes, so that the reading data rate is always kept not to exceed a preset reading data rate threshold value, the cost of data transmission, analysis and storage of the visual sensor is reduced, and the performance of the sensor is remarkably improved. In addition, such visual sensors can make data statistics on events generated over a period of time for predicting the likely event generation rate over the next period of time, thus enabling selection of a reading mode that is more appropriate for the current external environment, application scenario, and motion state.
In a possible embodiment, the first event is represented by a polarity information, and the pixel array circuit may include a plurality of pixels, each of which may include a threshold comparing unit for outputting the polarity information indicating whether the light intensity variation amount is increased or decreased when the light intensity variation amount exceeds a predetermined threshold. And the reading circuit is specifically used for reading the polarity information output by the threshold value comparison unit. In this embodiment, the first event is represented by polarity information, the polarity information is usually represented by 1-2 bits, the carried information is less, the problem that the vision sensor will face an event burst when large-area object movement or light intensity fluctuation (such as a scene of entering and exiting a tunnel portal, a room switch lamp and the like) occurs due to large data volume is avoided, and the event loss is avoided when the preset maximum bandwidth (hereinafter referred to as bandwidth) of the vision sensor is fixed.
In a possible implementation manner, the first event is represented by light intensity information, the pixel array may include a plurality of pixels, each pixel may include a threshold comparing unit, a readout control unit and a light intensity collecting unit, and a light intensity detecting unit for outputting an electrical signal corresponding to the light signal irradiated thereon, where the electrical signal is used to indicate the light intensity. And the threshold comparison unit is used for outputting a first signal when the light intensity conversion amount exceeds a preset threshold value according to the electric signal. And the reading control unit is used for responding to the received first signal and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the first signal receiving moment. The reading circuit is particularly used for reading the electric signals cached by the light intensity acquisition unit. In this embodiment, the first event is represented by the light intensity information. When the transmitted data amount does not exceed the bandwidth limit, the event is represented by adopting the light intensity information, and the light intensity information is usually represented by multiple bits, for example, 8-12 bits, and compared with the polarity information, the light intensity information can carry more information, thereby being beneficial to processing and analyzing the event, for example, improving the quality of image reconstruction.
In a possible implementation, the control circuit is further configured to: the statistical data is determined based on at least one data signal received from the read circuit. And if the statistical data is determined to meet the preset conversion condition, transmitting a conversion signal to the reading circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip. In this embodiment, a way of converting the two event expressions is given, and conversion conditions are obtained according to the amount of data to be transmitted. For example, when the transmitted data volume is large, the event is represented by the polarity information, so that the complete transmitted data volume is ensured, and the situation that the event data cannot be read out and the event is lost is avoided. When the transmitted data volume is smaller, the method is switched to the mode that the light intensity information is used for representing the event, so that the transmitted event can carry more information, processing and analysis of the event are facilitated, and the quality of image reconstruction can be improved.
In a possible embodiment, the first event representation is representing an event by the light intensity information, the second event representation is representing an event by the polarity information, the predetermined conversion condition is that the total amount of data read from the pixel array circuit by the first event representation is greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal. In this embodiment, a specific condition is given that the event is switched from being represented by the light intensity information to being represented by the polarity information. When the transmitted data quantity is larger than the preset bandwidth, the event is represented by the polarity information, so that the complete transmitted data quantity is ensured, and the situation that event data cannot be read out and the event is lost is avoided.
In a possible embodiment, the first event representation is an event represented by a polarity information, the second event representation is an event represented by a light intensity information, the predetermined conversion condition is that if at least one data signal is read from the pixel array circuit by the second event representation, the total data amount read is not greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is not greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal. In this embodiment, a specific condition is given that the event is switched from being represented by the polarity information to being represented by the light intensity information. When the transmitted data quantity is not greater than the preset bandwidth, the method is switched to the mode that the transmitted event is represented by the light intensity information, so that the transmitted event can carry more information, processing and analysis of the event are facilitated, and the quality of image reconstruction can be improved.
In a third aspect, the present application provides a decoding circuit, which may include: and a reading circuit for reading the data signal from the vision sensor chip. And the decoding circuit is used for decoding the data signal according to the first decoding mode. And the decoding circuit is also used for decoding the data signal according to a second decoding mode when receiving the conversion signal from the control circuit. A decoding circuit provided in a third aspect corresponds to a vision sensor chip provided in a second aspect, and is configured to decode a data signal output by the vision sensor chip provided in the second aspect. A decoding circuit provided in the third aspect may switch different decoding modes for different event representation modes.
In a possible implementation, the control circuit is further configured to: the statistical data is determined based on the data signal read from the reading circuit. And if the statistical data is determined to meet the preset conversion condition, transmitting a conversion signal to the coding circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
In a possible embodiment, the first decoding mode is to decode the data signal according to a first bit corresponding to a first event representation mode, the first event representation mode is to represent the event by the light intensity information, the second decoding mode is to decode the data signal according to a second bit corresponding to a second event representation mode, the second event representation mode is to represent the event by polarity information, the polarity information is used to indicate that the light intensity variation is enhanced or reduced, the conversion condition is that the total data amount decoded according to the first decoding mode is greater than a preset bandwidth, or the predetermined conversion condition is that the number of the data signals is greater than a ratio of the preset bandwidth to the first bit, and the first bit is a preset bit of a data format of the data signal.
In a possible embodiment, the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means, the first event representation means represents an event by polarity information indicating that the amount of change in light intensity is increased or decreased, the second decoding means decodes the data signal by a second bit corresponding to a second event representation means, the second event representation means represents the event by light intensity information, the conversion condition is that if the data signal is decoded according to the second decoding means, the total data amount is not greater than a preset bandwidth, or the predetermined conversion condition is that the number of data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
In a fourth aspect, the present application provides a method of operating a vision sensor chip, which may include: at least one data signal corresponding to a pixel in the pixel array circuit is generated by measuring the amount of light intensity variation by the pixel array circuit of the vision sensor chip, the at least one data signal indicating a light intensity variation event, the light intensity variation event indicating that the amount of light intensity variation measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. At least one data signal is read from the pixel array circuit by a read circuit of the vision sensor chip in a first event representation. At least one data signal is provided to the control circuit of the vision sensor chip by the reading circuit. When a transition signal generated based on the at least one data signal is received from the control circuit by the read circuit, transition is made to reading the at least one data signal from the pixel array circuit in a second event representation.
In one possible implementation, the first event is represented by the polarity information, the pixel array circuit may include a plurality of pixels, each pixel may include a threshold comparing unit, and reading at least one data signal from the pixel array circuit in the first event representation by the reading circuit of the vision sensor chip may include: when the light intensity conversion amount exceeds a predetermined threshold value, polarity information is outputted by the threshold value comparing unit, the polarity information being used for indicating whether the light intensity conversion amount is increased or decreased. The polarity information output by the threshold value comparing unit is read by a reading circuit.
In one possible implementation, the first event is represented by light intensity information, the pixel array may include a plurality of pixels, each pixel may include a threshold comparing unit, a readout control unit and a light intensity collecting unit, and reading at least one data signal from the pixel array circuit by the reading circuit of the vision sensor chip in the first event may include: and outputting an electric signal corresponding to the light signal irradiated on the light source through the light intensity acquisition unit, wherein the electric signal is used for indicating the light intensity. When the electric signal determines that the light intensity conversion amount exceeds a preset threshold value, a first signal is output through a threshold value comparison unit. And in response to the first signal, the reading control unit instructs the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the first signal receiving time. And reading the electric signals cached by the light intensity acquisition unit through a reading circuit.
In one possible embodiment, the method may further comprise: the statistical data is determined based on at least one data signal received from the read circuit. And if the statistical data is determined to meet the preset conversion condition, transmitting a conversion signal to the reading circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
In a possible embodiment, the first event representation is representing an event by the light intensity information, the second event representation is representing an event by the polarity information, the predetermined conversion condition is that the total amount of data read from the pixel array circuit by the first event representation is greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal.
In a possible embodiment, the first event representation is an event represented by a polarity information, the second event representation is an event represented by a light intensity information, the predetermined conversion condition is that if at least one data signal is read from the pixel array circuit by the second event representation, the total data amount read is not greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is not greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal.
In a fifth aspect, the present application provides a decoding method, including: reading a data signal from the vision sensor chip by a reading circuit; decoding the data signal according to the first decoding mode by a decoding circuit; when the switching signal is received from the control circuit, the data signal is decoded by the decoding circuit according to the second decoding mode.
In one possible embodiment, the method further comprises: determining statistical data based on the data signal read from the reading circuit; and if the statistical data is determined to meet the preset conversion condition, transmitting a conversion signal to the coding circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
In a possible embodiment, the first decoding mode is to decode the data signal according to a first bit corresponding to a first event representation mode, the first event representation mode is to represent the event by the light intensity information, the second decoding mode is to decode the data signal according to a second bit corresponding to a second event representation mode, the second event representation mode is to represent the event by polarity information, the polarity information is used to indicate that the light intensity variation is enhanced or reduced, the conversion condition is that the total data amount decoded according to the first decoding mode is greater than a preset bandwidth, or the predetermined conversion condition is that the number of the data signals is greater than a ratio of the preset bandwidth to the first bit, and the first bit is a preset bit of a data format of the data signal.
In a possible embodiment, the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means, the first event representation means represents an event by polarity information indicating that the amount of change in light intensity is increased or decreased, the second decoding means decodes the data signal by a second bit corresponding to a second event representation means, the second event representation means represents the event by light intensity information, the conversion condition is that if the data signal is decoded according to the second decoding means, the total data amount is not greater than a preset bandwidth, or the predetermined conversion condition is that the number of data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
In a sixth aspect, the present application provides a vision sensor chip, which may include: and a pixel array circuit for generating at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of change in light intensity, the at least one data signal being indicative of a light intensity change event, the light intensity change event indicating that the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. And the first coding unit is used for coding at least one data signal according to the first bit so as to obtain first coded data. The first coding unit is further configured to code at least one data signal according to a second bit indicated by the first control signal when the first control signal is received from the control circuit, where the first control signal is determined by the control circuit according to the first coded data. As can be seen from the solution provided in the sixth aspect, by dynamically adjusting the bit width of the light intensity characteristic information, when the event generation rate is small and the bandwidth limitation has not been reached, the event is encoded according to the maximum bit width quantization event, when the event generation rate is large, the bit width of the light intensity characteristic information is gradually reduced to satisfy the bandwidth limitation, and thereafter, if the event generation rate is reduced again, the bit width of the light intensity characteristic information can be increased without exceeding the bandwidth limitation. The visual sensor can adaptively switch between multiple event representations to better achieve the goal of transmitting all events with greater accuracy of representation.
In one possible embodiment, the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
In one possible embodiment, when the data amount of the first encoded data is not less than the bandwidth, the second bit indicated by the control signal is less than the first bit such that the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth. When the event generation rate is large, the bit width representing the light intensity characteristic information is gradually reduced to satisfy the bandwidth limitation.
In one possible embodiment, when the data amount of the first encoded data is smaller than the bandwidth, the second bit indicated by the control signal is larger than the first bit, and the total data amount of the at least one data signal encoded by the second bit is not larger than the bandwidth. If the rate of event generation becomes smaller, the bit width of the light intensity characteristic information can be increased on the premise of not exceeding the bandwidth limit, so that the purpose of transmitting all events with greater representation accuracy can be better realized.
In one possible embodiment, the pixel array may include N regions, the maximum bits of at least two regions of the N regions are different, the maximum bits represent a preset maximum bit for encoding at least one data signal generated by one region, and the first encoding unit is specifically configured to encode at least one data signal generated by a first region according to a first bit to obtain first encoded data, where the first bit is not greater than the maximum bit of the first region, and the first region is any one region of the N regions. The first coding unit is specifically configured to code, when receiving a first control signal from the control circuit, at least one data signal generated in the first area according to a second bit indicated by the first control signal, where the first control signal is determined by the control circuit according to the first coded data. In this embodiment, the pixel array may be further divided into regions, and the maximum bit widths of the different regions may be set by using different weights, so as to adapt to different interested regions in the scene, for example, a larger weight may be set in a region possibly including the target object, so that the accuracy of representing the event output by the region including the target object is higher, and a smaller weight may be set in the background region, so that the accuracy of representing the event output by the background region is lower.
In a possible implementation, the control circuit is further configured to: when it is determined that the total data amount of the at least one data signal encoded by the third bit is greater than the bandwidth and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth, the first control signal is transmitted to the first encoding unit, the third bit and the second bit differing by 1 bit unit. In such an embodiment, all events may be transmitted with greater accuracy of representation without exceeding bandwidth limitations.
In a seventh aspect, the present application provides a decoding apparatus, which may include: and a reading circuit for reading the data signal from the vision sensor chip. And a decoding circuit for decoding the data signal according to the first bit. And the decoding circuit is also used for decoding the data signal according to the second bit indicated by the first control signal when the first control signal is received from the control circuit. A decoding circuit provided in the seventh aspect corresponds to a vision sensor chip provided in the sixth aspect, and is configured to decode a data signal output by the vision sensor chip provided in the sixth aspect. The decoding circuit provided in the seventh aspect can dynamically adjust the decoding mode aiming at the coded bits adopted by the visual sensor.
In one possible embodiment, the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
In one possible embodiment, the second bit is smaller than the first bit when the total data amount of the data signal decoded from the first bit is not smaller than the bandwidth.
In one possible embodiment, the second bit is greater than the first bit when the total data amount of the data signal decoded from the first bit is less than the bandwidth, and the total data amount of the data signal decoded by the second bit is not greater than the bandwidth.
In one possible implementation manner, the reading circuit is specifically configured to read a data signal corresponding to a first area from the vision sensor chip, where the first area is any one of N areas that may be included in a pixel array of the vision sensor, and maximum bits of at least two areas in the N areas are different, where the maximum bits represent preset maximum bits for encoding at least one data signal generated in one area. The decoding circuit is specifically configured to decode the data signal corresponding to the first area according to the first bit.
In a possible implementation, the control circuit is further configured to: and when the total data amount of the data signal decoded by the third bit is determined to be larger than the bandwidth and the total data amount of the data signal decoded by the second bit is determined to be not larger than the bandwidth, transmitting a first control signal to the first coding unit, wherein the difference between the third bit and the second bit is 1 bit unit.
In an eighth aspect, the present application provides a method of operating a vision sensor chip, which may include: at least one data signal corresponding to a pixel in the pixel array circuit is generated by measuring the amount of light intensity variation by the pixel array circuit of the vision sensor chip, the at least one data signal indicating a light intensity variation event, the light intensity variation event indicating that the amount of light intensity variation measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. The first coding unit of the vision sensor chip codes at least one data signal according to the first bit to obtain first coded data. When a first control signal is received from a control circuit of the vision sensor chip through a first coding unit, at least one data signal is coded according to a second bit indicated by the first control signal, and the first control signal is determined by the control circuit according to the first coded data.
In one possible embodiment, the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
In one possible embodiment, when the data amount of the first encoded data is not less than the bandwidth, the second bit indicated by the control signal is less than the first bit such that the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth.
In one possible embodiment, when the data amount of the first encoded data is smaller than the bandwidth, the second bit indicated by the control signal is larger than the first bit, and the total data amount of the at least one data signal encoded by the second bit is not larger than the bandwidth.
In one possible embodiment, the pixel array may include N regions, a maximum bit of at least two regions of the N regions being different, the maximum bit representing a preset maximum bit encoding at least one data signal generated from one region, the encoding of the at least one data signal according to the first bit by the first encoding unit of the vision sensor chip may include: at least one data signal generated in a first area is encoded by a first encoding unit according to a first bit, so as to obtain first encoded data, wherein the first bit is not larger than the maximum bit of the first area, and the first area is any one area of N areas. When the first control signal is received from the control circuit of the vision sensor chip through the first coding unit, the coding of the at least one data signal according to the second bit indicated by the first control signal may include: when a first control signal is received from the control circuit through the first coding unit, at least one data signal generated by the first area is coded according to a second bit indicated by the first control signal, and the first control signal is determined by the control circuit according to the first coded data.
In one possible embodiment, the method may further include: when it is determined that the total data amount of the at least one data signal encoded by the third bit is greater than the bandwidth and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth, the first control signal is transmitted to the first encoding unit by the control circuit, the difference between the third bit and the second bit being 1 bit unit.
In a ninth aspect, the present application provides a decoding method, which may include: the data signal is read from the vision sensor chip by a reading circuit. The data signal is decoded by a decoding circuit according to the first bit. When the first control signal is received from the control circuit by the decoding circuit, the data signal is decoded according to the second bit indicated by the first control signal.
In one possible embodiment, the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
In one possible embodiment, the second bit is smaller than the first bit when the total data amount of the data signal decoded from the first bit is not smaller than the bandwidth.
In one possible embodiment, the second bit is greater than the first bit when the total data amount of the data signal decoded from the first bit is less than the bandwidth, and the total data amount of the data signal decoded by the second bit is not greater than the bandwidth.
In one possible implementation, reading the data signal from the vision sensor chip by the reading circuit may include: the data signals corresponding to the first area are read from the vision sensor chip through the reading circuit, the first area is any one area in N areas which can be included in the pixel array of the vision sensor, the maximum bits of at least two areas in the N areas are different, and the maximum bits represent preset maximum bits for encoding at least one data signal generated in one area. Decoding the data signal according to the first bit by the decoding circuit may include: and decoding the data signal corresponding to the first area according to the first bit by a decoding circuit.
In one possible embodiment, the method may further comprise: and when the total data amount of the data signal decoded by the third bit is determined to be larger than the bandwidth and the total data amount of the data signal decoded by the second bit is determined to be not larger than the bandwidth, transmitting a first control signal to the first coding unit, wherein the difference between the third bit and the second bit is 1 bit unit.
In a tenth aspect, the present application provides a variety of vision sensor chips, which may include: and a pixel array circuit for generating a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit by measuring the amount of light intensity variation, the plurality of data signals indicating at least one light intensity variation event, the at least one light intensity variation event indicating that the amount of light intensity variation measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. And a third encoding unit for encoding a first differential value according to a first preset bit, the first differential value being a difference between the light intensity conversion amount and a predetermined threshold value. Reducing the accuracy of event presentation, i.e., reducing the bit width of the presentation of the event, reduces the information that the event can carry, and in some scenarios is detrimental to the processing and analysis of the event. The manner of reducing the accuracy of event representation may not be applicable to all scenes, i.e. in some scenes, the event needs to be represented by a bit width of high bits, but the event represented by the bit width of high bits, although more data can be carried, the data volume is larger, and under the condition that the preset maximum bandwidth of the visual sensor is fixed, there may be a situation that the event data cannot be read out, so that the data is lost. The scheme provided in the tenth aspect adopts a mode of encoding the differential value, so that the cost of data transmission, analysis and storage of the visual sensor is reduced, the event can be transmitted with the highest precision, and the performance of the sensor is obviously improved.
In one possible embodiment, the pixel array circuit may include a plurality of pixels, each of which may include a threshold comparing unit for outputting polarity information indicating whether the light intensity variation exceeds a predetermined threshold, the polarity information indicating whether the light intensity variation is increased or decreased. And the third coding unit is also used for coding the polarity information according to the second preset bit. In this embodiment, polarity information may also be encoded, by which it is indicated whether the light intensity is increasing or decreasing, facilitating the acquisition of current light intensity information from the light intensity signal acquired from the last decoding and the polarity information.
In a possible embodiment, each pixel may include a light intensity detection unit, a readout control unit, and a light intensity acquisition unit, where the light intensity detection unit is configured to output an electrical signal corresponding to the light signal irradiated thereon, and the electrical signal is configured to indicate the light intensity. The threshold value comparing unit is specifically used for outputting polarity information when the light intensity conversion amount exceeds a preset threshold value according to the electric signal. And the readout control unit is used for responding to the received polarity signal, and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the polarity information receiving moment. The third coding unit is further configured to code the first electrical signal according to a third preset bit, where the first electrical signal is an electrical signal acquired by the light intensity acquisition unit and corresponding to the first receiving time of the polarity information, and the third preset bit is a maximum bit preset by the vision sensor and used for representing the characteristic information of the light intensity. After the initial state full quantity is encoded, the subsequent events only need to encode polarity information and a difference value between the light intensity variation and a preset threshold value, so that the encoded data quantity can be effectively reduced. Wherein full-scale encoding refers to encoding an event with a maximum bit width predefined by the visual sensor. In addition, the light intensity information of the current moment can be reconstructed in a lossless manner by using the light intensity information of the last event, the decoded polarity information and the decoded differential value.
In a possible embodiment, the third coding unit is further configured to: and encoding the electric signals acquired by the light intensity acquisition unit according to a third preset bit at intervals of preset time. The full-scale encoding is performed every preset time length so as to reduce decoding dependence and prevent error codes.
In a possible embodiment, the third coding unit is specifically configured to: and when the first differential value is smaller than the preset threshold value, encoding the first differential value according to a first preset bit.
In a possible embodiment, the third coding unit is further configured to: and when the first differential value is not smaller than the preset threshold value, the first residual differential value and the preset threshold value are encoded according to a first preset bit, and the first residual differential value is the difference value between the differential value and the preset threshold value.
In a possible embodiment, the third coding unit is specifically configured to: and when the first residual differential value is not smaller than the preset threshold value, encoding a second residual differential value according to the first preset bit, wherein the second residual differential value is the difference value between the first residual differential value and the preset threshold value. And carrying out first coding on the preset threshold value according to the first preset bit. And performing secondary coding on the preset threshold value according to the first preset bit. Since the vision sensor may have a certain delay, it may be caused that the amount of light intensity conversion is greater than a predetermined threshold value twice or more than twice, an event is generated. This may have a problem that the difference value is equal to or greater than the predetermined threshold, the amount of change in the light intensity is at least twice the predetermined threshold, for example, the first residual difference value may not be less than the predetermined threshold, the second residual difference value may be encoded, if the second residual difference value is still not less than the predetermined threshold, the third residual difference value may be encoded, the third difference value is the difference between the second residual difference value and the predetermined threshold, and the predetermined threshold is encoded for the third time, and the above-described process is repeated until the residual difference value is less than the predetermined threshold.
In an eleventh aspect, the present application provides a decoding apparatus, which may include: and the acquisition circuit is used for reading the data signals from the vision sensor chip. And the decoding circuit is used for decoding the data signal according to the first bit to obtain a differential value, wherein the differential value is smaller than a preset threshold value, the differential value is the difference value between the light intensity transformation amount measured by the vision sensor and the preset threshold value, the light intensity transformation amount exceeds the preset threshold value, and the vision sensor generates at least one light intensity change event. A decoding circuit provided in the eleventh aspect corresponds to a vision sensor chip provided in the tenth aspect, and is configured to decode a data signal output by the vision sensor chip provided in the tenth aspect. The decoding circuit provided in the eleventh aspect can adopt a corresponding differential decoding mode aiming at the differential coding mode adopted by the vision sensor.
In a possible implementation, the decoding circuit is further configured to: the data signal is decoded according to the second bit to obtain polarity information indicating whether the amount of change in light intensity is increased or decreased.
In a possible implementation, the decoding circuit is further configured to decode the data signal received at the first time according to a third bit to obtain an electrical signal corresponding to the optical signal output by the vision sensor and irradiated thereon, where the third bit is a maximum bit preset by the vision sensor and used for representing the characteristic information of the light intensity.
In a possible implementation, the decoding circuit is further configured to: and decoding the data signal received at the first moment according to the third bit at preset time intervals.
In a possible implementation, the decoding circuit is specifically configured to: the data signal is decoded according to the first bit to obtain a differential value and at least one predetermined threshold.
In a twelfth aspect, the present application provides a method of operating a vision sensor chip, which may include: a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit are generated by measuring the amount of light intensity variation by the pixel array circuit of the vision sensor chip, the plurality of data signals indicating at least one light intensity variation event, the at least one light intensity variation event indicating that the amount of light intensity variation measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. And encoding a first differential value according to a first preset bit by a third encoding unit of the vision sensor chip, wherein the first differential value is the difference value between the light intensity conversion quantity and a preset threshold value.
In one possible implementation, the pixel array circuit may include a plurality of pixels, each pixel may include a threshold comparison unit, and the method may further include: when the light intensity conversion amount exceeds a predetermined threshold value, polarity information is outputted by the threshold value comparing unit, the polarity information being used for indicating whether the light intensity conversion amount is increased or decreased. And encoding the polarity information according to the second preset bit through a third encoding unit.
In a possible embodiment, each pixel may include a light intensity detection unit, a readout control unit, and a light intensity acquisition unit, and the method may further include: an electric signal corresponding to the light signal irradiated thereon is output through the light intensity detection unit, and the electric signal is used for indicating the light intensity. Outputting the polarity information by the threshold comparing unit may include: when the light intensity conversion amount exceeds a predetermined threshold value according to the electric signal, the polarity information is output through the threshold value comparing unit. The method may further comprise: and in response to receiving the polarity signal, the readout control unit instructs the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the polarity information receiving moment. And encoding the first electric signal according to a third preset bit, wherein the first electric signal is an electric signal of the first receiving moment of the corresponding polarity information acquired by the light intensity acquisition unit, and the third preset bit is the maximum bit preset by the vision sensor and used for representing the characteristic information of the light intensity.
In one possible embodiment, the method may further comprise: and encoding the electric signals acquired by the light intensity acquisition unit according to a third preset bit at intervals of preset time.
In one possible implementation manner, the encoding, by the third encoding unit of the vision sensor chip, the first differential value according to the first preset bit may include: and when the first differential value is smaller than the preset threshold value, encoding the first differential value according to a first preset bit.
In one possible implementation manner, the encoding, by the third encoding unit of the vision sensor chip, the first differential value according to the first preset bit may further include: and when the first differential value is not smaller than the preset threshold value, the first residual differential value and the preset threshold value are encoded according to a first preset bit, and the first residual differential value is the difference value between the differential value and the preset threshold value.
In one possible embodiment, when the first differential value is not less than the predetermined threshold, encoding the first residual differential value and the predetermined threshold according to the first preset bit may include: and when the first residual differential value is not smaller than the preset threshold value, encoding a second residual differential value according to the first preset bit, wherein the second residual differential value is the difference value between the first residual differential value and the preset threshold value. And carrying out first coding on the preset threshold value according to the first preset bit. The predetermined threshold value is encoded a second time according to a first preset bit, and the first residual differential value may include a second residual differential value and two predetermined threshold values.
In a thirteenth aspect, the present application provides a decoding method, which may include: the data signal is read from the vision sensor chip by the acquisition circuit. The data signal is decoded by a decoding circuit according to the first bit to obtain a differential value, wherein the differential value is smaller than a preset threshold value, the differential value is the difference value between the light intensity transformation amount measured by the vision sensor and the preset threshold value, the light intensity transformation amount exceeds the preset threshold value, and the vision sensor generates at least one light intensity change event.
In one possible embodiment, the method may further include: the data signal is decoded according to the second bit to obtain polarity information indicating whether the amount of change in light intensity is increased or decreased.
In one possible embodiment, the method may further include: the data signal received at the first time is decoded according to a third bit, which is the maximum bit of the characteristic information preset by the vision sensor and used for representing the light intensity, so as to acquire an electric signal corresponding to the light signal irradiated on the output of the vision sensor.
In one possible embodiment, the method may further include: and decoding the data signal received at the first moment according to the third bit at preset time intervals.
In one possible implementation, decoding, by the decoding circuit, the data signal according to the first bit to obtain the differential value may include: the data signal is decoded according to the first bit to obtain a differential value and at least one predetermined threshold.
In a fourteenth aspect, the present application provides an image processing method, including: acquiring motion information, wherein the motion information comprises information of a motion track of a target object when the target object moves within a detection range of a motion sensor; generating at least one frame of event image according to the motion information, wherein the at least one frame of event image is an image representing a motion track of the target object when the target object moves within the detection range; acquiring a target task, and acquiring iteration time according to the target task; and carrying out iterative updating on the at least one frame of event image to obtain at least one updated frame of event image, wherein the time for carrying out iterative updating on the at least one frame of event image does not exceed the iterative time.
Therefore, in the embodiment of the application, the moving object can be monitored by the motion sensor, the motion sensor acquires the information of the motion track of the moving object in the detection range, after the target task is obtained, the iteration duration can be determined according to the target task, and the event image is iteratively updated in the iteration duration, so that the event image matched with the target task is obtained.
In a possible implementation manner, the performing any one of the iterative updating on the at least one frame of event image includes: acquiring motion parameters representing parameters of relative motion between the motion sensor and the target object; and carrying out iterative updating on the target event image in the at least one frame of event image according to the motion parameters to obtain an updated target event image.
Therefore, in the embodiment of the application, when the event image is updated iteratively, the event image can be compensated by updating the parameter based on the relative motion between the object and the motion sensor, so as to obtain a clearer event image.
In a possible implementation manner, the acquiring motion parameters includes: acquiring a value of an optimization model preset in the last iteration updating process; and calculating according to the value of the optimization model to obtain the motion parameter.
Therefore, in the embodiment of the application, the event image can be updated based on the value of the optimization model, and better motion parameters are obtained through calculation according to the optimization model, and then the event image is updated by using the motion parameters, so that a clearer event image is obtained.
In a possible implementation manner, the iteratively updating the target event image in the at least one frame of event images according to the motion parameter includes: and compensating the motion trail of the target object in the target event image according to the motion parameters to obtain a target event image obtained by current iteration updating.
Therefore, in the embodiment of the application, the motion parameter can be used to compensate the motion track of the target object in the event image, so that the motion track of the target object in the event image is clearer, and the event image is clearer.
In one possible embodiment, the motion parameters include one or more of the following: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
Therefore, in the embodiment of the application, the target object in the event image can be subjected to motion compensation through various motion parameters, so that the definition of the event image is improved.
In one possible implementation manner, in the process of any one iteration update, the method further includes: if the result of the current iteration meets a preset condition, terminating the iteration, wherein the termination condition comprises at least one of the following: and iteratively updating the at least one frame of event images for a preset number of times or enabling the value change of the optimization model in the updating process of the at least one frame of event images to be smaller than a preset value.
Therefore, in the embodiment of the application, in addition to setting the iteration duration, a convergence condition related to the iteration number or the value of the optimization model may be set, so that under the constraint of the iteration duration, an event image meeting the convergence condition is obtained.
In a fifteenth aspect, the present application provides an image processing method, including: generating at least one frame of event image according to motion information, wherein the motion information comprises information of a motion track of a target object when the target object moves within a detection range of a motion sensor, and the at least one frame of event image is an image representing the motion track of the target object when the target object moves within the detection range; acquiring motion parameters representing parameters of relative motion between the motion sensor and the target object; initializing a preset value of the optimization model according to the motion parameters to obtain the value of the optimization model; and updating the at least one frame of event image according to the value of the optimization model to obtain the updated at least one frame of event image.
In the embodiment of the application, the optimization model can be initialized by using the parameters of the relative motion between the motion sensor and the target object, so that the initial iteration times of the event image are reduced, the convergence rate of the iteration of the event image is accelerated, and a clearer event image is obtained under the condition of fewer iteration times.
In one possible embodiment, the motion parameters include one or more of the following: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
In a possible implementation manner, the acquiring motion parameters includes: acquiring data acquired by an Inertial Measurement Unit (IMU) sensor; and calculating the motion parameters according to the data acquired by the IMU sensor. Thus, in the embodiment of the application, the motion parameters can be calculated through the IMU, so that more accurate motion parameters can be obtained.
In a possible implementation manner, after the initializing the values of the preset optimization model according to the motion parameters, the method further includes: and updating parameters of the IMU sensor according to the value of the optimization model, wherein the parameters of the IMU sensor are used for acquiring data by the IMU sensor.
Therefore, in the embodiment of the application, the parameters of the IMU can be updated according to the value of the optimization model, so that correction of the IMU is realized, and the data acquired by the IMU is more accurate.
In a sixteenth aspect, the present application provides an image processing apparatus having a function of implementing the method of any one of the above fourteenth or fourteenth possible implementation manners, or having a function of implementing the method of any one of the above fifteenth or fifteenth possible implementation manners. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a seventeenth aspect, the present application provides a method of image processing, including: acquiring motion information, wherein the motion information comprises information of a motion track of a target object when the target object moves within a detection range of a motion sensor; generating an event image according to the motion information, wherein the event image is an image representing a motion track of the target object when the target object moves within the detection range; obtaining a first reconstructed image according to at least one event included in the event image, wherein the color types of a first pixel point and at least one second pixel point are different, the first pixel point is a pixel point corresponding to any one event in the at least one first reconstructed image, and the at least one second pixel point is included in a plurality of pixel points adjacent to the first pixel point in the first reconstructed image.
Therefore, in the embodiment of the present application, when there is relative motion between the subject and the motion sensor, image reconstruction can be performed based on the data acquired by the motion sensor, and a reconstructed image can be obtained, and even when the RGB sensor is not clearly imaged, a clear image can be obtained.
In a possible implementation manner, the determining, according to at least one event included in the event image, a color type corresponding to each pixel point in the event image, to obtain a first reconstructed image includes: and scanning each pixel point in the event image according to a first direction, determining a color type corresponding to each pixel point in the event image, and obtaining a first reconstructed image, wherein if the first pixel point is scanned to have an event, the color type of the first pixel point is determined to be a first color type, and if a second pixel point arranged in front of the first pixel point according to the first direction does not have an event, the color type corresponding to the second pixel point is a second color type, the first color type and the second color type are different color types, and the pixel point with the event represents the pixel point corresponding to the event in the event image at the position where the motion sensor detects that the motion sensor has a change.
In the embodiment of the application, the image reconstruction can be performed based on the event of each pixel point in the event image by scanning the event image, so that a clearer event image is obtained. Therefore, in the embodiment of the application, the information acquired by the motion sensor can be used for image reconstruction, and the reconstructed image can be obtained efficiently and quickly, so that the efficiency of image recognition, image classification and the like of the reconstructed image can be improved. Even in some scenes such as shooting moving objects or shooting jitters, clear RGB images cannot be shot, the image reconstruction can be performed through the information acquired by the motion sensor, and clearer images can be quickly and accurately reconstructed so as to facilitate subsequent tasks such as recognition or classification.
In a possible implementation manner, the first direction is a preset direction, or the first direction is determined according to data acquired by the IMU, or the first direction is determined according to an image captured by the color RGB camera. Thus, in the embodiments of the present application, the direction in which the event image is scanned may be determined in a variety of ways, accommodating more scenes.
In one possible implementation manner, if a plurality of consecutive third pixels arranged after the first pixel according to the first direction do not have an event, the color type corresponding to the plurality of third pixels is the first color type. Therefore, in the embodiment of the present application, when there are a plurality of continuous pixels without events, the color types corresponding to the continuous pixels are the same, so as to avoid the situation that the edges are unclear due to the movement of the same object in the actual scene.
In one possible implementation manner, if a fourth pixel point arranged after the first pixel point in the first direction and adjacent to the first pixel point has an event, and a fifth pixel point arranged after the fourth pixel point in the first direction and adjacent to the fourth pixel point does not have an event, the color types corresponding to the fourth pixel point and the fifth pixel point are both the first color type.
Therefore, when at least two continuous pixel points in the event image have events, the reconstructed color type can not be changed when the second event is scanned, so that the unclear edge of the reconstructed image caused by the over-wide edge of the target object is avoided.
In one possible implementation manner, after the scanning each pixel point in the event image according to the first direction, determining a color type corresponding to each pixel point in the event image, and obtaining a first reconstructed image, the method further includes: scanning the event image according to a second direction, and determining a color type corresponding to each pixel point in the event image to obtain a second reconstructed image, wherein the second direction is different from the first direction; and fusing the first reconstructed image and the second reconstructed image to obtain the updated first reconstructed image.
In the embodiment of the application, the event images can be scanned according to different directions, so that a plurality of reconstructed images are obtained from a plurality of directions, and then the reconstructed images are fused to obtain a more accurate reconstructed image.
In one possible embodiment, the method further comprises: if the first reconstructed image does not meet the preset requirement, the motion information is updated, the event image is updated according to the updated motion information, and the updated first reconstructed image is obtained according to the updated event image.
In the embodiment of the application, the event image can be updated by combining the information acquired by the motion sensor, so that the updated event image is clearer.
In a possible implementation manner, before determining the color type corresponding to each pixel point in the event image according to at least one event included in the event image, and obtaining a first reconstructed image, the method further includes: compensating the event image according to a motion parameter when the target object and the motion sensor perform relative motion, so as to obtain the compensated event image, wherein the motion parameter comprises one or more of the following: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
Therefore, in the embodiment of the application, the motion compensation can be performed on the event image by combining the motion parameters, so that the event image is clearer, and the reconstructed image obtained by reconstruction is clearer.
In one possible implementation, the color type of the pixels in the reconstructed image is determined based on the color acquired by the color RGB camera. In the embodiment of the application, the colors in the actual scene can be determined according to the RGB camera, so that the colors of the reconstructed image are matched with the colors in the actual scene, and the user experience is improved.
In one possible embodiment, the method further comprises: obtaining an RGB image according to the data acquired by the RGB camera; and fusing the RGB image and the first reconstruction image to obtain the updated first reconstruction image. Therefore, in the embodiment of the application, the RGB image and the reconstructed image can be fused, so that the finally obtained reconstructed image is clearer.
In an eighteenth aspect, the present application further provides an image processing apparatus having a function of implementing the method of the eighteenth aspect or any one of the possible implementation manners of the eighteenth aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a nineteenth aspect, the present application provides a method for image processing, including: acquiring a first event image (event image) and a plurality of shot first images, wherein the first event image comprises information of an object moving in a preset range in a shooting time period of the plurality of first images, exposure time periods corresponding to the plurality of first images are different, and the preset range is a shooting range of a camera; calculating a first jitter degree corresponding to each first image in the plurality of first images according to the first event image, wherein the first jitter degree is used for representing the jitter degree of a camera when the plurality of first images are shot; determining the fusion weight of each first image in the plurality of first images according to the first jitter degree corresponding to each first image, wherein the first jitter degree corresponding to the plurality of first images and the fusion weight are in a negative correlation; and fusing the plurality of first images according to the fusion weight of each first image to obtain a target image.
Therefore, in the embodiment of the present application, the degree of shake when capturing RGB images may be quantified by an event image, and the fusion weight of each RGB image may be determined according to the degree of shake of each RGB image. Generally, the fusion weight corresponding to the RGB image with low jitter is higher, so that the information included in the final target image tends to be a clearer RGB image, and a clearer target image is obtained. In general, the RGB image with higher jitter degree has smaller corresponding weight value, and the RGB image with lower jitter degree has larger corresponding weight value, so that the information included in the final obtained target image is more prone to the information included in the clearer RGB image, the final obtained target image is clearer, and the user experience is improved. And if the target image is used for subsequent image recognition or feature extraction and the like, the obtained recognition result or the extracted feature is more accurate.
In one possible embodiment, before the determining the fusion weight of each first image of the plurality of first images according to the first jitter degree, the method further includes: and if the first dithering degree is not higher than a first preset value and is higher than a second preset value, performing dithering removal processing on each first image to obtain each first image after dithering removal.
Therefore, in the embodiment of the present application, the shake situations can be distinguished based on the dynamic data, and the shake situations are directly fused when there is no shake, and the RGB image is adaptively debounced when the shake is not strong, and the RGB image is complementarily photographed when the shake is strong, so that scenes with various shake degrees are used, and the generalization capability is strong.
In one possible implementation manner, the determining the fusion weight of each first image in the plurality of first images according to the first dithering degree includes: if the first jitter degree is higher than a first preset value, a second image is obtained through re-shooting, and the second jitter degree of the second image is not higher than the first preset value; calculating the fusion weight of each first image according to the first jitter degree of each first image, and calculating the fusion weight of the second image according to the second jitter degree; the step of fusing the plurality of first images according to the fusion weight of each first image to obtain a target image comprises the following steps: and fusing the plurality of first images and the second image according to the fusion weight of each first image and the fusion weight of the second image to obtain the target image.
In general, the RGB image with higher jitter degree has smaller corresponding weight value, and the RGB image with lower jitter degree has larger corresponding weight value, so that the information included in the final obtained target image is more prone to the information included in the clearer RGB image, the final obtained target image is clearer, and the user experience is improved. And if the target image is used for subsequent image recognition or feature extraction and the like, the obtained recognition result or the extracted feature is more accurate. Aiming at the RGB image with high jitter degree, the RBG image can be taken in a supplementary mode, the RGB image with lower jitter degree and clearer can be obtained, when the image fusion is carried out subsequently, the clearer image can be used for fusion, and further the finally obtained target image is clearer.
In one possible embodiment, before the capturing again the second image, the method further comprises: acquiring a second event image, wherein the second event image is acquired before the first event image is acquired; and calculating exposure parameters according to the information included in the second event image, wherein the exposure parameters are used for shooting the second image.
Therefore, in the embodiment of the application, the exposure strategy is adaptively adjusted by using the information acquired by the dynamic sensing camera (i.e. the motion sensor), that is, the high dynamic range sensing characteristic of the texture in the shooting range is utilized by using the dynamic sensing information, the image with proper shooting exposure time is adaptively supplemented, and the capability of capturing the texture information of the strong light area or the dark light area by the camera is improved.
In one possible implementation manner, the re-shooting obtains a second image, and the method further includes: dividing the first event image into a plurality of areas, and dividing a third image into a plurality of areas, wherein the third image is a first image with the minimum exposure value in the plurality of first images, the plurality of areas included in the first event image correspond to the positions of the plurality of areas included in the third image, and the exposure value comprises at least one of exposure duration, exposure amount or exposure level; calculating whether each region in the first event image includes first texture information and whether each region in the third image includes second texture information; if a first area in the first event image includes the first texture information and an area corresponding to the first area in the third image does not include the second texture information, shooting according to the exposure parameter to obtain the second image, wherein the first area is any area in the first dynamic area.
Therefore, in the embodiment of the present application, if a certain region in the first dynamic region includes texture information and the same region as the region in the RGB image with the minimum exposure value does not include texture information, it means that the region in the RGB image has a high blur degree, and the RGB image can be taken in a complementary manner. If each region in the first event image does not include texture information, then the RGB image need not be taken in a supplemental manner.
In a twentieth aspect, the present application provides an image processing method, including: firstly, detecting motion information of a target object, wherein the motion information can comprise information of a motion track of the target object when the target object moves within a preset range, and the preset range is a shooting range of a camera; then, determining focusing information according to the motion information, wherein the focusing information comprises parameters for focusing a target object in a preset range; and then focusing the target object in a preset range according to the focusing information, and shooting an image of the preset range.
Therefore, in the embodiment of the application, the movement track of the target object in the shooting range of the camera can be detected, and then the focusing information is determined according to the movement track of the target object and focusing is completed, so that a clearer image can be shot. Even if the target object is in motion, the target object can be accurately focused, a clear image in a motion state is shot, and user experience is improved.
In one possible embodiment, the determining the focusing information according to the motion information may include: predicting the motion trail of the target object in a preset duration according to motion information, namely the information of the motion trail of the target object in a preset range, so as to obtain a prediction area, wherein the prediction area is an area where the target object is located in the preset duration obtained by prediction; and determining a focusing area according to the prediction area, wherein the focusing area comprises at least one focusing point for focusing the target object, and the focusing information comprises the position information of the at least one focusing point.
Therefore, in the embodiment of the application, the future motion trail of the target object can be predicted, and the focusing area is determined according to the predicted area, so that the focusing on the target object can be accurately completed. Even if the target object moves at a high speed, the embodiment of the application can focus the target object in advance in a prediction mode, so that the target object is in a focusing area, and a clearer target object moving at a high speed is shot.
In one possible implementation, determining the focusing area according to the prediction area may include: if the predicted area meets the preset condition, determining the predicted area as a focusing area; if the predicted area does not meet the preset condition, predicting the motion trail of the target object in the preset time length according to the motion information again to obtain a new predicted area, and determining a focusing area according to the new predicted area. The preset condition may be that the prediction area includes a complete target object, or that the area of the prediction area is larger than a preset value, or the like.
Therefore, in the embodiment of the application, the focusing area is determined according to the prediction area only when the prediction area meets the preset condition, and the camera is triggered to shoot, and when the prediction area does not meet the preset condition, the camera is not triggered to shoot, so that incomplete target objects in a shot image can be avoided, or meaningless shooting can be avoided. And when shooting is not performed, the camera can be in an unactuated state, and the camera is triggered to perform shooting only when the predicted area meets the preset condition, so that the power consumption generated by the camera can be reduced.
In one possible embodiment, the motion information further includes at least one of a motion direction and a motion speed of the target object; the predicting the motion trail of the target object within the preset duration according to the motion information to obtain the predicted area may include: and predicting the motion trail of the target object in a preset duration according to the motion trail of the target object when the target object moves in a preset range, and the motion direction and/or the motion speed, so as to obtain a prediction area.
Therefore, in the embodiment of the application, the motion trail of the target object in the future preset duration can be predicted according to the motion trail of the target object in the preset range, the motion direction and/or the motion speed and the like, so that the area where the target object is located in the future preset duration of the target object can be accurately predicted, the target object can be more accurately focused, and a clearer image can be shot.
In a possible implementation manner, the predicting the motion track of the target object in the preset duration according to the motion track of the target object in the preset range and the motion direction and/or the motion speed to obtain the prediction area may include: fitting a change function of the central point of the area where the target object is located, which changes along with time, according to the motion trail and the motion direction and/or the motion speed of the target object when the target object moves within a preset range; calculating a predicted central point according to the change function, wherein the predicted central point is the central point of the area where the target object is located in the predicted preset duration; and obtaining a prediction area according to the prediction center point.
Therefore, in the embodiment of the application, according to the motion track of the target object during motion, a change function of the center point of the area where the target object is located along with time change is fitted, then the center point of the area where the target object is located at a certain moment in the future is predicted according to the change function, the predicted area is determined according to the center point, further, the target object can be focused more accurately, and further, a clearer image can be shot.
In one possible implementation manner, the image of the prediction range may be captured by the RGB camera, and focusing on the target object in the preset range according to the focusing information may include: at least one point with the smallest norm distance from the central point of the focusing area among the plurality of focusing points of the RGB camera is used as the focusing point to focus.
Therefore, in the embodiment of the present application, at least one point closest to the norm of the center point of the focusing area may be selected as the focusing point, and focusing may be performed, thereby completing focusing on the target object.
In one possible implementation manner, the motion information includes a current area of the target object, and the determining the focusing information according to the motion information may include: and determining the current area of the target object as a focusing area, wherein the focusing area comprises at least one focusing point for focusing the target object, and the focusing information comprises the position information of the at least one focusing point.
Therefore, in the embodiment of the present application, the information of the motion trail of the target object in the preset range may include the current area of the target object and the historical area of the target object, and the current area of the target object may be used as the focusing area, so that focusing on the target object is completed, and a clearer image may be shot.
In one possible implementation manner, before capturing the image in the preset range, the method may further include: acquiring exposure parameters; the capturing an image of a preset range may include: and shooting an image in a preset range according to the exposure parameters.
Therefore, in the embodiment of the application, the exposure parameters can be adjusted, so that shooting is completed through the exposure parameters, and a clear image is obtained.
In one possible implementation manner, the acquiring exposure parameters may include: and determining an exposure parameter according to the motion information, wherein the exposure parameter comprises exposure time, the motion information comprises the motion speed of the target object, and the exposure time and the motion speed of the target object are in negative correlation.
Therefore, in the embodiment of the application, the exposure time length can be determined by the movement speed of the target object, so that the exposure time length is matched with the movement speed of the target object, for example, the faster the movement speed is, the shorter the exposure time length is, the slower the movement speed is, and the longer the exposure time length is. Overexposure or underexposure and the like can be avoided, so that a clearer image can be shot later, and user experience is improved.
In one possible implementation manner, the acquiring exposure parameters may include: and determining exposure parameters according to the illumination intensity, wherein the exposure parameters comprise exposure time, and the magnitude of the illumination intensity in a preset range and the exposure time are in negative correlation.
Therefore, in the embodiment of the application, the exposure time can be determined according to the detected illumination intensity, when the illumination intensity is larger, the exposure time is shorter, and when the illumination intensity is smaller, the exposure time is longer, so that a proper amount of exposure can be ensured, and a clearer image can be shot.
In one possible embodiment, after capturing the image of the preset range, the method may further include: and fusing the images within a preset range according to the monitored movement information of the target object and the images, so as to obtain the target image within the preset range.
Therefore, in the embodiment of the application, while capturing an image, the motion condition of the target object in the preset range can be monitored, information of the corresponding motion of the target object in the image, such as the contour of the target object, the position of the target object in the preset range and the like, is obtained, and the captured image is enhanced by the information, so that a clearer target image is obtained.
In one possible implementation manner, the detecting the motion information of the target object within the preset range may include: and monitoring the motion condition of the target object in a preset range by a dynamic vision sensor (dynamic vision sensor, DVS) to obtain motion information.
Therefore, in the embodiment of the application, the object moving in the shooting range of the camera can be monitored by the DVS, so that accurate movement information can be obtained, and even if the target object is in a state of moving at a high speed, the movement information of the target object can be timely captured by the DVS.
In a twenty-first aspect, the present application further provides an image processing apparatus having a function of implementing the method of the nineteenth aspect or any one of the possible implementation manners of the nineteenth aspect, or having a function of implementing the method of the twentieth aspect or any one of the possible implementation manners of the twentieth aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a twenty-second aspect, an embodiment of the present application provides a graphical user interface GUI, wherein the graphical user interface is stored in an electronic device comprising a display screen, a memory, one or more processors configured to execute one or more computer programs stored in the memory, the graphical user interface comprising: responding to a triggering operation of shooting a target object, shooting an image of a preset range according to focusing information, displaying the image of the preset range, wherein the preset range is a camera shooting range, the focusing information comprises parameters for focusing the target object in the preset range, the focusing information is determined according to movement information of the target object, and the movement information comprises information of a movement track of the target object when moving in the preset range.
The advantageous effects produced by the twenty-second aspect and any possible implementation manner of the twenty-second aspect may be referred to the description of the twentieth aspect and any possible implementation manner of the twentieth aspect.
In one possible implementation, the graphical user interface may further comprise: and responding to the motion information to predict the motion trail of the target object within a preset time length to obtain a predicted area, wherein the predicted area is the area where the target object is located within the preset time length, which is obtained by prediction, and determining the focusing area according to the predicted area, wherein the focusing area is displayed in the display screen, the focusing area comprises at least one focusing point for focusing the target object, and the focusing information comprises the position information of at least one focusing point.
In one possible implementation, the graphical user interface may specifically include: if the predicted area meets the preset condition, responding to the determination of the focusing area according to the predicted area, and displaying the focusing area in the display screen; if the predicted area does not meet the preset condition, the predicted area is obtained in response to predicting the motion trail of the target object within the preset time period again according to the motion information, the focusing area is determined according to the new predicted area, and the focusing area is displayed in the display screen.
In a possible embodiment, the motion information further includes at least one of a motion direction and a motion speed of the target object; the graphical user interface may specifically include: and predicting the motion trail of the target object in a preset time period according to the motion trail of the target object in a preset range and the motion direction and/or the motion speed to obtain the prediction area, and displaying the prediction area in the display screen.
In one possible implementation, the graphical user interface may specifically include: and responding to a motion track when the target object moves within a preset range, the motion direction and/or the motion speed, fitting a change function of the central point of the area where the target object is located along with the change of time, calculating a prediction central point according to the change function, wherein the prediction central point is the central point of the area where the target object is predicted, obtaining the prediction area according to the prediction central point, and displaying the prediction area in a display screen.
In a possible implementation manner, the image of the prediction horizon is taken by an RGB camera, and the graphical user interface may specifically include: and in response to focusing at least one point with the smallest norm distance from the central point of the focusing area among the plurality of focusing points of the RGB camera, displaying an image shot after focusing based on the at least one point as the focusing point on a display screen.
In one possible implementation manner, the motion information includes a current area of the target object, and the graphical user interface specifically may include: in response to the current region of the target object being the focusing region, the focusing region includes at least one focus point for focusing the target object, the focusing information includes position information of the at least one focus point, and the focusing region is displayed in the display screen.
In one possible implementation, the graphical user interface may further comprise: and in response to the monitored information of the movement of the target object corresponding to the image, fusing the images in the preset range to obtain a target image in the preset range, and displaying the target image in the display screen.
In one possible implementation, the motion information is obtained by monitoring the motion condition of the target object within the preset range through a dynamic vision sensor DVS.
In one possible implementation, the graphical user interface may specifically include: acquiring exposure parameters before shooting the image in the preset range, and displaying the exposure parameters in a display screen; and responding to the image of the preset range shot according to the exposure parameters, and displaying the image of the preset range shot according to the exposure parameters in a display screen.
In one possible implementation manner, the exposure parameter is determined according to the motion information, and the exposure parameter includes an exposure time period, where the exposure time period has a negative correlation with the motion speed of the target object.
In one possible implementation manner, the exposure parameter is determined according to illumination intensity, the illumination intensity can be illumination intensity detected by a camera or illumination intensity detected by a motion sensor, the exposure parameter comprises exposure duration, and the magnitude of the illumination intensity in the preset range is in negative correlation with the exposure duration.
In a twenty-third aspect, the present application provides an image processing method, including: first, an event stream and one frame of RGB image (which may be referred to as a first RGB image) are acquired by a camera equipped with a motion sensor (e.g., DVS) and an RGB sensor, respectively, wherein the acquired event stream includes at least one frame of event image, each frame of event image in the at least one frame of event image is generated from motion trajectory information when a target object (i.e., a moving object) moves within a monitoring range of the motion sensor, and the first RGB image is a superposition of photographed scenes at each time captured by the camera during an exposure period. After the event stream and the first RGB image are acquired, a mask may be constructed according to the event stream, where the mask is used to determine a motion area of each frame of the event image in the event stream, that is, to determine a position of a moving object in the RGB image. After the event stream, the first RGB image and the mask are obtained according to the above steps, a second RGB image, which is an RGB image of the removal target object, may be obtained according to the event stream, the first RGB image and the mask.
In the above embodiment of the present application, the moving object may be removed based on only one RGB image and event stream, so as to obtain an RGB image without moving object.
In one possible implementation, before building the mask from the event stream, the method may further include: when the motion sensor monitors that motion mutation occurs in the monitoring range at the first moment, triggering the camera to shoot a third RGB image; the obtaining a second RGB image from the event stream, the first RGB image, and the mask includes: and obtaining a second RGB image according to the event stream, the first RGB image, the third RGB image and the mask. In this case, the obtaining the second RGB image from the event stream, the first RGB image and the mask may be: and obtaining a second RGB image according to the event stream, the first RGB image, the third RGB image and the mask.
In the above embodiment of the present application, whether the motion data acquired by the motion sensor has a motion mutation may be determined, when the motion mutation exists, the camera is triggered to capture a third RGB image, then an event stream and a frame of first RGB image are obtained according to the similar manner, a mask is constructed according to the event stream, and finally a second RGB image without motion foreground is obtained according to the event stream, the first RGB image, the third RGB image and the mask. The obtained third RGB image triggers the camera to automatically snap under the condition of sudden movement change, so that the sensitivity is high, a frame of image can be obtained at the beginning of the user perceiving that the moving object changes, and a better removal effect on the moving object can be realized based on the third RGB image and the first RGB image.
In one possible implementation, the motion sensor monitoring the monitored range for motion abrupt changes at a first time includes: in the monitoring range, the overlapping part between the generation area of the first event stream acquired by the motion sensor at the first moment and the generation area of the second event stream acquired by the motion sensor at the second moment is smaller than a preset value.
In the above embodiments of the present application, the determination conditions for motion mutation are specifically described, and the feasibility is provided.
In one possible implementation, the manner in which the mask is constructed from the event stream may be: first, the monitoring range of the motion sensor may be divided into a plurality of preset neighborhoods (set as a neighborhood k), then, in each neighborhood k, when the number of event images of the event stream within the preset duration Δt exceeds a threshold value P, the corresponding neighborhood is determined to be a motion region, the motion region may be marked as 0, and if the number of event images of the event stream within the preset duration Δt does not exceed the threshold value P, the corresponding neighborhood is determined to be a background region, and the background region may be marked as 1.
In the above embodiments of the present application, a method for constructing a mask is specifically described, which is simple and easy to operate.
In a twenty-fourth aspect, the present application further provides an image processing apparatus having a function of implementing the method of the twenty-second aspect or any one of the possible implementation manners of the twenty-second aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a twenty-fifth aspect, the present application provides a pose estimation method applied to a synchronous positioning and mapping (simultaneous localization and mapping, SLAM) scene. The method comprises the following steps: the terminal acquires a first event image and a first RGB image, wherein the first event image is aligned with the first target image in time sequence, and the first target image comprises an RGB image or a depth image. The first event image is an image representing a motion trajectory of the target object when the target object generates motion within a detection range of a motion sensor. The terminal determines the integration time of the first event image. If the integration time is less than a first threshold, the terminal determines that pose estimation is not performed by the first target image. And the terminal executes pose estimation according to the first event image.
In the scheme, when the terminal determines that the current scene is in a scene where the RGB camera is difficult to collect effective environment information based on the fact that the integration time of the event image is smaller than the threshold value, the terminal determines that pose estimation is not performed through the RGB image with poor quality, and therefore precision of pose estimation is improved.
Optionally, in one possible implementation manner, the method further includes: determining the acquisition time of the first event image and the acquisition time of the first target image; and determining that the first event image is aligned with the first target image in time sequence according to the time difference between the acquisition time of the first target image and the acquisition time of the first event image is smaller than a second threshold value. The second threshold may be determined according to the accuracy of SLAM and the frequency of RGB image acquisition by the RGB camera, for example, the second threshold may be 5 ms or 10 ms.
Optionally, in one possible implementation manner, the acquiring the first event image includes: acquiring N continuous DVS events; integrating the N consecutive DVS events into a first event image; the method further comprises the steps of: and determining the acquisition time of the first event image according to the acquisition time of the N continuous DVS events.
Optionally, in one possible implementation manner, the determining the integration time of the first event image includes: determining N consecutive DVS events for integration into the first event image; and determining the integration time of the first event image according to the acquisition time of the first DVS event and the last DVS event in the N continuous DVS events. Since the first event image is obtained by integrating N consecutive DVS events, the terminal may determine the acquisition time of the first event image according to the acquisition time corresponding to the N consecutive DVS events, that is, determine the acquisition time of the first event image as a time period from when the first DVS event to when the last DVS event is acquired in the N consecutive DVS events.
Optionally, in one possible implementation manner, the method further includes: and acquiring a second event image, wherein the second event image is an image representing a motion track of the target object when the target object moves within the detection range of the motion sensor. Wherein the time period for which the motion sensor detects that the first event image is obtained is different from the time period for which the motion sensor detects that the second event image is obtained. If the RGB image aligned with the second event image time sequence does not exist, determining that the second event image does not have the RGB image for jointly executing pose estimation; and executing pose estimation according to the second event image.
Optionally, in a possible implementation manner, before the determining the pose according to the second event image, the method further includes: if the second event image is determined to have time sequence aligned Inertial Measurement Unit (IMU) data, determining a pose according to the second event image and IMU data corresponding to the second event image; if it is determined that the second event image does not have time-aligned inertial measurement unit IMU data, a pose is determined from the second event image only.
Optionally, in one possible implementation manner, the method further includes: acquiring a second target image, wherein the second target image comprises an RGB image or a depth image; if no event image aligned with the second target image time sequence exists, determining that the second target image does not have an event image for jointly executing pose estimation; and determining the pose according to the second target image.
Optionally, in one possible implementation manner, the method further includes: and executing loop detection according to the first event image and a dictionary, wherein the dictionary is constructed based on the event image. That is, before performing loop-back detection, the terminal may construct a dictionary based on the event image in advance so that loop-back detection can be performed based on the dictionary in the course of performing loop-back detection.
Optionally, in one possible implementation manner, the method further includes: a plurality of event images are acquired, wherein the event images are event images used for training, and the event images can be event images shot by a terminal in different scenes. The visual features of the plurality of event images are acquired, and the visual features may include, for example, features such as texture, pattern, or gray statistics of the images. Clustering the visual features through a clustering algorithm to obtain clustered visual features, wherein the clustered visual features have corresponding descriptors. By clustering visual features, similar visual features can be grouped into a class to facilitate subsequent execution of matching of visual features. And finally, constructing the dictionary according to the clustered visual characteristics.
Optionally, in a possible implementation manner, the performing loop detection according to the first event image and the dictionary includes: determining a descriptor of the first event image; determining visual features corresponding to descriptors of the first event image in the dictionary; determining a bag-of-word vector corresponding to the first event image based on the visual features; and determining the similarity between the bag-of-words vector corresponding to the first event image and the bag-of-words vectors of other event images to determine the event image matched with the first event image.
In a twenty-sixth aspect, the present application provides a key frame selection method, including: acquiring an event image; determining first information of the event image, wherein the first information comprises events and/or features in the event image; and if the event image is determined to at least meet the first condition based on the first information, determining the event image as a key frame, wherein the first condition is related to the number of events and/or the number of features.
In the scheme, whether the current event image is a key frame is judged by determining the information such as the event number, the event distribution, the feature number and/or the feature distribution in the event image, so that the key frame can be quickly selected, the algorithm quantity is small, and the quick selection of the key frame in the scenes such as video analysis, video encoding and decoding or security monitoring can be met.
Optionally, in one possible implementation manner, the first condition includes: one or more of the number of events in the event image being greater than a first threshold, the number of event effective areas in the event image being greater than a second threshold, the number of features in the event image being greater than a third threshold, and the feature effective areas in the event image being greater than a fourth threshold.
Optionally, in one possible implementation manner, the method further includes: acquiring a depth image aligned with the event image time sequence; and if the event image is determined to at least meet the first condition based on the first information, determining that the event image and the depth image are key frames.
Optionally, in one possible implementation manner, the method further includes: acquiring RGB images aligned with the event image time sequence; acquiring the feature quantity and/or the feature effective area of the RGB image; and if the event image is determined to at least meet the first condition based on the first information, and the number of the features of the RGB image is larger than a fifth threshold value and/or the number of the feature effective areas of the RGB image is larger than a sixth threshold value, determining the event image and the RGB image as key frames.
Optionally, in one possible implementation manner, if it is determined that the event image at least meets a first condition based on the first information, determining that the event image is a key frame includes: determining second information of the event image if the event image at least meets the first condition based on the first information, wherein the second information comprises motion characteristics and/or pose characteristics in the event image; and if the event image is determined to meet at least a second condition based on the second information, determining that the event image is a key frame, wherein the second condition is related to the motion variation and/or the pose variation.
Optionally, in one possible implementation manner, the method further includes: determining a definition and/or brightness consistency index of the event image; and if the event image at least meets the second condition based on the second information, and the definition of the event image is larger than a definition threshold value and/or the brightness consistency index of the event image is larger than a preset index threshold value, determining the event image as a key frame.
Optionally, in one possible implementation manner, the determining a brightness consistency index of the event image includes: if the pixels in the event image represent the polarity of the light intensity change, calculating the absolute value of the difference value between the event number of the event image and the event number of the adjacent key frames, and dividing the absolute value by the pixel number of the event image to obtain a brightness consistency index of the event image; if the pixels in the event image represent light intensity, the event image and the adjacent key frames are subjected to pixel-by-pixel difference, the absolute value of the difference value is calculated, summation operation is carried out on the absolute value corresponding to each group of pixels, and the obtained summation result is divided by the number of pixels, so that the brightness consistency index of the event image is obtained.
Optionally, in one possible implementation manner, the method further includes: acquiring RGB images aligned with the event image time sequence; determining a definition and/or brightness consistency index of the RGB image; and if the event image at least meets a second condition based on the second information, and the definition of the RGB image is larger than a definition threshold value and/or the brightness consistency index of the RGB image is larger than a preset index threshold value, determining the event image and the RGB image as key frames.
Optionally, in one possible implementation, the second condition includes: one or more of the distance between the event image and the last key frame exceeds a preset distance value, the rotation angle between the event image and the last key frame exceeds a preset angle value, and the distance between the event image and the last key frame exceeds a preset distance value, and the rotation angle between the event image and the last key frame exceeds a preset angle value.
In a twenty-seventh aspect, the present application provides a pose estimation method, including: acquiring a first event image and a target image corresponding to the first event image, wherein the first event image and the environment information captured by the image are the same, and the target image comprises a depth image or an RGB image; determining a first motion region in the first event image; determining a corresponding second motion area in the image according to the first motion area; and estimating the pose according to the second motion area in the image.
In this scheme, a dynamic region in a scene is captured by an event image, and pose determination is performed based on the dynamic region, so that pose information can be determined in preparation.
Optionally, in one possible implementation manner, the determining the first motion area in the first event image includes: if the dynamic vision sensor DVS for acquiring the first event image is static, acquiring a pixel point with an event response in the first event image; and determining the first motion area according to the pixel points with the event responses.
Optionally, in one possible implementation manner, the determining the first motion area according to the pixel point with the event response includes: determining an outline formed by pixel points with event response in the first event image; and if the area surrounded by the outline is larger than a first threshold value, determining the area surrounded by the outline as a first movement area.
Optionally, in one possible implementation manner, the determining the first motion area in the first event image includes: if the DVS for acquiring the first event image is moving, acquiring a second event image, wherein the second event image is a previous frame event image of the first event image; calculating the displacement size and the displacement direction of the pixels in the first event image relative to the second event image; and if the displacement direction of the pixels in the first event image is different from the displacement direction of surrounding pixels, or the difference value between the displacement magnitude of the pixels in the first event image and the displacement magnitude of the surrounding pixels is larger than a second threshold value, determining that the pixels belong to a first motion area.
Optionally, in one possible implementation manner, the method further includes: determining a corresponding static region in the image according to the first motion region; and determining the pose according to the static area in the image.
In a twenty-eighth aspect, the present application further provides a data processing apparatus, where the data processing apparatus has a function of implementing a method according to any one of the twenty-fifth or twenty-fifth possible implementation manners, or where the data processing apparatus has a function of implementing a method according to any one of the twenty-sixth or twenty-sixth possible implementation manners, or where the data processing apparatus has a function of implementing a method according to any one of the twenty-seventh or twenty-seventh possible implementation manners. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
In a twenty-ninth aspect, embodiments of the present application provide an apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected by a wire, the processor invoking program code in the memory for performing the processing-related functions in the method as set forth in any of the first through twenty-seventh aspects above. Alternatively, the device may be a chip.
In a thirty-third aspect, the present application provides an electronic device comprising: the device comprises a display module, a processing module and a storage module.
The display module is used for displaying a graphical user interface of the application program stored in the storage module, and the graphical user interface can be any one of the graphical user interfaces.
In a thirty-first aspect, an embodiment of the present application provides an apparatus, which may also be referred to as a digital processing chip or chip, the chip including a processing unit and a communication interface, the processing unit obtaining program instructions through the communication interface, the program instructions being executed by the processing unit, the processing unit being configured to perform a processing-related function as in any of the optional embodiments of the first to twenty-seventh aspects described above.
In a thirty-second aspect, the present embodiments provide a computer readable storage medium comprising instructions that when run on a computer cause the computer to perform the method of any of the optional embodiments of the first to twenty-seventh aspects described above.
In a thirty-third aspect, the present embodiments provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the alternative embodiments of the first to twenty-seventh aspects described above.
Drawings
FIG. 1A is a schematic diagram of a system architecture provided herein;
fig. 1B is a schematic structural diagram of an electronic device provided in the present application;
FIG. 2 is a schematic diagram of another system architecture provided herein;
FIG. 3-a is a schematic diagram of the read data amount versus time for an asynchronous read mode based on event streams;
FIG. 3-b is a diagram of the read data amount versus time for a synchronous read mode based on frame scanning;
FIG. 4-a is a block diagram of a vision sensor provided herein;
FIG. 4-b is a block diagram of another vision sensor provided herein;
FIG. 5 is a schematic diagram of the principle of a synchronous read mode based on frame scanning and an asynchronous read mode based on event streams according to an embodiment of the present application;
FIG. 6-a is a schematic diagram of a vision sensor operating in a frame-scan based read mode in accordance with an embodiment of the present application;
FIG. 6-b is a schematic diagram of a vision sensor operating in an event stream based read mode, according to an embodiment of the present application;
FIG. 6-c is a schematic diagram of a vision sensor operating in an event stream based read mode, according to an embodiment of the present application;
FIG. 6-d is a schematic diagram of a vision sensor operating in a frame-scan based read mode, according to an embodiment of the present application;
FIG. 7 is a flowchart of a method for operating a vision sensor chip, according to a possible embodiment of the present application;
FIG. 8 is a block diagram of a control circuit provided herein;
FIG. 9 is a block diagram of an electronic device provided herein;
FIG. 10 is a diagram of a single data read mode and the data amount over time of an adaptively switched read mode according to a possible embodiment of the present application;
FIG. 11 is a schematic diagram of a pixel circuit provided herein;
FIG. 11-a is a schematic diagram showing an event by light intensity information and an event by polarity information;
FIG. 12-a is a schematic diagram showing a structure of a data format control unit in a read circuit according to the present application;
FIG. 12-b is a schematic diagram showing another configuration of the data format control unit in the read circuit of the present application;
FIG. 13 is a block diagram of another control circuit provided herein;
FIG. 14 is a block diagram of another control circuit provided herein;
FIG. 15 is a block diagram of another control circuit provided herein;
FIG. 16 is a block diagram of another control circuit provided herein;
FIG. 17 is a block diagram of another control circuit provided herein;
FIG. 18 is a distinguishing schematic diagram of a single event representation from an adaptive transition event representation provided in accordance with the present application;
FIG. 19 is a block diagram of another electronic device provided herein;
FIG. 20 is a flowchart of a method for operating a vision sensor chip, according to a possible embodiment of the present application;
FIG. 21 is a schematic diagram of another pixel circuit provided herein;
FIG. 22 is a schematic flow chart of an encoding method provided in the present application;
FIG. 23 is a block diagram of another vision sensor provided herein;
FIG. 24 is a schematic illustration of region division of a pixel array;
FIG. 25 is a block diagram of another control circuit provided herein;
FIG. 26 is a block diagram of another electronic device provided herein;
FIG. 27 is a schematic diagram of a binary data stream;
FIG. 28 is a flowchart of a method for operating a vision sensor chip, according to a possible embodiment of the present application;
FIG. 29-a is a block diagram of another vision sensor provided herein;
FIG. 29-b is a block diagram of another vision sensor provided herein;
FIG. 29-c is a block diagram of another vision sensor provided herein;
FIG. 30 is a schematic diagram of another pixel circuit provided herein;
FIG. 31 is a block diagram of a third encoding unit provided herein;
FIG. 32 is a schematic flow chart of another encoding method provided in the present application;
FIG. 33 is a block diagram of another electronic device provided herein;
FIG. 34 is a flowchart of a method for operating a vision sensor chip, according to a possible embodiment of the present application;
FIG. 35 is a schematic illustration of an event provided herein;
FIG. 36 is a schematic illustration of events at a certain time provided herein;
FIG. 37 is a schematic view of a movement region provided herein;
FIG. 38 is a flow chart of an image processing method provided in the present application;
FIG. 39 is a flowchart of another image processing method provided in the present application;
FIG. 40 is a flowchart of another image processing method provided in the present application;
FIG. 41 is a schematic illustration of an event image provided herein;
FIG. 42 is a flow chart of another image processing method provided in the present application;
FIG. 43 is a flowchart of another image processing method provided in the present application;
FIG. 44 is a flowchart of another image processing method provided in the present application;
FIG. 45 is a flow chart of a method of image processing provided herein;
FIG. 46A is a schematic illustration of another event image provided herein;
FIG. 46B is a schematic illustration of another event image provided herein;
FIG. 47A is a schematic illustration of another event image provided herein;
FIG. 47B is a schematic illustration of another event image provided herein;
FIG. 48 is a flow chart of another method of image processing provided herein;
FIG. 49 is a flow chart of another method of image processing provided herein;
FIG. 50 is a schematic illustration of another event image provided herein;
FIG. 51 is a schematic view of a reconstructed image provided herein;
FIG. 52 is a flowchart of an image processing method provided in the present application;
FIG. 53 is a schematic view of a method for fitting a motion profile provided herein;
FIG. 54 is a schematic diagram of one manner of determining a focal point provided herein;
FIG. 55 is a schematic diagram of one manner of determining a prediction center provided herein;
FIG. 56 is a flowchart of another image processing method provided in the present application;
fig. 57 is a schematic view of a shooting range provided in the present application;
FIG. 58 is a schematic view of a prediction region provided herein;
FIG. 59 is a schematic view of a focusing area provided herein;
FIG. 60 is a flowchart of another image processing method provided in the present application;
FIG. 61 is a schematic diagram of an image enhancement mode provided in the present application;
FIG. 62 is a flow chart of another image processing method provided in the present application;
FIG. 63 is a flowchart of another image processing method provided in the present application;
FIG. 64 is a schematic view of a scenario employed herein;
FIG. 65 is a schematic view of another scenario of the application of the present application;
FIG. 66 is a schematic view of a GUI provided herein;
FIG. 67 is a schematic view of another GUI provided herein;
FIG. 68 is a schematic diagram of another GUI provided herein;
FIG. 69A is a schematic view of another GUI provided herein;
FIG. 69B is a schematic view of another GUI provided herein;
FIG. 69C is a schematic view of another GUI provided herein;
FIG. 70 is a schematic view of another GUI provided herein;
FIG. 71 is a schematic view of another GUI provided herein;
FIG. 72A is a schematic diagram of another GUI provided herein;
FIG. 72B is a schematic diagram of another GUI provided herein;
FIG. 73 is a flow chart of another image processing method provided in the present application;
FIG. 74 is a schematic view of an RGB image with low jitter provided by the present application;
FIG. 75 is a schematic view of an RGB image with high dithering level provided in the present application;
FIG. 76 is a schematic view of an RGB image in a high light ratio scene provided herein;
FIG. 77 is a schematic illustration of another event image provided herein;
FIG. 78 is a schematic view of an RGB image provided herein;
FIG. 79 is a schematic view of another RGB image provided herein;
FIG. 80 is a schematic diagram of another GUI provided herein;
FIG. 81 is a schematic diagram of a relationship between a photosensitive unit and pixel values provided herein;
FIG. 82 is a flowchart of an image processing method provided in the present application;
FIG. 83 is a schematic diagram of an event stream provided herein;
FIG. 84 is a schematic diagram of a blurred image obtained after exposure and superposition of multiple shooting scenes provided in the present application;
FIG. 85 is a schematic illustration of a mask provided herein;
FIG. 86 is a schematic diagram of a build mask provided herein;
FIG. 87 is a diagram showing an effect of removing a moving object from an image I to obtain an image I';
FIG. 88 is a schematic flow chart of the process of removing a moving object from an image I to obtain an image I' according to the present application;
FIG. 89 is a schematic view of a moving object with small motion during photographing provided in the present application;
FIG. 90 is a schematic diagram of a trigger camera according to the present disclosure capturing a third RGB image;
Fig. 91 is an image B captured by a trigger camera based on motion abrupt change provided in the present application k And a schematic diagram of the image I obtained by active shooting of the user in a certain exposure time;
FIG. 92 is a schematic flow chart of a process for obtaining a second RGB image without moving object based on a frame of the first RGB image and the event stream E;
FIG. 93 is a schematic flow chart of obtaining a second RGB image without moving object based on a frame of the first RGB image, the third RGB image and the event stream E;
FIG. 94A is another GUI schematic diagram provided herein;
FIG. 94B is another GUI schematic diagram provided herein;
fig. 95 is a schematic diagram illustrating comparison between a conventional camera and a scene photographed by DVS according to the present application
Fig. 96 is a schematic diagram of a comparison between a conventional camera and a scene photographed by a DVS provided in the present application;
FIG. 97 is a schematic view of an outdoor navigation system using DVS provided herein;
FIG. 98a is a schematic view of a station navigation system using DVS provided in the present application;
FIG. 98b is a view point navigation schematic diagram using DVS provided in the present application;
FIG. 99 is a schematic diagram of a mall navigation system using DVS according to the present application;
FIG. 100 is a schematic flow chart of a SLAM execution provided in the present application;
Fig. 101 is a schematic flow chart of a pose estimation method 10100 provided in the present application;
FIG. 102 is a schematic illustration of a DVS event integration into an event image provided herein;
fig. 103 is a schematic flow chart of a key frame selection method 10300 provided in the present application;
FIG. 104 is a schematic view of region division of an event image provided herein;
fig. 105 is a flowchart of a key frame selection method 10500 provided in the present application;
fig. 106 is a schematic flow chart of a pose estimation method 1060 provided in the present application;
FIG. 107 is a schematic flow chart of performing pose estimation based on stationary region of image provided by the present application;
FIG. 108a is a schematic flow chart of performing pose estimation based on motion areas of an image provided by the present application;
FIG. 108b is a flowchart illustrating a pose estimation performed based on an overall region of an image provided herein;
FIG. 109 is a schematic view of an AR/VR glasses structure provided herein;
FIG. 110 is a schematic view of a gaze-aware structure provided herein;
FIG. 111 is a schematic diagram of a network architecture provided herein;
FIG. 112 is a schematic diagram of an image processing apparatus according to the present application;
Fig. 113 is a schematic structural view of another image processing apparatus provided in the present application;
FIG. 114 is a schematic diagram of another image processing apparatus provided in the present application;
FIG. 115 is a schematic view of another image processing apparatus provided herein;
FIG. 116 is a schematic diagram of another image processing apparatus provided in the present application;
FIG. 117 is a schematic diagram of another image processing apparatus provided herein;
FIG. 118 is a schematic view of another image processing apparatus according to the present application;
FIG. 119 is a schematic diagram of another configuration of a data processing apparatus provided herein;
FIG. 120 is a schematic diagram of another configuration of a data processing apparatus provided herein;
fig. 121 is a schematic diagram of another structure of the electronic device provided in the present application.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The electronic device, the system architecture, the method flow and the like provided by the application are described in detail from different angles.
1. Electronic equipment
The method provided by the application can be applied to various electronic devices, or the electronic devices can be used for executing the method provided by the application, and the electronic devices can be applied to shooting scenes, such as shooting scenes, security scenes, automatic driving scenes, unmanned aerial vehicle shooting scenes and the like.
The electronic device in the present application may include, but is not limited to: smart mobile phones, televisions, tablet computers, hand rings, head mounted display devices (Head Mount Display, HMD), augmented reality (augmented reality, AR) devices, mixed Reality (MR) devices, cellular phones (cellular phones), smart phones (smart phones), personal digital assistants (personal digital assistant, PDA), in-vehicle electronics, laptop computers (laptop computers), personal computers (personal computer, PC), monitoring devices, robots, in-vehicle terminals, autonomous vehicles, and the like. Of course, in the following embodiments, there is no limitation on the specific form of the electronic device.
Illustratively, the architecture of the electronic device application provided by the present application is shown in fig. 1A.
The electronic device, such as the car, the mobile phone, the AR/VR glasses, the security monitoring device, the camera or other intelligent home terminals described in fig. 1A, may access the cloud platform through a wired or wireless network, and a server is provided in the cloud platform, where the server may include a centralized server or a distributed server, and the electronic device may communicate with the server of the cloud platform through the wired or wireless network, so as to implement data transmission. For example, after the electronic device is collected, the electronic device may be saved or backed up on the cloud platform to prevent data loss.
The cloud platform can be accessed to the access point or the base station through the electronic equipment in a wireless or wired mode. For example, the access point may be a base station, and the electronic device is provided with a SIM card, through which network authentication of an operator is achieved, so as to access the wireless network. Alternatively, the access point may include a router, and the electronic device accesses the router through a 2.4GHz or 5GHz wireless network, thereby accessing the cloud platform through the router.
In addition, the electronic equipment can independently perform data processing, can also realize data processing through the collaborative cloud, and can be specifically adjusted according to actual application scenes. For example, a DVS may be disposed in the electronic device, where the DVS may work cooperatively with a camera or other sensors in the electronic device, may also work independently, and a processor disposed in the DVS or a processor disposed in the electronic device processes data collected by the DVS or other sensors, and may also cooperate with a cloud device to process data collected by the DVS or other sensors.
The specific structure of the electronic device is exemplarily described below.
By way of example, referring to fig. 1B, a specific structure of the electronic device provided in the present application is exemplified below.
It should be noted that, the electronic device provided in the present application may include more or fewer components than those shown in fig. 1B, and the electronic device shown in fig. 1B is merely an exemplary illustration, and those skilled in the art may add or subtract components from the electronic device according to the requirements, which is not limited in this application.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, an image sensor 180N, etc., wherein the image sensor 180N may be a photosensitive unit (may be referred to as a color sensor pixel, not shown in fig. 1B) including a separate color sensor 1801N and a separate motion sensor 1802N, or may be a photosensitive unit (may be referred to as a motion sensor pixel, not shown in fig. 1B) including a color sensor and a motion sensor.
It should be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.
The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 100.
The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.
PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.
The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.
The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.
The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present invention is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include, but are not limited to: fifth Generation mobile communication technology (5 th-Generation, 5G) systems, global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), bluetooth (blue), global navigation satellite system (the global navigation satellite system, GNSS), wireless fidelity (wireless fidelity, wiFi), near field wireless communication (near field communication, NFC), FM (which may also be referred to as frequency modulation broadcast), zigbee, radio frequency identification technology (radio frequency identification, RFID) and/or Infrared (IR) technology, and the like. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS), etc.
In some embodiments, the electronic device 100 may also include a wired communication module (not shown in fig. 1B), or the mobile communication module 150 or the wireless communication module 160 herein may be replaced with a wired communication module (not shown in fig. 1B) that may enable the electronic device to communicate with other devices via a wired network. The wired network may include, but is not limited to, one or more of the following: optical transport network (optical transport network, OTN), synchronous digital hierarchy (synchronous digital hierarchy, SDH), passive optical network (passive optical network, PON), ethernet (Ethernet), or flexible Ethernet (FlexE), etc.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.
The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a format of a standard RGB camera (or referred to as an RGB sensor) 0, yuv, or the like. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.
A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.
Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.
The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.
The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the electronic device 100 through the reverse motion, so as to realize anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.
The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude from barometric pressure values measured by barometric pressure sensor 180C, aiding in positioning and navigation.
The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.
The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.
A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.
The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outward through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it may be determined that there is an object in the vicinity of the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object in the vicinity of the electronic device 100. The electronic device 100 can detect that the user holds the electronic device 100 close to the ear by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.
The ambient light sensor 180L is used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is in a pocket to prevent false touches.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.
The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by temperature sensor 180J exceeds a threshold, electronic device 100 performs a reduction in the performance of a processor located in the vicinity of temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the electronic device 100 heats the battery 142 to avoid the low temperature causing the electronic device 100 to be abnormally shut down. In other embodiments, when the temperature is below a further threshold, the electronic device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.
The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.
The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the sound portion vibration bone block obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to implement a heart rate detection function.
The image sensor 180N, which is a device for converting an optical image into an electronic signal, is widely used in digital cameras and other electronic optical devices, and converts an optical image on a photosensitive surface into an electrical signal in a proportional relationship with the optical image by using a photoelectric conversion function of a photoelectric device. In contrast to light sensitive elements of "point" light sources such as photodiodes, phototriodes, etc., an image sensor is a functional device that divides the light image on its light-receiving surface into a number of small units (i.e., pixels) that are converted into usable electrical signals, where each small unit corresponds to a light-sensing unit within the image sensor, which may also be referred to as a sensor pixel. Image sensors are classified into photoconductive cameras and solid-state image sensors. Compared with a photoconductive camera tube, the solid-state image sensor has the characteristics of small volume, light weight, high integration level, high resolution, low power consumption, long service life, low price and the like. If the devices are different, the devices can be divided into two main types, namely a charge coupled device (charge coupled device, CCD) and a metal oxide semiconductor device (complementary metal-oxide semiconductor, CMOS); if the types of the photographed optical images are different, the types of the photographed optical images may be classified into two types, i.e., a color sensor 1801N and a motion sensor 1802N.
Specifically, the color sensor 1801N, including a conventional RGB image sensor, may be used to detect an object within a range captured by the camera, and each photosensitive unit corresponds to one image point in the image sensor. As to how to cover the color filters, different sensor manufacturers have different solutions, and the most common method is to cover the RGB red, green and blue filters, with 1:2:1 is formed by four pixels to form a color pixel (i.e. red and blue filters cover one pixel each, and the remaining two pixels cover the green filter), the reason for this ratio is that the human eye is more sensitive to green. After receiving illumination, the photosensitive unit generates corresponding current, the current corresponds to the light intensity, so that the electric signal directly output by the photosensitive unit is analog, the output analog electric signal is converted into a digital signal, and all the finally obtained digital signals are output to a special DSP processing chip for processing in a digital image matrix mode. The conventional color sensor outputs a full image of a photographing region in a frame format.
In particular, motion sensor 1802N may include a variety of different types of visual sensors, such as may include frame-based motion detection visual sensors (motion detection vision sensor, MDVS) and event-based motion detection visual sensors. The method can be used for detecting the moving object in the range shot by the camera and collecting the moving contour or the moving track of the moving object.
In one possible scenario, motion sensor 1802N may include a Motion Detection (MD) vision sensor, which is a type of vision sensor that detects Motion information that originates from relative Motion between a camera and a target, either camera Motion or target Motion, or both. Motion detection visual sensors include frame-based motion detection and event-based motion detection. Frame-based motion detection vision sensors require exposure integration, and motion information is obtained from the frame differences. Event-based motion detection visual sensors do not require integration and obtain motion information through asynchronous event detection.
In one possible scenario, the motion sensor 1802N may include a detection vision sensor (motion detection vision sensor, MDVS), a dynamic vision sensor (dynamic vision sensor, DVS), an active sensor (active pixel sensor, APS), an infrared sensor, a laser sensor, or an inertial measurement unit (Inertial Measurement Unit, IMU), or the like. The DVS may include, in particular, a DAVIS (Dynamic and Active-pixel Vision Sensor), ATIS (Asynchronous Time-based Image Sensor), or CeleX sensor, among others. DVS uses the biological vision characteristics to simulate a neuron per pixel, and responds independently to the relative change in illumination intensity (hereinafter referred to as "light intensity"). For example, if the motion sensor is a DVS, the pixel outputs an event signal including the position of the pixel, a time stamp, and characteristic information of the light intensity when the relative change of the light intensity exceeds a threshold. It should be understood that in the following embodiments of the present application, the mentioned motion information, dynamic data, dynamic images, etc. may be acquired by a motion sensor.
For example, the motion sensor 1802N may include an inertial measurement unit (Inertial Measurement Unit, IMU) that is a device that measures the angular velocity and acceleration of an object. The IMU is typically composed of three single-axis accelerometers and three single-axis gyroscopes, which measure the acceleration signal and the angular velocity signal of the object relative to the navigational coordinate system, respectively, and thus calculate the pose of the object. For example, the IMU described above may specifically include the gyro sensor 180B and the acceleration sensor 180E described above. The IMU has the advantage of high acquisition frequency. The data acquisition frequency of the IMU can reach more than 100HZ generally, and the consumption-level IMU can capture data up to 1600 HZ. The IMU can give high accuracy measurements in a short time.
For example, the motion sensor 1802N may include an active sensor (active pixel sensor, APS). For example, RGB images are captured at a high frequency of >100HZ and subtraction is performed on two adjacent frames of images to obtain a varying value. If the change value is greater than the threshold value, if >0 is set to be 1, and if the change value is not greater than the threshold value, if=0 is set to be 0, the finally obtained data is similar to the data obtained by DVS, and the capturing of the image of the moving object is completed.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.
2. System architecture
In the process of realizing image shooting, reading or saving, the electronic equipment relates to the change among a plurality of components, and the application is described in detail below on scenes and the like from data acquisition, data encoding and decoding, image enhancement, image reconstruction or application.
By way of example, taking a scene of capturing and processing an image as shown in fig. 2, a processing flow of the electronic device is exemplarily described.
And (3) data acquisition: the data may be acquired by a brain-like camera, an RGB camera, or a combination thereof. The brain-like camera may include a lay-down vision sensor that simulates a biological retina with an integrated circuit, each pixel simulating a biological neuron, and the change in light intensity being expressed in the form of an event. Through development, various different types of bionic visual sensors are developed, and the common characteristics of the bionic visual sensors are that a pixel array independently and asynchronously monitors light intensity changes, and the changes are output as event signals, such as the motion sensors DVS or DAVIS and the like. The RGB camera converts the analog signal into a digital signal and stores the digital signal in a storage medium. The data may also be collected by combining the brain-like camera and the RGB camera, for example, the data collected by the brain-like camera and the RGB may be projected into the same canvas, the value of each pixel may be determined based on the value fed back by the brain-like camera and/or the RGB camera, or the value of each pixel may include the values of the brain-like camera and the RGB as independent channels, respectively. The optical signals may be converted into electrical signals by a brain-like camera, an RGB camera, or a combination thereof, resulting in a data stream in frames or an event stream in events. The image acquired by the RGB camera is hereinafter referred to as an RGB image, and the image acquired by the brain-like camera is referred to as an event image.
Data encoding and decoding: including data encoding and data decoding. The data encoding may include encoding the acquired data after the data acquisition and saving the encoded data to a storage medium. Data decoding may include reading data from a storage medium and decoding the data to decode the data into data that is available for subsequent identification, detection, etc. In addition, the data acquisition mode can be adjusted according to the data encoding and decoding mode, so that more efficient data acquisition and data encoding and decoding are realized. The encoding and decoding of data can be classified into various types, including brain-like camera-based encoding and decoding, and RGB camera-based encoding and decoding, etc. Specifically, in the encoding process, the data collected by the brain-like camera, the RGB camera or the combination thereof may be encoded so as to be stored in a storage medium according to a certain format, and in the decoding process, the data stored in the storage medium may be decoded into data available for subsequent use. For example, a user may collect video or image data via a brain-like camera, an RGB camera, or a combination thereof on the first day, and encode the video or image data for storage in a storage medium. The next day the data may be read from the storage medium and decoded to obtain a playable video or image.
Image optimization: the method comprises the steps of reading the collected images after the images are collected by the brain-like camera or the RGB camera, and then carrying out optimization processing such as enhancement or reconstruction on the collected images so as to facilitate subsequent processing based on the optimized images. Image enhancement and reconstruction may include image reconstruction or motion compensation, for example. Motion compensation, for example, compensates for moving objects in the event image or the RGB image through motion parameters of the moving objects acquired by the DVS, so that the resulting event image or RGB image is clearer. Image reconstruction, such as reconstructing an RGB image from images acquired by a brain-like vision camera, allows for a clear RGB image from data acquired by the DVS even in a moving scene.
Application scene: after the optimized RGB image or the event image is obtained through image optimization, the optimized RGB image or the event image can be used for further application, and of course, the collected RGB image or the event image can also be further applied, and specifically, the image can be adjusted according to actual application scenes.
Specifically, the application scenario may include: motion photography enhancement, DVS image and RGB image fusion, detection recognition, synchronous localization and mapping (simultaneous localization and mapping, SLAM), eye tracking, key frame selection, or pose estimation, etc. For example, in a scene where a subject having a motion is photographed, enhancement processing is performed on the photographed image, so that a clearer moving subject is photographed. The DVS image and the RGB image are fused, namely, the RGB image is enhanced through the object in motion acquired by the DVS, and the object in motion or the object influenced by the large light ratio in the RGB image is compensated, so that a clearer RGB image is obtained. Detection recognition is to perform target detection or target recognition based on an RGB image or an event image. Eye movement tracking is to track the eye movement of a user according to the acquired RGB image, event image or the optimized RGB image, event image and the like, and determine information such as the gaze point, the gaze direction and the like of the user. The key frame selection is to combine the information collected by the brain-like camera, and select some frames from the video data collected by the RGB camera as key frames.
Furthermore, in the following embodiments of the present application, the sensors that may need to be activated are different in different embodiments. For example, when data is acquired, the motion sensor may be activated when the event image is optimized by motion compensation, optionally an IMU or a gyroscope or the like may also be activated, in an embodiment of image reconstruction the motion sensor may be activated to acquire the event image and then optimized in connection with the event image, or in an embodiment of motion photography enhancement the motion sensor and RGB sensor or the like may be activated, whereby in a different embodiment the activation of the respective sensor may be selected.
Specifically, the method provided by the application can be applied to an electronic device, the electronic device can comprise an RGB sensor, a motion sensor and the like, the RGB sensor is used for acquiring an image in a shooting range, the motion sensor is used for acquiring information generated when an object moves relative to the motion sensor in a detection range of the motion sensor, and the method comprises the following steps: at least one of the RGB sensor and the motion sensor is selected based on scene information, and data is collected by the selected sensor, the scene information including at least one of status information of the electronic device, a type of an application program requesting collection of an image in the electronic device, or environmental information.
In one possible implementation, the foregoing status information includes information such as a remaining power of the electronic device, a remaining memory (or an available memory), or a CPU load.
In one possible embodiment, the aforementioned environmental information may include a variation value of the illumination intensity within the photographing range of the color RGB sensor and the motion sensor or information of the moving object within the photographing range. For example, the environmental information may include a change in illumination intensity within a photographing range of the RGB sensor or the DVS sensor, or a movement condition of an object within the photographing range, such as information of a movement speed, a movement direction, etc. of the object, or an abnormal movement condition of the object within the photographing range, such as a speed abrupt change, a direction abrupt change, etc. of the object.
The type of the application program in the electronic device for requesting to collect the image may be understood as that the electronic device carries a system such as android, linux or hong Monte, and the application program may be run in the system, and the program running in the system may be classified into various types, such as an application program of photographing type or an application program of object detection.
Typically, a motion sensor is sensitive to motion changes and insensitive to static scenes, responds to motion changes by issuing events, and since static areas rarely issue events, its data only expresses the light intensity information of the motion change areas and is not complete full-field Jing Guangjiang information. RGB color cameras are good at performing complete color recording on natural scenes and reproducing the texture details in the scenes.
Taking the foregoing electronic device as an example of a mobile phone, the default configuration is that the DVS camera (i.e., the DVS sensor) is turned off. When the camera is used, according to the type of the application which is called currently, such as a photographing APP calls the camera, if the camera is in a high-speed motion state, a DVS camera and an RGB camera (namely an RGB sensor) are required to be started at the same time; if the APP requesting to call the camera is the APP for object detection or motion detection, object photographing and face recognition are not needed, the DVS camera can be selected to be started, and the RGB camera is not started.
Optionally, the camera start mode may be selected according to the current device status, for example, when the current power is lower than a certain threshold, the latter user starts the power saving mode, and cannot take a picture normally, and may only start the DVS camera, because the DVS camera is not clear in photo imaging, but low in power consumption, and when detecting a moving object, high-definition imaging is not required.
Alternatively, the device may sense the surrounding environment to decide whether to switch camera modes, such as night scenes, or when the current device is moving at high speed, the DVS camera may be turned on. If the scene is static, the DVS camera may not be turned on.
And by integrating the application types, the environment information and the equipment state, determining the camera power starting mode, and in the running process, deciding whether to trigger the switching of the camera modes, so that different sensors are started in different scenes, and the adaptability is high.
It can be understood that the start mode includes 3 modes, only the RGB camera is turned on, only the DVS camera is turned on, and the RGB and DVS cameras are turned on simultaneously. Also, the reference factors for detecting the application type and environment detection may be different for different products.
For example, a camera in a security scene has a motion detection (motion detection) function, and the camera stores video only when a moving object is detected, so that the storage space is reduced, and the storage time of a hard disk is prolonged. Specifically, when the DVS and the RGB camera are applied to a home or security camera, the DVS camera is only turned on by default to perform motion detection and analysis, and when the DVS camera detects abnormal motion and abnormal behavior (such as sudden motion of an object, sudden change of motion direction, etc.), for example, a person approaches or has significant light intensity change, the RGB camera is started to perform shooting, and a full scene texture image of the period is recorded as a monitoring certificate. When the abnormal movement is finished, the operation is switched to DVS operation, the standby working state of the RGB camera is realized, and the data volume and the power consumption of the monitoring equipment are remarkably saved.
The intermittent shooting method utilizes the advantage of low power consumption of the DVS, and meanwhile, the DVS is faster in motion detection based on events, and has quicker response and more accurate detection compared with motion detection based on images. The all-weather uninterrupted detection is realized. The method is more accurate, low in power consumption and capable of saving storage space.
For another example, when the DVS and the RGB camera are applied to vehicle-mounted assistance/automatic driving, during driving, when a situation that a oncoming vehicle turns on a high beam, or is directly incident to the sun, or enters and exits a tunnel is encountered, the RGB camera may not be able to capture effective scene information, and at this time, the DVS may not be able to obtain texture information, but may obtain rough contour information in the scene, which has great assistance value for the judgment of the driver. In addition, under the condition of heavy fog weather, the profile information shot by the DVS can also assist in judging road conditions. Therefore, the DVS and the RGB camera can be triggered to switch the master-slave working state under a specific scene, such as when the light intensity is changed drastically, or in extreme weather.
The above procedure is also true when the DVS camera is applied to AR/VR glasses. When DVS is used for SLAM or eye tracking, the start mode of the camera may be decided according to the device status and the surrounding environment.
In the following embodiments of the present application, when data acquired by a certain sensor is used, the sensor is turned on, and details thereof are not repeated.
The various embodiments provided herein are described below in connection with the various modes of operation described above and with reference to fig. 2 described above.
3. Process flow of the method
The foregoing exemplary description of the electronic device and system architecture provided in the present application, and the following detailed description of the method provided in the present application is provided with reference to fig. 1A-2. Specifically, the foregoing method corresponding to each module is described separately in connection with the foregoing architecture of fig. 2, and it should be understood that the method steps mentioned below in this application may be implemented separately or may be implemented in combination in one device, and may be specifically adjusted according to an actual application scenario.
1. Data acquisition and encoding and decoding
The following is an exemplary description of the process of data acquisition and data encoding and decoding.
In the conventional art, an asynchronous read mode based on an event stream (hereinafter also referred to simply as an "event stream-based read mode" or an "asynchronous read mode") and a synchronous read mode based on a frame scan (hereinafter also referred to simply as a "frame scan-based read mode" or a "synchronous read mode") are commonly employed by a vision sensor (i.e., the aforementioned motion sensor). For a visual sensor that has been manufactured, only one of two modes can be employed. According to specific application scenes and motion states, the signal data amounts required to be read in unit time in the two reading modes may have significant differences, and further the cost required to output the read data is different. Fig. 3-a and 3-b show schematic diagrams of the read data amount versus time in an asynchronous read mode based on an event stream and a synchronous read mode based on a frame scan, respectively.
In one aspect, biomimetic visual sensors, because of their motion-sensitive nature, typically do not generate light intensity change events (also referred to herein as "events") in static areas of the environment, and such sensors employ almost entirely asynchronous read modes based on event streams, which means that the events are arranged in a certain order. An asynchronous read mode will be exemplarily described below using DVS as an example. According to the sampling principle of the DVS, by comparing the current light intensity with the light intensity at the time of the last event occurrence, an event is generated and outputted when the variation amount thereof reaches a predetermined dispensing threshold C (hereinafter, simply referred to as a predetermined threshold). That is, in general, when the difference between the current light intensity and the light intensity at the time of the last event generation exceeds a predetermined threshold C, the DVS will generate an event, which can be described by equation 1-1:
|L-L′|≥C (1-1)
where L and L' represent the light intensity at the current time and the light intensity at the last event occurrence, respectively.
Wherein for asynchronous read mode each event can be expressed as < x, y, t, m >, (x, y) represents the pixel position where the event was generated, t represents the time when the event was generated, and m represents the characteristic information of the light intensity. Specifically, pixels in the pixel array circuit of the vision sensor measure the amount of light intensity variation in the environment. The pixel may output a data signal indicating an event if the measured amount of change in light intensity exceeds a predetermined threshold. Thus, in an asynchronous read mode based on an event stream, the pixels of the vision sensor are further divided into pixels generating light intensity variation events and pixels not generating light intensity variation events. The light intensity variation event may be characterized by the coordinate information (x, y) of the pixel generating the event, the characteristic information of the light intensity at the pixel, the time t at which the characteristic information of the light intensity is read, and the like. The coordinate information (x, y) may be used to uniquely identify a pixel in the pixel array circuit, e.g., x represents a row index where the pixel is located in the pixel array circuit and y represents a column index where the pixel is located in the pixel array circuit. By identifying coordinates and time stamps associated with the pixels, the spatio-temporal location of the occurrence of the light intensity variation event can be uniquely determined, and then all events can be formed into an event stream according to the order of occurrence.
In some DVS sensors (such as DAVIS sensor, ATIS sensor, etc.), m represents the trend of light intensity change, which may also be called polarity information, and is usually represented by 1-2 bits, where the value may be ON/OFF, where ON represents the increase of light intensity, and OFF represents the decrease of light intensity, i.e. when the light intensity increases and exceeds a predetermined threshold, an ON pulse is generated; when the light intensity decreases and exceeds a predetermined threshold, an OFF pulse is generated (this application indicates an increase in light intensity with "+1" and an decrease in light intensity with "—1"). In some DVS sensors, such as the CeleX sensor, where m represents absolute light intensity information, also referred to as light intensity information, in a scene where the moving object is monitored, the light intensity information is typically represented by a plurality of bits, such as 8 bits to 12 bits.
In the asynchronous read mode, only the data signal at the pixel that generated the light intensity variation event is read. Thus, for a biomimetic visual sensor, the event data that needs to be read has sparse asynchronous characteristics. As shown in curve 101 of fig. 3-a, the visual sensor operates in an asynchronous read mode based on an event stream, and the amount of data that the visual sensor needs to read changes over time as the rate of light intensity change events occurring in the pixel array circuit change.
On the other hand, conventional vision sensors, such as cell phone cameras, digital cameras, etc., typically employ a synchronous read mode based on frame scanning. The read mode does not distinguish whether a light intensity change event is generated at a pixel of the visual sensor. The data signal generated by a pixel is read whether or not a light intensity variation event is generated at that pixel. When reading the data signals, the vision sensor scans the pixel array circuit in a predetermined order, synchronously reads the characteristic information m indicating the light intensity at each pixel (the characteristic information m of the light intensity has been described above, and is not repeated here), and sequentially outputs as 1 st frame data, 2 nd frame data, and so on. Thus, as shown in curve 102 of fig. 3-b, in the synchronous read mode, each frame data amount read by the vision sensor has the same size, and the data amount remains unchanged over time. For example, assuming that 8 bits are used to represent the light intensity value of one pixel, and the total number of pixels in the visual sensor is 66, the data amount of one frame of data is 528 bits. Typically, the frames are output at equal time intervals, for example, 30 frames per second, 60 frames per second, 120 frames per second, etc.
Applicant has found that the present visual sensor still has drawbacks, including at least the following:
first, a single read mode cannot accommodate all scenarios, which is detrimental to relief of pressure on data transfer and storage.
As shown in curve 101 of fig. 3-a, the visual sensor operates in an asynchronous read mode based on an event stream, and the amount of data that the visual sensor needs to read changes over time as the rate of light intensity change events occurring in the pixel array circuit change. There are fewer light intensity variation events generated in the static scene and thus the total data volume that the vision sensor needs to read is also lower. In dynamic scenes, for example, when the user moves severely, a large amount of light intensity change events are generated, the total data amount required to be read by the vision sensor also rises, and in some scenes, a large amount of light intensity change events are generated, so that the total data amount exceeds the bandwidth limit, and the situation that the event is lost or the reading is delayed may occur. As shown by curve 102 of fig. 3-b, the vision sensor operates in a frame-based synchronous readout mode, requiring that the state or intensity value of a pixel be represented in one frame, whether or not the pixel is changed. This representation is costly when only a small number of pixels change.
Under different application scenarios and motion states, the output and storage costs of adopting the two modes may have significant differences. For example, when shooting a static scene, only a small number of pixels produce light intensity change events over a period of time. By way of example, for example, in one scan, light intensity change events are generated at only three pixels in a pixel array circuit. In the asynchronous reading mode, only the coordinate information (x, y), the time information t and the light intensity variation of the three pixels are required to be read, so that three light intensity variation events can be represented. Assuming that 4, 2 bits are allocated for the coordinates of one pixel, the reading time stamp, and the light intensity variation, respectively, in the asynchronous reading mode, the total data amount required to be read in the reading mode is 30 bits. In the synchronous read mode, the data signals output by all the pixels of the entire array are read to form a complete frame of data, although only three pixels generate valid data signals indicating the light intensity change event. Assuming that 8 bits are allocated for each pixel in the synchronous read mode, the total number of pixels of the pixel array circuit is 66, and the total number of required reads is 528 bits. It can be seen that even though there are a large number of pixels in the pixel array circuit that do not generate events, so many bits need to be allocated in the synchronous read mode. Is uneconomical from the viewpoint of representing costs and increases the pressure on data transmission and storage. Thus, in this case, it is more economical to employ an asynchronous read mode.
In another example, when a strong motion occurs within the scene or a strong change in light intensity in the environment occurs, such as a large number of people walking, or suddenly switching on and off a light, a large number of pixels in the visual sensor measure the light intensity change in a short time and generate a data signal indicative of the light intensity change event. Since the amount of data characterizing a single event in the asynchronous read mode is greater than the amount of data characterizing a single event in the synchronous read mode, employing the asynchronous read mode in this case may require a significant representation cost. Specifically, there may be a plurality of pixels in succession in each row of the pixel array circuit that generate light intensity variation events, for each of which coordinate information (x, y), time information t, and characteristic information m of light intensity are transmitted. The coordinate change between these events often has only one unit of deviation, as does the read time. In this case, the asynchronous read mode is costly in terms of representation of the coordinate and time information, which can cause a surge in the amount of data. In the synchronous read mode, however, each pixel outputs a data signal indicating only the amount of light intensity variation, regardless of the number of light intensity variation events generated in the pixel array circuit at any one time, without allocating bits for the coordinate information and the time information of each pixel. Thus, for cases where the events are dense, it is more economical to employ a synchronous read mode.
Secondly, a single event representation mode cannot be suitable for all scenes, the light intensity information is adopted to represent the event, so that the stress on data transmission and storage is not easy to relieve, and the polarity information is adopted to represent the event to influence the processing and analysis of the event.
The synchronous reading mode and the asynchronous reading mode are introduced above, and the read events are represented by the characteristic information m of the light intensity, wherein the characteristic information of the light intensity comprises polarity information and light intensity information. The event represented by the polarity information is referred to herein as a polarity format event, and the event represented by the light intensity information is referred to herein as a light intensity format event. For a visual sensor that has been manufactured, only one of two event formats can be used, namely using polarity information to represent the event, or using light intensity information to represent the event. The following describes the advantages and disadvantages of the event in the polarity format and the event in the light intensity format by taking an asynchronous read mode as an example.
In the asynchronous reading mode, when the polarity information is adopted to represent the event, the polarity information p is usually represented by 1 bit to 2 bits, the carried information is less, and only the change trend of the light intensity can be represented as increasing or weakening, so that the polarity information is adopted to represent the event to influence the processing and analysis of the event, for example, the polarity information is adopted to represent the event, the difficulty is high during image reconstruction, and the accuracy for object identification is poor. When the light intensity information is used for representing the event, the event is usually represented by multiple bits, for example, the event represented by the light intensity information is represented by 8-12 bits, and compared with the polarity information, the light intensity information can carry more information, is favorable for processing and analyzing the event, for example, the quality of image reconstruction can be improved, but the time required for obtaining the event represented by the light intensity information is longer due to the larger data quantity. According to the DVS sampling principle, an event will occur when the light intensity variation of a pixel exceeds a predetermined threshold, and then when a large-area object motion or light intensity fluctuation occurs in a scene (for example, a scene of entering and exiting a tunnel portal, a room switch lamp, etc.), the vision sensor will face the problem of event sudden increase, and under the condition that the preset maximum bandwidth (hereinafter referred to as bandwidth) of the vision sensor is fixed, there is a case that event data cannot be read out. At present, a random discarding mode is generally adopted for processing. If random discarding is adopted, although the transmission data amount can be ensured not to exceed the bandwidth, the data is lost, and under certain special application scenarios (such as automatic driving, etc.), the random discarded data may have higher importance. In other words, when the event is largely triggered, the data amount exceeds the bandwidth, and the data in the light intensity format cannot be completely output to the outside of the DVS, resulting in loss of a part of the event. These lost events may be detrimental to the processing and analysis of the event, such as a streak, incomplete profile during the intensity reconstruction process.
In order to solve the above-mentioned problems, the embodiment of the present application provides a vision sensor, which compares the data amounts in two reading modes based on the statistics result of the light intensity variation event generated by the pixel array circuit, so as to switch to the reading mode suitable for the current application scene and the motion state. In addition, based on the statistical result of the light intensity change event generated by the pixel array circuit, the relation between the data quantity of the event represented by the light intensity information and the bandwidth is compared, so that the representing precision of the event is adjusted, the event is transmitted in a proper representation mode on the premise of meeting the bandwidth limit, and all the events are transmitted in a larger representation precision as much as possible.
A vision sensor provided in an embodiment of the present application is described below.
Fig. 4-a shows a block diagram of a vision sensor provided herein. The vision sensor may be implemented as a vision sensor chip and is capable of reading a data signal indicative of an event in at least one of a frame scan based read mode and an event stream based read mode. As shown in fig. 4-a, the vision sensor 200 includes a pixel array circuit 210 and a read circuit 220. The vision sensor is coupled to the control circuit 230. It should be understood that the visual sensor shown in fig. 4-a is for exemplary purposes only and does not imply any limitation to the scope of the present application. Embodiments of the present application may also be embodied in different sensor architectures. In addition, it should also be understood that the vision sensor may also include other elements or entities for purposes of image acquisition, image processing, image transmission, etc., which are not shown for ease of description, but are not meant to be provided by embodiments of the present application.
The pixel array circuit 210 may include one or more pixel arrays, and each pixel array includes a plurality of pixels, each pixel having position information for unique identification, such as coordinates (x, y). The pixel array circuit 210 may be configured to measure the amount of change in light intensity and generate a plurality of data signals corresponding to a plurality of pixels. In some possible embodiments, each pixel is configured to respond independently to changes in light intensity in the environment. In some possible embodiments, the pixel compares the measured amount of change in light intensity with a predetermined threshold value, and if the measured amount of change in light intensity exceeds the predetermined threshold value, the pixel generates a first data signal indicative of a light intensity change event, e.g. the first data signal comprises polarity information, such as +1 or-1, or the first data signal may also be absolute light intensity information. In this example, the first data signal may indicate a trend of light intensity variation or an absolute light intensity value at the corresponding pixel. In some possible embodiments, the pixel generates a second data signal, e.g. 0, different from the first data signal if the measured amount of change in light intensity does not exceed the predetermined threshold. In embodiments of the present application, the data signal may indicate a polarity of the light intensity, an absolute light intensity value, a variation value of the light intensity, and the like. The intensity polarity may indicate a trend in the intensity change, e.g., an increase or decrease, generally indicated by +1 and-1. The absolute light intensity value may represent the light intensity value measured at the current time. Depending on the structure, use and kind of the sensor, the light intensity or the amount of change in the light intensity may have different physical meanings. The scope of the present application is not limited in this respect.
The read circuit 220 is coupled to and can communicate with the pixel array circuit 210 and the control circuit 230. The reading circuit 220 is configured to read the data signal output from the pixel array circuit 210, and it can be understood that the reading circuit 220 reads the data signal output from the pixel array 210 into the control circuit 230. The control circuit 230 is configured to control a mode in which the data signal is read by the reading circuit 220, and the control circuit 230 may also be configured to control a representation manner of the output data signal, in other words, control a representation accuracy of the data signal, for example, the control circuit may control the vision sensor to output an event represented by polarity information, an event represented by light intensity information, an event represented by a certain fixed number of bits, or the like, which will be described later in connection with a specific embodiment.
According to a possible embodiment of the present application, the control circuit 230 may be connected to the vision sensor 200 through a bus interface as a separate circuit or chip external to the vision sensor 200 as shown in fig. 4-a. In other possible embodiments, the control circuit 230 may also be integrated with the pixel array circuit and the readout circuit therein as a circuit or chip internal to the vision sensor. Fig. 4-b shows a block diagram of another vision sensor 300 according to a possible embodiment of the present application. The vision sensor 300 may be implemented as an example of the vision sensor 200. As shown in fig. 4-b, another block diagram of a vision sensor is provided herein. The vision sensor includes a pixel array circuit 310, a read circuit 320, and a control circuit 330. The pixel array circuit 310, the read circuit 320 and the control circuit 330 are functionally identical to the pixel array circuit 210, the read circuit 220 and the control circuit 230 shown in fig. 4-a, and thus are not described in detail herein. It should be understood that the visual sensor is for exemplary purposes only and does not imply any limitation on the scope of the present application. Embodiments of the present application may also be embodied in different visual sensors. In addition, it should also be understood that the visual sensor may also include other elements, modules, or entities, which are not shown for clarity purposes, but are not meant to be provided by embodiments of the present application.
Based on the architecture of the vision sensor, the vision sensor provided in the present application is described in detail below.
The reading circuit 220 may be configured to scan the pixels in the pixel array circuit 210 in a predetermined order to read the data signals generated by the corresponding pixels. In an embodiment of the present application, the read circuit 220 is configured to be able to read the data signals output by the pixel array circuit 210 in more than one signal read mode. For example, the reading circuit 220 may read in one of the first reading mode and the second reading mode. In the present context, the first reading mode and the second reading mode correspond to one of a frame scan based reading mode and an event stream based reading mode, respectively, further, the first reading mode may refer to a current reading mode of the reading circuit 220, and the second reading mode may refer to a switchable alternative reading mode.
Referring to fig. 5, a schematic diagram illustrating the principle of a synchronous read mode based on frame scanning and an asynchronous read mode based on event streams according to an embodiment of the present application is shown. As shown in the upper half of fig. 5, the black dots represent pixels generating the light intensity variation event, and the white dots represent pixels not generating the light intensity variation event. The left-hand dashed box represents a synchronous read mode based on frame scanning in which all pixels generate voltage signals based on received light signals and then output data signals after analog-to-digital conversion, and in which the read circuit 220 constructs one frame of data by reading the data signals generated by all pixels. The right dashed box represents an asynchronous read mode based on an event stream in which when the read circuit 220 scans a pixel that generates a light intensity variation event, coordinate information (x, y) of the pixel can be acquired. Then, only the data signal generated by the pixel generating the light intensity variation event is read, and the read time t is recorded. In the case where a plurality of pixels generating an event of a light intensity change exist in the pixel array circuit, the reading circuit 220 sequentially reads data signals generated by the plurality of pixels in the scanning order, and constructs an event stream as an output.
The lower half of fig. 5 depicts two read modes from the perspective of representing cost (e.g., the amount of data required to be read). As shown in fig. 5, in the synchronous read mode, the amount of data read by the read circuit 220 each time is the same, for example, 1 frame of data. Shown in fig. 5 as frame 1 data 401-1 and frame 2 data 401-2. According to the amount of data representing a single pixel (e.g. number of bits B p ) And the total number M of pixels in the pixel array circuit can determine that one frame of data to be read is M.B p . In the asynchronous read mode, the read circuit 220 reads the data signal indicating the light intensity variation event, and then constructs all events into an event stream 402 in order of occurrence. In this case, the data amount read by the reading circuit 220 each time and the event data amount B for representing a single event ev (e.g., the sum of the coordinates (x, y) representing the pixel generating the event, the reading time stamp t, and the number of bits of the characteristic information of the light intensity) and the number N of light intensity variation events ev Related to the following.
In some implementations, the read circuit 220 may be configured to provide the read at least one data signal to the control circuit 230. For example, the read circuit 220 may provide the data signals read over a period of time to the control circuit 230 for historical data statistics and analysis by the control circuit 230.
In some possible embodiments, where the currently employed first read mode is an event stream based read mode, the read circuit 220 reads data signals generated by pixels in the pixel array circuit 210 that generate light intensity variation events, which are also referred to as first data signals below for convenience of description. Specifically, the reading circuit 220 determines the positional information (x, y) of the pixel related to the light intensity variation event by scanning the pixel array circuit 210. Based on the positional information (x, y) of the pixel, the reading circuit 220 reads a first data signal generated by the pixel among the plurality of data signals to acquire characteristic information of the light intensity indicated by the first data signal and read time information t.By way of example, in the event stream based read mode, the amount of event data read per second by the read circuit 220 may be represented as B ev ·N ev The read data rate of the individual bits, read circuit 220, is B ev ·N ev Bits per second (bps), where B ev Event data amount (e.g. number of bits) allocated for each light intensity variation event in a stream-based reading mode, wherein the first b x And b y A number of bits are used to represent the pixel coordinates (x, y), followed by b t A number of bits representing the time stamp t, and finally b, at which the data signal was read f Bits are used to represent characteristic information of the intensity indicated by the data signal, i.e. B ev =b x +b y +b t +b f ,N ev The average number of events per second that are generated for the readout circuit 220 based on historical statistics of the number of light intensity variation events generated in the pixel array circuit 210 over a period of time. Because of the read mode based on frame scanning, the amount of data per frame read by the read circuit 220 can be expressed as M.B p The data quantity read per second is M.B p F bits, i.e. the read data rate of the read circuit 220 is M B p F bps, where the total number of pixels in a given visual sensor 200 is M, B p To match the amount of pixel data (e.g., the number of bits) allocated for each pixel in the frame-scan-based read mode, f is a predetermined frame rate of the read circuit 220 in the frame-scan-based read mode, i.e., the read circuit 220 scans the pixel array circuit 210 at a predetermined frame rate fHz in that mode to read the data signals generated by all pixels in the pixel array circuit 210. Therefore, M, B p And f are known amounts, the read data rate of the read circuit 220 in the read mode based on frame scanning can be directly obtained.
In some possible embodiments, where the currently employed first read mode is a frame scan based read mode, the read circuit 220 electrically references the pixel array over a period of timeThe historical statistics of the number of light intensity variation events generated in the road 210 can yield the average number of events N generated per second ev N acquired according to the read mode of frame scanning ev It can be calculated that in the event stream based read mode, the amount of event data read per second by the read circuit 220 is B ev ·N ev In a single bit, i.e., event stream based read mode, the read data rate of the read circuit 220 is B ev ·N ev bps。
As can be seen from the above two embodiments, the read data rate of the read circuit 220 in the read mode based on frame scanning can be directly calculated according to the predefined parameters, and the read data rate of the read circuit 220 in the read mode based on event stream can be obtained according to N of the two modes ev And (5) calculating to obtain the product.
The control circuit 230 is coupled to the read circuit 220 and is configured to control the read circuit 220 to read the data signals generated by the pixel array circuit 210 in a particular read mode. In some possible embodiments, the control circuit 230 may obtain at least one data signal from the read circuit 220 and determine which of the current read mode and the alternative read mode is more suitable for the current application scenario and the motion state based at least on the at least one data signal. Further, in some embodiments, the control circuit 230 may instruct the read circuit 220 to switch from the current data read mode to another data read mode based on the determination.
In some possible embodiments, the control circuit 230 may send an indication to the read circuit 220 to switch the read mode based on historical statistics of light intensity change events. For example, the control circuit 230 may determine statistical data related to at least one light intensity variation event based on at least one data signal received from the reading circuit 220. If the statistical data is determined to satisfy the predetermined switching condition, the control circuit 230 transmits a mode switching signal to the read circuit 220 to cause the read circuit 220 to switch to the second read mode. For ease of comparison, the statistics may be used to measure the read data rates of the first and second read modes, respectively.
In some embodiments, the statistics may include a total amount of events measured by pixel array circuit 210 per unit time. If the total data amount of the light intensity variation events read by the reading circuit 220 in the first reading mode has been greater than or equal to the total data amount of the light intensity variation events of the second reading mode, it is indicated that the reading circuit 220 should switch from the first reading mode to the second reading mode. In some embodiments, the given first read mode is a frame scan based read mode and the second read mode is an event stream based read mode. The control circuit 230 may be based on the number of pixels M, the frame rate f, and the amount of pixel data B of the pixel array circuit p To determine the total data M.B of the light intensity variation events read in the first reading mode p F. The control circuit 230 may be based on the number of light intensity variation events, N ev And an event data volume B associated with the event stream based read mode ev To determine the total data quantity B of the light intensity variation event ev ·N ev I.e. the total data quantity B of light intensity variation events read in the second reading mode ev ·N ev . In some embodiments, the switching parameters may be used to adjust the relationship between the total data amount in the two read modes, as shown in the following equation (1), the total data amount M.B of the light intensity variation events read in the first read mode p F is greater than or equal to the total data quantity B of the light intensity variation events of the second reading mode ev ·N ev The read circuit 220 should switch to the second read mode:
η·M·B P ·f≥B ev ·N ev (1)
where η is the switching parameter used for the adjustment. From the above equation (1), it can be further derived that the first threshold data amount d 1 =M·B p F.eta. I.e. if the total data quantity B of the light intensity variation event ev ·N ev Less than or equal to the threshold data amount d 1 Indicating that the first reading mode is down readingThe control circuit 230 may determine that the statistical data of the light intensity variation event satisfies the predetermined switching condition, when the total data amount of the light intensity variation event is greater than or equal to the total data amount of the light intensity variation event of the second read mode. In this embodiment, the frame rate f and the pixel data amount B associated with the frame-scan-based read mode may be based at least on the number of pixels M of the pixel array circuit p To determine the threshold data amount d 1
In an alternative implementation of the above embodiment, the total data M.B of the light intensity variation events read in the first reading mode p F is greater than or equal to the total data quantity B of the light intensity variation events of the second reading mode ev ·N ev Can be represented by the following formula (2):
M·B P ·f-B ev ·N ev ≥θ (2)
where θ is the switching parameter used for adjustment. From the above equation (2), it can be further derived that the second threshold data amount
d 2 =M·B p ·f-θ
I.e. if the total data quantity B of the light intensity variation event ev ·N ev Less than or equal to the second threshold data amount d 2 The control circuit 230 may determine that the statistical data of the light intensity variation event satisfies the predetermined switching condition, indicating that the total data amount of the light intensity variation event read in the first reading mode has been greater than or equal to the total data amount of the light intensity variation event of the second reading mode. In this embodiment, the frame rate f and the pixel data amount B associated with the frame-scan-based read mode may be based at least on the number of pixels M of the pixel array circuit p To determine the threshold data amount d 2
In some embodiments, the first read mode is an event stream based read mode and the second read mode is a frame scan based read mode. Since in the event stream based read mode, the read circuit 220 reads only the data signals generated by the pixels generating the event. Thus, the control circuit 230 is based on the read power The number of data signals provided by way 220 may directly determine the number N of light intensity variation events generated in pixel array circuit 210 ev . The control circuit 230 may be based on the number of events N ev And an event data volume B associated with the event stream based read mode ev Determining the total data quantity of the light intensity variation events, namely the total data quantity B of the events read in the first reading mode ev ·N ev . Similarly, the control circuit 230 may also be based on the number of pixels M, the frame rate f, and the amount of pixel data B of the pixel array circuit p To determine the total data M.B of the light intensity variation event read in the second reading mode p F. As shown in the following formula (3), the total data amount B of the light intensity variation event read in the first reading mode ev ·N ev A total data amount M.B greater than or equal to the light intensity variation event of the second reading mode p F, the read circuit 220 should switch to the second read mode:
B ev ·N ev ≥η·M·B P ·f (3)
where η is the switching parameter used for the adjustment. From the above equation (3), it can be further derived that the first threshold data amount d 1 =η·M·B P F. If the total data quantity B of the light intensity variation event ev ·N ev Greater than or equal to threshold data amount d 1 The control circuit 230 determines that the statistics of the light intensity variation event satisfy the predetermined switching condition. In this embodiment, the pixel number M, frame rate f, and pixel data amount B of the pixel array circuit may be based at least on p To determine the threshold data amount d 1
In an alternative implementation of the above embodiment, the total data amount B of the light intensity variation events read in the first reading mode ev ·N ev A total data amount M.B greater than or equal to the light intensity variation event of the second reading mode p F may be as shown in the following equation (4):
M·B P ·f-B ev ·N ev ≤θ (4)
where θ is the switching parameter used for adjustment. From the above equation (4), it can be further derived that the second threshold data amount d 2 =M·B P F- θ, if the total data quantity B of the light intensity variation event ev ·N ev Greater than or equal to threshold data amount d 2 The control circuit 230 determines that the statistics of the light intensity variation event satisfy the predetermined switching condition. In this embodiment, the pixel number M, frame rate f, and pixel data amount B of the pixel array circuit may be based at least on p To determine the threshold data amount d 2
In other embodiments, the statistics may include the number of events N measured by pixel array circuit 210 per unit time ev . If the first read mode is a frame scan based read mode and the second read mode is an event stream based read mode, the control circuit 230 determines the number of light intensity variation events N based on the number of first data signals of the plurality of data signals provided by the read circuit 220 ev . If the statistical data indicates the number N of light intensity variation events ev Less than a first threshold number n 1 The control circuit 230 may determine that the statistics of the light intensity variation event satisfies the predetermined switching condition based on at least the number of pixels M of the pixel array circuit, the frame rate f associated with the frame scan based read mode, and the amount of pixel data B p And an event data amount B associated with the event stream based read mode ev To determine a first threshold number n 1 . For example, in the foregoing embodiment, the following formula (5) can be further obtained based on the formula (1):
i.e. a first threshold number n 1 Can be determined as
In an alternative implementation of the above example, the following formula (6) may be further derived based on formula (2):
correspondingly, a second threshold number n 2 Can be determined as
In still other embodiments, if the first read mode is an event stream based read mode and the second read mode is a frame scan based read mode, the control circuit 230 may directly determine the number of light intensity variation events N based on the number of at least one data signal provided by the read circuit 220 ev . If the statistical data indicates the number N of light intensity variation events ev Greater than or equal to a first threshold number n 1 The control circuit 230 determines that the statistics of the light intensity variation event satisfy the predetermined switching condition. Frame rate f and pixel data amount B associated with a frame scan-based read mode may be based at least on the number M of pixels of pixel array circuit 210 p And an event data amount B associated with the event stream based read mode ev To determine a first threshold number n 1 =M·B p ·f/(η·B ev ). For example, in the foregoing embodiment, the following formula (7) can be further obtained based on the formula (3):
i.e. a first threshold number n 1 Can be determined as
In an alternative implementation of the above example, the following formula (8) may be further derived based on formula (4):
correspondingly, a second threshold number n 2 Can be determined as
It should be understood that the formulas, switching conditions, and associated computing methods given above are merely one example implementation of embodiments of the present application, and that other suitable mode switching conditions, switching strategies, and computing methods may be employed, as the scope of the present application is not limited in this respect.
Fig. 6-a shows a schematic diagram of a vision sensor operating in a frame scan based read mode, according to an embodiment of the present application. Fig. 6-b shows a schematic diagram of a vision sensor operating in an event stream based read mode, according to an embodiment of the present application. As shown in fig. 6-a, the read circuit 220 or 320 is currently operating in a first read mode, i.e., a frame scan based read mode. Since the control circuit 230 or 330 determines that the number of events generated in the current pixel array circuit 210 or 310 is small based on the history statistics, for example, only four valid data in one frame of data, then predicts that the possible event generation rate in the next period is low. If the read circuit 220 or 320 continues to read in a frame scan based read mode, it will be necessary to repeatedly allocate bits for the pixels generating the event, thereby generating a large amount of redundant data. In this case, the control circuit 230 or 330 transmits a mode switching signal to the read circuit 220 or 320 to switch the read circuit 220 or 320 from the first read mode to the second read mode. After switching, as shown in fig. 6-b, the read circuit 220 or 320 operates in the second read mode, only valid data signals are read, thereby avoiding the transmission bandwidth and memory resources occupied by a large number of invalid data signals.
Fig. 6-c shows a schematic diagram of a vision sensor operating in an event stream based read mode, according to an embodiment of the present application. Fig. 6-d shows a schematic diagram of a vision sensor operating in a frame scan based read mode, according to an embodiment of the present application. As shown in fig. 6-c, the read circuit 220 or 320 is currently operating in a first read mode, i.e., an event stream based read mode. Since the control circuit 230 or 330 determines that the number of events generated in the current pixel array circuit 210 or 310 is high based on historical statistics, for example, nearly all pixels in the pixel array circuit 210 or 310 generate data signals indicating that the change in light intensity is above a predetermined threshold for a short period of time. In turn, the read circuit 220 or 320 may predict that the rate of possible event generation is high for the next time period. Since there is a large amount of redundant data in the read data signal, e.g., nearly identical pixel location information, read time stamps, etc., if the read circuit 220 or 320 continues to read in an event stream based read mode, this will cause a surge in the amount of read data. Thus, in this case, the control circuit 230 or 330 transmits a mode switching signal to the read circuit 220 or 320 to switch the read circuit 220 or 320 from the first read mode to the second read mode. After switching, as shown in fig. 6-d, the read circuit 220 or 320 operates in a frame-based scanning mode to read the data signals in a less costly read mode of representation of a single pixel, relieving the pressure of storing and transmitting the data signals.
In some possible embodiments, the vision sensor 200 or 300 may further include a parsing circuit that may be configured to parse the data signal output by the reading circuit 220 or 320. In some possible embodiments, the parsing circuit may parse the data signal in a parsing mode that is compatible with the current data read mode of the read circuit 220 or 320. This will be described in detail below.
It should be understood that other existing or future developed data read modes, data parse modes, etc. are also suitable for use in the possible embodiments of the present application, and that all values in the embodiments of the present application are illustrative and not limiting, e.g., the possible embodiments of the present application may be switched between more than two data read modes.
According to a possible embodiment of the present application, a vision sensor chip is provided that is capable of adaptively switching between a plurality of reading modes based on historical statistics of light intensity variation events generated in a pixel array circuit. Therefore, in a dynamic scene or a static scene, the visual sensor chip can always realize good reading and analyzing performances, avoid the generation of redundant data and relieve the pressure of image processing, transmission and storage.
Fig. 7 shows a flow chart of a method for operating a vision sensor chip according to a possible embodiment of the present application. In some possible embodiments, the method may be implemented in the vision sensor 200 shown in fig. 4-a or the vision sensor 300 shown in fig. 4-b and the electronic device shown in fig. 9 below, or may be implemented using any suitable device, including various devices now known or later developed. For ease of discussion, the method will be described below in connection with the vision sensor 200 shown in fig. 4-a.
Referring to fig. 7, a method for operating a vision sensor chip according to an embodiment of the present application may include the steps of:
501. a plurality of data signals corresponding to a plurality of pixels in a pixel array circuit are generated.
The pixel array circuit 210 generates a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit 210 by measuring the amount of change in light intensity. In the context of this document, the data signal may indicate a polarity of light intensity, an absolute light intensity value, a value of a change in light intensity, and the like.
502. At least one of the plurality of data signals is read from the pixel array circuit in a first read mode.
The reading circuit 220 reads at least one data signal of the plurality of data signals from the pixel array circuit 210 in the first reading mode, and the data signals occupy a certain memory and transmission resource within the vision sensor 200 after being read. The manner in which the visual sensor chip 200 reads the data signal may be different depending on the particular read mode. In some possible embodiments, for example, in an event stream based read mode, the read circuit 220 determines the location information (x, y) of the pixel associated with the light intensity variation event by scanning the pixel array circuit 210. Based on the position information, the reading circuit 220 may read out a first data signal of the plurality of data signals. In this embodiment, the reading circuit 220 acquires the characteristic information of the light intensity, the position information (x, y) of the pixel generating the light intensity change event, the time stamp t of the read data signal, and the like by reading the data signal.
In other possible embodiments, the first read mode may be a frame scan based read mode. In this mode, the vision sensor 200 scans the pixel array circuit 210 at a frame frequency associated with a frame scan-based read mode to read all data signals generated by the pixel array circuit 210. In this embodiment, the reading circuit 220 acquires the characteristic information of the light intensity by reading the data signal.
503. At least one data signal is provided to the control circuit.
The read circuit 220 provides the read at least one data signal to the control circuit 230 for data statistics and analysis by the control circuit 230. In some embodiments, the control circuit 230 may determine statistical data related to at least one light intensity variation event based on the at least one data signal. The control circuit 230 may analyze the statistics using a switching policy module. If it is determined that the statistical data satisfies the predetermined switching condition, the control circuit 230 transmits a mode switching signal to the read circuit 220.
In the case where the first read mode is a frame scan based read mode and the second read mode is an event stream based read mode, in some embodiments, the control circuit 230 may determine the number of light intensity variation events based on the number of first data signals in the plurality of data signals. Further, the control circuit 230 compares the number of light intensity variation events with a first threshold number. If the statistical data indicates that the number of light intensity variation events is less than or equal to the first threshold number, the control circuit 230 determines that the statistical data of the light intensity variation events satisfies a predetermined switching condition and transmits a mode switching signal. In this embodiment, the control circuit 230 may determine or adjust the first threshold number based on the number of pixels of the pixel array circuit, the frame rate and the amount of pixel data associated with the frame scan based read mode, and the amount of event data associated with the event stream based read mode.
In the case where the first read mode is an event stream based read mode and the second read mode is a frame scan based read mode, in some embodiments, the control circuit 230 may determine statistical data related to the light intensity variation event based on the first data signal received from the read circuit 220. Further, the control circuit 230 compares the number of light intensity variation events with a second threshold number. If the number of light intensity variation events is greater than or equal to the second threshold number, the control circuit 230 determines that the statistical data of the light intensity variation events satisfies a predetermined switching condition and transmits a mode switching signal. In this embodiment, the control circuit 230 may determine or adjust the second threshold number based on the number of pixels of the pixel array circuit, the frame rate and the amount of pixel data associated with the frame scan based read mode, and the amount of event data associated with the event stream based read mode.
504. Based on the mode switching signal, the first reading mode is switched to the second reading mode.
The reading circuit 220 switches the first reading mode to the second reading mode based on the mode switching signal received from the control circuit 220. Further, the reading circuit 220 reads at least one data signal generated by the pixel array circuit 210 in the second reading mode. The control circuit 230 may then continue to perform historical statistics on the light intensity variation events generated by the pixel array circuit 210, and when the switching condition is satisfied, send a mode switching signal to switch the reading circuit 220 from the second reading mode to the first reading mode.
According to the method provided by the possible embodiments of the application, the control circuit continuously performs historical statistics and real-time analysis on the light intensity change event generated in the pixel array circuit in the whole reading and analyzing process, and once the switching condition is met, a mode switching signal is sent so as to enable the reading circuit to be switched from the current reading mode to a more suitable alternative switching mode. The adaptive switching process is repeated until the reading of all data signals is completed.
Fig. 8 shows a block diagram of a control circuit of a possible embodiment of the present application. The control circuitry may be used to implement control circuitry 230 in fig. 4-a, control circuitry 330 in fig. 5, etc., as well as other suitable devices. It should be understood that the control circuit is for exemplary purposes only and does not imply any limitation on the scope of the application. Embodiments of the present application may also be embodied in different control circuits. In addition, it should also be understood that the control circuit may also include other elements, modules, or entities, which are not shown for clarity purposes, but are not meant to be provided by embodiments of the present application.
As shown in fig. 8, the control circuit includes at least one processor 602, at least one memory 604 coupled to the processor 602, and a communication mechanism 612 coupled to the processor 602. The memory 604 is used at least for storing a computer program and data signals retrieved from a read circuit. A statistical model 606 and a policy module 608 are preconfigured on the processor 602. The control circuit 630 may be communicatively coupled to the read circuit 220 of the vision sensor 200 as shown in fig. 4-a or to a read circuit external to the vision sensor via the communication mechanism 612 to perform control functions thereon. For ease of description, reference is made below to the read circuit 220 in fig. 4-a, but embodiments of the present application are equally applicable to configurations of peripheral read circuits.
Similar to the control circuit 230 shown in fig. 4-a, in some possible embodiments, the control circuit may be configured to control the read circuit 220 to read the plurality of data signals generated by the pixel array circuit 210 in a particular data read mode (e.g., a synchronous read mode based on frame scanning, an asynchronous read mode based on event streams, etc.). In addition, the control circuit may be configured to obtain a data signal from the read circuit 220, which may indicate, but is not limited to, a light intensity polarity, an absolute light intensity value, a change in light intensity value, and the like. For example, the intensity polarity may indicate a trend in the intensity change, such as an increase or decrease, generally indicated by +1/-1. The absolute light intensity value may represent the light intensity value measured at the current time. Depending on the structure, use and kind of sensor, the information about the light intensity or the change in the light intensity may have different physical significance.
The control circuit determines statistical data relating to at least one light intensity variation event based on the data signal obtained from the reading circuit 220. In some embodiments, the control circuitry may obtain data signals generated by the pixel array circuitry 210 over a period of time from the read circuitry 220 and store the data signals in the memory 604 for historical statistics and analysis. In the context of the present application, the first and second read modes may be one of an asynchronous read mode based on an event stream and a synchronous read mode based on a frame scan, respectively. It should be noted that all features described herein with respect to adaptively switching read modes are equally applicable to other types of sensors, as well as data read modes, and switching between more than two data read modes, whether currently known or developed in the future.
In some possible embodiments, the control circuit may utilize one or more statistical models 606 that are pre-configured to historically count the occurrence of light intensity variation events over a period of time by the pixel array circuit 210 provided by the read circuit 220. Statistical model 606 may then transmit the statistical data as an output to policy module 608. As previously described, the statistics may indicate the number of light intensity variation events, as well as the total amount of light intensity variation events. It should be appreciated that any suitable statistical model, statistical algorithm may be applied to possible embodiments of the present application, the scope of the present application being not limited in this respect.
Since the statistics are statistics of the historical conditions of the light intensity variation events generated by the visual sensor over a period of time, the policy module 608 is available to analyze and predict the rate at which events occur over the next period of time. Policy module 608 may be preconfigured with one or more handoff decisions. When there are multiple switching decisions, the control circuit may select one of the multiple switching decisions for analysis and decision-making as desired, e.g., based on factors such as the type of vision sensor 200, the nature of the light intensity variation event, the nature of the external environment, the state of motion, etc. Other suitable policy modules and mode switching conditions or policies may be employed in possible embodiments of the present application, the scope of the present application being not limited in this respect.
In some embodiments, if policy module 608 determines that the statistics satisfy the mode switch condition, an indication to read circuit 220 to switch the read mode is output. In another embodiment, if policy module 608 determines that the statistics do not satisfy the mode switch condition, no indication is output to read circuit 220 that the read mode is switched. In some embodiments, the indication of switching the read mode may take explicit form as described in the embodiments above, for example in the form of a switching signal or flag bit informing the read circuit 220 to switch the read mode.
Fig. 9 shows a block diagram of an electronic device according to a possible embodiment of the present application. As shown in fig. 9, the electronic device includes a vision sensor chip 901, communication interfaces 902 and 903, a control circuit 930, and a parsing circuit 904. It should be appreciated that the electronic device is for exemplary purposes and may be implemented using any suitable device, including various sensor devices currently known and developed in the future. Embodiments of the present application may also be embodied in different sensor systems. In addition, it should also be understood that the electronic device may also include other elements, modules, or entities, which are not shown for clarity, but are not meant to be provided by embodiments of the present application.
As shown in fig. 9, the visual sensor includes a pixel array circuit 710 and a read circuit 720, wherein read components 720-1 and 720-2 of the read circuit 720 are coupled to a control circuit 730 via communication interfaces 702 and 703, respectively. In embodiments of the present application, the reading components 720-1 and 720-2 may be implemented using separate devices, or may be integrated into the same device. For example, the read circuit 220 shown in FIG. 4-a is an example implementation of an integration. For ease of description, the reading components 720-1 and 720-2 may be configured to implement data reading functions in a frame scan based reading mode and an event stream based reading mode, respectively.
Pixel array circuit 710 may be implemented using pixel array circuit 210 in fig. 4-a or pixel array circuit 310 in fig. 5, as well as any suitable other device, as the application is not limited in this respect. The features of the pixel array circuit 710 are not described in detail herein.
The read circuit 720 may read the data signals generated by the pixel array circuit 710 in a specific read mode. For example, in the example of turning on the read component 720-1 and turning off the read component 720-2, the read circuit 720 initially reads the data signal in a read mode based on frame scanning. In the example of turning on the read component 720-2 and turning off the read component 720-1, the read circuit 720 initially reads the data signal in an event stream based read mode. The read circuit 720 is implemented using the read circuit 220 of fig. 4-a or the read circuit 320 of fig. 5, and may be implemented using any suitable other device, and the features of the read circuit 720 are not described herein.
In embodiments of the present application, the control circuit 730 may instruct the read circuit 720 to switch from the first read mode to the second read mode by way of an instruction signal or a flag bit. In this case, the read circuit 720 may receive an indication from the control circuit 730 to switch the read mode, for example, to switch the read component 720-1 on and switch the read component 720-2 off, or to switch the read component 720-2 on and switch the read component 720-1 off.
As previously described, the electronic device may also include parsing circuitry 704. The parsing circuit 704 may be configured to parse the data signal read by the reading circuit 720. In a possible embodiment of the present application, the parsing circuit may employ a parsing mode that is compatible with the current data read mode of the read circuit 720. As an example, ifThe read circuit 720 initially reads the data signals in a read mode based on the event stream, and the parsing circuit accordingly based on the first amount of data B associated with the read mode ev ·N ev To parse the data. When the reading circuit 720 switches from the event stream-based reading mode to the frame scan-based reading mode based on the instruction of the control circuit 730, the parsing circuit starts to read the data according to the second data amount, i.e., one frame data size m·b p To parse the data signal and vice versa.
In some embodiments, the parsing circuit 704 may implement switching of the parsing mode of the parsing circuit without requiring an explicit switching signal or flag bit. For example, parsing circuit 704 may employ the same or corresponding statistical model and switching strategy as control circuit 730 to make the same statistical analysis and consistent switching predictions of the data signals provided by reading circuit 720 as control circuit 730. As an example, if the read circuit 720 initially reads the data signal in a read mode based on the event stream, the parsing circuit is accordingly initially based on the first amount of data B associated with the read mode ev ·N ev To parse the data. For example, the front b analyzed by the analyzing circuit x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t A bit indicates the reading time, and b is finally taken f The bits indicate characteristic information of the light intensity. The parsing circuit obtains at least one data signal from the reading circuit 720 and determines statistical data related to at least one light intensity variation event. If the analysis circuit 704 determines that the statistical data satisfies the switching condition, it switches to an analysis mode corresponding to a reading mode based on frame scanning, and the statistical data is set to the frame data size M.B p To parse the data signal.
As another example, if the reading circuit 720 initially reads the data signal in the reading mode based on the frame scanning, the parsing circuit 704 reads the data signal in the parsing mode corresponding to the reading mode every B p Bits take the value of each pixel location within the frame in turn, where the value of the pixel location where no light intensity change event occurredIs 0. The parsing circuit 704 may count non-0 number of light intensity change events in a frame based on the data signal.
In some possible embodiments, the parsing circuit 704 obtains at least one data signal from the reading circuit 720 and determines which of the current parsing mode and the alternative parsing mode corresponds to the reading mode of the reading circuit 720 based at least on the at least one data signal. Further, in some embodiments, resolution circuitry 704 may switch from the current resolution mode to another resolution mode based on the determination.
In some possible embodiments, the resolution circuit 704 may determine whether to switch resolution modes based on historical statistics of light intensity change events. For example, the parsing circuit 704 may determine statistical data related to at least one light intensity variation event based on at least one data signal received from the reading circuit 720. If the statistical data is determined to satisfy the switching condition, the parsing circuit 704 switches from the current parsing mode to an alternative parsing mode. For ease of comparison, the statistics may be used to measure the read data rates of the first and second read modes, respectively, of the read circuit 720.
In some embodiments, the statistics may include a total amount of data for the number of events measured by pixel array circuit 710 per unit time. If the parsing circuit 704 determines, based on the at least one data signal, that the total data amount of light intensity variation events read by the reading circuit 720 in the first reading mode has been greater than or equal to the total data amount of light intensity variation events of its second reading mode, it indicates that the reading circuit 720 has switched from the first reading mode to the second reading mode. In this case, the parsing circuit 704 should switch to the parsing mode corresponding to the current reading mode accordingly.
In some embodiments, the given first read mode is a frame scan based read mode and the second read mode is an event stream based read mode. In this embodiment, the parsing circuit 704 initially parses the data signal acquired from the reading circuit 720 in a frame-based parsing mode corresponding to the first reading mode. The parsing circuit 704 may be based on an imagePixel number M, frame rate f, and pixel data amount B of the pixel array circuit 710 p To determine the total data M.B of the light intensity variation events read by the read circuit 720 in the first read mode p F. The resolving circuit 704 may be based on the number of light intensity variation events, N ev And an event data volume B associated with the event stream based read mode ev To determine the total data amount B of the light intensity variation event read by the read circuit 720 in the second read mode ev ·N ev . In some embodiments, a switching parameter may be utilized to adjust the relationship between the total data amount in the two read modes. Further, the parsing circuit 704 may determine the total data M.B of the light intensity variation events read by the reading circuit 720 in the first reading mode according to, for example, equation (1) above p Whether f is greater than or equal to the total data quantity B of the light intensity variation events of the second reading mode ev ·N ev . If so, the parsing circuit 704 determines that the read circuit 720 has switched to an event stream based read mode and accordingly switches from a frame based parsing mode to an event stream based parsing mode.
In an alternative implementation of the above embodiment, the parsing circuit 704 may determine the total data amount M.B of the light intensity variation events read by the reading circuit 720 in the first reading mode according to the above formula (2) p Whether f is greater than or equal to the total data quantity B of the light intensity variation events it reads in the second reading mode ev ·N ev . Similarly, the total data M.B of the light intensity variation events read in the first read mode at the determination read circuit 720 p F is greater than or equal to the total data quantity B of the light intensity variation events of the second reading mode ev ·N ev In the event that the parsing circuit 704 determines that the read circuit 720 has switched to the event stream based read mode and accordingly switches from the frame based parsing mode to the event stream based parsing mode.
In some embodiments, the first read mode is an event stream based read mode and the second read mode is a frame based scanIs a read mode of (a). In this embodiment, the parsing circuit 704 initially parses the data signal acquired from the reading circuit 720 in an event stream based parsing mode corresponding to the first reading mode. As described above, the analyzing circuit 704 can directly determine the number N of light intensity variation events generated in the pixel array circuit 710 based on the number of the first data signals provided by the reading circuit 720 ev . The parsing circuit 704 may be based on the number of events N ev And an event data volume B associated with the event stream based read mode ev Determining a total data amount B of events read by the read circuit 720 in the first read mode ev ·N ev . Similarly, the resolution circuit 704 may also be based on the number of pixels M, frame rate f, and pixel data amount B of the pixel array circuit p To determine the total data M.B of the light intensity variation events read by the read circuit 720 in the second read mode p F. The resolving circuit 704 may then determine the total data amount B of light intensity variation events read in the first reading mode, e.g., according to equation (3) above ev ·N ev Whether or not the total data amount M.B of the light intensity variation event of the second reading mode is greater than or equal to p F. Similarly, when the resolving circuit 704 determines the total data amount B of the light intensity variation events read by the reading circuit 720 in the first reading mode ev ·N ev A total data amount M.B greater than or equal to the light intensity variation event of the second reading mode p At f, the parsing circuit 704 determines that the read circuit 720 has switched to a frame scan based read mode and accordingly switches from an event stream based parsing mode to a frame based parsing mode.
In an alternative implementation of the above embodiment, the parsing circuit 704 may determine the total data amount B of the light intensity variation events read by the reading circuit 720 in the first reading mode according to the above formula (4) ev ·N ev Whether or not it is greater than or equal to the total data quantity M.B of the light intensity variation event which it reads in the second reading mode p F. Similarly, the intensity of light read in the first read mode at the determination read circuit 720Total data volume of change event B ev ·N ev A total data amount M.B greater than or equal to the light intensity variation event of the second reading mode p In the case of f, the parsing circuit 704 determines that the reading circuit 720 has switched to a frame scan based reading mode and accordingly switches from an event stream based parsing mode to a frame scan based parsing mode.
For the read time t of an event in the read mode based on frame scanning, it is default that all events within the same frame have the same read time t. In the case where the accuracy requirement for the event reading time is high, the reading time of each event can be further determined as follows. Taking the above embodiment as an example, in the reading mode based on the frame scanning, the frequency of the scanning of the pixel array circuit by the reading circuit 720 is f Hz, the time interval for reading the data of two adjacent frames is s=1/f, and the start time of each frame is given by:
T k =T 0 +kS (9)
wherein T is 0 For the start time of the first frame, k is the frame number, the time required to digital-to-analog convert for one of the M pixels can be determined by the following equation (10):
the time at which the light intensity variation event occurs at the ith pixel in the kth frame can be determined by the following equation (11):
where i is a positive integer. If the current reading mode is synchronous reading mode, switching to asynchronous reading mode according to each event B ev The bits parse the data. In the above embodiment, the switching of the parsing mode may be performed without explicit switching signals or flag bitsIn the case of implementation. For other data reading modes known at present or developed in the future, the parsing circuit may parse the data in a similar manner adapted to the data reading mode, which will not be described herein.
Fig. 10 shows a schematic diagram of the data amount over time of a single data read mode and an adaptively switched read mode according to a possible embodiment of the present application. The left half of fig. 10 depicts a schematic diagram of the amount of read data over time of a conventional visual sensor or sensor system employing either a synchronous read mode or an asynchronous read mode alone. In the case of purely using the synchronous read mode, as shown by a curve 1001, since each frame has a fixed data amount, the read data amount remains unchanged with time, i.e., the read data rate (the amount of data read per unit time) is stable. As described above, when a large number of events are generated in the pixel array circuit, it is reasonable to read the data signal in a reading mode based on frame scanning, and most of the frame data is effective data representing the generated events, and there is less redundancy. When the events generated in the pixel array circuit are fewer, a large amount of invalid data representing and generating the events exist in one frame, and at the moment, the frame data structure still represents and reads the light intensity information at the pixels to generate redundancy, thereby wasting transmission bandwidth and storage resources.
In the case of an asynchronous read mode alone, the amount of data read varies with the rate at which the event occurs, as shown by curve 1002, and thus the read data rate is not fixed. When fewer events are generated in the pixel array circuit, only a small number of bits for representing the coordinate information (x, y) of the pixel, the time stamp t at which the data signal is read, and the characteristic information f of the light intensity need to be allocated for the events, the total amount of data to be read is small, and in this case, it is reasonable to adopt an asynchronous read mode. When a large number of events are generated in the pixel array circuit in a short time, a large number of bits for representing the events need to be allocated. However, these pixel coordinates are almost adjacent, and the read time of the data signal is almost the same. In other words, there is a large amount of repeated data in the event data read, and thus there is also a problem of redundancy in the asynchronous read mode, in which case the read data rate even exceeds that in the synchronous read mode, which is not reasonable if the asynchronous read mode is still employed.
The right half of fig. 10 depicts a schematic diagram of the amount of data in an adaptive data read mode over time according to a possible embodiment of the present application. The adaptive data reading mode may be implemented using the vision sensor 200 shown in fig. 4-a, the vision sensor 300 shown in fig. 4-b, or the electronic device shown in fig. 9, or a conventional vision sensor or sensor system may implement the adaptive data reading mode by using the control circuit shown in fig. 8. For ease of description, features relating to the adaptive data reading mode are described below with reference to the vision sensor 200 shown in fig. 4-a. As shown by curve 1003, the vision sensor 200 selects, for example, an asynchronous read mode in an initialized state. Due to the number of bits B used to represent each event in this mode ev Is predetermined (e.g. B ev =b x +b y +b t +b f ) With the generation and reading of events, the vision sensor 200 may count the read data rate in the current mode. On the other hand, the number of bits B for representing each pixel of each frame in the synchronous read mode p Is also predetermined, so that the read data rate during that period of time using the synchronous read mode can be calculated. The vision sensor 200 may then determine whether the relationship between the data rates in the two read modes satisfies the mode switching condition. For example, the vision sensor 200 may compare which of the two read modes has a smaller read data rate based on a predefined threshold. Upon determining that the mode-switching condition is met, the vision sensor 200 switches to another read mode, such as from an initial asynchronous read mode to a synchronous read mode. The steps are continuously carried out in the process of reading and analyzing the data signals until the output of all data is completed. As shown in curve 1003, the vision sensor 200 adaptively selects an optimal read mode throughout the data reading process, with the two read modes alternating such that the read data rate of the vision sensor 200 begins The read data rate of the synchronous read mode is not exceeded, thereby reducing the cost of data transmission, parsing and storage of the vision sensor.
In addition, according to the adaptive data reading method provided in the embodiment of the present application, the vision sensor 200 may perform historical statistics on the events to predict the possible event generation rate in the next time period, so that a reading mode more suitable for the application scenario and the motion state can be selected.
Through the scheme, the visual sensor can be adaptively switched among a plurality of data reading modes, so that the reading data rate is always kept not to exceed a preset reading data rate threshold, the cost of data transmission, analysis and storage of the visual sensor is reduced, and the performance of the sensor is obviously improved. In addition, such visual sensors can make data statistics on events generated over a period of time for predicting the likely event generation rate over the next period of time, thus enabling selection of a reading mode that is more appropriate for the current external environment, application scenario, and motion state.
The above describes that the pixel array circuit may be used to measure the amount of change in light intensity and generate a plurality of data signals corresponding to a plurality of pixels. The data signal may indicate a polarity of the light intensity, an absolute light intensity value, a value of a change in the light intensity, and the like. The pixel array circuit outputs the data signal as described in detail below.
Fig. 11 shows a schematic diagram of a pixel circuit 900 provided herein. Each of the pixel array circuit 210, the pixel array circuit 310, and the pixel array circuit 710 may include one or more pixel arrays, and each pixel array includes a plurality of pixels, each pixel may be regarded as one pixel circuit, each pixel circuit for generating a data signal corresponding to the pixel. Referring to fig. 11, a schematic diagram of a preferred pixel circuit according to an embodiment of the present application is provided. The present application also sometimes refers to one pixel circuit simply as one pixel. As shown in fig. 11, a preferred pixel circuit in the present application includes a light intensity detection unit 901, a threshold value comparison unit 902, a readout control unit 903, and a light intensity acquisition unit 904.
The light intensity detection unit 901 is configured to convert the acquired light signal into a first electrical signal. The light intensity detection unit 901 may monitor light intensity information irradiated on the pixel circuit in real time, and convert the acquired light signal into an electrical signal in real time and output the electrical signal. In some possible embodiments, the light intensity detection unit 901 may convert the acquired light signal into a voltage signal. The specific structure of the light intensity detection unit is not limited, and the structure capable of converting the light signal into the electric signal can be adopted in the embodiments of the present application, for example, the light intensity detection unit may include a photodiode and a transistor. The anode of the photodiode is grounded, the cathode of the photodiode is connected with the source of the transistor, and the drain and gate of the transistor are connected to a power supply.
The threshold comparing unit 902 is configured to determine whether the first electrical signal is greater than a first target threshold, or whether the first electrical signal is less than a second target threshold, and when the first electrical signal is greater than the first target threshold, or the first electrical signal is less than the second target threshold, the threshold comparing unit 902 outputs a first data signal, where the first data signal is used to indicate that the pixel has a light intensity conversion event. The threshold comparing unit 902 is used to compare whether the difference between the current light intensity and the light intensity at the time of the last event generation exceeds a predetermined threshold, and can be understood with reference to formula 1-1. The first target threshold may be understood as the sum of the first predetermined threshold and the second electrical signal and the second target threshold may be understood as the sum of the second predetermined threshold and the second electrical signal. The second electric signal is an electric signal output by the light intensity detection unit 901 at the time of the last event occurrence. The threshold comparing unit in the embodiment of the present application may be implemented by hardware or software, which is not limited in the embodiment of the present application. The types of the first data signals output by the threshold comparison unit 902 may be different: in some possible embodiments, the first data signal comprises polarity information, such as +1 or-1, for indicating an increase or decrease in light intensity. In some possible embodiments, the first data signal may be an activation signal for instructing the readout control unit 903 to control the light intensity acquisition unit 904 to acquire the first electrical signal, and to buffer the first electrical signal. When the first data signal is an activation signal, the first data signal may also be polarity information, and when the readout control unit 903 acquires the first data signal, the light intensity acquisition unit 904 is controlled to acquire the first electrical signal.
The readout control unit 903 is further configured to notify the reading circuit to read the first electrical signal stored in the light intensity acquisition unit 904. Or notifies the reading circuit to read the first data signal output from the threshold comparing unit 902, which is polarity information.
The read circuit 905 may be configured to scan pixels in the pixel array circuit in a predetermined order to read data signals generated by the corresponding pixels. In some possible embodiments, the read circuit 905 may be understood with reference to the read circuit 220, the read circuit 320, and the read circuit 720, i.e., the read circuit 905 is configured to be capable of reading the data signals output by the pixel circuits in more than one signal read mode. For example, the reading circuit 905 may read in one of a first reading mode and a second reading mode, which respectively correspond to one of a frame scan-based reading mode and an event stream-based reading mode. In some possible embodiments, the read circuit 905 may also read the data signals output by the pixel circuits in only one signal read mode, such as the read circuit 905 being configured to read the data signals output by the pixel circuits in only a frame scan based read mode, or the read circuit 905 being configured to read the data signals output by the pixel circuits in only an event stream based read mode. In the embodiment corresponding to fig. 11, the data signals read by the reading circuit 905 are represented differently, that is, in some possible embodiments, the data signals read by the reading circuit are represented by polarity information, for example, the reading circuit may read the polarity information output by the threshold comparing unit; in some possible embodiments, the data signal read by the reading circuit may be represented by light intensity information, for example, the reading circuit may read the electrical signal buffered by the light intensity acquisition unit.
Referring to fig. 11-a, description will be made of an event represented by light intensity information and an event represented by polarity information, taking as an example a reading of a data signal output from a pixel circuit based on a reading mode of an event stream. As shown in the upper half of fig. 11-a, the black dots represent pixels generating light intensity variation events, and a total of 8 events are included in fig. 11-a, wherein the first 5 events are represented by light intensity information and the last 3 events are represented by polarity information. As shown in the lower part of fig. 11-a, the event represented by the light intensity information and the event represented by the polarity information each need to include coordinate information (x, y), time information t, except that the characteristic information m of the light intensity is light intensity information a by the event represented by the light intensity information and the characteristic information m of the light intensity is polarity information p by the event represented by the polarity information. The differences with respect to the light intensity information and the polarity information have been described above, and the description is not repeated here, only to emphasize that the data amount of the event represented by the polarity information is smaller than the data amount represented by the light intensity information.
The determination of which information the data signal read by the reading circuit is represented needs to be made in accordance with an instruction from the control circuit, as will be described in detail below.
In some implementations, the read circuit 905 may be configured to provide the read at least one data signal to the control circuit 906. For example, the read circuit 905 may provide the total data amount of the data signal read over a period of time to the control circuit 906 for historical data statistics and analysis by the control circuit 906. In one embodiment, the read circuitry 906 counts the number of events N generated per second by the pixel array circuitry based on the number of light intensity variation events generated by each pixel circuit 900 in the pixel array circuitry over a period of time ev . Wherein N is ev The acquisition may be performed by one of a frame scan-based read mode and an event stream-based read mode.
The control circuit 906 is coupled to the read circuit 905 and is configured to control the read circuit 906 to read the data signals generated by the pixel circuit 900 in the manner of a particular event representation. In some possible embodiments, the control circuitry 906 may obtain at least one data signal from the read circuitry 905 and determine which of the representation of the current event and the alternative event representation is more appropriate for the current application scenario and motion state based at least on the at least one data signal. Further, in some embodiments, the control circuitry 906 may instruct the read circuitry 905 to switch from the current event representation to another event representation based on the determination.
In some possible embodiments, the control circuitry 906 may send an indication of the transition event representation to the read circuitry 905 based on historical statistics of light intensity change events. For example, the control circuitry 906 may determine statistical data related to at least one light intensity variation event based on at least one data signal received from the read circuitry 905. If the statistical data is determined to satisfy the predetermined conversion condition, the control circuit 906 transmits an instruction signal to the reading circuit 905 to cause the reading circuit 905 to convert the read event format.
In some possible embodiments, the data provided by the read circuit 905 to the control circuit 906 is the total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time, provided that the read circuit 905 is configured to read the data signals output by the pixel circuit only in a read mode based on the event stream. Assuming that the current control circuit 906 controls the reading circuit 905 to read the data outputted from the threshold value comparing unit 902, i.e., the event is represented by the polarity information, the reading circuit 905 may vary the number of events N according to the light intensity ev Bit width H of data format to determine total data amount N of light intensity variation event ev X H. Wherein the bit width of the data format h=b x +b y +b t +b p ,b p The number of bits is used to represent polarity information of the light intensity indicated by the data signal, typically 1 bit to 2 bits. Since the polarity information of the light intensity is generally expressed in 1 bit to 2 bits, the total data amount of the event expressed by the polarity information must be smaller than the bandwidth, in order to ensure that event data of higher accuracy can be transmitted as much as possible without exceeding the bandwidth limit, if the total data amount of the event expressed by the light intensity information is also smaller than or equal to the bandwidth,then the transition is made to representing the event by the light intensity information. In some embodiments, conversion parameters may be used to adjust the relationship between the amount of data and bandwidth K for one event representation, as shown in equation (12) below, the total amount of data N for an event represented by the intensity information ev X H is less than or equal to the bandwidth.
N ev ×H≤α×K (12)
Where α is the switching parameter used for adjustment. As can be further derived from the above formula (12), the control circuit 906 may determine that the statistical data of the light intensity variation event satisfies the predetermined switching condition if the total data amount of the event indicated by the light intensity information is less than or equal to the bandwidth. Some possible application scenarios include when the pixel acquisition circuit generates fewer events within a period of time or when the rate of generating events by the pixel acquisition circuit is slower within a period of time, in these cases, the events can be represented by light intensity information, and since the events represented by the light intensity information can carry more information, the subsequent processing and analysis of the events are facilitated, for example, the quality of image reconstruction can be improved.
In some embodiments, assuming that the current control circuit 906 controls the reading circuit 905 to read the electrical signal buffered by the light intensity acquisition unit 904, i.e. the event is represented by light intensity information, the reading circuit 905 may vary the number of events N according to the light intensity ev Bit width H of data format to determine total data amount N of light intensity variation event ev X H. Wherein, when the event stream based read mode is employed, the bit width of the data format h=b x +b y +b t +b a ,b a The number of bits is used to represent the intensity information indicated by the data signal, typically a number of bits, such as 8-12 bits. In some embodiments, conversion parameters may be used to adjust the relationship between the amount of data and bandwidth K for one event representation, as shown in equation (13) below, the total amount of data N for an event represented by the intensity information ev X H is larger than, the reading circuit 220 should read the data output from the threshold value comparing unit 902, i.e., convert to an event represented by polarity information:
N ev ×H>β×K (13)
where β is the switching parameter used for regulation. From the above equation (13), it can be further derived that if the total data amount N of the light intensity variation event ev If x H is greater than the threshold data amount β×k, indicating that the total data amount representing the light intensity variation event by the light intensity information is greater than or equal to the bandwidth, the control circuit 905 may determine that the statistical data of the light intensity variation event satisfies the predetermined conversion condition. Some possible application scenarios include when the pixel acquisition circuit generates a large number of events within a period of time, or when the rate of events generated by the pixel acquisition circuit is relatively high within a period of time, under these conditions, if the light intensity information is continuously used for representing the events, the event may be lost, so that the event can be represented by the polarity information, the pressure of data transmission is relieved, and the loss of data is reduced.
In some embodiments, the data provided by the read circuit 905 to the control circuit 906 is the number of events N measured by the pixel array circuit per unit time ev . In some possible embodiments, the control circuit can determine the number N of light intensity change events provided that the current control circuit 906 controls the read circuit 905 to read the data output by the threshold comparison unit 902, i.e. the events are represented by polarity information ev And (3) withAnd determining whether a predetermined conversion condition is satisfied. If N ev Less than or equal toThe reading circuit 220 should read the electrical signal buffered in the light intensity collecting unit 904, i.e. convert the event represented by the light intensity information, and convert the current event represented by the polarity information into the event represented by the light intensity information. For example, in the foregoing embodiment, the following formula (14) can be further derived based on the formula (12):
in some embodiments, assuming that the current control circuit 906 controls the reading circuit 905 to read the electrical signal buffered by the light intensity acquisition unit 904, i.e. the event is represented by light intensity information, the control circuit 906 may vary the number of events N according to the light intensity ev And (3) withAnd determining whether a predetermined conversion condition is satisfied. If N ev Greater than The reading circuit 220 should read the signal output in the threshold comparing unit 902, i.e., convert the event represented by the polarity information, and convert the current event represented by the light intensity information into the event represented by the polarity information. For example, in the foregoing embodiment, the following formula (15) can be further obtained based on the formula (12):
in some possible embodiments, assuming that the read circuit 905 is configured to read the data signal output by the pixel circuit only in a read mode based on frame scanning, the data supplied by the read circuit 905 to the control circuit 906 is the total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time. When a read mode based on frame scanning is employed, the bit width h=b of the data format p ,B p For the amount of pixel data (e.g., the number of bits) allocated for each pixel in the frame-scan-based read mode, when an event is represented by polarity information, B p Typically 1 to 2 bitsThe event is represented by the light intensity information, typically 8 bits to 12 bits. The read circuit 905 may determine the total data amount M x H of the light intensity variation events, where M represents the total number of pixels. It is assumed that the current control circuit 906 controls the read circuit 905 to read the data output from the threshold value comparing unit 902, that is, an event is represented by polarity information. The total data amount of the events represented by the polarity information must be smaller than the bandwidth, in order to ensure that the event data with higher accuracy can be transmitted as much as possible without exceeding the bandwidth limit, if the total data amount of the events represented by the light intensity information is also smaller than or equal to the bandwidth, the transition is made to the event represented by the light intensity information. In some embodiments, conversion parameters may be used to adjust the relationship between the amount of data and bandwidth K for one event representation, as shown in equation (16) below, the total amount of data N for an event represented by the intensity information ev X H is less than or equal to the bandwidth.
M×H≤α×K (16)
In some embodiments, assuming that the current control circuit 906 controls the reading circuit 905 to read the electrical signal buffered by the light intensity acquisition unit 904, that is, the light intensity information indicates an event, the reading circuit 905 may determine the total data amount mxh of the light intensity change event, and in some embodiments, may adjust the relationship between the data amount and the bandwidth K in one event indication mode by using the conversion parameter, as shown in the following equation (17), the total data amount mxh of the event indicated by the light intensity information is greater than the bandwidth, and the reading circuit 220 should read the data output by the threshold value comparing unit 902, that is, convert to the data indicated by the polarity information to indicate the event:
M×H>α×K (17)
in some possible embodiments, it is assumed that the read circuit 905 is configured to read in one of a first read mode and a second read mode, which correspond to one of a frame scan based read mode and an event stream based read mode, respectively. For example, in the following description of how the control circuit determines whether the switching condition is satisfied or not by the read circuit 905 currently reading the data signal output from the pixel circuit in the read mode based on the event stream, the control circuit 906 controls the read circuit 905 to read the data output from the threshold value comparing unit 902, that is, by indicating the combination mode of the event by the polarity information:
In the initial state, one reading mode can be selected arbitrarily, for example, a reading mode based on frame scanning can be selected, or a reading mode based on event stream can be selected; in addition, in the initial state, an event representation mode may be arbitrarily selected, for example, the control circuit 906 controls the reading circuit 905 to read the electrical signal buffered by the light intensity acquisition unit 904, that is, the event is represented by the light intensity information, or the control circuit 906 controls the reading circuit 905 to read the data output by the threshold value comparison unit 902, that is, the event is represented by the polarity information. Assuming that the reading circuit 905 currently reads the data signal output from the pixel circuit in the event stream-based reading mode, the control circuit 906 controls the reading circuit 905 to read the data output from the threshold value comparing unit 902, that is, to represent an event by polarity information. The data supplied from the reading circuit 905 to the control circuit 906 may be a first total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time. Since the total number of pixels M is known, the amount of pixel data B allocated for each pixel in the frame-scan-based read mode p It is known that the bit width H of the data format at the time of the event is represented by the light intensity information. According to the known M, B p H, the data signals output by the pixel circuits can be read based on a reading mode of the event stream, and the second total data quantity of the number of the events measured by the pixel array circuits in unit time under the event combination mode is represented by the light intensity information; the data signal output by the pixel circuit can be read by a reading model based on frame scanning, and the polarity information is used for representing a third total data quantity of the number of the events measured by the pixel array circuit in unit time in the event combination mode; the data signal output by the pixel circuit is read based on the reading model of the frame scan, and the fourth total data amount of the number of events measured by the pixel array circuit per unit time in the event-combination mode is represented by the light intensity information. In particular according to M, B p H calculating a second data amountThe manner in which the third data amount and the fourth data amount are described above is not repeated here. The first total data amount supplied by the above-described reading circuit 905, and the second, third, and fourth total data amounts acquired by calculation, and the relationship between them and the bandwidth K determine whether the switching condition is satisfied. If the current combination mode cannot ensure that the event data with higher precision is transmitted as much as possible under the condition that the bandwidth limit is not exceeded, determining that the switching condition is met, and switching to the combination mode capable of ensuring that the event data with higher precision is transmitted as much as possible under the condition that the bandwidth limit is not exceeded.
For a better understanding of the above process, the following description is provided in connection with a specific example:
let the bandwidth limit be K, the bandwidth adjustment factor α. In a read mode based on an event stream, the bit width h=b of the data format when an event is represented by polarity information x +b y +b t +b p When an event is represented by light intensity information, bit width h=b of the data format x +b y +b t +b a . Typically 1.ltoreq.b p <b a Such as b p Typically 1 to 2 bits, b a Typically 8 bits to 12 bits.
In a frame scan based read mode, the event need not represent coordinates and time, the event is determined based on the state of each pixel, assuming that the data bit width allocated to each pixel is b in a polar mode sp B in light intensity mode sa The total number of pixels is M. Let bandwidth limit k=1000 bps, b x =5bit b y =4bit,b t =10bit,b p =1bit,b a =8bit,b sp =1bit,b sa 8 bits, total number of pixels m=100, bandwidth adjustment factor α=0.9. Assuming 10 events were generated at 1 st second, 15 events were generated at 2 nd second, and 30 events were generated at 3 rd second.
Assuming that in the initial state, a reading mode based on an event stream is adopted by default, and the event is represented by a polarity mode.
Hereinafter, the event stream-based read mode and the event represented by the polarity information will be referred to as an asynchronous polarity mode, the event stream-based read mode and the event represented by the light intensity information will be referred to as an asynchronous light intensity mode, the frame scan-based read mode and the event represented by the polarity information will be referred to as a synchronous polarity mode, and the frame scan-based read mode and the event represented by the light intensity information will be referred to as a synchronous light intensity mode.
Second 1: generating 10 events
Asynchronous polarity mode: n (N) ev =10,H=b x +b y +b t +b p =5+4+10+1=20 bits, and the estimated data amount is N ev ·H=200bit,N ev ·H<α.K, satisfying bandwidth limitations.
Asynchronous light intensity mode: at this time h=b x +b y +b t +b a =5+4+10+8=27 bits, then the estimated data size in the light intensity mode is N ev ·H=270bit,N ev ·H<α.K, still meets the bandwidth limitations.
Synchronous polarity mode: m=100, h=b sp =1bit, where the estimated data amount is m·h=100bit, m·h<α.K, still meets the bandwidth limitations.
Synchronous light intensity pattern: m=100, h=b sa =8bit, where the estimated data amount is m·h=800 bit, m·h<α.K, still meets the bandwidth limitations.
In summary, the asynchronous light intensity mode is selected at 1 st second, and light intensity information of all 10 events is transmitted in a smaller data volume (270 bits) under the condition that the bandwidth limit is not exceeded. The control circuit 906 determines that the current combination mode cannot ensure that event data with higher accuracy is transmitted as much as possible without exceeding the bandwidth limit, determines that the switching condition is satisfied, and controls to switch the asynchronous polarity mode to the asynchronous light intensity mode. Such as sending an indication signal that instructs the read circuit 905 to switch from the current event representation to another event representation.
Second 2: generating 15 events
Asynchronous polarity mode: the estimated data quantity is N ev H=15×20=300 bits, satisfying the bandwidth limitation.
Asynchronous light intensity mode: the estimated data quantity is N ev H=15×27=405 bits, satisfying the bandwidth limitation.
Synchronous polarity mode: the estimated data amount is m·h=100×1=100 bits, and the bandwidth limitation is satisfied.
Synchronous light intensity pattern: the estimated data amount is m·h=100×8=800 bits, and the bandwidth limitation is satisfied.
In summary, in the 2 nd second, the control circuit 906 determines that the current combination mode can ensure that the event data with higher accuracy is transmitted as much as possible under the condition that the bandwidth limit is not exceeded, and determines that the switching condition is not satisfied, and still selects the asynchronous light intensity mode.
3 rd second: generating 30 events
Asynchronous polarity mode: the estimated data quantity is N ev H=30×20=600 bits, satisfying the bandwidth limitation.
Asynchronous light intensity mode: the estimated data quantity is N ev H=30×27=810 bits, satisfying the bandwidth limitation.
Synchronous polarity mode: the estimated data amount is m·h=100×1=100 bits, and the bandwidth limitation is satisfied.
Synchronous light intensity pattern: the estimated data amount is m·h=100×8=800 bits, and the bandwidth limitation is satisfied.
At 3 seconds, the light intensity pattern is synchronized, and light intensity information of all 30 events can be transmitted in 800bit data volume. And the current combination mode (asynchronous light intensity mode) cannot be adopted in the 3 rd second, under the condition that the bandwidth limit is not exceeded, event data with higher accuracy can not be transmitted as much as possible, if the switching condition is determined to be met, the asynchronous light intensity mode is controlled to be switched into the synchronous light intensity mode. Such as sending an indication signal that instructs the read circuit 905 to switch from the current event read mode to another event read mode.
It should be understood that the formulas, conversion conditions, and associated computing methods given above are merely one example implementation of embodiments of the present application, and that other suitable event representation conversion conditions, conversion strategies, and computing methods may be employed, as the scope of the present application is not limited in this respect.
In some embodiments, the reading circuit 905 includes a data format control unit 9051, which is configured to control the reading circuit to read the signal output by the threshold comparing unit 902, or read the electrical signal buffered by the light intensity collecting unit 904. By way of example, the data format control unit 9051 is described below in connection with two preferred embodiments.
Referring to fig. 12-a, a schematic diagram of a data format control unit in a read circuit according to an embodiment of the present application is shown. The data format control unit may include and gates 951 and 954, or gates 953, and not gates 952. The input terminal of the and gate 951 is used for receiving the conversion signal sent by the control circuit 906, and the polarity information output by the threshold comparing unit 902, and the input terminal of the and gate 954 is used for receiving the conversion signal sent by the control circuit 906 after passing through the not gate 952, and the electrical signal (light intensity information) output by the light intensity collecting unit 904. The outputs of and gate 951 and gate 954 are coupled to the input of or gate 953, and the output of or gate 953 is coupled to control circuitry 906. In one possible embodiment, the conversion signal may be 0 or 1, and the data format control unit 9051 may control the reading of the polarity information output by the threshold comparing unit 902, or control the reading of the light intensity information output by the light intensity collecting unit 904. For example, if the conversion signal is 0, the data format control unit 9051 may control the output threshold comparing unit 902 to output the polarity information, and if the conversion signal is 1, the data format control unit 9051 may control the output light intensity information in the light intensity collecting unit 904. In one possible embodiment, the data format control unit 9051 may be connected to the control unit 906 through a format signal line, through which the converted signal transmitted by the control unit 906 is received.
It should be noted that the data format control unit shown in fig. 12-a is only one possible structure, and other logic structures capable of implementing line switching may be adopted in the embodiments of the present application. As shown in fig. 12-b, the read circuit 905 may include read components 955 and 956, where the read component 955 and the read component 956 may be implemented using separate devices, or may be integrated in the same device. A reading component 955 may be used to read the data output by the threshold comparison 902 and a reading component 956 may be used to read the electrical signal buffered by the light intensity acquisition unit.
The read circuit 905 may read the data signal generated by the pixel array circuit in a specific event representation. For example, the control circuit may control the reading component 955 to be turned on and the reading component 956 to be turned off, and the reading circuit 905 reads the data output from the threshold value comparing unit 902 through the reading section, and the reading circuit reads the event represented by the polarity information. In the example of turning on the reading element 956 and turning off the reading element 955, the reading circuit 905 reads an event represented by the light intensity information by reading the electrical signal buffered in the light intensity acquisition unit 904.
It should be noted that, in some possible embodiments, the reading circuit may further include other circuit structures, for example, an analog-to-digital conversion unit, for converting an analog signal into a digital signal. For another example, a statistics unit for counting the number N of events measured by the pixel array circuit per unit time ev For another example, a calculation unit may be further included for calculating the total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time. Furthermore, it should be noted that the connection in this application may represent a direct connection or coupling, such as the connection between the or gate 953 and the control circuit 906, and in one possible embodiment, may represent the connection between the or gate 953 and the control circuit 906, the or gate 953 may be connected to an input terminal of the statistical unit, and the control circuit 906 may be connected to an output terminal of the statistical unit.
According to the method provided by the possible embodiments of the present application, the control circuit 906 continuously performs historical statistics and real-time analysis on the light intensity variation events generated in the pixel array circuit during the whole reading and analyzing process, and sends a conversion signal once the conversion condition is met, so that the reading circuit 905 converts the information in the reading threshold comparing unit 902 into the information in the reading light intensity collecting unit 904, or the reading circuit 905 converts the information in the reading light intensity collecting unit 904 into the information in the reading threshold comparing unit 902, and the adaptive conversion process is repeated until the reading of all the data signals is completed.
Fig. 13 shows a block diagram of a control circuit of a possible embodiment of the present application. The control circuitry may be used to implement the control circuitry 906 of fig. 11, 12-a, etc., as shown in fig. 13, the control circuitry comprising at least one processor 1101, at least one memory 1102 coupled to the processor 1101, and a communication mechanism 1103 coupled to the processor 1101. The memory 1102 is used at least for storing a computer program and data signals obtained from a read circuit. A statistical model 111 and a policy module 112 are preconfigured on the processor 1101. The control circuit may be communicatively coupled to the read circuit 905 of the vision sensor as in fig. 11, 12-a, or to a read circuit external to the vision sensor via the communication mechanism 1103 to perform control functions thereto.
In some possible embodiments, the control circuitry may be configured to control the read circuitry 905 to read the plurality of data signals generated by the pixel array circuitry in a particular event representation. In addition, the control circuit may be configured to obtain a data signal from the reading circuit 905, and when the control circuit controls the reading circuit 905 to read an event represented by light intensity information, the data signal may indicate an absolute light intensity value, which may represent a light intensity value measured at the current time. When the control circuit controls the read circuit 905 to read an event represented by polarity information, the data signal may indicate a light intensity polarity or the like. For example, the intensity polarity may indicate a trend in the intensity change, such as an increase or decrease, generally indicated by +1/-1.
The control circuit determines statistical data relating to at least one light intensity variation event based on the data signal obtained from the reading circuit. For example, the above-mentioned statistical data may be the total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time, or the statistical data may also be the number of events N measured by the pixel array circuit per unit time ev . In some embodimentsThe control circuit may obtain data signals generated by the pixel array circuit over a period of time from the read circuit 905 and store the data signals in the memory 1102 for historical statistics and analysis.
In some possible embodiments, the control circuitry may utilize one or more statistical models 111 that are pre-configured to historically count the occurrence of light intensity variation events over a period of time for the pixel array circuitry provided by the read circuitry 906. Statistical model 111 may then transmit the statistical data to policy module 112. As previously described, the statistics may indicate the number of light intensity variation events, as well as the total amount of light intensity variation events. It should be appreciated that any suitable statistical model, statistical algorithm may be applied to possible embodiments of the present application, the scope of the present application being not limited in this respect.
Since the statistics are statistics of the historical conditions of the light intensity variation events generated by the visual sensor over a period of time, the policy module 112 is available to analyze and predict the rate or amount of data at which events occur over the next period of time. The policy module 112 may be preconfigured with one or more conversion decisions. When multiple transition decisions are present, the control circuitry may select one of the multiple transition decisions for analysis and decision-making as desired, e.g., based on factors such as the type of visual sensor, the nature of the light intensity change event, the nature of the external environment, the state of motion, etc.
In some embodiments, a plurality of statistical models and policy modules corresponding to the plurality of statistical models may be included in the processor, and referring to fig. 14, a block diagram of another control circuit provided in an embodiment of the present application is shown. The statistical model 1 (121) may be understood by referring to the statistical model 111 corresponding to fig. 13, and the policy module 1 (122) may be understood by referring to the policy module 112 corresponding to fig. 13. Statistical model 2 (123) may be understood with reference to statistical model 606 corresponding to fig. 8, and policy module 2 (124) may be understood with reference to policy module 608 corresponding to fig. 8. The communication mechanism 1203 may be understood with reference to the communication mechanism 1103 corresponding to fig. 13, and the communication mechanism 612 corresponding to fig. 8. In these embodiments, the control circuitry may be configured to control the reading The fetch circuit 220 reads a plurality of data signals generated by the pixel array circuit in a specific data read mode (e.g., a synchronous read mode based on frame scanning, an asynchronous read mode based on event stream, etc.). Meanwhile, the control circuit may be configured to acquire data signals of different expression from the reading circuit (e.g., acquire a data signal represented by polarity information, a data signal represented by light intensity information, etc.). The memory 1202 is used at least for storing a computer program and data signals acquired from a reading circuit. The computer programs stored in the memory 1202 may include programs related to switching data reading modes and programs related to converting event representations, among others. Furthermore, it should be noted that statistical model 1 and statistical model 2 may be based on different data, for example, statistical model 1 is based on the number of events N measured by the pixel array circuit per unit time ev The statistics is carried out and the result is output to the strategy module 1, the statistics model 2 is carried out based on the total data quantity of the number of the events (light intensity conversion events) measured by the pixel array circuit in unit time and the result is output to the strategy module 2, or the statistics model 1 is carried out based on the total data quantity of the number of the events (light intensity conversion events) measured by the pixel array circuit in unit time and the result is output to the strategy module 1, and the statistics model 2 is based on the number N of the events measured by the pixel array circuit in unit time ev Statistics are performed and the result is output to the policy module 2.
In some embodiments, referring to fig. 15, multiple processors (e.g., processor 1301 and processor 1302) may be included. Each processor is configured to output a control policy, and the statistical model 1 (131), the policy module 1 (132), the statistical model 2 (133), and the policy module 2 (134) may refer to the statistical model 1, the policy module 1, the statistical model 2, and the policy module 2 in the embodiment corresponding to fig. 14, which are understood, and will not be repeated here. In some embodiments, referring to fig. 15, a plurality of memories (such as a memory 1303 and a memory 1304) may be included, each for storing a computer program related to a control strategy, or a data signal acquired from a reading circuit, such as a memory 1 for storing a data signal acquired from a reading circuit and a program related to a transition event representation, and a memory 2 for storing a data signal acquired from a reading circuit and a program related to switching a data reading mode, respectively. For another example, one memory is used for storing data signals retrieved from the reading circuit and the other memory is used for storing a computer program (such a scheme is not shown in the figure) related to the control strategy.
In some embodiments, only one communication mechanism may be included, or multiple communication mechanisms may be included, where the communication mechanism 1305 and the communication mechanism 1306 in fig. 15 may be understood as one communication mechanism, or may be understood as two different communication mechanisms, and may be understood with reference to the corresponding communication mechanism 1203 and the communication mechanism 1204 in fig. 14.
In some embodiments, if the policy module determines that the statistics satisfy the conversion condition, an indication of the conversion event representation is output to the read circuit. In another embodiment, if the policy module determines that the statistics do not satisfy the conversion condition, no indication of the conversion event representation is output to the read circuit. In some embodiments, the indication of the manner in which the event is represented may take explicit form, such as the manner in which the signal is converted to 0 or the signal is converted to 1, as described in the embodiments above.
It should be understood that the control circuit is for exemplary purposes only and does not imply any limitation on the scope of the application. Embodiments of the present application may also be embodied in different control circuits. In addition, it should also be understood that the control circuit may also include other elements, modules, or entities, which are not shown for clarity purposes, but are not meant to be provided by embodiments of the present application. By way of example, a scheme is provided below in which the control circuit is implemented in hardware to control the reading circuit to read events of different representations.
Referring to fig. 16, a block diagram of a control circuit according to an embodiment of the present application is provided. The control circuit 1400 may be used to implement the control circuit 906 of fig. 11 or 12-a, etc. As shown in fig. 16, the control circuit 1400 may include a counter 1401 and a comparator 1402. The counter may be coupled to the read circuit 1403 via a communication mechanism and the comparator may be coupled to the read circuit 1403 via a communication mechanism.
The control circuit 1400 may be configured to control the read circuit 1403 to read the plurality of data signals generated by the pixel array circuit in a particular event representation. The control circuit 1400 may acquire the data signal transmitted by the reading circuit 1403 through the counter 1401, and the counter value is incremented by one each time the counter 1401 receives an event. The counter may send the counted number of events to the comparator 1402, and the comparator 1402 determines whether to output the switching signal to the reading circuit 1403 based on the switching condition and the number of events transmitted by the counter. For example, if the event is currently represented by the polarity information, the conversion condition can be understood by referring to the formula (14), when the comparator determines that the value of the counter output is less than or equal to When this is the case, the comparator 1402 outputs a conversion signal to the reading circuit 1403, controlling the reading circuit 1403 to read the electrical signal buffered in the light intensity acquisition unit. For another example, if the event is represented by the light intensity information, the conversion condition can be understood by referring to the formula (15), if the comparator determines that the value output by the counter is greater than or equal toWhen this is the case, the comparator 1402 outputs a conversion signal to the reading circuit 1403, and controls the reading circuit 1403 to read the signal output by the threshold value comparing unit. The comparator 1402 resets by notifying the counter 1401 after each completion of the comparison. The read circuit 1403 can be understood with reference to the read circuit 905 corresponding to fig. 11 and 12-a.
In some embodiments, the user may be further allowed to select a representation of a certain event in a customized manner, referring to fig. 17, which is a block diagram of another control circuit 1500 provided in an embodiment of the present application. The fixed signal is used for indicating the reading circuit to read the data signal according to a certain fixed event representation mode, such as a signal (an event represented by polarity information) indicating that the reading circuit reads the output of the threshold unit or a signal (an event represented by light intensity information) buffered by the light intensity acquisition unit. The selector 1503 is configured to receive the fixed signal and the signal output from the comparator 1502, and when the selector 1503 receives the fixed signal, the reading circuit is controlled according to the instruction of the fixed signal. If the selector 1503 does not receive the fixed signal, the read circuit is controlled in accordance with the switching signal output from the comparator 1502. The counter 1501 can be understood with reference to a counter 1401 in fig. 16, the comparator 1502 can be understood with reference to a comparator 1402 in fig. 16, and the reading circuit 1503 can be understood with reference to the reading circuit 1403 in fig. 16.
Referring to fig. 18, a distinguishing schematic diagram of a single event representation from an adaptive transition event representation provided in accordance with the present application is shown. As shown in fig. 18, in a single event representation manner, such as a manner of representing an event by using light intensity information (represented by a light intensity pattern in fig. 18), when the amount of data that the vision sensor needs to transmit exceeds a preset maximum bandwidth of the vision sensor, a part of the data is randomly discarded. As shown by a curve 1601 in fig. 18, when a large number of events are generated in the pixel array circuit, when the amount of data to be transferred is larger than the bandwidth, as shown by a broken line portion in a curve 1901, there is a case where event data cannot be read out, that is, there is a case where data is lost. When the scheme provided by the application is adopted, the relation between the data quantity of the event represented by the light intensity information and the bandwidth is compared, so that the representation accuracy of the event is adjusted. When the data quantity is small, the data transmission does not exceed the bandwidth, the light intensity information is adopted to represent the event, the sampled brightness information of the changed pixels can be output as much as possible, the representing precision of the event is high, and the light intensity information can be directly used in the subsequent processing, such as the brightness reconstruction, and complex processing is not needed. When a large number of events are triggered, the data volume of the event exceeds the bandwidth, the event is switched to the polarity information to represent the event (represented by a polarity mode in fig. 18), the event is represented with lower accuracy, and the data volume can be greatly reduced because the polarity information usually only needs 1 bit to 2 bits, the transmission pressure can be relieved and the loss of the data can be reduced due to the reduction of the data volume. It should be noted that, the embodiment of the present application may use a method of reconstructing brightness by using polarity information, which is an exemplary method of modeling and estimating. The event is generated because the change of the brightness is larger than the fixed threshold C, when knowing the brightness at a certain moment before the reconstruction moment, the brightness can be reconstructed according to the polarity information between the two moments and the principle of event generation, and the reconstruction of the brightness can be represented by the formula 1-2:
Where x, y represents the row and column coordinates (coordinate information) of the pixel, and t is a time stamp (time information) of the reconstruction time. e, e p Representing an event currently represented by polarity information. I (x, y, t) pre ) Representing luminance information at a time before the reconstruction time, if more accurate luminance information needs to be obtained, the luminance information can be estimated according to the spatial information and the time domain information of the pixel, for example, a linear interpolation method and a bicubic interpolation method can be adopted. In some possible embodiments, I (x, y, t pre ) The quantization error can be further reduced, representing the time immediately preceding the reconstruction time. It should be noted that, in the prior art, the brightness reconstruction difficulty is greater through polarity information, the accuracy of object identification is also worse, and unlike the prior art, the scheme provided by the application converts the event represented by the polarity information and the event represented by the light intensity information, and in the brightness reconstruction process through the polarity information, the brightness (light intensity information) at a certain moment before the reconstruction moment can be combined, so that compared with the prior art, the difficulty of brightness reconstruction can be reduced, and the accuracy of object identification is improved. In the present application, the light intensity information is sometimes referred to as luminance information, and both information means the same meaning.
To better demonstrate the advantages of the adaptive transition event representation, a specific example is described below. Assuming that the preset maximum bandwidth of the vision sensor is 200bps, setting alpha and beta to be 1, assuming that the initial event representation mode is that events are represented by light intensity information, each event is represented by 8 bits, and the data amount of the pixel array circuit in 4 seconds is respectively: 30 events were generated at 1 second, 60 events were generated at 2 seconds, 40 events were generated at 3 seconds, 20 events were generated at 4 seconds, and 15 events were generated at 5 seconds. Then if the event is always represented by the light intensity information, 1 second, since 240 bits of data are generated to be greater than 200 bits of bandwidth, only 25 events may be normally transmitted, and other events may be lost due to bandwidth limitation. At 2 seconds, for the same reason, since the amount of data generated is greater than the bandwidth, limited by the bandwidth, only part of the event is still normally transmitted, and part of the event is lost. At 3 seconds, again, due to bandwidth limitations, only part of the event is still transmitted normally, and part of the event is lost. At 4 seconds, the entire data volume can be normally transferred. At 5 seconds, the entire data volume can be normally transferred. When the scheme of the self-adaptive event representation mode is adopted, the data quantity required to be transmitted in the 1 st second exceeds the bandwidth, so that the conversion condition is met, the event is converted into the event represented by the polarity information, each event is represented by 2 bits on the assumption that the event is represented by the polarity information, and the loss probability of the event can be reduced because the number of bits required for representing one event by the polarity information is smaller than that of one event represented by the light intensity information and the event is represented by the polarity information. In the 2 nd second, the total data quantity required to be transmitted is smaller than the bandwidth, but the light intensity information is used for representing the event, the data quantity required to be transmitted exceeds the bandwidth, so the event is still represented by the polarity information, the loss probability of the event is reduced, in the 3 rd second, the data quantity required to be transmitted by the light intensity information still exceeds the bandwidth, the event is still represented by the polarity information, the loss probability of the event is reduced, in the 4 th second, the total data quantity required to be transmitted is smaller than the bandwidth, the light intensity information is used for representing the event, the data quantity required to be transmitted is smaller than the bandwidth, the conversion condition is met, the light intensity information is used for representing the event, the accuracy of event representation is improved, more information can be carried by the event, in the 5 th second, the light intensity information is used for representing the event, the event is still used, and more information can be carried by the event.
According to the method, the device and the system, a single event representation mode is adopted, for example, only light intensity information is adopted to represent the event, the loss rate of data is 36.4%, and when the scheme of the self-adaptive conversion event representation mode is adopted, the event can be converted into the event represented by polarity information when the transmitted data volume is larger than the bandwidth, so that the loss probability of the event is greatly reduced, and the event carries more information as much as possible under the condition that the bandwidth is not limited.
Fig. 19 shows a block diagram of an electronic device according to a possible embodiment of the present application. As shown in fig. 19, the electronic device includes a vision sensor chip 1700, a control circuit 1701, and a resolution circuit 1702. Wherein the control circuit 1701 may be understood with reference to the control circuit 906. It should be appreciated that the electronic device is for exemplary purposes and may be implemented using any suitable device, including various sensor devices currently known and developed in the future. Embodiments of the present application may also be embodied in different sensor systems. In addition, it should also be understood that the electronic device may also include other elements, modules, or entities, which are not shown for clarity, but are not meant to be provided by embodiments of the present application.
The vision sensor chip 1700 and the control circuit 1701 may be understood with reference to the vision sensor and the control circuit described in fig. 11 to 18, and the detailed description thereof will not be repeated here. The parsing circuit 1702 may be configured to parse data signals read by the read circuitry in the vision sensor chip 1700. In a possible embodiment of the present application, the parsing circuit 1702 may employ a parsing scheme that is compatible with the current event representation of the read circuit in the visual sensor chip 1700. As an example, if the reading circuit initially reads an event represented by polarity information, the parsing circuit parses the event based on the number of bits associated with the representation, respectively, such as the predetermined polarity information being represented with 1 bit, the parsing circuit parses the event based on 1 bit, respectively. When the reading circuit reads the event represented by the light intensity information, the parsing circuit parses the event based on the number of bits associated with the representation, for example, the predetermined light intensity information is represented by 12 bits, and the parsing circuit parses the event based on 12 bits, respectively.
In some embodiments, the resolution circuit 1702 may implement switching of resolution modes of the resolution circuit without requiring explicit switching signals or flag bits. For example, the parsing circuit 1702 may employ the same or corresponding statistical models and conversion strategies as the control circuit 1701 to make the same statistical analysis and consistent conversion predictions for the data signals provided by the read circuit as the control circuit 1701. As an example, the reading circuit is described as reading the data signal based on the reading mode of the event stream, and as already described above, each event may be represented as <x,y,t,m>(x, y) represents a pixel position at which an event is generated, t represents a time at which the event is generated, m represents characteristic information of light intensity, and m includes polarity information and light intensity information. Let x pass b x Represented by a bit, y is b y Represented by a bit, t passes b t The number of bits is represented by 1 bit when m represents the polarity information, and by 12 bits when m represents the light intensity information. Accordingly, when the control circuit 1701 controls the reading circuit to read the event indicated by the polarity information in the initial state, the analyzing circuit 1702 analyzes the previous b in the initial state x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t And a bit indicates reading time, and finally, 1 bit is taken to indicate the characteristic information of the light intensity, wherein the characteristic information of the light intensity is polarity information. The parsing circuit 1702 obtains the data signals from the read circuit and determines statistics related to the light intensity variation event. If the parsing circuit 1702 determines that the statistical data satisfies the conversion condition, then the data is converted to and from the light intensity information tableAn analysis mode corresponding to the event, for example, the previous b analyzed by the analysis circuit 1702 when the event represented by the polarity information is converted into the event represented by the light intensity information x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t And the reading time is indicated by one bit, and finally, the characteristic information of the light intensity is indicated by 12 bits, and particularly, the characteristic information of the light intensity is light intensity information.
As another example, if the reading circuit 905 initially reads the data signal represented by the light intensity information, the parsing circuit 1702 is in a parsing mode corresponding to the manner in which the event is represented. Such as front b of analysis x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t One bit indicates the reading time and finally 12 bits are taken to indicate the light intensity information. The parsing circuit 1702 obtains the data signals from the read circuit and determines statistics related to the light intensity variation event. If the parsing circuit 1702 determines that the statistics satisfy the conversion condition, then the process proceeds to a parsing mode corresponding to the event represented by the polarity information, e.g., the previous b parsed by the parsing circuit 1702 when the event represented by the polarity information is converted to the event represented by the light intensity information x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t One bit indicates the read time and finally 1 bit is taken to indicate the polarity information.
As another example, if the reading circuit 905 initially reads the data signal output from the pixel array circuit in the event stream-based reading mode, and particularly reads the data signal represented by the light intensity information, the parsing circuit 1702 parses the data signal acquired from the reading circuit 905 in a parsing mode corresponding to the reading mode and the event representation, for example, in a mode corresponding to the event stream reading mode, and parses the previous b x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t A bit indicates the read time, finally 1 is takenThe 2 bits indicate the light intensity information. The parsing circuit 1702 obtains the data signals from the read circuit and determines statistics related to the light intensity variation event. If the parsing circuit 1702 determines that the statistics satisfy the conversion condition, then the process proceeds to a parsing mode corresponding to the event represented by the polarity information, e.g., the previous b parsed by the parsing circuit 1702 when the event represented by the polarity information is converted to the event represented by the light intensity information x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t One bit indicates the read time and finally 1 bit is taken to indicate the polarity information.
In some possible embodiments, the parsing circuit 704 obtains a data signal from the reading circuit 905 and determines which of the current parsing mode and the alternative parsing mode corresponds to the representation of the event read by the reading circuit 905 based on the data signal. Further, in some embodiments, resolution circuitry 704 may switch from the current resolution mode to another resolution mode based on the determination.
The embodiment of the application also provides a method for operating the vision sensor chip. Referring to fig. 20, which is a flowchart of a method for operating a vision sensor chip according to a possible embodiment of the present application, may include the steps of:
1801. at least one data signal corresponding to a pixel in the pixel array circuit is generated by measuring the amount of light intensity variation.
The pixel array circuit generates at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of change in light intensity, the at least one data signal indicating a light intensity change event indicating that the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold. Wherein the pixel array circuit may comprise one or more pixel arrays and each pixel array comprises a plurality of pixels, each pixel may be considered a pixel circuit, which may be understood with reference to the pixel circuit 900.
1802. At least one data signal is read from the pixel array circuit in a first event representation.
The reading circuit reads at least one data signal from the pixel array circuit in a first event representation. The read circuit may be understood with reference to read circuit 905.
In some possible embodiments, the first event is represented by a polarity information, the pixel array circuit includes a plurality of pixels, each pixel includes a threshold comparing unit, and the threshold comparing unit is configured to output the polarity information when the light intensity conversion amount exceeds a predetermined threshold, and the polarity information is configured to indicate whether the light intensity conversion amount is increased or decreased. And the reading circuit is specifically used for reading the polarity information output by the threshold value comparison unit.
In some possible embodiments, the first event is represented by the light intensity information, the pixel array comprises a plurality of pixels, each pixel comprising a threshold comparison unit, a readout control unit and a light intensity acquisition unit,
and the light intensity detection unit is used for outputting an electric signal corresponding to the light signal irradiated on the light intensity detection unit, and the electric signal is used for indicating the light intensity.
And the threshold comparison unit is used for outputting a first signal when the light intensity conversion amount exceeds a preset threshold value according to the electric signal.
And the reading control unit is used for responding to the received first signal and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the first signal receiving moment.
The reading circuit is particularly used for reading the electric signals cached by the light intensity acquisition unit.
Wherein the light intensity detection unit may be understood with reference to the light intensity detection unit 901, the threshold comparison unit may be understood with reference to the threshold comparison unit 902, the readout control unit may be understood with reference to the readout control unit 903, and the light intensity acquisition unit may be understood with reference to the light intensity acquisition unit 904.
1803. At least one data signal is provided to the control circuit.
The reading circuit is also used for providing at least one data signal to the control circuit. The control circuitry may be understood with reference to control circuitry 906.
1804. Upon receiving a transition signal from the control circuit, which is generated based on the at least one data signal, transition is made to reading the at least one data signal from the pixel array circuit in a second event representation.
The read circuit is configured to switch to reading the at least one data signal from the pixel array circuit in a second event representation upon receiving a switch signal generated based on the at least one data signal from the control circuit.
In some possible embodiments, the control circuit is further configured to determine the statistical data based on at least one data signal received from the read circuit. And if the statistical data is determined to meet the preset conversion condition, transmitting a conversion signal to the reading circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
In some possible embodiments, the first event representation is an event represented by a polarity information, the second event representation is an event represented by a light intensity information, the predetermined conversion condition is that if at least one data signal is read from the pixel array circuit by the second event representation, the total data amount read is not greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is not greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal.
In some possible embodiments, the first event representation is an event represented by light intensity information, the second event representation is an event represented by polarity information, the predetermined conversion condition is that the total amount of data read from the pixel array circuit by the first event representation is greater than a preset bandwidth, or the predetermined conversion condition is that the number of at least one data signal is greater than a ratio of the preset bandwidth to a first bit, the first bit being a preset bit of a data format of the data signal. According to the self-adaptive event representation mode provided by the embodiment of the application, the visual sensor can carry out historical data statistics on the events so as to predict the possible event generation rate in the next time period, so that the event representation mode which is more suitable for application scenes and motion states can be selected.
Through the scheme, the visual sensor can be adaptively switched between the two event representation modes, so that the reading data rate is always kept not to exceed the preset reading data rate threshold, the cost of data transmission, analysis and storage of the visual sensor is reduced, and the performance of the sensor is obviously improved. In addition, such visual sensors can make data statistics on events generated over a period of time for predicting the likely event generation rate over the next period of time, thus enabling selection of a reading mode that is more appropriate for the current external environment, application scenario, and motion state.
The above describes that the visual sensor is able to adaptively switch between two event representations, including an event represented by polarity information and an event represented by light intensity information. When the self-adaptive event representation mode is adopted, the relation between the data quantity of the event represented by the light intensity information and the bandwidth is compared, so that the representation precision of the event is adjusted, the event is transmitted in a proper representation mode on the premise of meeting the bandwidth limit, and all the events are transmitted with larger representation precision as much as possible. In some embodiments, the visual sensor may adaptively switch between multiple event representations to better achieve the goal of transmitting all events with greater accuracy of representation, as described below in connection with some specific embodiments.
Fig. 21 shows a schematic diagram of a pixel circuit 1900 provided herein. Each of the pixel array circuit 210, the pixel array circuit 310, and the pixel array circuit 710 may include one or more pixel arrays, and each pixel array includes a plurality of pixels, each pixel may be regarded as one pixel circuit, each pixel circuit for generating a data signal corresponding to the pixel. Referring to fig. 21, a schematic diagram of another preferred pixel circuit according to an embodiment of the present application is provided. The present application also sometimes refers to one pixel circuit simply as one pixel. As shown in fig. 21, a preferred pixel circuit in the present application includes a light intensity detection unit 1901, a threshold comparison unit 1902, a readout control unit 1903, and a light intensity acquisition unit 1904.
The light intensity detection unit 1901 is configured to convert the acquired light signal into a first electrical signal. The light intensity detection unit 1901 may be understood with reference to the light intensity detection unit 901 in the embodiment corresponding to fig. 11, and the description thereof will not be repeated here.
The threshold comparing unit 1902 is configured to determine whether the first electrical signal is greater than a first target threshold, or whether the first electrical signal is less than a second target threshold, and when the first electrical signal is greater than the first target threshold, or the first electrical signal is less than the second target threshold, the threshold comparing unit 1902 outputs a first data signal, where the first data signal is used to indicate that the pixel has a light intensity conversion event. The threshold value comparing unit 1902 is configured to compare whether the difference between the current light intensity and the light intensity generated at the previous event exceeds a predetermined threshold value, and can be understood with reference to formula 1-1. The first target threshold may be understood as the sum of the first predetermined threshold and the second electrical signal and the second target threshold may be understood as the sum of the second predetermined threshold and the second electrical signal. The second electric signal is an electric signal output by the light intensity detection unit 901 at the time of the last event occurrence. The threshold comparing unit in the embodiment of the present application may be implemented by hardware or software, which is not limited in the embodiment of the present application.
The readout control unit 1903 controls the light intensity acquisition unit 904 to acquire the first electric signal when the first data signal is acquired. The readout control unit 1903 is also configured to notify the reading circuit 1905 to read the data signal output from the pixel circuit.
The reading circuit 1905 may be configured to scan pixels in the pixel array circuit in a predetermined order to read data signals generated by the corresponding pixels. In some possible embodiments, the read circuit 905 may be understood with reference to the read circuit 220, the read circuit 320, and the read circuit 720, i.e., the read circuit 905 is configured to be capable of reading the data signals output by the pixel circuits in more than one signal read mode. For example, the reading circuit 1905 may read in one of a first reading mode and a second reading mode, which respectively correspond to one of a frame scan-based reading mode and an event stream-based reading mode. In some possible embodiments, the reading circuit 1905 may also read the data signals output by the pixel circuits in only one signal reading mode, such as the reading circuit 1905 being configured to read the data signals output by the pixel circuits in only a frame scan based reading mode, or the reading circuit 1905 being configured to read the data signals output by the pixel circuits in only an event stream based reading mode.
The first encoding unit 1907 is configured to encode the first electrical signal buffered by the light intensity acquisition unit 1904 according to the currently acquired bit width. The reading circuit 1905 is further configured to read the data signal encoded by the first encoding unit 1907. As to how the first encoding unit 1907 encodes the first electric signal according to the acquired bit width, the control circuit 1906 is required to control it, which will be described in detail below.
In some implementations, the read circuit 1905 may be configured to provide the read at least one data signal to the control circuit 1906. The control circuit 1906 may control the first encoding unit 1907 to encode an event with a certain bit width according to the data signal acquired from the reading circuit 1905.
The read circuit 1905 may provide a data signal read over a period of time to the control circuit 1906 for reasoning by the control circuit 1906 and instruct the first encoding unit 1902 to encode an event with a certain bit width. In some possible embodiments, the control circuit 1906 may obtain at least one data signal from the reading circuit 1905, and determine, based at least on the at least one data signal, whether the coding scheme currently adopted by the first coding unit 1907 is suitable for the current application scenario and the motion state, so as to adjust the coding scheme of the first coding unit 1907. In some possible embodiments, the first coding unit 1907 may also interact with the control circuit 1906 directly, instead of sending the coded data signal to the control circuit 1906 through a reading circuit, for example, the first coding unit, the control circuit 1906 determines, based on the received coded data signal, whether the coding manner currently adopted by the first coding unit 1907 is suitable for the current application scenario and the motion state, so as to adjust the coding manner of the first coding unit 1907.
In some possible embodiments, the data provided by the read circuit 1905 to the control circuit 1906 is the number N of events (intensity transition events) measured by the pixel array circuit per unit time ev . Assuming that the bit width of the currently adopted information representing the light intensity characteristics is H 1 That is, it is assumed that the control circuit 1906 controls the first encoding unit 1907 to currently employ H 1 The characteristic information of the light intensity of each event is encoded with a number of bits (hereinafter, simply referred to as encoding for each event), and assuming that the bit width of the light intensity characteristic information encoded with i bits is preset and the number of events is encoded with s bits, the visual sensor has a total number of bits n=n to be transmitted ev ×H 1 +i+s. If the number of bits N that the vision sensor is required to transmit in total is greater than or equal to the bandwidth K, the control circuit 1906 determines to reduce the bit width of the light intensity characteristic information. The bit width of the light intensity characteristic information calculated by the control circuit is H 2 The number of bits that the vision sensor is required to transmit in common, particularly when H is used 2 When the number of bits encodes each event, the visual sensor has a total number of bits n=n to be transmitted ev ×H 2 +i+s. If the number of bits N to be transmitted by the vision sensor is smaller than or equal to the bandwidth K, the control circuit 1906 controls the first encoding unit 1907 to adopt H 2 Encoding each event with a bit H 2 Less than H 1
In some possible embodiments, the data provided by the read circuit 1905 to the control circuit 1906 may be the total data amount of the number of events (light intensity conversion events) measured by the pixel array circuit per unit time. For example, assume that the bit width of the currently employed light intensity characteristic information is H 1 The total data amount of the number of events measured by the pixel array circuit per unit time supplied from the reading circuit 1905 to the control circuit 1906 is N ev ×H 1
The above description has been given to the case where the bandwidth of the vision sensor is constant, there is a case where event data cannot be read out. At present, a random discarding mode is generally adopted for processing. If random discarding is adopted, although the transmission data amount can be ensured not to exceed the bandwidth, the data is lost, and under certain special application scenarios (such as automatic driving, etc.), the random discarded data may have higher importance. The schemes described in fig. 11 to 20 compare the relation between the data amount and the bandwidth of the event represented by the light intensity information, thereby adjusting the accuracy of the event representation, solving the problem by adaptively switching between the two event representations, and transmitting all the events in an appropriate representation on the premise of meeting the bandwidth limitation, and transmitting all the events in a larger representation accuracy as much as possible. In the schemes described in fig. 11 to 20 and the existing schemes, the bit width of the light intensity characteristic information is fixed, and considering the problems existing in the prior art, the present application further provides a scheme for dynamically adjusting the bit width of the light intensity characteristic information, which can transmit all events with greater representation accuracy on the premise of meeting the bandwidth limitation compared with the scheme of fig. 11 to 20 in which only two event representation modes are provided. The scheme of dynamically adjusting the bit width of the light intensity characteristic information is that when the data amount required to be transmitted by the vision sensor in a period of time (such as a unit time) exceeds the bandwidth, the bit width of the light intensity characteristic information is reduced, namely the accuracy of event representation is reduced until the bandwidth limit is met, and the event is encoded by adopting the bit width of the light intensity characteristic information meeting the bandwidth limit (specifically, the light intensity characteristic information of the event is encoded). The bit width of the light intensity characteristic information satisfying the bandwidth limitation (hereinafter also referred to as bit width, or bit width representing the light intensity characteristic information) may be determined in various manners, and will be described with reference to several preferred embodiments.
In some possible embodiments, the bit width of the optimal information representing the light intensity characteristic may be determined in a decreasing bit width manner. As shown in fig. 22, in the initial state, the first encoding unit 1907 may first encode an event according to the maximum bit width B, the control circuit 1906 calculates whether the rate of event generation exceeds the bandwidth limit based on the data per unit time supplied from the reading circuit 1905, and if the bandwidth is exceeded, gradually decreases the quantization accuracy, that is, gradually decreases the bit width of the light intensity characteristic information, for example, adjusts the bit width representing the light intensity characteristic information to (B-1), determines whether the rate of event generation exceeds the bandwidth when the bit width is (B-1), whether the rate of event generation exceeds the bandwidth when the bit width is (B-2), and whether the rate of event generation exceeds the bandwidth when the bit width is (B-n), and n is a positive integer. The control circuit 1906 compares the adjusted rate of occurrence of the predicted event with the bandwidth; if the bandwidth limit (i.e., not greater than the bandwidth) is satisfied, the first encoding unit 1907 is controlled to encode the event with the bit width of the current stage, for example, when it is determined that the bit width representing the light intensity characteristic information is (B-1), the rate at which the event is generated does not exceed the bandwidth limit, i.e., the event is encoded with (B-1). For a better understanding of this embodiment, the following is illustrative. Assuming that the maximum bit width B is 12 bits, i.e. it is predefined that at most only 12 bits can be used for encoding one event, assuming that the bandwidth is limited to 3000bps (maximum allowable transmission 3000 bits per second), i.e. the preset maximum bandwidth is 3000bps, assuming that in an actual scenario 100 events are generated at 1 st second, 300 events are generated at 2 nd second, 400 events are generated at 3 rd second, and 180 events are generated at 4 th second.
If the prior art random discard scheme is used, the following situation occurs:
second 1: 100 events are transmitted, 0 events are lost, the event loss rate is 0, and the visual sensor transmits 1200 bits in total.
Second 2: 250 events were transmitted, 50 events were lost, the event loss rate was 16.7%, and the visual sensor transmitted a total of 3000 bits.
3 rd second: 250 events were transmitted, 150 events were lost, the event loss rate was 37.5%, and the visual sensor transmitted a total of 3000 bits.
Second 4: 180 events are transmitted, 0 events are lost, the event loss rate is 0, and the visual sensor transmits 2160 bits in total.
The above scheme loses 200 events, transmitting 9360 bits altogether, with a loss rate of 20.4%.
If the scheme provided by the application is adopted, the dynamic adjustment represents the light intensityThe bit width of the characteristic information, for example, the bit width of the optimal characteristic information representing the light intensity is determined by adopting a bit width decreasing mode, so that the loss rate of the event can be effectively reduced, and the description is continued by combining the above examples. The above description indicates that the parameter s represents the number of events encoded by s bits, the parameter i represents the bit width of the light intensity characteristic information encoded by i bits, and the maximum bit width B is 12 bits, so the bit width is only required The bits can be represented, hereThe representation is rounded up, i.e. imax may be 4, and in addition, s is here assumed to be 32, i.e. the number of events is encoded by 32 bits.
The scheme of dynamically adjusting the characteristic information of the represented light intensity can reduce the event loss rate:
second 1: since 100 events are generated, first the calculation is performed at maximum bit width B, i.e. the rate at which events are generated is 100×12<3000 bits, not exceeding the bandwidth limit, so that the number of events 100 (32 bits) and the bit width size 12 (4 bits) are transmitted, then 100 events are transmitted, each event being encoded at 12 bits, for a total of 32+4+100×12=1236 bits.
Second 2: firstly, calculating the rate 300 multiplied by 12>3000 of event generation according to the maximum bit width of 12 bits, exceeding the bandwidth limit, and if one event is continuously encoded according to the maximum bit width, event loss can be generated; the bit width is decremented, and the rate of event generation is 300×11>3000 calculated by an 11-bit computer, and the bandwidth limit is still exceeded; then the bit width continues to be decremented, taking into account that there are also events of 32 bits and bit width size 4 bits that need to be transmitted, with the rate 300 x 10 = 3000 resulting from the 10 bit calculation event, still exceeding the bandwidth as a whole; the bit width continues to be decremented at a rate of 300 x 9 = 2700 for the event generation calculated with 9 bits, taking into account the 32 bits of the event number and the 4 bits of the bit width, still not exceeding the bandwidth limit. Thus, if it is determined that 9 bits are the optimal bit width, the control circuit controls the first encoding unit to encode one event using 9 bits, and further, it is necessary to transmit the event number 300 (32 bits) and bit width size 9 (4 bits), and then transmit 300 events, each of which is encoded in 9 bits, and it is necessary to transmit 32+4+300×9=2736 bits in total.
3 rd second: firstly, calculating the rate 400 multiplied by 12>3000 generated by the event according to the maximum bit width of 12 bits, and exceeding the bandwidth limit; the bit width is decremented, and the rate generated by adopting 11 bits to calculate events is 400 multiplied by 11>3000, and the bandwidth limit is still exceeded; the rate of event generation 400 x 10>3000 is calculated with 10 bits, still exceeding the bandwidth limit, the rate of event generation 400 x 9>3000 is calculated with 9 bits, still exceeding the bandwidth limit, the rate of event generation 400 x 8>3000 is calculated with 8 bits, still exceeding the bandwidth limit, and the rate of event generation 400 x 7 = 2800 is calculated with 7 bits. Therefore, if it is determined that 7 bits are the optimal bit width, the control circuit controls the first encoding unit to encode one event using 7 bits, and further, it is necessary to transmit the event number 300 (32 bits) and bit width size 9 (4 bits), and then transmit 300 events, each of which is encoded in 7 bits, and it is necessary to transmit 32+4+400×7=2836 bits in total, that is, the event generation speed is 2836bps.
Second 4: first, the rate 180×12=2160 of event generation is calculated according to the maximum bit width of 12 bits, and the bandwidth limit is not exceeded, and then the control circuit controls the first encoding unit to encode one event by 12 bits. A total of 32+4+180×12=2196 bits needs to be transmitted.
By the scheme for dynamically adjusting the bit width of the light intensity characteristic information, 0 events are lost in the example, and only 9004 bits are transmitted in total, so that 3.8% of data volume is saved, and meanwhile, each event can be transmitted with different precision. The scheme can save 23.4% of the data volume if compared to the original data, i.e. if each event is coded in 12 bits without considering event loss.
In the above example, the control circuit calculates the rate at which the event occurs at the maximum bit width every second, and decrements the maximum bit width when the bandwidth limit is not satisfied, so as to satisfy the bandwidth limit, in this way, it is always ensured that all events are transmitted with the maximum representation accuracy without losing the event. In some possible embodiments, the rate at which events occur may be calculated every second for the current bit width, and when the bandwidth limit is not met, the current bit width is decremented to meet the bandwidth limit, and when the bandwidth limit is met, the bandwidth may be incremented to achieve transmission of all events with maximum presentation accuracy, as will be described further below in connection with the above examples.
Second 1: since 100 events are generated, first the calculation is performed at maximum bit width B, i.e. the rate at which events are generated is 100×12<3000 bits, not exceeding the bandwidth limit, so that the number of events 100 (32 bits) and the bit width size 12 (4 bits) are transmitted, then 100 events are transmitted, each event being encoded at 12 bits, for a total of 32+4+100×12=1236 bits.
Second 2: firstly, calculating the rate generated by an event according to the current bit width, namely calculating the rate 300 multiplied by 12>3000 generated by the event according to the bit width of 12 bits, wherein the rate exceeds the bandwidth limit, and if one event is continuously encoded according to 12 bits, event loss can be generated; the bit width is decremented, and the rate of event generation is 300×11>3000 calculated by an 11-bit computer, and the bandwidth limit is still exceeded; then the bit width continues to be decremented, taking into account that there are also events of 32 bits and bit width size 4 bits that need to be transmitted, with the rate 300 x 10 = 3000 resulting from the 10 bit calculation event, still exceeding the bandwidth as a whole; the bit width continues to be decremented at a rate of 300 x 9 = 2700 for the event generation calculated with 9 bits, taking into account the 32 bits of the event number and the 4 bits of the bit width, still not exceeding the bandwidth limit. Thus, if it is determined that 9 bits are the optimal bit width, the control circuit controls the first encoding unit to encode one event using 9 bits, and further, it is necessary to transmit the event number 300 (32 bits) and bit width size 9 (4 bits), and then transmit 300 events, each of which is encoded in 9 bits, and it is necessary to transmit 32+4+300×9=2736 bits in total.
3 rd second: the rate of event generation is calculated first at the current bit width, i.e. at the rate of event generation of 400 x 9>3000 calculated at 9 bits, exceeding the bandwidth limit, then at the rate of event generation of 400 x 8>3000 calculated at 8 bits, still exceeding the bandwidth limit, then at the rate of event generation of 400 x 7 = 2800 calculated at 7 bits. Therefore, if it is determined that 7 bits are the optimal bit width, the control circuit controls the first encoding unit to encode one event using 7 bits, and further, it is necessary to transmit the event number 300 (32 bits) and bit width size 9 (4 bits), and then transmit 300 events, each of which is encoded in 7 bits, and it is necessary to transmit 32+4+400×7=2836 bits in total, that is, the event generation speed is 2836bps.
Second 4: first, the bit width is increased by the rate generated by the current bit width calculation event, namely, the rate generated by the 7-bit calculation event 180×7 < 3000, and considering that the event number 32 bits and the bit width size 4 bits need to be transmitted, the bandwidth limit is still not exceeded in total, the bit width is increased, the rate generated by the 8-bit calculation event 180×8 < 3000 is adopted, the bandwidth limit is still not exceeded in total, the bit width is increased continuously, the rate generated by the 9-bit calculation event 180×9 < 3000 is adopted, the bandwidth limit is still not exceeded in total, the bit width is increased continuously, the rate generated by the 10-bit calculation event 180×10 < 3000 is adopted, the bandwidth limit is still not exceeded in total, the bit width is increased continuously, the rate generated by the 12-bit calculation event 180×12=2160 is not exceeded, the bandwidth limit is not exceeded in total, the 12-bit width is determined to be optimal, and the control circuit controls the first encoding unit to encode one event by 12 bits, and a total of 32+4+180+12=2196 bits is required to be transmitted.
In addition to dynamically adjusting the bit width representing the light intensity characteristic information in a bit width decreasing manner as mentioned above, determining the bit width representing the light intensity characteristic information satisfying the bandwidth limitation may also determine the bit width representing the light intensity characteristic information satisfying the bandwidth limitation in other manners, which will be further described with reference to several preferred embodiments.
In some possible embodiments, the bit width of the light intensity characteristic information meeting the bandwidth limitation can be determined according to a binary search method, for example, taking 400 events generated in the 3 rd second as an example for explanation, firstly, an event is encoded according to 12 bits, and the control circuit determines the rate of event generation 400×12>3000; then calculating the rate 400 multiplied by 6<3000 generated by the event according to half of 12 bits, namely 6 bits, and exceeding the limit of bandwidth; then the mid-point between 12 bits and 6 bits is calculated, i.e. the rate of event generation at 9 bits is 400 x 9>3000, still exceeding the bandwidth limit; then the mid-point between 9 bits and 6 bits is calculated, i.e. the rate of event generation at 8 bits is 400 x 8>3000, still exceeding the bandwidth limit; calculating the midpoint between 8 bits and 6 bits, namely the rate 400 multiplied by 7<3000 generated by events at 7 bits, which is smaller than the limit of bandwidth; since the rate at which events occur at 8 bits exceeds the bandwidth limit, the rate at which events occur at 6 bits and 7 bits do not exceed the bandwidth limit, and thus the events are encoded with 7 bits of higher quantization accuracy. The above process only needs 5 comparisons, and the algorithm complexity is O (log B).
In some possible embodiments, the approximate value estimation method can also be used to determine the bit width of the information representing the light intensity characteristic which meets the bandwidth limitation, for example, 400 events are generated in total at the 3 rd second, so that the bit width of each event can be calculated to be not more than 3000bps according to the bandwidth limitationBits, hereRepresenting a rounding down. The control circuit controls the first encoding unit to encode one event with 7 bits.
By dynamically adjusting the bit width of the light intensity characteristic information, when the generation rate of the event is small and the bandwidth limit is not reached, the event is encoded according to the maximum bit width quantization event, when the generation rate of the event is large, the bit width of the light intensity characteristic information is gradually reduced to meet the bandwidth limit, and then, if the generation rate of the event is reduced, the bit width of the light intensity characteristic information can be increased on the premise of not exceeding the bandwidth limit.
In some embodiments, the pixel array may be further divided into regions, and the maximum bit widths of the different regions may be set by using different weights, so as to adapt to different interested regions in the scene, for example, a larger weight may be set in a region possibly including the target object, so that the accuracy of representing the event output by the region including the target object is higher, a smaller weight may be set in the background region, so that the accuracy of representing the event output by the background region is lower.
Referring to fig. 23, a block diagram of another vision sensor is provided herein. In the vision sensor, the pixel array circuit is divided into areas, and different areas adopt different coding modes, namely, the different areas output events with different representing precision. As shown in fig. 23, two pixel circuits are exemplified. Assuming that the pixel circuit 1900 is one pixel circuit in a first region of a pixel array circuit of the vision sensor, the pixel circuit 2100 is one pixel circuit in a second region of the pixel array circuit. The first region and the second region are two different regions in the pixel array circuit, and the representation accuracy of the time output by the first region and the second region is different.
The reading circuit 2105 may be configured to read the data signal generated by the pixel circuit 1900 and the data signal generated by the pixel circuit 2100, and transmit the encoded data signal output by the first encoding unit 1907 to the control circuit 2106, and transmit the encoded data signal output by the second encoding unit 2107 to the control circuit 2106. The control circuit 2106 may control how large bits the first encoding unit 1907 uses to encode the event according to the data transmitted by the reading circuit, and control how large bits the second encoding unit 2107 uses to encode the event, that is, the control circuit 2106 may control the accuracy of the representation of the event output by the pixel circuit 1900 and the pixel circuit 2100, respectively.
The light intensity detection unit 2101, the threshold comparison unit 2102, the reading control unit 2103, the light intensity acquisition unit 2104 and the second encoding unit 2107 may refer to the light intensity detection unit 1901, the threshold comparison unit 1902, the reading control unit 1903, the light intensity acquisition unit 1904 and the first encoding unit 1907, respectively, and are not repeated here.
In the following, a specific example is described in which different areas of the pixel array circuit controlled by the control circuit are encoded differently. Referring to fig. 24, a schematic diagram of area division of a pixel array is shown. Referring to fig. 24, the pixel array circuit is divided into 6 regions, namely, a region, B region, C region, D region, E region and F region. Different weights may be set for different regions, such as a region that may include a target object is set with a larger weight, a background region is set with a smaller weight, and in fig. 24, for example, a region D, a region E, and a region F are set with a larger weight, a region a, a region B, and a region C are set with a smaller weight, specifically, a region a has a weight of 0.05, B region B has a weight of 0.1, C region C has a weight of 0.05, D region has a weight of 0.2, E region E has a weight of 0.4, and region F has a weight of 0.2. For example, if the maximum bit width is 12 bits, the maximum bit width of the E region with the greatest weight is set to 12 bits, and according to the weight of each region, the maximum bit width of the a region is set to 2 bits, the maximum bit width of the B region is set to 3 bits, the maximum bit width of the C region is set to 2 bits, the maximum bit width of the D region is set to 6 bits, and the maximum bit width of the F region is set to 6 bits. It should be noted that, in fig. 24, the pixel array is divided into 6 regions, and the weights set for each region are for illustration, and not meant to limit the scheme, and actually, different numbers of regions may be divided according to the requirements, and different maximum bit widths may be set for different regions. The manner in which the control circuit determines the optimal bit width for each region is the same as that described above for the control circuit to determine the optimal bit width for the pixel circuit 1900, and a bit width decreasing manner, a binary search method, an approximation estimation method, and the like may be employed. Illustratively, a further explanation will be made below in a manner of decreasing bit width.
Assuming that the bandwidth is limited to 3000bps, since the pixel array is divided into regions, the bandwidth allocated for each region is also different, and the above example continues to explain that the weight of the a region is 0.05, the a region allocated bandwidth is 3000×0.05=150 bps, the weight of the B region is 0.1, the B region allocated bandwidth is 3000×0.1=300 bps, the weight of the c region is 0.05, the c region allocated bandwidth is 3000×0.05=150 bps, the weight of the D region is 0.2, the D region allocated bandwidth is 3000×0.2=600 bps, the weight of the E region is 0.4, the E region allocated bandwidth is 3000×0.4=1200 bps, the weight of the F region is 0.2, and the F region allocated bandwidth is 3000×0.2=600 bps. Assuming that 50 events are generated in the A area, 80 events are generated in the B area, 60 events are generated in the C area, 90 events are generated in the D area, 100 events are generated in the E area, and 80 events are generated in the F area, then:
region a: for the a region, the control circuit first determines that the rate of event generation is 50×2 < 150 according to the maximum bit width 2 bits of the a region, and considering that the number of events is 32 bits and the bit width size is 4 bits and needs to be transmitted, and still does not exceed the bandwidth limit, so the control circuit controls the encoding unit corresponding to the a region to encode one event with 2 bits, for example, the pixel circuit 1900 is one pixel circuit in the a region, and the control circuit 2106 controls the first encoding unit 1907 to encode one event with 2 bits, and the a region needs to transmit 32+4+50×2=136 bits in total.
Region B: for the B region, the control circuit first determines the rate of event generation 80×3 < 300 according to the maximum bit width 3 bits of the B region, and considering that the number of events is 32 bits and the bit width size is 4 bits and needs to be transmitted, the bandwidth limit is still not exceeded, so the control circuit controls the corresponding encoding unit of the B region to encode one event with 3 bits, for example, the pixel circuit 2100 is one pixel circuit in the B region, and the control circuit 2106 controls the second encoding unit 2107 to encode one event with 3 bits, and the B region needs to transmit 32+4+80×3=276 bits in total.
Region C: for the C area, the control circuit firstly determines the rate 60 multiplied by 1 < 150 generated by the event according to the maximum bit width 2 bits of the C area, and the control circuit controls the coding unit corresponding to the C area to code one event by adopting 1 bit in consideration of the fact that the number of the events is 32 bits and the bit width size is 4 bits and need to be transmitted and still does not exceed the bandwidth limit. The C region requires a total of 32+4+60×1=96 bits to be transmitted.
Region D: the bandwidth is 3000×0.2=600 bps, encoded in 6 bits, and the D region requires a total of 32+4+90×6=576 bits to be transmitted.
Region E: with a bandwidth of 3000 x 0.4=1200 bps, encoding at 12 bits would exceed the bandwidth limit, encoding at 11 bits, the E region would require a total of 32+4+100 x 11=1136 bits to be transmitted.
Region F: the bandwidth is 3000×0.2=600 bps, encoded with 6 bits, and the F region requires a total of 32+4+80×6=516 bits to be transmitted.
In summary, the total number of bits transmitted in 1 second is 2736 bits, and if compared with the original data, i.e. event loss is not considered, each event is encoded according to 12 bits, the scheme can save 50.4% of data.
Fig. 25 shows a block diagram of a control circuit of a possible embodiment of the present application. The control circuitry may be used to implement the control circuitry 1906 of fig. 21, 23, etc., as shown in fig. 25, including at least one processor 2301, at least one memory 2302 coupled to the processor 2301, and a communication mechanism 2303 coupled to the processor 2301. The memory 2302 is used at least for storing a computer program and data signals acquired from a reading circuit. The control circuit may be communicatively coupled to the read circuit 2105 of the vision sensor as in fig. 21, 23 or to a read circuit external to the vision sensor via a communication mechanism 2303, a first encoding unit 1907, and a second encoding unit 2109 to implement control functions for them. When the processor reads the computer program stored in the memory 2302, the actions performed by the control circuits described above in fig. 21 to 24 are performed.
It should be noted that the control circuit shown in fig. 25 may further include a preconfigured statistical model 231 and a policy module 232. Historical statistics are made of the occurrence of light intensity variation events over a period of time (e.g., a unit of time) by the pixel array circuitry provided by the read circuitry 2105. The statistical model 231 may then transmit the statistical data to the policy module 232. The statistical data may indicate the number of light intensity variation events and may also indicate the total data amount of light intensity variation events.
In some embodiments, a plurality of statistical models may be included in the processor, and a policy module corresponding to the plurality of statistical models. Such as the control circuit shown in fig. 25, may be combined with the control circuit shown in fig. 8. For example, in some embodiments, the processor of the control circuit includes a statistical model 606, a policy module 608, a statistical model 231, and a policy module 232.
Fig. 26 shows a block diagram of an electronic device according to a possible embodiment of the present application. As shown in fig. 26, the electronic device includes a vision sensor chip 2400, a control circuit 2401, and a parsing circuit 2402. It should be appreciated that the electronic device is for exemplary purposes and may be implemented using any suitable device, including various sensor devices currently known and developed in the future. Embodiments of the present application may also be embodied in different sensor systems. In addition, it should also be understood that the electronic device may also include other elements, modules, or entities, which are not shown for clarity, but are not meant to be provided by embodiments of the present application.
The vision sensor chip 2400 and the control circuit 2401 may be understood with reference to the vision sensor and the control circuit described in fig. 21 to 25, and the detailed description thereof will not be repeated here. The parsing circuit 2402 may be configured to parse data signals read by the read circuit in the vision sensor chip 2400. In a possible embodiment of the present application, the parsing circuit 2402 may parse the data signal transmitted by the vision sensor using a bit width that is compatible with the data format bit width currently employed by the vision sensor chip 2400. In order to better understand how the parsing circuit parses the data signals transmitted by the vision sensor chip, two specific examples are described below.
One example is mentioned above: assuming a maximum bit width B of 12 bits, the bandwidth limit is 3000bps (maximum allowable transmission of 3000 bits per second), the bit width representing the light intensity characteristic information is encoded by 4 bits by 32 bits of the number of events, the visual sensor generates 100 events at 1 st second, 300 events at 2 nd second, 400 events at 3 rd second, and 180 events at 4 th second. Then, with the scheme provided in this application, the 1 st second encodes an event with 12 bits, the 2 nd second encodes an event with 9 bits, the 3 rd second encodes an event with 7 bits, and the 4 th second encodes an event with 12 bits. The following continues to describe how the parsing circuit parses the data signals transmitted by the vision sensor chip in connection with this example.
The data output by the vision sensor chip may be a binary data stream, which may include three parts for representing the number of events, the bit width, and each event after encoding, respectively. As shown in fig. 27, which is a schematic diagram of a binary data stream, the first s bits of the binary data stream output by the vision sensor chip are used to represent the number of events, for example, in the previous example, s is 32, the parsing circuit may read the first 32 bits of the binary data stream to parse the number of events, by parsing the first 32 bits of the binary data stream corresponding to the 1 st second, the number of events of the 1 st second may be parsed to be 100, and then i bits are parsed to obtain the bit width representing the light intensity feature information, in the previous example, s is 4, by parsing the 4 bits of the corresponding position of the binary data stream corresponding to the 1 st second, the bit width representing the light intensity feature information of the 1 st second may be 12 bits, that is, the 1 st second represents the 1 event by 12 bits, and then according to the binary data stream corresponding to the 1 st second, the 100 events are parsed sequentially according to the 12 bits.
The same as the analysis process of the 1 st second, by analyzing the first 32 bits of the binary data stream corresponding to the 2 nd second, the number of events of the 2 nd second can be analyzed to be 300, then 4 bits are read to analyze the bit width to be 9 bits, and then 300 events are sequentially analyzed according to the 9 bits. By analyzing the first 32 bits of the binary data stream corresponding to the 3 rd second, the number of events in the 3 rd second can be analyzed to be 400, then 4 bits are read to analyze the bit width to be 7 bits, and then 400 events are sequentially analyzed according to the 7 bits. The first 32 bits of the binary data stream corresponding to the 4 th second are analyzed, the number of events of the 4 th second is 180, the 4 bits are read to analyze the bit width to be 12 bits, and then the 180 events are sequentially analyzed according to the 12 bits.
In some possible embodiments, in case of an event stream based read mode, each event is represented as<x,y,t,m>(x, y) represents the pixel position at which the event is generated, t represents the time at which the event is generated, and m represents the characteristic information of the light intensity. Let x pass b x Represented by a bit, y is b y Represented by a bit, t passes b t The number of bits indicates that m is a bit width indicating the light intensity characteristic information. Correspondingly, when the initial state of the circuit is analyzed, the front b can be analyzed x The bits indicate the coordinates x of the pixel, next b y The bits indicate the coordinates y of the pixel, followed by b t And finally analyzing the number of the s bits to represent the number of the events, and analyzing the events according to the bit widths indicated by the i bits, wherein the i bits represent the bit widths of the light intensity characteristic information.
In some possible embodiments, if different regions of the pixel array circuit employ different encoding schemes, then for each region, a bit width analysis data signal representing the light intensity characteristic information corresponding to that region is employed separately. An example is mentioned above in connection with fig. 24, and the following description will continue in connection with this example. For the area A, the analysis circuit reads 32 bits to analyze that the number of events is 50, reads 4 bits to analyze that the bit width is 2, and then sequentially analyzes 50 events according to 2 bits of each event. For region B, the number of events resolved by reading 32 bits is 80, the bit width resolved by reading 4 bits is 3, and then 80 events are resolved sequentially for each event of 3 bits. For the region C, the number of the events is 60 by reading 32 bits, the bit width is 1 by reading 4 bits, and then 60 events are sequentially analyzed according to 1 bit of each event. For the region D, the number of the events is 90 by reading and reading 32 bits, the bit width is 6 by reading 4 bits, and then 90 events are sequentially analyzed according to 6 bits of each event. For the region E, the number of the events is 100 by reading and reading 32 bits, the bit width is 11 by reading and reading 4 bits, and then 100 events are sequentially analyzed according to 11 bits of each event.
In one possible implementation, the parsing circuit 2400 may employ the same control strategy as in the vision sensor chip 2400 to determine a parsing pattern that is appropriate for the current event presentation of the reading circuit. As an example, if the visual sensor chip 2400 represents one event by R bits in an initial state, the parsing circuit parses the event accordingly based on the number of bits associated with the representation (e.g., R bits in the initial state), and if the visual sensor adjusts the event representation according to the amount of data required to be transmitted and the preset maximum bandwidth of the visual sensor, the parsing circuit 2400 determines to parse the event by the associated number of bits using the same adjustment policy as the visual sensor.
The embodiment of the application also provides a method for operating the vision sensor chip. Referring to fig. 28, which is a flowchart of a method for operating a vision sensor chip according to a possible embodiment of the present application, may include the steps of:
2601. at least one data signal corresponding to a pixel in the pixel array circuit is generated by measuring the amount of light intensity variation.
The pixel array circuit generates at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of light intensity change, the at least one data signal being indicative of a light intensity change event, the light intensity change event being indicative of the amount of light intensity change measured by the corresponding pixel in the pixel array circuit exceeding a predetermined threshold.
Step 2601 may be understood with reference to step 1801 in the corresponding embodiment of fig. 20, and the detailed description will not be repeated here.
2602. At least one data signal is encoded according to the first bits to obtain first encoded data.
The first encoding unit encodes at least one data signal according to the first bit to obtain first encoded data. The first encoding unit may be understood with reference to the steps performed by the first encoding unit 1907 in fig. 21.
2603. When the first control signal is received from the control circuit, at least one data signal is encoded according to a second bit indicated by the first control signal, the first control signal being determined by the control circuit according to the first encoded data.
The first encoding unit encodes at least one data signal according to a second bit indicated by the first control signal when receiving the first control signal from the control circuit, the first control signal being determined by the control circuit according to the first encoded data.
The first encoding unit may be understood with reference to the steps performed by the first encoding unit 1907 in fig. 21.
In some possible embodiments, the control signal is determined by the control circuit based on the first encoded data and a preset bandwidth of the vision sensor chip.
In some possible embodiments, the second bit indicated by the control signal is smaller than the first bit when the data amount of the first encoded data is not smaller than the bandwidth, such that the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth.
In some possible embodiments, when the data amount of the first encoded data is not greater than the bandwidth, the second bit indicated by the control signal is greater than the first bit, and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth.
In some possible embodiments, the pixel array includes Y regions, a maximum bit of at least two regions of the Y regions is different, the maximum bit representing a preset maximum bit of encoding the at least one data signal generated by one of the regions, and the first encoding unit is specifically configured to encode the at least one data signal generated by the first region according to the first bit to obtain first encoded data, where the first bit is not greater than the maximum bit of the first region, and the first region is any one region of the Y regions; the first encoding unit is specifically configured to encode the at least one data signal generated in the first area according to a second bit indicated by the first control signal when the first control signal is received from the control circuit, where the first control signal is determined by the control circuit according to the first encoded data.
In some possible embodiments, the control circuit is further configured to: when it is determined that the total data amount of the at least one data signal encoded by the third bit is greater than the bit and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth, the first control signal is transmitted to the first encoding unit, the third bit and the second bit differing by 1 bit unit. To ensure that events are encoded with larger bits and all events are transmitted as much as possible, while meeting bandwidth constraints.
In order to be able to transmit all events generated by the visual sensor while meeting the bandwidth limitation, the above-described scheme transmits all events with greater accuracy of representation while meeting the bandwidth limitation by adjusting the accuracy of the representation of the events. However, decreasing the accuracy of the representation of the event, i.e., decreasing the bit width of the representation of the event, reduces the information that the event can carry, and in some scenarios is detrimental to the processing and analysis of the event. The manner of reducing the accuracy of event representation may not be applicable to all scenes, i.e. in some scenes, the event needs to be represented by a bit width of high bits, but the event represented by the bit width of high bits is also mentioned in the foregoing, although more data can be carried, the data amount is also larger, and under the condition that the preset maximum bandwidth of the visual sensor is fixed, there may be a situation that the event data cannot be read out, so that the data is lost. In order to solve this problem, embodiments of the present application further provide a vision sensor, which will be specifically described below.
Referring to fig. 29-a, a block diagram of another vision sensor provided herein is provided. The vision sensor in the present application may be implemented as a vision sensor chip, which is not repeated herein. As shown in fig. 29-a, the vision sensor includes a pixel array circuit 2701 and a read circuit 2702. The reading circuit 2702 may read a data signal output from the pixel array circuit 2701 and transmit the data signal to the third encoding unit 2703 for the third encoding unit 2703 to encode the acquired data signal, and how the third encoding unit 2703 encodes the acquired data signal will be described below. The data signal encoded by the third code 2703 may be read outside the vision sensor.
In some possible embodiments, the third coding unit 2703 may be disposed inside the vision sensor, referring to fig. 29-b, which is a block diagram of another vision sensor provided in an embodiment of the present application, as shown in fig. 29-b, the vision sensor 2800 further includes the third coding unit 2703, where the third coding unit 2703 may be implemented by software or hardware, and the embodiment of the present application is not limited thereto.
In some possible embodiments, the vision sensor may further include a control circuit, referring to fig. 29-c, a block diagram of another vision sensor provided for embodiments of the present application, as shown in fig. 29-c, the vision sensor 2900 further includes a control circuit 2704, and the control circuit 2704 may be configured to control a mode in which the reading circuit 2704 reads the data signals. For example, the reading circuit 905 may read in one of a first reading mode and a second reading mode, which respectively correspond to one of a frame scan-based reading mode and an event stream-based reading mode. It should be noted that the control circuit 2704 may not be disposed inside the vision sensor, and in some possible embodiments, the reading circuit 2704 may also read the data signal output by the pixel circuit in only one signal reading mode, for example, the reading circuit 2704 is configured to read the data signal output by the pixel circuit in only a reading mode based on frame scanning, or the reading circuit 2704 is configured to read the data signal output by the pixel circuit in only a reading mode based on event stream.
It has been mentioned above that each pixel array circuit may comprise one or more pixel arrays and each pixel array comprises a plurality of pixels, each pixel may be regarded as one pixel circuit, each pixel circuit being for generating a data signal corresponding to the pixel. Referring to fig. 30, a schematic diagram of another preferred pixel circuit according to an embodiment of the present application is shown. The pixel circuit 3000 includes a light intensity detection unit 3001, a threshold comparison unit 3002, a readout control unit 3003, and a light intensity acquisition unit 3004.
The light intensity acquisition unit 3001 is configured to convert an acquired light signal into an electrical signal. The light intensity detection unit 3001 may be understood with reference to the light intensity detection unit 901 in the corresponding embodiment of fig. 11, and the description thereof will not be repeated here.
The threshold comparing unit 3002 is configured to determine whether the first electrical signal is greater than a first target threshold or whether the first electrical signal is less than a second target threshold. The first electrical signal is an electrical signal currently output by the light intensity acquisition unit, and when the first electrical signal is greater than the first target threshold or the first electrical signal is less than the second target threshold, the threshold comparison unit 3002 outputs polarity information indicating whether the light intensity conversion is increasing or decreasing, for example, the polarity information may be +1 or-1, +1 indicates that the light intensity is increasing, and-1 indicates that the light intensity is decreasing. The threshold comparing unit 3002 is used to compare whether the difference between the current light intensity and the light intensity at the time of the last event generation exceeds a predetermined threshold, and can be understood with reference to formula 1-1. The first target threshold may be understood as the sum of the first predetermined threshold and the second electrical signal and the second target threshold may be understood as the sum of the second predetermined threshold and the second electrical signal. The second electric signal is an electric signal output by the light intensity detection unit 3001 at the time of the last event occurrence. The threshold comparison unit in the embodiment of the present application may be implemented by hardware, or may be implemented by software.
When the readout control unit 3003 acquires the polarity information, the light intensity acquisition unit 3004 is controlled to acquire the first electrical signal.
The readout control unit 3003 is also configured to notify the reading circuit to read the first electrical signal stored in the light intensity collection unit 3004. And notifies the reading circuit 3005 to read the polarity information output by the threshold comparing unit 3002.
The reading circuit 3005 may be configured to scan pixels in the pixel array circuit in a predetermined order to read a data signal generated by the corresponding pixels. In some possible embodiments, the read circuit 3005 may be understood with reference to the read circuit 220, the read circuit 320, and the read circuit 720, i.e., the read circuit 905 may be configured to be capable of reading data signals output by the pixel circuits in more than one signal read mode. For example, the reading circuit 3005 may read in one of a first reading mode and a second reading mode, which respectively correspond to one of a frame scan-based reading mode and an event stream-based reading mode. In some possible embodiments, the read circuit 905 may also read the data signals output by the pixel circuits in only one signal read mode, such as the read circuit 3005 being configured to read the data signals output by the pixel circuits in only a frame scan based read mode, or the read circuit 3005 being configured to read the data signals output by the pixel circuits in only an event stream based read mode.
The third encoding unit 3007 encodes polarity information and a difference value of the light intensity variation amount from a predetermined threshold value based on the data signal acquired from the reading circuit 3005. The working principle of the bionic visual sensor is introduced above, taking DVS as an example, by comparing the current light intensity with the light intensity when the last event occurs, when the variation amount reaches the preset release threshold C, an event is generated and output. I.e. typically the DVS will generate an event when the difference between the current light intensity and the light intensity at the time of the last event generation exceeds a predetermined threshold C. The method fully considers the working principle of the bionic visual sensor, and reduces the cost of event representation by utilizing a preset threshold value. The principle is described below, and for the light intensity information, the absolute light intensity L thereof is encoded in an initial state (i.e. upon first event readout), after which, if a new event occurs, only the difference value K between the light intensity variation and a predetermined threshold value and the polarity information are encoded. The principle is that a new event generating condition is that the light intensity variation of the current moment reaches a preset threshold value compared with the light intensity variation of the last event, and the light intensity variation is not completely equal to the preset threshold value but the differential value of the light intensity variation is about 0 in consideration of possible delay and noise influence; encoding and passing the differential value will therefore significantly reduce the cost of the data representation. Meanwhile, in order to achieve the accuracy of decoding, polarity information is transmitted to assist in judging the change trend (namely positive and negative) of the current light intensity compared with the last event, so that the light intensity at the current moment is rebuilt.
Exemplary, referring to fig. 31, a block diagram of a third coding unit is provided herein. The third encoding unit 2703 may include a storage module 271, a comparison module 272, and an encoding module 273. The storage module 271 may be configured to store a data signal acquired from the reading circuit 2702, and the data signal may include polarity information acquired from the threshold comparing unit 3002 by the reading circuit 2702 and light intensity information acquired from the light intensity acquiring unit 3004. The comparison module 272 is configured to compare the light intensity variation, that is, the difference between the currently acquired light intensity information and the last acquired light intensity information, and hereinafter, the difference between the currently acquired light intensity information and the last acquired light intensity information is referred to as a light intensity variation. The comparison module 272 is further configured to determine a difference between the amount of light intensity conversion and a predetermined threshold, wherein the predetermined threshold may be different in value depending on whether the light intensity indicated by the polarity information is increased or decreased. The difference between the light intensity conversion amount and the predetermined threshold value is hereinafter referred to as a differential value, which can be expressed by control as a differential value k= |l-L' | -C. The encoding module 273 encodes the polarity information stored in the storage module, for example, 1-2 bits may be used to encode the polarity information, and the encoding module 273 is further configured to encode the differential value output by the comparison module, which will be referred to as differential encoding hereinafter. In a preferred embodiment, the number of bits encoding the differential value may be determined based on a predetermined threshold, such as 30, and the differential value should theoretically be no greater than 30, so the maximum number of bits required for the differential value is Bits. In one possible embodiment, the differential value may still be greater than the predetermined threshold, and then the remaining differential value (the difference between the differential value and the predetermined threshold) may continue to be encoded until the remaining differential value is not greater than the predetermined threshold. For example, if the first calculated differential value (hereinafter referred to as a first differential value) is greater than a predetermined threshold value, the first differential value may be encoded into a second differential value, which isAnd if the difference value between the first differential value and the preset threshold value is the same, the absolute light intensity information is represented by the second differential value and the two preset threshold values, namely the second differential value is coded, and the preset threshold value is coded twice, so that the coded absolute light intensity information is obtained. For a better understanding of the process of encoding differential values in the embodiments of the present application, reference is made to fig. 32, which is described below in connection with a specific example:
assuming that the absolute light intensity information is represented by 10 bits, i.e., the maximum bit width representing the light intensity characteristic information is 10 bits, the predetermined threshold value is 30, according to the above analysis, the theoretical differential value should be equal to or less than the event issuing threshold value 30, and thus the maximum number of bits required for encoding the differential value isBits. If the number of events is 10, the cost of representing an event by the light intensity information is 10×10=100 bits. By adopting the coding scheme provided by the application, the representation cost of the event can be saved, namely the data quantity required to be transmitted is saved, and the following specific description is given: assuming that the absolute light intensities of 10 events to be transmitted are {80, 112, 150, 100, 65, 24, 81, 123, 170, 211} respectively, the events are encoded in maximum bit width in an initial state, and the absolute light intensity 80 of the 1 st event is encoded in 10 bits.
From the 2 nd event, polarity information is encoded in 1 bit, and the difference value of the light intensity variation amount and the issue threshold 30 is encoded in 5 bits. For event 2, the change in the absolute light intensity 80 compared with event 1 is |112-80|=32; and the difference value of the light intensity variation from the dispensing threshold 30 is 32-30=2. Since the intensity is increasing compared to event 1, i.e. 112>80, the polarity information is +1. The polarity information +1 is encoded with 1 bit and the differential value 2 is encoded with 5 bits, respectively.
For event 3, the absolute light intensity 112 of the light intensity change amount compared with event 2 is |150-112|=38, and the difference value between the light intensity change amount and the dispensing threshold value is 38-30=8; the polarity information is still +1. The polarity information +1 is encoded with 1 bit and the differential value 8 is encoded with 5 bits, respectively.
For the 4 th event, the light intensity variation of the absolute light intensity 150 compared with the 3 rd event is |100-150|=50, and the difference value between the light intensity variation and the dispensing threshold value is 50-30=20; the polarity information is-1, since the current absolute light intensity is reduced compared to the absolute light intensity at event 3, i.e. 100< 150. The polarity information-1 is encoded with 1 bit and the differential value 20 is encoded with 5 bits, respectively.
For the 5 th event, the light intensity variation of the absolute light intensity 100 compared with the 4 th event is |100-65|=35, and the difference value between the light intensity variation and the emission threshold is 35-30=5; the polarity information is-1 because the current absolute light intensity is reduced compared to the absolute light intensity at event 3, i.e. 65< 100. The polarity information-1 is encoded with 1 bit and the differential value 5 is encoded with 5 bits, respectively.
For the 6 th event, the light intensity variation of the absolute light intensity 65 compared with the 5 th event is |65-24|=41, and the difference value between the light intensity variation and the dispensing threshold value is 41-30=11; the polarity information is-1 because the current absolute light intensity is reduced compared to the absolute light intensity at event 5, i.e. 24< 65. The polarity information-1 is encoded with 1 bit and the differential value 11 is encoded with 5 bits, respectively.
For the 7 th event, the light intensity variation of the absolute light intensity 24 compared with the 6 th event is |81-24|=57, and the difference value between the light intensity variation and the emission threshold is 57-30=27; since the current absolute intensity is increased compared to the absolute intensity at event 5, i.e. 81 > 24, the polarity information is +1. The polarity information +1 is encoded with 1 bit and the differential value 27 is encoded with 5 bits, respectively.
For the 8 th event, the light intensity variation of the absolute light intensity 81 compared with the 7 th event is |123-81|=42, and the difference value between the light intensity variation and the dispensing threshold is 42-30=12; since the current absolute light intensity is increased compared to the absolute light intensity of event 7, i.e. 123 > 81, the polarity information is +1. The polarity information +1 is encoded with 1 bit and the differential value 12 is encoded with 5 bits, respectively.
For the 9 th event, the light intensity variation of the absolute light intensity 123 compared with the 8 th event is 170-123|=47, and the difference value between the light intensity variation and the emission threshold is 47-30=17; since the current absolute intensity is increased compared to the absolute intensity at event 3, i.e. 170 > 123, the polarity information is +1. The polarity information +1 is encoded with 1 bit and the differential value 17 with 5 bits, respectively.
For the 10 th event, the light intensity variation of the absolute light intensity 170 compared with the 9 th event is |211-170|=41, and the difference value between the light intensity variation and the dispensing threshold is 41-30=11; since the current absolute intensity is increased compared to the absolute intensity at event 3, i.e. 211 > 170, the polarity information is +1. The polarity information +1 is encoded with 1 bit and the differential value 11 is encoded with 5 bits, respectively.
In this example, the initial state 1 st event encodes 10 bits, after which 9 events encode the light intensity polarity by 1 bit, and the light intensity variation by 5 bits and the differential value of the issue threshold 30 are encoded, 10+ (1+5) ×9=64 bits in total. The absolute light intensity of the original coding with fixed 10 bits requires 10×10=100 bits, and the data coding mode provided by the application saves at least 36% of data quantity. The existing vision sensor does not consider an efficient coding strategy in event transmission and storage, coordinate information (x, y) of pixels is generally coded according to fixed bit width, the time t when the characteristic information of light intensity is read and the characteristic information of light intensity are coded, and when the characteristic information of light intensity is the light intensity information, the light intensity information often needs to be represented by a large number of bits. From the aspect of DVS sampling principle, the light intensity information of the front and rear events has a certain correlation, and especially, considering that the predetermined threshold is determined, these information can be used to reduce redundancy of the event data, so as to realize efficient compression. The scheme provided by the application utilizes the data correlation, reduces the correlation through event differential coding, and realizes the reduction of the data volume. The specific improvement comprises that after the initial state full quantity is coded, only polarity information and a difference value between the light intensity variation quantity and a preset threshold value are needed to be coded in a subsequent event, so that the coded data quantity can be effectively reduced. Wherein full-scale encoding refers to encoding an event with a maximum bit width predefined by the visual sensor. In addition, the light intensity information of the current moment can be reconstructed in a lossless manner by using the light intensity information of the last event, the decoded polarity information and the decoded differential value. The decoding process is described below with reference to fig. 33.
Fig. 33 shows a block diagram of an electronic device according to a possible embodiment of the present application. As shown in fig. 33, the electronic device includes a vision sensor chip 3100 and an analysis circuit 3101. It should be appreciated that the electronic device is for exemplary purposes and may be implemented using any suitable device, including various sensor devices currently known and developed in the future. Embodiments of the present application may also be embodied in different sensor systems. In addition, it should also be understood that the electronic device may also include other elements, modules, or entities, which are not shown for clarity, but are not meant to be provided by embodiments of the present application.
The vision sensor chip 3100 may be understood by referring to the vision sensors described in fig. 29-a to 32, and the detailed description thereof will not be repeated here. The parsing circuit 3101 may be configured to parse data signals read by the reading circuit in the vision sensor chip 3100. In a possible embodiment of the present application, the parsing circuit 3101 may decode the polarity information and the differential value according to a preset decoding method to obtain the light intensity information at the current time. In order to better understand how the parsing circuit 3101 parses the data signals transmitted by the vision sensor chip, the following description is made in connection with the above example.
In the initial state, the parsing circuit 3101 decodes the acquired binary data stream, and decodes the 1 st event according to the maximum bit width to acquire the absolute light intensity at the time corresponding to the 1 st second, for example, in the above example, decodes the absolute light intensity 80 of the 1 st event according to 10 bits.
The decoding process thereafter first parses the polarity information, such as the parsing circuit 3101 reads the first 1 bit in the binary data stream and decodes the 1 bit to obtain polarity information, and decodes the differential value according to the light intensity information representation bit width under differential encoding. And reconstructing the absolute light intensity at the current moment according to the absolute light intensity of the last event of the same pixel and a preset threshold value.
For example, for event 2, the light intensity polarity is decoded according to 1 bit to obtain +1; the difference value 2 is then decoded in 5 bits. Then, since the light intensity polarity is positive, it means that the light intensity of the 2 nd event is enhanced compared with the 1 st event, the absolute light intensity of the 2 nd event is calculated as 80+2+30=112, where 80 is the absolute light intensity of the 1 st event decoded, 2 is the differential value, and 30 is the event release threshold.
For the 3 rd event, the light intensity polarity is +1 according to 1 bit, and then the difference value is 8 according to 5 bits, so that the absolute light intensity of the 3 rd event is reconstructed to 112+8+30=150.
For the 4 th event, firstly decoding the light intensity polarity to be-1 according to 1 bit, and then decoding the difference value to be 20 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 3, the absolute intensity is reconstructed to 150-20-30 = 100.
For the 5 th event, firstly decoding the light intensity polarity to be-1 according to 1 bit, and then decoding the difference value to be 5 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 4, the absolute intensity is reconstructed to 100-5-30 = 65.
For the 6 th event, firstly decoding the light intensity polarity to be-1 according to 1 bit, and then decoding the difference value to be 11 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 5, the absolute intensity is reconstructed to 65-11-30=24.
For the 7 th event, firstly, decoding the light intensity polarity to be +1 according to 1 bit, and then decoding the difference value to be 27 according to 5 bits; since the intensity polarity is positive, representing an increase in intensity compared to event 5, the absolute intensity is reconstructed to 24+27+30=81.
For the 8 th event, firstly, decoding the light intensity polarity to be +1 according to 1 bit, and then decoding the difference value to be 12 according to 5 bits; since the intensity polarity is positive, representing an increase in intensity compared to event 5, the absolute intensity is reconstructed to 81+12+30=123.
For the 9 th event, firstly, 1 bit is used for decoding the light intensity polarity to be +1, and then 5 bits are used for decoding the difference value to be 17; since the intensity polarity is positive, representing an increase in intensity compared to event 5, the absolute intensity is reconstructed to 123+17+30=170.
For the 10 th event, firstly decoding the light intensity polarity to be +1 according to 1 bit, and then decoding the difference value to be 11 according to 5 bits; since the intensity polarity is positive, representing an increase in intensity compared to event 5, the absolute intensity is reconstructed to 170+11+30=211.
In some possible embodiments, full-scale encoding may be performed every preset time period to reduce decoding dependency and prevent bit errors. Continuing with the above example, which mentions that the resolution circuit 3101 can obtain the transform trend of the light intensity by reading the 1-bit encoding polarity information, since in one embodiment, the full-size encoding is added, it is also necessary to indicate whether the resolution circuit 3101 is currently using the full-size encoding or the differential encoding, for example, whether the resolution circuit 3101 is the full-size encoding or the differential encoding can be indicated by 2-bits, for example, when the resolution circuit 3101 decodes +1 and-1 by 2-bits, it is determined that the decoding is performed by the decoding method corresponding to the differential encoding, for example, +1 represents that the light intensity is increased, -1 represents that the light intensity is reduced, and if the resolution circuit 3101 parses 0 by 2-bits, it is determined that the decoding is performed by the decoding method corresponding to the full-size encoding. The following illustrates that the 1 st event uses 10 bits of coding, the 2 nd event to the 7 th event uses 2 bits of coding polarity information, and the 5 bits of coding differential value. Since the full-scale encoding is set to be performed once every preset time length to prevent decoding dependency and error codes, assuming that for the 8 th event, the preset time length is spaced from the 1 st event, for the 8 th event, the differential encoding mode is no longer adopted, that is, 2-bit encoding polarity information is no longer adopted, a 5-bit encoding differential value is adopted, and full-scale encoding is adopted, that is, the light intensity information 123 corresponding to the 8 th event is represented by 10 bits. The 9 th and 10 th events still use differential encoding, wherein the polarity information is encoded with 2 bits and the differential value is encoded with 5 bits.
The total data amount in the encoding process is 10+ (2+5) ×7+ (2+10) + (2+5) ×2=85 bits, and compared with the scheme that 10×10=100 bits are required for encoding absolute light intensity according to fixed 10 bits originally, the scheme that full-amount encoding is performed once every preset time length can save at least 15% of data amount.
For the scheme of performing full-scale encoding once every preset time, during decoding, the parsing circuit 3101 may determine which decoding mode should be used according to the polarity information, and reconstruct the current light intensity according to the differential value, the polarity information, the predetermined threshold value, and the decoded light intensity at the time of the last event issue, which will be described further below with reference to the above examples.
For the 1 st event, decoding is performed according to the maximum bit width of 10 bits to obtain the absolute light intensity at the moment corresponding to the 1 st second, and then all the events are firstly decoded according to 2 bits to obtain the light intensity polarity. If the light polarity information indicates that differential encoding is used, for example, if the polarity information is not 0, the differential value is decoded according to 5 bits, and if the polarity information indicates that full-scale encoding is used, for example, if the polarity information is 0, the light intensity information is decoded according to 10 bits.
Specifically, the 2 nd event decodes the light intensity polarity to +1 according to 2 bits first, and decodes the difference value to 2 according to 5 bits since the light intensity polarity is non-zero, and the absolute light intensity is reconstructed to 80+2+30=112.
And the 3 rd event is decoded according to 2 bits to obtain the light intensity polarity of +1, and then the difference value of 8 is decoded according to 5 bits, so that the absolute light intensity of the 3 rd event is reconstructed to 112+8+30=150.
For the 4 th event, firstly, decoding the light intensity polarity to be-1 according to 2 bits, and then decoding the differential value to be 20 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 3, the absolute intensity is reconstructed to 150-20-30 = 100.
For the 5 th event, firstly decoding the light intensity polarity to be-1 according to 2 bits, and then decoding the differential value to be 5 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 4, the absolute intensity is reconstructed to 100-5-30 = 65.
For the 6 th event, firstly, decoding the light intensity polarity to be-1 according to 2 bits, and then decoding the differential value to be 11 according to 5 bits; since the intensity polarity is negative, representing a decrease in intensity compared to event 5, the absolute intensity is reconstructed to 65-11-30=24.
For the 7 th event, firstly, decoding the light intensity polarity to be +1 according to 2 bits, and then decoding the differential value to be 27 according to 5 bits; since the intensity polarity is positive, representing an increase in intensity compared to event 5, the absolute intensity is reconstructed to 24+27+30=81.
For the 8 th event, the light intensity polarity is 0 according to the 2 bits, which means that the event is full-quantity coding, and the absolute light intensity is 123 according to the 10 bits.
For the 9 th event, the light intensity polarity is +1 after decoding according to 2 bits, and then the difference value is 17 after decoding according to 5 bits, and the absolute light intensity is reconstructed to be 123+17+30=170.
For the 10 th event, the light intensity polarity is +1 after decoding according to 2 bits, and then the difference value is 11 after decoding according to 5 bits, and the absolute light intensity is reconstructed to 170+11+30=211.
The decoding and light intensity reconstruction of 10 events are completed so far.
In the above embodiments, it is assumed that the difference between the light intensity variation and the differential value is not less than 0, and in some possible embodiments, if the light intensity variation and the differential value are less than 0, full-scale encoding is used, that is, if the difference between the light intensity variation and the differential value is less than 0, full-scale encoding is used, and if the difference between the light intensity variation and the differential value is not less than 0, differential encoding is used.
In some possible embodiments, the vision sensor may have a certain delay, which may cause two or more times to satisfy the light intensity transition amount being greater than a predetermined threshold, before an event is generated. This causes a problem that the difference value is equal to or larger than the predetermined threshold value, and the amount of change in light intensity is at least twice the predetermined threshold value. In order to solve this problem, a recursive index coding scheme is used, which will be described below.
The third encoding unit 2703 may first determine whether the differential value exceeds a predetermined threshold value, encode the differential value directly according to the differential encoding method mentioned above assuming that the differential value is smaller than the predetermined threshold value, assume that the differential value is not smaller than the predetermined threshold value, and the difference value (first remaining differential value) between the differential value and the predetermined threshold value is M1, encode M1 if the first remaining differential value M1 is smaller than the predetermined threshold value, and encode the differential value. For a better understanding of the present solution, the following description is given in connection with an example:
assuming that the bit width of the maximum representative light intensity characteristic information is 10 bits, 4 events are to be transmitted, and absolute light intensities of the 4 events are {80, 150, 100, 200} respectively, the predetermined threshold value is 30, the specific encoding procedure of the third encoding unit 2703 is as follows:
in the initial state, for event 1, the absolute light intensity 80 is encoded in 10 bits.
The absolute light intensity of the 2 nd event is 150, and compared with the light intensity variation of the 1 st event, the absolute light intensity variation of the 2 nd event is |150-80|=70, the polarity information is +1; and the difference value between the light intensity variation and the preset threshold value is 70-30=40, the residual difference value exceeds the preset threshold value 30, the residual difference value cannot be directly coded, the difference value is obtained by subtracting 30 from 40 and is 10, and the difference value 10 is coded when the difference value is smaller than the preset threshold value and less than 10. 1 predetermined threshold 30 is subtracted from the differential value 40 to the remaining differential value 10, then 1 predetermined threshold 30 is encoded, and the remaining differential value 10 is encoded. I.e., encoding polarity information +1, a predetermined threshold 30, and a residual differential value 10.
The absolute light intensity of the 3 rd event is 100, and compared with the 2 nd event, the light intensity variation is |100-150|=50, and the polarity information is-1; the difference value between the light intensity variation and the predetermined threshold is 50-30=20, and 20 is smaller than the predetermined threshold, the polarity information-1 is encoded, and the difference value is 20.
The total data amount in the above encoding process is 10+ (1+5) + (1+5) =27 bits, and the total data amount required for encoding according to the fixed 10 bits is 3×10=30 bits, so that the method of the present embodiment can save at least 10% of the data amount.
It is mentioned above that if the first residual differential value M1 is smaller than the predetermined threshold value, the differential value M1 is encoded, and the differential value is encoded, and there may be a case where the first residual differential value M1 is still larger than the predetermined threshold value. If the first remaining difference value M1 is still greater than the predetermined threshold, the third encoding unit 2703 determines to apply full-scale encoding to the event because if the difference between the first remaining difference value M1 and the predetermined threshold is calculated again until the remaining difference value is smaller than the predetermined threshold, such as the second remaining difference value M2 is smaller than the predetermined threshold, and the difference value is M2 obtained by subtracting n predetermined thresholds, the predetermined threshold is encoded n times, n encoded predetermined thresholds are obtained, and M2 is encoded, in such a manner that the cost of event representation may exceed the cost of full-scale encoding, so when the first remaining difference value M1 is still greater than the predetermined threshold, the third encoding unit 2703 determines to apply full-scale encoding to the event. The following examples illustrate: assuming that the above example is repeated with the 4 th event, the 4 th event has the light intensity information of 200, the light intensity variation is |200-100|=100, the polarity information is +1, the difference value between the light intensity variation and the predetermined threshold is 100-30=70, 70 exceeds the predetermined threshold 30, the remaining difference value is not less than the predetermined threshold, the remaining difference value cannot be encoded, 70-30=20 is calculated, 40 still exceeds the predetermined threshold 30, 40-30=10 is calculated, 10 is less than the predetermined threshold, 2 predetermined thresholds are subtracted from the difference value 70 to the remaining difference value 10, and 2 predetermined thresholds are encoded, and the remaining difference value 10 is encoded. Namely the encoded polarity information +1, the first predetermined threshold 30, the second predetermined threshold 30, and the residual differential value 10. If the differential encoding scheme is still used, the total data amount in the encoding process is 10+ (1+5+5) + (1+5) + (1+5+5) =43 bits, and the original event data requires 4×10=30 bits according to the fixed 10-bit encoding, so when the first residual differential value M1 is still greater than the predetermined threshold, the third encoding unit 2703 determines that the full amount encoding is used for the event, and the data amount can be further saved.
The following describes a decoding scheme corresponding to the recursive index coding scheme. The parsing circuit 3101 decodes the acquired binary data stream, for the 1 st event, according to the maximum bit width, to acquire the absolute light intensity at the time corresponding to the 1 st second, for example, in the above example, the absolute light intensity 80 of the 1 st event is decoded according to 10 bits. The decoding process thereafter first parses the polarity information, such as the parsing circuit 3101 reads the first 1 bit in the binary data stream and decodes the 1 bit to obtain polarity information, and decodes the differential value according to the light intensity information representation bit width under differential encoding. And if the decoded differential value is equal to the preset threshold value, continuing decoding according to the light intensity information representing bit width to obtain a residual differential value. The following description is made in connection with the above examples:
in the initial state, the absolute light intensity 80 of the 1 st event is decoded in 10 bits.
For the 2 nd event, the light intensity polarity is +1 according to 1 bit, then the difference value is 30 according to 5 bits, and the residual difference value 10 is continuously decoded according to 5 bits when the difference value is equal to the release threshold. Therefore, the light intensity difference value of the 2 nd event is actually 30+10=40, and the absolute light intensity is reconstructed to 80+40+30=150.
For event 3, the absolute light intensity is reconstructed to 150-20-30=100 by decoding the light intensity polarity to-1 with 1 bit and then decoding the difference value to 20 with 5 bits.
So far, the decoding and light intensity reconstruction of 3 events are completed.
The embodiment of the application also provides a method for operating the vision sensor chip. Referring to fig. 34, which is a flowchart of a method for operating a vision sensor chip according to a possible embodiment of the present application, may include the steps of:
3201. at least one data signal corresponding to a pixel in the pixel array circuit is generated by measuring the amount of light intensity variation.
The pixel array circuit generates a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit by measuring the amount of light intensity variation, the plurality of data signals indicating at least one light intensity variation event, the at least one light intensity variation event indicating that the amount of light intensity variation measured by the corresponding pixel in the pixel array circuit exceeds a predetermined threshold.
3202. And encoding the differential value according to the first preset bit width.
The third encoding unit encodes a differential value, which is a difference between the light intensity conversion amount and a predetermined threshold value, according to the first preset bit width. The third encoding unit may be understood with reference to the third encoding unit 2703, and a detailed description thereof will not be repeated here.
In some possible embodiments, the pixel array circuit includes a plurality of pixels, each pixel including a threshold comparing unit for outputting polarity information indicating whether the light intensity variation exceeds a predetermined threshold, the polarity information indicating whether the light intensity variation is increased or decreased. And the third coding unit is also used for coding the polarity information according to the second preset bit width.
In some possible embodiments, each pixel includes a light intensity detection unit, a readout control unit, and a light intensity acquisition unit, where the light intensity detection unit is configured to output an electrical signal corresponding to the light signal irradiated thereon, and the electrical signal is configured to indicate the light intensity. The threshold value comparing unit is specifically used for outputting polarity information when the light intensity conversion amount exceeds a preset threshold value according to the electric signal. And the readout control unit is used for responding to the received polarity signal, and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the polarity information receiving moment. The third coding unit is further configured to code the first electrical signal according to a third preset bit width, where the first electrical signal is an electrical signal acquired by the light intensity acquisition unit and buffered at a first receiving time of the corresponding polarity information, and the third preset bit width is a maximum bit width preset by the vision sensor and used for representing the characteristic information of the light intensity.
In some possible embodiments, the third encoding unit is further configured to: and encoding the differential value according to a third preset bit width at intervals of preset time.
In some possible embodiments, the third coding unit is specifically configured to: and when the differential value is larger than a preset threshold value, encoding the differential value according to a first preset bit width.
In some possible embodiments, the third encoding unit is further configured to: and when the difference value is not greater than the preset threshold value, the residual difference value and the preset threshold value are encoded according to the first preset bit width, and the residual difference value is the difference value between the difference value and the preset threshold value.
In order to better demonstrate the way in which differential values are encoded, the amount of data required to transmit an event can be saved, as will be described below in connection with experimental data. The CeleX sensor is an existing visual sensor, the CeleX sensor adopts an asynchronous reading mode, namely a reading mode based on an event stream, the events transmitted by the CeleX sensor are represented by light intensity information, and generally, the CeleX sensor adopts 8-13 bits to represent the light intensity information, namely the maximum bit width of representing the light intensity characteristic information is 8-13 bits. In the experiment, the parameters of the Celex sensor were set to be 1280×800 in spatial resolution, 14us in temporal resolution, and Fixed Event-Intensity Mode in sampling Mode, and the bit width of the maximum representative light Intensity characteristic information was set to be 12 bits. Experiments were performed using 7 sets of Event data collected by the CeleX sensor in Event-integrity mode. The experimental results are shown in table 1, and it is intended that the amount of data required for transmission can be greatly reduced compared to the case of directly transmitting the original data, i.e., the data required for encoding according to 12 bits. In addition, compared with the existing coding scheme in the prior art, the coding scheme provided by the application has the advantages that the correlation between the light intensity variation and the preset threshold value is fully considered, only the difference value and the polarity information of the light intensity variation and the preset threshold value are transmitted, the light intensity at the current moment can be reconstructed, and compared with the existing coding mode, the data quantity can be greatly and obviously saved. Taking part in the experimental data in table 1, the average value of the compression ratio corresponding to 7 groups of data is 1.645, and the differential coding mode provided by the application obtains the average 1.654 times of lossless compression ratio, so that the data volume is saved by about 41.1%; in contrast, the existing coding scheme can only obtain an average 1.306 times compression ratio (saving about 26.6% of the data).
Table 1:
in addition, it should be noted that the reading circuit outside the vision sensor may read the data signal encoded by the encoding module 273 to the outside of the vision sensor, for example, the vision sensor is mounted in an electronic device, which may include a processor and a memory, and the reading circuit of the electronic device may read the data signal encoded by the third encoding unit 3007 to the processor of the electronic device, or to the memory. It should be noted that, the description of encoding refers to encoding the characteristic information of the light intensity, and the application is not limited to encoding or other processing manners of other information lines for representing an event, for example, other information may include coordinate information (x, y) of a pixel generating the event, a time t when the characteristic information of the light intensity is read, and so on.
The visual sensor provided by the embodiment of the application is introduced, through the scheme provided by the application, the visual sensor can be adaptively switched among various data reading modes, so that the reading data rate is always kept not to exceed a preset reading data rate threshold value, and can be adaptively switched between two event representation modes, so that the reading data rate is always kept not to exceed the preset reading data rate threshold value, the cost of data transmission, analysis and storage of the visual sensor is reduced, and the performance of the sensor is remarkably improved. The vision sensor provided by the application can also adopt the precision of event representation adjustment, and all events are transmitted with larger representation precision under the condition of meeting bandwidth limitation. The vision sensor provided by the application can also adopt a mode of encoding the differential value, so that the cost of data transmission, analysis and storage of the vision sensor is reduced, the event can be transmitted with the highest precision, and the performance of the sensor is obviously improved.
The vision sensor that this application provided can assemble on any kind of equipment that needs to utilize visual information, say the vision sensor that this application provided can assemble in smart mobile phone, TV set, flat panel device, supervisory equipment, camera module, security protection equipment etc..
2. Image optimization
After data acquisition and data encoding and decoding, available RGB image, event image or video data and the like can be obtained, and further, the acquired data can be used for further optimization so as to carry out subsequent application. For example, an RGB image may be acquired by an RGB camera, and the data may be encoded by the foregoing encoding and decoding methods, and when the RGB image is required to be used, the data may be decoded to obtain a usable RGB image. For example, the event image may be acquired by the DVS, stored in the storage medium by the encoding method provided above, and when the event image is required to be used, the DVS image may be read by the decoding method described above, so as to perform the subsequent processing. The following is an exemplary description of the flow of the method for performing image optimization provided in the present application.
Before introducing the flow of some of the methods provided herein, some of the more general concepts will be described below.
For ease of understanding, some general concepts involved in the methods provided herein will first be described.
a. Motion sensor
In connection with the foregoing description of fig. 1B, the motion sensor may detect that the motion of the target object within a certain range will cause a series of pixels to generate event output due to the change of the illumination intensity, so as to obtain an event stream within a period of time. The motion information mentioned in the embodiment of the application can be obtained by monitoring the motion condition of the target object within a preset range by using a motion sensor, and the information when the target object moves within a detection range is obtained.
Taking the motion sensor as DVS as an example, as shown in fig. 35, the manner of generating an event may be that the DVS generates an event in response to a motion change, and the event is mostly generated in a region where a moving object exists because the event is not excited by a static region. In general, when the difference in light intensity between the current light intensity and the light intensity at the time of the last event generation exceeds a threshold, the DVS will generate one event, such as events N1, N2, and N3 shown in fig. 3, and the generation by the event is related only to the relative change in light intensity. Where each event may be expressed as < x, y, t, f >, (x, y) represents a pixel position at which the event is generated, t represents a time at which the event is generated, and f represents characteristic information of light intensity. In some DVS sensors (such as DAVIS sensor, ATIS sensor, etc.), f represents the trend of light intensity change, which may also be referred to as polarity, and is usually indicated by 1bit, and the value may be ON/OFF, where ON represents the increase in light intensity and OFF represents the decrease in light intensity. In some DVS sensors, such as the CeleX sensor, where f represents absolute light intensity, the scene being monitored by the moving object is typically represented by a plurality of bits, such as 9 bits, representing light intensity values in the range of 0-511.
It will be appreciated that the DVS will not generate events when the light intensity changes beyond a threshold, so moving objects can be detected by the DVS, which is not sensitive to static areas.
b. Event image
The event image may include an image generated from data acquired by the motion sensor, specifically, an image generated based on motion track information of the target object when moving within the monitoring range of the motion sensor, or the event image may be used to identify information of the target object when moving within the detection range of the motion sensor within a period of time.
For example, if the hand is swung in the detection range of the DVS, the event at one of the detected moments is shown in fig. 36, where the white color in fig. 36 indicates the event detected by the DVS, that is, the DVS may detect the contour and the position of the moving object in the preset range.
Specifically, for example, an image made up of data acquired by DVS may be expressed asWherein the method comprises the steps of(x, y) represents coordinates of a certain position in the image, t represents time, t 1 To subtract 50 milliseconds (i.e., a time window) from the time at which the exposure image was taken, t 2 To start shooting exposure imageEvents represent data acquired by a motion sensor, such as DVS.
c. Motion information
The movement information may include information including when the target object moves within a preset range.
The motion condition of the target object in the preset range can be monitored through the motion sensor, and the motion information of the target object in the preset range can be obtained. The target object is an object moving within a preset range, the number of the target objects can be one or more, and the movement information can comprise information of a movement track when the target object moves within the preset range.
For example, the motion information may include information such as a size of a region where the target object is located, a frame, or coordinates of corner points within a preset range when the target object moves within the preset range.
Specifically, a time window can be generated through data monitored by the DVS, then the events in the time window are segmented according to the short time window, the events in the short time window are accumulated, and the motion trail obtained after the connected domain is calculated. Further, a series of motion trajectories in the time window are analyzed, and information such as a motion direction and a motion speed of a moving target object is obtained by calculating an optical flow or a motion vector.
Illustratively, as shown in fig. 37, the time window may be segmented into a plurality of short time windows, such as the k short time windows shown in fig. 37. The cutting mode can be cutting according to set time length, cutting according to random time length, cutting according to motion track change condition, and the like, and can be specifically adjusted according to actual application scenes. After the k short-time windows are obtained by segmentation, the positions of the events in each short-time window are analyzed, and the area where the target object in each short-time window is located is determined, for example, the motion area in the short-time window 1 is the motion area 1 shown in fig. 37, and the motion area in the short-time window k is the motion area k shown in fig. 37. Then, the motion area and motion characteristics of the target area, such as the motion direction or the motion speed, are determined by the change condition of the motion area in the short-time window 1-k.
In general, the motion characteristics included in the motion information may include a motion speed or a motion direction, and the like. In particular, the movement speed may be a trend of the speed of the target object compared to the speed of the previous short time window, including but not limited to a faster, slower, etc. speed trend state quantity, or even more levels of speed trend state quantity, such as fast, faster, very fast, slow, slower, very slow, etc. The direction of movement may also be a change in direction over a previous short window, including but not limited to a state amount of direction trend that is left, right, up, down, unchanged, etc., or even more levels of state amount of direction trend, such as up left, down left, up right, down right, left, right, up, down, unchanged, etc.
The above general concepts may be substituted in the following embodiments provided in the present application, and will not be described in detail below.
Some possible embodiments of image enhancement or reconstruction are described below.
The purpose of image enhancement and reconstruction is, among other things, to obtain a clearer RGB image or event image, some possible embodiments being described below, respectively.
(1) Motion compensation
In general, information collected by a motion sensor can be used for performing image reconstruction, object detection, shooting a moving object, shooting with a moving device, shooting deblurring, motion estimation, depth estimation, or performing object detection and identification, and so on, so how to obtain more accurate motion information is a problem to be solved.
The application provides an image processing method which is used for updating motion information by using motion parameters to obtain more accurate motion information.
First, in this scenario, the specific flow of the image processing method provided in the present application may include: the motion information of the target object in the detection range during motion is acquired by using a motion sensor, the motion information can be derived from frame-based motion detection or event-based motion detection, and the like, then an event image is generated based on the motion information, then motion parameters including parameters of relative motion between the motion sensor and the target object are calculated, and then the event image is updated according to the motion parameters, so that an updated event image is obtained.
In the image processing method provided by the application, a plurality of implementation modes are provided for the process of updating the event image, and different embodiments and combined embodiments thereof are respectively described below.
In one possible implementation, the event image may be updated based on a preset optimization model, e.g., the event image is updated with the goal of optimizing the value of the optimization model, resulting in an updated event image. In the process of updating the event image, the initial value of the optimization model can be determined according to the motion parameters, so that the motion information monitored by the motion sensor is used as constraint to initialize the value of the optimization model, and the initial value used in updating the event image can be more accurate. Compared with the method that multiple global iteration updating is required for the event image, the method provided by the application initializes the optimization model based on the acquired motion parameters, can obviously improve the updating speed for the event image, improves the updating efficiency for the event image, gives a better initialization updating direction, and improves the optimization effect under the limited iteration times.
In one possible implementation, in updating the event image, multiple iterative updates may be performed, generally, to better the resulting event image. In the process of each iteration update, the motion parameters output in the last iteration update can be used for carrying out the current iteration update, so that the motion parameters can be prevented from being recalculated in each iteration update, and the update efficiency is improved.
The following describes the process of initializing the value of the optimization model and iteratively updating the event image, respectively.
Process one, process for initializing optimization model using motion parameters
Referring to fig. 38, a flowchart of an image processing method is provided.
3801. Motion information is acquired.
The motion condition of the target object in the preset range can be monitored through the motion sensor, and the motion information of the target object in the preset range can be obtained. The target object is an object moving within a preset range, the number of the target objects can be one or more, and the movement information can comprise information of a movement track when the target object moves within the preset range.
For example, the motion information may include information such as a size of an area where the target object is located, a frame, or coordinates of corner points within a preset range when the target object moves within the preset range.
For ease of understanding, the area in which the target object is located at each time of detection when the target object moves within a preset range is hereinafter referred to as a movement area of the target object. For example, if the target object is a pedestrian and the pedestrian is performing a whole-body motion, the whole-body motion of the pedestrian may be included in the motion region, and if the pedestrian is moving only with the arm, the target object may be merely the arm of the pedestrian and the motion region may include the arm portion of the pedestrian.
Typically, the preset range is related to the focal length or field angle of the camera, etc. For example, the larger the angle of view of the camera, the larger the area of the captured range, and the smaller the angle of view of the camera, the smaller the area of the captured range. For another example, the larger the focal length of the camera, the farther the shooting range is, and the more clearly the object with a long shooting distance is, the smaller the focal length of the camera, and the closer the shooting range is.
In this embodiment, the range monitored by the motion sensor includes a shooting range of the camera, the preset range may be a shooting range of the camera, and the range monitored by the motion sensor includes the preset range, i.e., the range monitored by the motion sensor may be greater than or equal to the preset range.
In one possible implementation manner, the motion information may include an area where the target object is currently located and an area where the history after entering the preset range is located, and may further include a motion speed or a motion direction of the target object.
In combination with the foregoing data acquisition and data encoding, the motion information in this embodiment may be data obtained by the foregoing data acquisition and encoding and decoding methods, for example, the event stream may be obtained by DVS acquisition, and the available motion information may be obtained by the foregoing provided data encoding and decoding processing methods.
3802. At least one frame of event image is generated from the motion information.
Wherein, after obtaining the motion information, at least one frame of event image can be generated from the information acquired by the motion sensor within the detection range. Generally, the motion information may include information of a track of the target object moving within the detection range within a period of time, so that an image corresponding to the detection range is generated, and the track of the target object included in the motion information is mapped into the image to obtain at least one frame of event image. The at least one frame of event image may be understood as an image representing a motion trajectory of the target object when the target object generates motion within the detection range.
For example, the event images may be described with reference to FIGS. 35-37 and related description.
When at least one frame of event image mentioned in the application is a multi-frame event image, the event images can be event images of the same time window or event images of different event windows, for example, event image 1 is an event image in a [ t1, t2] period, and event image 2 is an event image in a [ t2, t3] period. Of course, the at least one frame of event image may be event images of different areas within the same period. For example, the monitoring region of the DVS may be divided into a plurality of regions, and a corresponding event image may be generated based on events detected within each region.
It should also be noted that, according to the method provided in the foregoing data acquisition and encoding and decoding, the event image may be directly read from the storage medium, and then, without performing steps 3801-3802, at least one frame of event image may be directly read from the storage medium, which is herein only illustrative and not limiting.
3803. And acquiring motion parameters.
Wherein the motion parameter represents a parameter related to a relative motion between the sensor and the target object, such as a motion speed of the target object at an image plane, a motion direction, a motion acceleration, optical flow information, a depth of the target object from the motion sensor, an acceleration of the motion sensor, or an angular velocity of the motion sensor, wherein the optical flow information represents a speed of the relative motion between the motion sensor and the target object.
In addition, there may be various ways of calculating the motion parameter, for example, if the motion parameter includes not only a parameter related to the motion sensor itself but also a motion speed, a motion direction, or a motion acceleration of the target object, the motion parameter may be calculated from information acquired by the motion sensor.
For another example, if the motion parameters include parameters related to the motion sensor itself, such as optical flow information, acceleration of the motion sensor, angular velocity or depth of the motion sensor, the motion sensor may obtain parameters related to the motion sensor itself through information collected by the motion sensor, or IMU, gyroscope, accelerometer, or the like.
For example, if the motion parameter does not include a parameter related to the motion sensor itself, but includes a motion speed, a motion direction, or a motion acceleration, a depth, or the like of the target object, the motion parameter may be calculated from information acquired by the motion sensor. For another example, if the motion parameters include parameters related to the motion sensor, such as optical flow information, acceleration of the motion sensor, angular velocity of the motion sensor, etc., the parameters related to the motion sensor may be obtained through information collected by the motion sensor, or IMU, gyroscope, accelerometer, etc.
For example, taking the motion parameter obtained from the data collected by the IMU as an example, the data collected by the IMU may include the angular velocity ω or the acceleration α of the IMU. One or more items may be selected from the angular velocity ω, the acceleration α, and the like as the motion parameter.
In one possible implementation manner, the motion parameters may be acquired by a motion sensor, and in some scenarios, the motion sensor may be affected by noise or bias when acquiring the motion parameters, so that the motion parameters are offset, and therefore, some correction parameters may be used to correct the motion parameters, so as to improve accuracy of the motion parameters. The motion parameters can be rectified after the motion parameters are determined, so that rectified motion parameters can be obtained, or the motion sensor is configured to rectify based on the rectified motion parameters when acquiring data, so that rectified data can be obtained, and the motion parameters after unbiasing can be directly extracted from the data acquired by the motion sensor. Therefore, in the embodiment of the application, the motion parameters after correction can be obtained, so that the motion parameters are more accurate.
For example, because IMU data is susceptible to noise or zero offset parameters, which are affected by random walk and therefore require constant update corrections. Therefore, when the motion parameters are extracted, the influence of noise or zero offset parameters can be removed on the basis of the data acquired by the IMU. For example, the true value of angular velocity is typically expressed as: the true value of the sum acceleration is expressed as:wherein,representing a transformation matrix of the camera from i to j:specifically, the transformation from the space coordinate system to the camera body coordinate system can be represented, g represents gravitational acceleration, n is noise, and b is a zero offset parameter.
Specifically, for example, the relative motion between the target object and the motion sensor may be represented by a uniform linear motion model, and the motion parameter may be a velocity vector v, and it should be noted that different motion models may be selected in different scenes, and multiple motion parameters may also be corresponding, for example, by taking uniform acceleration linear motion as an example, the motion parameters are a motion vector v and an acceleration a p . The initial velocity vector is calculated from the transformation matrix calculated from the IMU data and the depth Z of the phase plane and the projection model pi: Wherein E represents an identity matrix,representing the origin of the phase plane. Δt is a period of time.
In addition, taking calculation of motion parameters according to data collected by the DVS as an example, white in fig. 36 indicates an event monitored by the DVS, that is, the DVS may monitor the contour and the position of the moving object within the preset range, calculate the motion velocity of the target object according to the motion track of the target object monitored by the DVS within the preset range, and extract the motion direction of the target object.
Specifically, a time window can be generated through the data monitored by the DVS, then the events in the time window are segmented according to the time window, the events in the time window are accumulated, and the accumulated events in each time window can be understood as a frame of event image. Further, a series of motion trajectories in the time window may be analyzed, and by calculating optical flow, motion vectors, and the like, information of motion characteristics, such as a motion direction, a motion speed, and the like, of the moving target object may be obtained.
Illustratively, as shown in fig. 37 described above, the time window may be divided into a plurality of short time windows, such as k short time windows shown in fig. 37, each of which may correspond to one frame of event image. The cutting mode can be cutting according to set time length, cutting according to random time length, cutting according to motion track change condition, and the like, and can be specifically adjusted according to actual application scenes. After the k short-time windows are obtained by segmentation, the positions of the events in each short-time window are analyzed, and the area where the target object in each short-time window is located is determined, for example, the motion area in the short-time window 1 is the motion area 1 shown in fig. 37, and the motion area in the short-time window k is the motion area k shown in fig. 37. Then, the motion area and motion characteristics of the target area, such as the motion direction or the motion speed, are determined by the change condition of the motion area in the short-time window 1-k.
In one possible embodiment, after obtaining the motion parameters, before initializing the optimization model, the motion parameters may be used to compensate the event image, thereby obtaining a compensated event image. For example, taking the motion track of the target object as uniform linear motion and the motion sensor as DVS as an example, the time window [ t, t+Δt ]]Inner t k Time DVS capture position x k The motion track of the target object can be divided into a plurality of sections of linear motion, and θ (i.e. motion parameter) represents the motion speed of the target object in the event image, so that the position x of the event image after the motion compensation of the target object is performed k ' is: x's' k =x k -(t k -t ref ) θ, motion-compensating all events of the event image, thereby obtaining a compensated event image.
3804. Initializing a preset value of the optimization model according to the motion parameters to obtain the value of the optimization model.
After the motion parameters are obtained, the event images may be compensated using the motion parameters, resulting in compensated event images, from which initial values, otherwise referred to as initial optimal values, of the optimization model are then calculated.
The optimization model can comprise a plurality of types, and different optimization models can be selected according to different scenes. For example, the optimization model may include, but is not limited to, one or more of the following: variance (variance), mean square (mean square), image entropy (image entropy), gradient magnitude (gradient magnitude), laplace (Laplacian), soS loss function, R2 loss function, or uniform linear motion model, and the like. The variance may also be referred to as contrast, and the algorithm to maximize contrast may include gradient ascent, newton's method, etc., iterating once to calculate updated motion parameters, and then repeating the above until an optimal contrast is achieved.
It should be noted that, in the embodiment of the present application, only the contrast algorithm is used as the optimization model to perform the exemplary description, generally, the greater the contrast of the event image, the better the compensation effect or the update effect of the event image, and in other scenarios, the optimal value of the optimization model may be the minimum value, that is, the smaller the value of the optimization model, the better the update effect of the event image.
3805. And updating at least one frame of event image according to the value of the optimization model to obtain at least one updated frame of event image.
After initializing the optimizing model by using the motion parameters to obtain the value of the optimizing model, iteratively updating at least one frame of event image based on the value of the optimizing model, thereby obtaining at least one updated frame of event image.
Specifically, after the initial value of the optimization model is obtained, in the process of performing iterative updating on the event image, the motion parameter can be reversely deduced according to the initial value, the event image can be compensated according to the motion parameter reversely deduced to obtain a compensated event image, then the optimal value of the optimization model is calculated according to the compensated event image, the steps are repeated until the condition of ending the iteration is met, for example, the iteration number reaches the preset number, the iteration duration reaches the preset duration, the difference value between the event images obtained by the adjacent iteration is smaller than the preset value, or the difference value between the optimal values of the optimization model obtained by the adjacent iteration is smaller than the preset difference value, and the finally obtained event image is output.
For example, taking contrast (or variance) F as an optimization model and taking a motion parameter as a constant speed as an example, after initializing the motion parameter θ, in order to obtain an event image with maximized F, it is necessary to update the motion parameter θ most by θ=arg using F as an objective function min F(θ,x)。
The process of compensating a moving image can be understood as a process of dividing a time window [ t, t+Δt ]]The event images in the motion compensation method are reversely pushed back to the time t according to the motion model, so that the motion compensation is realized. Taking the motion track of the target object as uniform linear motion as an example, the time window [ t, t+delta t ]]Inner t k Time DVS capture position x k The motion track of the target object can be divided into a plurality of sections of linear motion, and θ represents the motion speed of the target object in the event image, so that the position x of the event image after the motion compensation of the target object is performed k ' is: x's' k =x k -(t k -t ref ) θ, accumulating the positions of the target object in the event image after motion compensation, and obtaining an event image after one update.
Then, the image contrast of the event image obtained after the motion compensation is calculatedWherein h is i,j Representing pixels in an event image formed by motion compensation of events in the time window, N p Representing the number of pixels in the frame, μ represents the mean of the frame. Subsequently, θ value θ=arg of optimization F (x, θ) is calculated by an optimization algorithm min F (theta, x), calculating the optimal motion parameter theta through multiple iterations, and compensating the event image according to the optimal motion parameter to obtain a better event image. The optimization algorithm can adopt gradient rising, newton method, conjugate gradient method (Conjugate Gradient) or Momentum optimization method (Momentum) and other algorithms,specifically, the method can be adjusted according to the actual application situation, and the application is not limited to the method.
For ease of understanding, step 3805 may be understood as compensating the event image using the motion parameter after obtaining the motion parameter, calculating an optimal value (i.e., an initial value) of the optimization model based on the compensated event image, then back-pushing the optimal motion parameter based on the optimal value, compensating the event image again using the back-pushed optimal motion parameter, and iterating the foregoing steps to obtain the final updated event image.
More specifically, the process of iteratively updating at least one frame of event image may also refer to the following second embodiment, which is not described herein.
Therefore, in the embodiment of the application, before updating the event image, the motion parameters can be used for initializing the optimization model, so that the event image can be updated based on the value of the initialized optimization model, and the event image does not need to be updated from the minimum value or the random initial value of the optimization model, so that the iterative update times of the event image can be reduced, the iterative update efficiency of the event image can be improved, and the optimal event image can be obtained quickly.
In a possible implementation manner, in the process of obtaining the value of the optimal optimization model through each calculation, the correction parameter can be updated, and the correction parameter can be used for obtaining the corrected motion parameter, for example, after the motion sensor collects data, the correction parameter is used for correcting the collected data, so that corrected data is obtained, and more accurate motion parameters can be extracted from the corrected data later. Alternatively, after the motion parameters are extracted from the data collected from the motion sensor, the motion parameters may be corrected using the correction parameters to obtain more accurate motion parameters. For example, the motion parameters may be acquired by the IMU, and the IMU parameters, otherwise known as correction parameters, may be updated during each calculation to obtain the values of the optimal optimization model, which may be used to acquire data by the IMU. For example, because the IMU data is susceptible to noise n and zero offset parameter b, where zero offsetThe parameters are affected by random walk and thus require constant update corrections. The true value of angular velocity is generally expressed as:the true value of the sum acceleration is expressed as:b and n g The IMU parameters are used for correcting the acquired data to obtain more accurate motion parameters. Therefore, in the embodiment of the application, in the process of updating the event image, the IMU parameters can be updated, so that the updated IMU parameters can be used to obtain more accurate motion parameters. Generally, the motion parameters can be obtained by integrating the data collected by the IMU, errors are gradually accumulated, drift of the motion parameters obtained by calculation is larger when the time is longer, and the data of the IMU cannot be calibrated in a shorter time.
The flow of the image processing method provided in the present application is exemplified below by taking DVS as an example, with reference to fig. 39, in which motion parameters are obtained from data acquired by the IMU.
First, the IMU data 3901 is data acquired by the IMU, and may specifically include an angular velocity, an acceleration, a speed, or the like of the IMU, and in general, the IMU and the DVS may be disposed in the same device or have a connection relationship, or the angular velocity, the acceleration, the speed, or the like of the IMU may also be expressed as an angular velocity, the acceleration, the speed, or the like of the DVS.
The motion parameters 3902 may be data derived from IMU data such as angular velocity, acceleration or velocity. In general, data acquired by the IMU is susceptible to noise n and zero offset parameter b, where the zero offset parameter is subject toThe effect to random walk thus requires constant update corrections. The true value of angular velocity is generally expressed as:the true value of acceleration is expressed as:wherein,representing a transformation matrix of the camera from i to j:specifically, the transformation from the space coordinate system to the camera body coordinate system can be represented, g represents gravitational acceleration, n is noise, and b is a zero offset parameter.
The contrast 3904 may be initialized with the motion parameters 3901 before the event image is updated, and the event image 3903 may be compensated with the motion parameters to obtain a compensated event image.
The time window [ t, t+Δt ] can be used in compensating the event image]The events in the motion compensation method are reversely pushed back to the time t according to the motion model, so that the motion compensation is realized. For example, the compensated position x k 'x' k =x k -(t k -t ref ) θ, accumulating the position of the compensation target object in the image to obtain a compensated image, where the pixel of the compensated event image is denoted as h ij
It should be noted that, in the embodiment of the present application, an algorithm in which the optimization model is a contrast (or referred to as variance) is used for illustration, and the contrast may replace other indicators in an actual application scene, such as variance (variance), mean square (mean square), image entropy (image entropy), gradient magnitude (gradient magnitude), laplace (Laplacian), and the like, and may specifically be adjusted according to the actual application scene.
After the compensated event image is obtained, the maximized contrast can be calculated based on the compensated event image, the motion parameters are updated by using the maximized contrast, the event image is compensated by continuously using the updated motion parameters, the updated event image is obtained, the steps are repeated until the condition of ending the iteration is met, and the final event image is output.
After compensated event image h ij Then, image contrast is calculatedN p Represents the number of pixels in the event image and μ represents the pixel mean of the event image. The motion parameter θ that maximizes F (x, θ) is then calculated, i.e., F can be maximized based on which value of θ. Therefore, the event image can be further subjected to iterative compensation based on the motion parameter theta obtained during maximization, and an updated image is obtained.
In the process of maximizing contrast, the IMU parameters can be updated at the same time, and the IMU parameters can be used for acquiring data by the IMU or correcting the data acquired by the IMU.
For example, the true value of the angular velocity is expressed as:the true value of acceleration is expressed as:the IMU parameters may include noise n and zero offset parameter b. The process of calculating motion parameters from IMU data is considered as θ=g (b a ,b g A, ω), then b' a ,b′ g =arg min F(G(b a ,b g ) X) to obtain updated noise n and zero offset parameter b.
Therefore, in the embodiment of the application, the motion parameters can be obtained based on the data acquired by the IMU, so that the contrast is initialized based on the motion parameters, and then the event image is updated based on the initialized contrast, so as to obtain the updated event image. The method is equivalent to setting the initial value of the contrast based on the motion parameters, reduces the number of times of repeated iteration on the event image, and improves the efficiency of obtaining the final event image.
Procedure two, iterative update procedure
Referring to fig. 40, a flowchart of another image processing method provided in the present application is as follows.
4001. And acquiring a target task, and determining the iteration time according to the target task.
The target task may include a period of time for iteratively updating at least one frame of event image, or the target task may be a task performed using at least one frame of event image, and include a period of time for iteratively updating at least one frame of event image, and the like.
For example, the target task may directly carry a time period for iteratively updating at least one frame of event images, for example, the time period for iteratively updating each frame of event images may be set to 30ms by a user.
For another example, the target task may be a task of performing target detection, image reconstruction, or capturing a moving object using at least one frame of event image, and the target task may further include performing iterative update for each frame of event image for a period of 50ms, or the total iterative period for the at least one frame of event image for 3900ms, or the like.
It should be noted that, step 4001 in the present application is an optional step, in some scenarios, the iteration duration of the event image may not be set, for example, the iteration number of the iterative update performed on the event image reaches a preset number, or the variation value of the output value of the optimization model does not exceed a preset value, which may be specifically adjusted according to the actual application scenario, which is not limited herein.
4002. Motion information is acquired.
4003. At least one frame of event image is generated from the motion information.
Steps 4002 through 4003 are similar to steps 3801 through 3802 described above, and are not repeated here.
After the event image is obtained, the event image may be iteratively updated, as described in steps 4004-4006 below.
It should be noted that, the execution order of the step 4001 and the step 4003 is not limited in this application, the step 4001 may be executed first, the step 4003 may be executed first, the step 4001 and the step 4003 may be executed simultaneously, and the execution may be specifically adjusted according to the actual application scenario, which is not limited herein.
4004. And obtaining the motion parameters obtained according to the optimization model in the last iteration.
Wherein the motion parameter represents a parameter related to a relative motion between the sensor and the target object, such as a motion speed, a motion direction, a motion acceleration, optical flow information, an acceleration of the motion sensor, an angular speed or a depth of the motion sensor, etc., of the target object, the optical flow information representing a speed of the relative motion between the motion sensor and the target object.
If the current iteration is the first iteration, the motion parameter may be set to an initial value, for example, to 0 or a preset value, or the motion parameter may be calculated according to information acquired by the motion sensor.
If the current iteration is not the first iteration, the value of the motion parameter can be reversely deduced according to the optimal value of the optimization model in the previous iteration process, and then the reversely deduced value is used as the value of the motion parameter. Or, the motion parameters obtained by the back-pushing and the motion parameters determined by the mode of the step 3803 are subjected to weighted fusion to obtain the fused motion parameters.
For example, in each iterative update of the event image, after calculating the optimal value F (x, θ), the motion is reversely calculatedValue of parameter θ, θ=arg min F (θ, x) to calculate updated motion parameters.
For another example, in addition to the motion parameters obtained based on the optimal values of the optimization model updated for the last iteration of the event image (referred to as motion parameter 1 for convenience of distinction), the motion parameters may be obtained according to the data collected by the motion sensor (referred to as motion parameter 2 for convenience of region), and the manner of obtaining the motion parameter 2 may refer to the foregoing step 3803, which is not described herein again. The motion parameters used in the current iteration update in the embodiment of the application can be obtained by adopting a mode of weighting and fusing the motion parameters 1 and 2. For example, the weight value of the motion parameter 1 may be set to 0.8, the motion parameter 2 may be a parameter acquired by the IMU, and the weight value of the motion parameter may be set to 0.2, and the motion parameter acquired by the current iteration update is=0.2×motion parameter 2+0.8×motion parameter 1.
In addition, if the at least one frame of event image is a multi-frame event image and the event image updated in the current iteration is different from the event image updated in the last iteration, the motion parameter obtained in the event image updating in the last iteration may be used to update the event image updated in the current iteration. Therefore, in the embodiment of the present application, when updating a different event image, the current event image may be updated using the motion parameter obtained by iteratively updating the event image of the previous frame, so that the update may be performed using a more accurate motion parameter. Compared with the value of the reinitialized motion parameter, the embodiment of the application provides an effective motion parameter, and can obviously improve the updating efficiency of the event image.
4005. And updating at least one frame of event image according to the motion parameters to obtain at least one updated frame of event image.
And after the motion parameters of the current iteration are obtained, compensating the event images according to the motion parameters to obtain at least one frame of event image obtained by updating the current iteration.
Specifically, when the at least one frame of event image is one frame of event image, the one frame of event image may be iteratively updated in each iteration process. If the at least one frame of event image is a multi-frame event image, after one frame of event image is updated, the next frame of event image can be continuously updated, or different event images can be updated for each iteration, so that iterative updating of all event images is completed. For example, the event image of the [ t0, t1] period may be iteratively updated a plurality of times first, and after the update of the event image of the [ t0, t1] period is completed, a final motion parameter is calculated, and based on the motion parameter, the event image of the [ t1, t2] period is updated, and so on. For another example, in the first iteration process, the event image in the period of [ t0, t1] may be updated, and after the motion parameter is obtained by calculation, the event image in the period of [ t1, t2] may be updated based on the motion parameter, and so on.
For ease of understanding, the present embodiment will be described by taking an example of one frame of event image (or referred to as a target event image).
For example, after determining the motion parameter θ, the position of each event in the target event image is compensated, e.g., x' k =x k -(t k -t ref )·θ,x′ k Namely, is para-x k And (3) compensating positions, wherein the transformed events at each position are accumulated to form an updated target event image:wherein N is e Representing the number of events in the target event image, b k The representation mode of the target event image can be represented by 0 or 1.
As shown in fig. 41, for example, the event images are represented in different forms in fig. 41, and it is apparent that the resulting event images become clearer as the number of iterations increases.
After compensating for the event image, the quality of the event image may be measured by the value of an optimization model, which may include a variety of parameters, which may include, but are not limited to, one or more of the following: variance (variance), mean square (mean square), image entropy (image entropy), gradient magnitude (gradient magnitude), laplace (Laplacian), soS loss function, R2 loss function, and the like. An optimization algorithm may be used to calculate the optimal value of the optimization model, from which then new motion parameters may be calculated.
For ease of understanding, step 4005 may be understood as measuring the quality of the event image, such as variance, mean square, image entropy, gradient magnitude, laplacian, etc., by a predetermined evaluation index after compensating the event image, such asθ value θ=arg to maximize F (x, θ) by optimization algorithm calculation min F (theta, x) to obtain updated motion parameters obtained in the current iteration updating process. Taking F as contrast as an example, an optimization algorithm for maximizing the contrast can adopt methods such as gradient rising, newton method and the like to calculate updated motion parameters, then update an updated event image or a next frame event image by using the motion parameters, and repeat the processes to obtain a final updated at least one frame event image.
4006. Whether the iteration is ended is determined, if yes, step 4007 is executed, and if not, step 4004 is executed.
In each iteration update process of the event image, after the event image is updated, it may be determined whether to end the iteration update of the event image, if the iteration is ended, at least one updated frame of image may be output, and if the iteration is not ended, the iteration update of the event image may be continued, that is, step 4004 may be executed.
Specifically, the method for judging whether to end the iteration may include judging whether the result of the current iteration meets a preset condition, if yes, ending the iteration, where the ending condition includes one or more of the following: the number of times of iterative updating of at least one frame of event image reaches a preset number of times, the time length of iterative updating of at least one frame of event image reaches a preset time length, or the change of the optimal value of the optimization model in the updating process of at least one frame of event image is smaller than a preset value, and the like. The preset duration may be determined according to the target task in step 4001, or may be a preset duration, such as 100ms or 50 ms. For example, the user may set the iterative update duration of each frame of event image through the interactive interface of the terminal.
Therefore, in some scenes, the iteration times can be determined according to actual requirements, the efficiency of event image processing and the quality of the event images are considered, the event images are updated under the condition that the real-time processing requirement is met, and the balance of the efficiency and the quality is realized.
4007. And outputting the updated at least one frame of event image.
After the iterative update of the event image is terminated, at least one updated frame of event image may be output.
Optionally, the updated at least one frame of event image may be used subsequently for subsequent analysis. Such as depth estimation, optical flow estimation, image reconstruction, object detection, shooting of moving objects, shooting with moving equipment, shooting deblurring, motion estimation, depth estimation, or object detection recognition, etc., may be performed using the updated event images.
In the embodiment of the application, in each iteration process, the motion parameter used in the current iteration can be obtained through the optimal value of the optimization model obtained in the last iteration, and the event image is compensated according to the motion parameter, so that the compensated event image is obtained. Therefore, the efficiency of updating the event image can be improved, and the event image with higher quality can be obtained quickly. And the iteration times can be adjusted according to specific scenes, so that the updating efficiency and quality of the event images can be considered, and the event images meeting the requirements can be obtained rapidly and efficiently.
For ease of understanding, the flow of the image processing method provided in the present application is exemplified below by a few specific iterative processes.
Illustratively, taking a frame of event image as an example, referring to fig. 42, another image processing method provided in the present application is a flowchart as follows.
First, DVS-collected data 4204 may be acquired and an event image 4202 may be initialized based on the DVS-collected data, resulting in an initial event image. In general, the representation of the event image may use polarity (b= -1 or +1) information, such as 1 indicates that a certain pixel exists an event, -1 indicates that a certain pixel does not exist an event, or may only count events (b=0 or 1), such as 1 indicates that a certain pixel exists an event, 0 indicates that a certain pixel does not exist an event, and so on.
If the current iteration is the first time, the motion parameter 4201 may be an initialized parameter, such as initialized to 0 or a preset value, or may be initialized according to data collected by the IMU, for example, an acceleration or a velocity collected by the IMU may be used as the initialized motion parameter. In addition, in the subsequent iteration process, the motion parameter may be a motion parameter obtained in the previous iteration, or may be a motion parameter obtained based on data acquired by a motion sensor (such as DVS, IMU, accelerometer or gyroscope, etc.), or may be a motion parameter used in the current iteration obtained by performing a weighted operation in combination with a motion parameter obtained in the previous iteration and a motion parameter obtained by data acquired by a motion sensor (such as DVS, IMU, accelerometer or gyroscope, etc.).
After the motion parameter 4201 is obtained, the event image 4202 is compensated using the motion parameter, resulting in a compensated event image. For example, after determining the motion parameter θ, the position of each event in the event image is compensated, e.g., x' k =x k -(t k -t ref )·θ,x′ k Namely, is para-x k And (3) compensating positions, wherein the transformed events at each position are accumulated to form an updated event image:
after compensating the event image using the motion parameters 4201, the compensated event image is used to maximize contrast, e.gθ value θ=arg to maximize F (x, θ) by optimization algorithm calculation min F (x, θ), realizing the update of the motion parameters.
When the iteration times of the event images reach the preset times, or the iteration time of the event images reach the preset time, or the change value of the maximized contrast does not exceed the preset change value, and the like, the iteration update of the event images can be terminated, and the final event images are output.
Therefore, in the embodiment of the application, the motion parameter can be reversely pushed by using the maximized contrast obtained in the last iteration, so that the event image can be compensated by using the motion parameter when the event image is updated next time, the updated event image can be obtained quickly, and the event image with better quality can be obtained while the updating efficiency is ensured.
While the foregoing has been exemplified by updating one frame of event image, the following is exemplified by updating a plurality of frames of event image.
Illustratively, as shown in fig. 43, an iterative update process of 3-frame event images (event image 1, event image 2, and event image 3 as shown in fig. 43) is taken as an example. Wherein, the three-frame event image can be generated based on the data acquired by the DVS at different time periods. For example, events acquired during a time period [ t0, t1] may be accumulated to obtain an event image 1; accumulating the events acquired in the time periods [ t1, t2] to obtain an event image 2; and accumulating the events acquired in the time periods [ t2, t3] to obtain an event image 3 and the like.
In the process of iteratively updating 1, if the current iteration is the first iteration, the motion parameter theta 1 Can be obtained by initializing based on data acquired by a motion sensor, or can be obtained initiallyInitialization to a preset value, etc. For example, the motion parameter θ 1 Can be extracted from the data acquired by the IMU, if the IMU can acquire the acceleration, angular velocity or speed of the IMU, one or more of the acceleration, angular velocity or speed of the IMU can be directly selected as the motion parameter theta 1 . For another example, the value of motion parameter initialization may be set to 0 or other value in advance, or the like.
In the update process of the event image 1, the motion parameter θ may be used 1 Compensating the event image 1 to obtain a compensated event image, calculating the maximized contrast based on the compensated event image and an optimization algorithm, and updating the motion parameter theta by using the maximized contrast 1
The detailed procedure of the iterative update of the event image 1-event image 3 is similar to the update procedure of fig. 42 described above, and will not be repeated here.
After terminating the iterative update to the event image 1, the resulting contrast may also be used to initialize the motion parameter θ 1 For motion parameter theta 2 After terminating the iterative update to the event image 2, the resulting contrast from the final iteration may also be used to initialize the motion parameter θ 2 For motion parameter theta 3
In one possible implementation, in addition to updating the next burst of event images after each frame of event images is updated, each frame of event images may be updated in a loop, so as to implement updating of multiple frames of event images.
Therefore, in the embodiment of the present application, after the update of one frame of event image is implemented, the motion parameter used when updating the next frame of event image may be initialized based on the motion parameter obtained by updating the event image, so that each time the event image is updated, the existing motion parameter may be used to update, thereby implementing efficient update of the event image.
Initializing an optimization model by using motion parameters, and performing iterative updating
The foregoing processes of initializing the motion parameter and iteratively updating the event image are described separately, and in some scenarios, the processes of initializing the motion parameter and iteratively updating the event image in the image processing method provided in the present application may be implemented in combination, and the methods implemented in combination are described below.
Illustratively, in some scenarios, as one terminal device may include a variety of motion sensors, e.g., the terminal device may include both a DVS and an IMU, an event image may be generated from data acquired by the DVS, motion parameters may be initialized using data acquired by the IMU, and then the event image may be iteratively updated based on the initialized motion parameters.
In the following, the motion parameters are exemplified as being initialized based on the data acquired by the IMU, and in some scenarios, the initial motion parameters may be determined based on data acquired by other sensors, such as an accelerometer, a gyroscope, a gravity sensor, a DVS, and the like. Referring to fig. 44, a flowchart of another image processing method is provided.
4401. And acquiring data acquired by the IMU.
The IMU may be configured to measure a triaxial angular velocity and an acceleration of the IMU, and the data collected by the IMU may include an angular velocity ω or an acceleration α of the IMU.
For example, in one scenario, a user may take a photograph through a cell phone, where in addition to an RGB camera (or RGB sensor), a DVS and an IMU may be included, and the data collected by the DVS may be used to provide other auxiliary functions for the photographing of the RGB camera, such as focusing or compensating for the RGB image photographed by the RGB camera. The IMU can detect the motion change condition of the IMU, such as angular velocity or acceleration, and the like, simultaneously when a user shoots by using the mobile phone.
4402. Initializing motion parameters.
The motion parameter may be selected from data collected by the IMU. For example, the data collected by the IMU may include an angular velocity ω or an acceleration α of the IMU. One or more items may be selected from the angular velocity ω, the acceleration α, and the like as the motion parameter.
4403. And acquiring a target task, and determining the iteration time according to the target task.
4404. Motion information is acquired.
4405. At least one frame of event image is generated from the motion information.
Steps 4403-4405 may be described in steps 4001-4003, and are not described herein.
4406. And updating the event image according to the motion parameters to obtain an updated event image.
Step 4406 may refer to the description in step 4005, and is not described herein.
4407. If the iteration duration is reached, step 4409 is executed, and if not, step 4408 is executed.
After the event image is updated at the present time, if the iteration duration of the event image reaches the preset iteration duration, the iteration update of the event image can be terminated, and a final event image is output.
In addition, if the multi-frame event image needs to be iteratively updated, after each event image update, whether the preset iteration duration is reached can be judged, and after all event images are updated, at least one updated event image can be output.
4408. And updating the motion parameters.
If the iteration update of the event image is not completed, after each update is completed, the updated event can be used as input of an optimization model, an optimization algorithm which is preset is used for calculating an optimal value of the optimization model, and the motion parameters are updated according to the optimal value.
It should be noted that, if the current iteration is the last iteration update of at least one frame of event image, step 4408 may be performed, or step 4408 may not be performed, and specifically, the adjustment may be performed according to the actual application scenario.
In one possible implementation, in addition to updating the motion parameters using the optimal values of the optimization model, more accurate motion parameters may be obtained in combination with the data acquired by the IMU. For example, the motion parameter obtained by back-pushing the optimal value of the optimization model is referred to as a motion parameter 1, the motion parameter obtained continuously by using the IMU is referred to as a motion parameter 2, after the motion parameter 1 and the motion parameter 2 are obtained, the motion parameter 1 and the motion parameter 2 may be weighted to obtain a final motion parameter, or one of the motion parameter 1 and the motion parameter 2 may be selected as a final motion parameter, which may be specifically adjusted according to an actual application scenario.
For example, the application can be applied to a scene of motion photography, taking a motion parameter as a motion speed of a camera as an example, and the motion parameter 1 can be calculated by optimizing an optimal value of a model to obtain v1=arg min F (v 1, x), the motion parameter 2 may be v2 selected from the data acquired by the IMU, and after one iteration of the event image, the motion parameter is updated as: θ=ω1×v1+ω2×v2, ω1 is a weight value of the motion parameter 1, and ω2 is a weight value of the motion parameter 2. Of course, one of v1 and v2 may be selected as the new motion parameter.
Specifically, the specific process of updating the motion parameters may refer to the related description in step 4004, which is not described herein.
In addition, after determining to terminate the iterative update to the event image, step 4408 may be performed, i.e. the motion parameter may be updated, or step 4408 may not be performed, i.e. the motion parameter may not be updated, and specifically may be adjusted according to the actual application scenario.
4409. And outputting the updated at least one frame of event image.
After the iterative updating of all the event images in the at least one frame of event images is completed, the final updated at least one frame of event images can be output.
Specifically, step 4409 may refer to the related description in step 4007, and is not described herein.
Therefore, in the embodiment of the application, the motion parameters can be initialized by using the data acquired by the motion sensor, such as the IMU, the accelerometer or the gyroscope, so that the event image can be updated based on the motion parameters later, which is equivalent to providing a higher starting point when the event image is updated, and the updated event image can be obtained efficiently. In the updating process, the iteration time length can be determined according to the target task, so that the event image can be updated on line according to the actual application scene, more application scenes are met, and the generalization capability is high. In addition, in the process of updating the multi-frame event image, the next frame event image is updated by multiplexing the motion parameters obtained when the previous frame event image is updated, so that the event image can be updated by using more accurate motion parameters, and a clearer event image can be obtained efficiently.
The foregoing detailed description of the process flow of optimizing the event image by means of motion compensation provided in the present application will provide a structure of an image processing apparatus, which is used to execute the steps in the first, second or third processes.
First, the present application provides an image processing apparatus, referring to fig. 112, for performing the steps in the second or third process, which may include:
an obtaining module 11201, configured to obtain motion information, where the motion information includes information of a motion track of a target object when the target object moves within a detection range of the motion sensor 11203;
a processing module 11202, configured to generate at least one frame of event image according to the motion information, where the at least one frame of event image is an image that represents a motion track of the target object when motion is generated in the detection range;
the acquiring module 11201 is further configured to acquire a target task, and acquire an iteration duration according to the target task;
the processing module 11202 is further configured to perform iterative update on the at least one frame of event image to obtain an updated at least one frame of event image, where a duration of performing iterative update on the at least one frame of event image does not exceed the iterative duration.
In one possible implementation, the processing module 11202 is specifically configured to: acquiring motion parameters representing parameters of relative motion between the motion sensor and the target object; and iteratively updating one frame of target event image (such as a target event image) in the at least one frame of event images according to the motion parameters to obtain an updated target event image.
In one possible implementation, the processing module 11202 is specifically configured to: acquiring a value of an optimization model preset in the last iteration updating process; and calculating according to the value of the optimization model to obtain the motion parameter.
In one possible implementation, the processing module 11202 is specifically configured to: and compensating the motion trail of the target object in the target event image according to the motion parameters to obtain a target event image obtained by current iteration updating.
In one possible embodiment, the motion parameters include one or more of the following: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
In a possible implementation manner, the processing module 11202 is further configured to terminate, during the updating process of any iteration, the iteration if the result of the current iteration meets a preset condition, where the termination condition includes at least one of the following: and iteratively updating the at least one frame of event images for a preset number of times or enabling the value change of the optimization model in the updating process of the at least one frame of event images to be smaller than a preset value.
The present application also provides an image processing apparatus, referring to fig. 113, which may be used to perform the steps in the foregoing first or third process, the image processing apparatus including:
a processing module 11302, configured to generate at least one frame of event image according to motion information, where the motion information includes information of a motion track of a target object when the target object generates motion within a detection range of a motion sensor, and the at least one frame of event image is an image that represents the motion track of the target object when the target object generates motion within the detection range;
an acquisition module 11301 for acquiring motion parameters, the motion parameters representing parameters of relative motion between the motion sensor 11303 and the target object;
The processing module 11302 is further configured to initialize a value of a preset optimization model according to the motion parameter, so as to obtain the value of the optimization model;
the processing module 11302 is further configured to update the at least one frame of event image according to the value of the optimization model, and obtain the updated at least one frame of event image.
In one possible embodiment, the motion parameters include one or more of the following: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
In one possible implementation manner, the obtaining module 11302 is specifically configured to: acquiring data acquired by an Inertial Measurement Unit (IMU) sensor; and calculating the motion parameters according to the data acquired by the IMU sensor.
In a possible implementation manner, the processing module 11302 is further configured to update, after initializing a value of a preset optimization model according to the motion parameter, a parameter of the IMU sensor according to the value of the optimization model, where the parameter of the IMU sensor is used for the IMU sensor to acquire data.
(2) Image reconstruction
The foregoing describes the manner of compensating and optimizing the event image by the motion parameters, and in another possible implementation manner, the RGB image may be reconstructed by combining the data acquired by the motion sensor, so that the reconstructed RGB image may be used for further applications, such as license plate recognition, two-dimensional code recognition, or guideboard recognition.
Typically, in the course of image reconstruction, a neural network may be used to output the reconstructed image. However, as the complexity of the image is higher, the complexity of the computation is also higher. For example, when the dimension of the two-dimensional code is higher, the calculation complexity is larger, and the efficiency of reconstructing the image is lower. Therefore, the application provides an image processing method, which is used for reconstructing an image of information acquired by a motion sensor and efficiently and accurately obtaining a reconstructed image.
First, the specific flow of the image processing method provided in the present application may include: acquiring motion information, wherein the motion information comprises information of a motion track of a target object when the target object moves within a detection range of a motion sensor; generating an event image according to the motion information, wherein the event image is an image representing a motion track of a target object when the target object moves within a detection range; and determining a color type corresponding to each pixel point in the event image according to at least one event included in the event image, and obtaining a first reconstructed image, wherein the color type of the first pixel point is different from that of at least one second pixel point, the first pixel point is a pixel point corresponding to any one event in at least one of the first reconstructed images, and the at least one second pixel point is included in a plurality of pixel points adjacent to the first pixel point in the first reconstructed image. Therefore, in the embodiment of the present application, the information acquired by the motion sensor may be used to reconstruct an image, so as to obtain a reconstructed image, and the reconstructed image may be used subsequently to perform image recognition, target detection, and the like.
Specifically, the event image may be an image obtained by accumulating N events at the corresponding positions of the events (or positions corrected by compensating along the motion trajectory) within a period of time, and the value of the position in the image where no event is generated is typically 0.
In some scenes such as the movement of a target object or the existence of shake of an imaging device, the image reconstruction can be performed by using information acquired by a motion sensor such as DVS, so that subsequent operations such as image recognition or target detection can be performed according to the reconstructed image.
For example, in some license plate recognition scenes of a garage, when a vehicle enters a garage entrance, a license plate recognition system provided at the entrance can shoot a license plate through an RGB camera, and then recognize the license plate number from the shot image. In general, RGB images that may be photographed due to movement of a vehicle are unclear, resulting in low license plate recognition efficiency. Taking a motion sensor arranged in a license plate recognition system as an example, the license plate recognition system can reconstruct an image through information acquired by the DVS by taking the DVS as an example, and the image of the license plate is quickly reconstructed, so that the license plate recognition efficiency is improved.
For example, in some two-dimensional code recognition scenes, the two-dimensional code image is unclear and cannot be recognized due to shake or unfixed two-dimensional code and other conditions of a user handheld terminal, or the two-dimensional code cannot be recognized due to the fact that a camera is turned on to scan the two-dimensional code in a scene with a large light ratio, such as a dark night environment, a flash lamp on the terminal causes overexposure of the two-dimensional code, and therefore the two-dimensional code cannot be recognized. Taking a motion sensor arranged in a terminal as an example of DVS, in the embodiment of the application, the terminal can use information acquired by the DVS to quickly reconstruct an image, and a reconstructed two-dimensional code image is obtained, so that the efficient identification of the two-dimensional code is realized.
The method of image processing provided in the present application is described in detail below.
Referring to fig. 45, another image processing method provided in the present application is a flowchart.
4501. Motion information is acquired.
The motion condition of the target object in the detection range of the motion sensor can be monitored through the motion sensor, and the motion information of the target object in the detection range can be obtained. Wherein the target object is an object moving within the detection range, the number of the target objects can be one or more, and the movement information can comprise information of a movement track of the target object when moving within the detection range.
It should be noted that, in the embodiments of the present application, the object or object in motion is an object or object having a relative motion with the motion sensor, and the motion referred to in the present application is understood to be a motion existing with respect to the motion sensor.
For example, the motion information may include information such as a size of an area in which the target object is located when the target object moves within the detection range, coordinates of a frame or corner point within the detection range, and the like.
For ease of understanding, the region in which the target object is located at each time of detection when the target object moves within the detection range is hereinafter referred to as a movement region of the target object. For example, if the target object is a pedestrian and the pedestrian is performing a whole-body motion, the whole-body motion of the pedestrian may be included in the motion region, and if the pedestrian is moving only with the arm, the target object may be merely the arm of the pedestrian and the motion region may include the arm portion of the pedestrian.
4502. Event images are generated from the motion information.
Wherein, after obtaining the motion information, at least one frame of event image can be generated from the information acquired by the motion sensor within the detection range. In general, the motion information may include information of a track of a target object moving within a detection range within a period of time, so that an image corresponding to the detection range is generated, and the track of the target object included in the motion information is mapped into the image to obtain an event image.
For example, the event images may be described with reference to FIGS. 35-37 and related description. Alternatively, the event image may be an image obtained by optimizing the event image by the motion compensation method described above.
In one possible embodiment, the method provided by the present application may further include: and compensating the event image according to the motion parameters during the relative motion between the target object and the motion sensor, and obtaining the compensated event image. The motion parameter represents a parameter related to a relative motion between the sensor and the target object, e.g. the motion parameter comprises one or more of: depth, optical flow information, acceleration of movement of the motion sensor, or angular velocity of movement of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a movement velocity of relative movement between the motion sensor and the target object. Therefore, in the embodiment of the application, the event image can be compensated through the motion parameters to obtain a clearer event image, so that a clearer reconstructed image can be obtained when the image reconstruction is carried out later.
For example, taking the motion track of the target object as uniform linear motion and the motion sensor as DVS as an example, the time window [ t, t+Δt ]]Inner t k Time DVS capture position x k The motion track of the target object can be divided into a plurality of sections of linear motion, and θ (i.e. motion parameter) represents the motion speed of the target object in the event image, so that the position x of the event image after the motion compensation of the target object is performed k ' is: x's' k =x k -(t k -t ref ) θ, motion-compensating all events of the event image, thereby obtaining a compensated event image.
In one possible implementation, the motion parameter may be extracted from data acquired by the IMU, or may be calculated based on data acquired by a motion sensor, such as DVS, or the like.
For example, if the motion parameters include not only parameters related to the motion sensor itself but also a motion speed, a motion direction, a motion acceleration, or the like of the target object, the motion parameters may be calculated from information acquired by the motion sensor.
For another example, if the motion parameters include parameters related to the motion sensor itself, such as optical flow information, acceleration of the motion sensor, angular velocity or depth of the motion sensor, the motion sensor may obtain parameters related to the motion sensor itself through information collected by the motion sensor, or IMU, gyroscope, accelerometer, or the like.
For example, taking the motion parameter obtained from the data collected by the IMU as an example, the data collected by the IMU may include the angular velocity ω or the acceleration α of the IMU. One or more items may be selected from the angular velocity ω, the acceleration α, and the like as the motion parameter.
Optionally, the event image may be further optimized by a method corresponding to the foregoing fig. 38-44, so as to obtain a clearer event image, and specifically, reference may be made to the foregoing description related to fig. 38-44, which is not repeated herein.
4503. And determining a color type corresponding to each pixel point in the event image from a preset color pool according to at least one event included in the event image, and obtaining a first reconstructed image.
The event image may be any one of the at least one frame image, or may be one frame image selected from the at least one frame image. For example, the information collected by the DVS in a period of time may be converted into a multi-frame event image, and one frame of event image may be arbitrarily selected from the multi-frame event image to perform color reconstruction, so as to obtain a frame of reconstructed image; or selecting one frame (such as a first frame, a fifth frame or a last frame) from the multi-frame event image to perform color reconstruction, so as to obtain a frame reconstructed image; alternatively, color reconstruction may be performed on a plurality of event images or all event images selected from the plurality of event images, so as to obtain a plurality of reconstructed images, where the process of performing color reconstruction for each of the plurality of event images is similar.
Wherein the color pool is optional, i.e. the color pool is not used to determine the color type of the pixel point. For example, when an event image is scanned, the default starting color type is white, and when the next pixel is scanned to the event, the next pixel is default to black, that is, only the default color type is used, and selection from a color pool is not required. In the embodiments of the present application, for ease of understanding, the color type selected from the color pool is exemplified by an example, and not by way of limitation. In an actual application scene, the color type of the set color pool or the default pixel point can be determined to be a fixed color type according to actual requirements.
For ease of understanding, the present application will illustratively describe the process of reconstructing a frame of event images to obtain a reconstructed image.
The color type corresponding to the pixel point corresponding to each event and the color type corresponding to the pixel point adjacent to the event can be determined from a preset color pool according to the position of the event included in the event image, so that a first reconstructed image is obtained. Specifically, taking one pixel point with an event as an example, the pixel point is hereinafter referred to as a first pixel point, at least one pixel point adjacent to the first pixel point has a color type different from that of the first pixel point, and a pixel point adjacent to the first pixel point and different from that of the first pixel point is hereinafter referred to as a second pixel point, if a certain area in the event image is a continuous pixel point without an event, the color types corresponding to the pixel points in the area are the same or the change of the illumination intensity is small, for example, the change of the illumination intensity is smaller than a threshold value.
In particular, the specific way of obtaining the first reconstructed image may comprise: and scanning each pixel point in the event image according to a first direction, determining a color type corresponding to each pixel point in the event image from a preset color pool, and obtaining a first reconstructed image, wherein if the event is scanned at the first pixel point, determining that the color type of the first pixel point is a first color type, a second pixel point arranged in front of the first pixel point according to the first direction does not have the event, the color type corresponding to the second pixel point is a second color type, and the first color type and the second color type are two color types included in the color pool.
For example, the color pool may include color type 1 and color type 2, each pixel in the event image may be scanned in a row-by-row or column-by-column manner, before the event is not scanned, color type 2 may be reconstructed for each pixel, when an event exists in a certain pixel scanned, color type 1 may be reconstructed for the pixel, if a continuous segment of pixels scanned subsequently does not have the event, the color type corresponding to the continuous segment of pixels is also color type 1, after the continuous pixel, if the pixel having the event is scanned again, color type 2 may be reconstructed for the pixel, and if a continuous segment of pixels scanned subsequently does not have the event, and so on, so as to obtain a reconstructed image.
In one possible implementation manner, if a plurality of consecutive third pixels arranged after the first pixel in the first direction do not have an event, the color type corresponding to the plurality of third pixels is the same as the color type of the first pixel, that is, the color type corresponding to the plurality of third pixels is the first color type.
For example, for ease of understanding, the event image may be represented as shown in fig. 46A, each "≡" represents that a pixel has an event, S1-S5 in fig. 46A, "≡" represents that a pixel does not have an event, and after the event image is obtained, the color type corresponding to each pixel is determined. Illustratively, the resulting reconstructed image may be as shown in fig. 46B, wherein the event image may be scanned in a row-by-row manner. When scanning the first row, the initial color can be set as a second color type, when scanning the first row and having an event (S1), the color type is changed, the color type corresponding to the pixel point is rebuilt into a first color type, and if no continuous multiple pixel points behind the pixel point have the event, the color type of the continuous pixel point is the same as the color type of the first event, and the continuous pixel point is the first color type. When the event S2 is scanned, setting the color type of the pixel point corresponding to the S2 as the first color type, wherein a plurality of pixel points after the S2 have no event, the color types of the plurality of pixel points are all set as the first color type, and the reconstruction mode of the third row and the second row is similar, and is not repeated. In the fourth line of scanning, the initial color type is also set to the second color type, and in the first event S4, the color type of the pixel where S4 is located is set to the first color type, and the color type of the pixel between S4 and S5 is also the first color type. After scanning to S5, the reconstructed color type is changed to be the second color type, the color type of the pixel point corresponding to S5 is the second color type, and the pixel point after S5 is also the second color type. In the fifth row, no event exists in the fifth row, that is, the color types of the pixels in the fifth row are all the second color types.
In one possible embodiment, if the fourth pixel arranged after the first pixel in the first direction and adjacent to the first pixel has an event, and the fifth pixel arranged after the fourth pixel in the first direction and adjacent to the fourth pixel does not have an event, the color types corresponding to the fourth pixel and the fifth pixel are both the first color type. It will be appreciated that when there are at least two consecutive pixels in the object image that have events, the reconstructed color type may not be changed when the second event is scanned, thereby avoiding the unclear edges of the reconstructed image due to the too wide edges of the object.
For example, as shown in fig. 47A, an event image in which two consecutive pixels each have an event may be shown in fig. 47A, and the event image shown in fig. 47A is similar to the event image shown in fig. 46A, and the description thereof will be omitted. Some differences are that in the first row of fig. 47A, there are two consecutive pixel points S1 and S2 with an event, and the reconstructed image obtained after scanning the event image may be as shown in fig. 47B, in the first row, the event image is scanned with the second color type as the initial color type, when the event S1 is scanned, the reconstructed color type is changed to be the first color type, and when the event S2 is scanned, because S2 is adjacent to S1, the reconstructed color type may not be changed at this time, that is, the color type corresponding to S2 is also the first color type, so that the phenomenon of edge interlayer is avoided, and the accuracy of the reconstructed image is improved.
In one possible implementation, the direction in which the event image is scanned may be preset or determined according to information acquired by the motion sensor.
For example, scanning may be set in advance in accordance with a row or a column of the event image.
For another example, if the terminal is used to photograph the two-dimensional code, and the DVS and the IMU are disposed in the terminal, when the two-dimensional code is photographed, the motion information collected by the DVS may be used to generate an event image, and the IMU may be used to determine a moving direction of the terminal, and then the moving direction is set as a direction in which the event image is scanned.
For another example, a DVS is disposed in the terminal, and when the two-dimensional code is photographed, motion information collected by the DVS may be used to generate an event image, and then, a motion direction of the terminal may be calculated according to the information collected by the DVS, and then, the motion direction is set as a direction in which the event image is scanned. Alternatively, the moving direction of the terminal may be recognized from information captured by the RGB camera, and the moving direction may be set as a direction in which the event image is scanned.
For another example, if the license plate is recognized by using the license plate recognition system, an image of the license plate needs to be photographed. The event image may be generated by the information acquired by the DVS, and the moving direction of the vehicle may also be calculated by the information acquired by the DVS, so that the moving direction is set as a direction in which the event image is scanned. Alternatively, the moving direction of the terminal may be recognized by information captured by the RGB camera, and set as a direction in which the event image is scanned.
In a possible embodiment, the color types included in the color pool may be set before the reconstruction based on the event image, and the setting may include multiple types, for example, two or more types of colors (such as black and white) may be preset as default, a color histogram of the RGB image may be generated after the RGB image captured by the RGB camera is obtained, two or more types of colors with the largest ratio may be selected from the histogram and added to the color pool, or input data of a user may be received, the color type may be determined from the input data and added to the color pool, and the like.
In one possible implementation, after the event image is obtained, the RGB images collected by the RGB camera may be combined to perform fusion, so as to obtain a clearer image, so as to facilitate subsequent tasks such as identification or classification.
Specifically, the event image and the RGB image may be divided into a plurality of regions, where the positions of the plurality of regions divided in the event image and the plurality of regions in the RGB image are the same, and then the blur degree of each region in the RGB image is measured (for example, using variance, laplace transform, etc.), when it is determined that the blur degree of a certain region is lower than the preset blur degree, image reconstruction is performed based on the region corresponding to the region in the event image, the reconstruction process may refer to the reconstruction process in step 4503, to obtain a reconstructed image in the region, and then the reconstructed image in the region and the RGB image are spliced, for example, the region with the blur degree lower than the preset blur degree in the RGB image is replaced by the reconstructed image in the region, so as to obtain the final reconstructed image. For example, the compensation reconstruction may be performed for the worse quality parts of the RGB image, while the better quality parts remain unchanged. Specifically, for example, for a two-dimensional code that generates a highlight-existing portion, the highlight portion can be reconstructed, while the portion that does not exist can be kept unchanged.
It can be understood that when the image reconstruction is performed, only a certain region in the event image can be reconstructed without completely reconstructing the event image, and the reconstructed image in the region is fused with the RGB image to obtain a new reconstructed image, namely a new first reconstructed image, so that the size of the region required to perform image reconstruction is reduced, and the efficiency of obtaining the reconstructed image is further improved.
After the first reconstructed image is scanned, other operations may be performed on the first reconstructed image, as shown in steps 4504-4506 below. It may be understood that, in the embodiment of the present application, the steps 4504-4506 are optional steps, any step of the steps 4504-4506 may be continuously performed, or the steps 4504-4506 may not be performed, and may specifically be adjusted according to an actual application scenario, which is not limited in this application.
4504. And scanning the event image for multiple times according to different directions to obtain a multi-frame reconstructed image, and fusing the multi-frame reconstructed image to obtain an updated first reconstructed image.
It should be noted that, step 4504 in the embodiment of the present application is an optional step, and specifically, whether multiple scans are needed may be determined according to an actual application scenario, which is not limited in this application. For example, taking a scene identified by a two-dimensional code as an example, if a frame of reconstructed image including the two-dimensional code is reconstructed and then the two-dimensional code cannot be identified, the event image may be scanned multiple times in different directions, so as to obtain an updated reconstructed image.
In one possible implementation, the same frame of event image may be scanned and reconstructed in a plurality of different directions to obtain a multi-frame reconstructed image, and then the multi-frame reconstructed image may be fused to output a final more accurate reconstructed image.
In particular, there are various ways to fuse the multi-frame reconstructed images, which are exemplary. The fusion may be performed by taking each pixel in the multi-frame reconstructed image as a unit, taking one pixel as an example, and if the values of the pixels in each frame of reconstructed image are the same, the same value is taken as the value of the pixel in the final reconstructed image. If the values of the pixels in each frame of reconstructed image are different, the values of the pixels in each frame of reconstructed image may be weighted and fused to obtain the values of the pixels in the final reconstructed image, or the values of the pixels may be determined by voting, for example, if the values of the pixels in the same position in the 4 frames of reconstructed image are respectively 1,1,1,0,1 and are greater than 0, the values of the pixels in the same position in the final reconstructed image are determined to be 1.
4505. Whether the first reconstructed image meets the preset condition is determined, if yes, step 4506 is performed, and if not, step 4501 is performed.
In the method provided by the embodiment of the application, whether the first reconstructed image meets the preset requirement can be further judged, if the first reconstructed image does not meet the preset requirement, the motion information can be acquired again, a new event image is acquired according to the new motion information, and then a new reconstructed image is obtained. The preset requirements may include, but are not limited to: the sharpness of the reconstructed image does not reach a preset value, or information included in the reconstructed image is not recognized, or the accuracy of the result recognized in the reconstructed image is lower than a threshold value, or the like.
For example, in the process of scanning the two-dimensional code by using the terminal, the terminal generates a frame of event image through the information acquired by the DVS, and performs image reconstruction according to the event image to obtain a frame of reconstructed image, if the reconstructed image is identified but the two-dimensional code is not identified, the information acquired by the DVS can be acquired again, a new event image is obtained, a new reconstructed image is obtained, and so on until the two-dimensional code is identified.
4506. Other treatments.
After the first reconstructed image meeting the preset requirement is obtained, other processing can be performed on the first reconstructed image, for example, identifying information included in the first reconstructed image, or saving the first reconstructed image, etc., and in different scenes, the processing manner of the first reconstructed image may be different, and in particular, the processing manner may be adjusted according to the actual application scene.
Therefore, in the embodiment of the application, the information acquired by the motion sensor can be used for image reconstruction, and the reconstructed image can be obtained efficiently and quickly, so that the efficiency of image recognition, image classification and the like of the reconstructed image can be improved. In some scenes such as shooting moving objects or shooting jitters, clear RGB images cannot be shot, image reconstruction can be performed through information acquired by a motion sensor, and clearer images can be quickly and accurately reconstructed so as to facilitate subsequent tasks such as recognition or classification.
The foregoing describes in detail the flow of the method of image processing provided in the present application. For easy understanding, the method of image processing provided in the present application will be described in more detail below with reference to the foregoing flow by taking a specific application scenario as an example.
Referring to fig. 48, another image processing method provided in the present application is a flowchart, as follows.
First, an event image 4801 is acquired.
The method can be obtained through data acquired by the DVS, the data of the DVS is accumulated and framed, and the specific framing mode can adopt a mode of accumulating according to time, accumulating according to the number of the events or accumulating according to time planes to obtain event images.
In general, the photographed object may be an object having two or more colors, such as a two-dimensional code, a applet code, a bar code, a license plate, a guideboard, etc., and the DVS may respond to a region where brightness changes, taking the photographed object as a two-color example, only an edge of the object is changed in brightness, so that a clearer-edged event image may be obtained using characteristics of the DVS and characteristics of the two-color object.
Preprocessing 4802, operations such as denoising or motion compensation can be performed on the event image. The denoising target is to remove noise irrelevant to the edge of the target object, and the denoising modes can comprise various modes such as neighborhood denoising, point cloud denoising and the like. The motion compensation may compensate the event image using motion parameters, which may include a motion speed, an angular speed, an acceleration, etc. of the DVS, and may further include a motion speed or an acceleration, etc. of the target object. The motion compensation can make the boundary of the target object in the event image clearer, and the edge is compensated by combining the event in time, so that a clearer and more accurate event image is obtained.
Image reconstruction 4803, that is, image reconstruction is performed based on the event image obtained by the preprocessing 4802, to obtain a reconstructed bicolor image.
For ease of understanding, the embodiment of the present application is exemplified by taking the reconstructed image as a bi-color image having two colors, and not by way of limitation, the bi-color image reconstruction 4803 may refer to the reconstruction process shown in fig. 49.
First, two colors 4901 are initialized.
Taking two color types included in the color pool as an example, initializing two colors 4901, that is, initializing the color type used when reconstructing an image. The two colors are colors in a two-color image. The two colors may be obtained in a variety of ways. Such as: defaulting to black and white images; identifying the type judgment color of the bicolor image, such as a bar code, and defaulting to black and white color; obtained from other sensors (sensors), such as calling an RGB camera to capture an image, determining the region of the bi-color image in the RGB image from the region of the DVS bi-color image, and then counting the two main colors (histogram method, etc.) in the region, with the two colors as an initial color pool. For example, in a scene where a two-dimensional code is scanned, two color types may be initialized to black and white, so that an image including the two-dimensional code with white and black may be reconstructed when image reconstruction is performed subsequently. For example, in a scene of scanning the license plate, two colors may be selected from a histogram of an area of the license plate included in the RGB image, if the license plate is a blue-background white character, two color types may be initialized to be blue and white, if the license plate is a yellow-background black character, two color types may be initialized to be yellow and black, and the like, and the adjustment may be specifically performed according to an actual application scene or a requirement, which is not limited herein.
Then, the event image 4902 is scanned. The event image may be scanned in a direction of movement of the target object, a direction of movement of the DVS, a line-by-line or column-by-column manner, etc., to scan whether each pixel has an event.
In the process of scanning the event image, it may be determined whether the scanning is ended 4903, if the scanning is ended, step 4907 is executed, and if the scanning is not ended, steps 4902 to 4906 may be executed continuously.
If the scanning is not finished, it may be determined whether the current pixel includes the event 4904, if the current pixel includes the event, the color 4905 may be changed, that is, the current pixel is set in a different color type of the previous pixel, for example, if the color corresponding to the previous pixel is white, the color corresponding to the current pixel is different black.
If the current pixel point does not include the event, the color type corresponding to the current pixel point does not need to be changed, namely the color corresponding to the current pixel point is the same as the color type corresponding to the previous pixel point. For example, if the color type corresponding to the previous pixel is white, the current pixel corresponds to the same color type as white.
After determining the color type corresponding to the current pixel, the color 4906 of the current pixel in the bi-color image may be reconstructed. For example, if the color type of the current pixel is white, the color type of the pixel is set to be white in the two-color image.
For example, in a scene in which the target object is a two-dimensional code, the event image is scanned line by line in the two-dimensional code reconstruction area. In the process of scanning each row of pixel points, the color of the scanned pixel points is reconstructed to black before no event is scanned. When an event is encountered for the first time during a scan, a white color is reconstructed. In a subsequent scan, when a scanned pixel has an event and no previous event has occurred or no subsequent pixel has occurred, then the color is changed and a new color is placed on the pixel location. The color of the previous pixel point is white, and the color of the current pixel point is changed to black.
And then continuing to scan the event image and reconstructing the colors of the pixel points in the bicolor image, namely circularly executing the steps 4902-4906 until the event image is scanned, namely, all the pixel points in the scanned event image are reconstructed, and reconstructing the colors of all the pixel points in the bicolor image to obtain the bicolor image.
After the two-color image is obtained, it may be further determined whether the two-color image satisfies the requirement 4907, and if the two-color image satisfies the requirement, the two-color image 4908 may be output. If the bi-color image does not meet the requirements, the scan direction may be selected to be changed 4909 and then the event image may be rescanned, i.e., steps 4902-4907 may be repeated until a bi-color image is obtained that meets the requirements.
The requirement may be that the recognition accuracy of the two-color image exceeds a preset accuracy value, or that information included in the two-color image is recognized, or that the degree of blurring of the two-color image is lower than a preset degree of blurring, or that the number of times of image reconstruction is performed exceeds a preset number of times, or the like. For example, if the reconstructed object is a two-dimensional code, a two-color image can be identified, if the two-dimensional code is identified and information included in the two-dimensional code is obtained, the two-color image meets the requirement, and the image reconstruction process can be terminated.
For example, taking a scanned two-dimensional code as an example, an event image obtained based on data acquired by DVS may be shown in fig. 50, and each pixel point in the event image may be scanned row by row or column by column, to obtain a final two-color image, where the two-color image is shown in fig. 51.
In addition, after the scanning direction is changed and the event image is scanned again, a new bicolor image can be obtained, and the new bicolor image and one or more frames of bicolor images obtained before the scanning direction is changed can be fused, so that a fused bicolor image is obtained. It can be understood that the event image is scanned by adopting a plurality of different directions, so that a plurality of image reconstruction is completed, and then the reconstructed images in different directions are fused. The fusion mode can be, for example, a voting method on pixels, so as to obtain a final bicolor image.
Therefore, in the embodiment of the application, the two-color image can be quickly reconstructed. The above embodiment repeatedly uses the dual-color image and the DVS trigger mechanism, and can implement fast reconstruction by simply scanning the image. And the algorithm has low complexity and can be realized without depending on the complex algorithm. When the two-color image and the DVS sensor have relative motion, an event (event) generated by the two-color image in the DVS can quickly obtain the original color two-color image for quick reconstruction, detection and identification. Higher reconstruction accuracy can be obtained for fast, high dynamic range, low latency requirements, etc. scenes and low computational power devices. The characteristics of the two-color image and the DVS event triggering mechanism are fully utilized, and the two-color image can be quickly reconstructed by scanning the image, so that the rapid identification and detection are facilitated. The recognition accuracy of the two-color image can be improved in a high dynamic range environment and a scene with fast motion. The method has the advantages of simplicity, low computational complexity and higher robustness of the reconstruction algorithm, and can improve the recognition speed of the bicolor images.
In addition, the image reconstruction can be performed by combining the RGB images acquired by the RGB camera. Specifically, after the RGB image is obtained, an area to be subjected to image reconstruction is determined, then a corresponding area is determined from the event image, and then the corresponding area in the event image is scanned to reconstruct a bicolor image, and the reconstruction process is similar to the above 4901-4909, and is not repeated here. After the two-color image of the region is obtained, the two-color image and the RGB image are fused, for example, the region in the RGB image is replaced by the two-color image, or the pixel value of the two-color image and the pixel value of the region in the RGB image are weighted and fused, so that the finally obtained fused image is clearer than the RGB image.
Furthermore, it should be noted that, if there may be parallax between the RGB camera and the DVS, that is, the field of view of the RGB camera and the DVS may be different, the RGB image and the event image may be registered before the image is reconstructed, for example, the RGB image and the event image may be aligned using the same coordinate system, so that the RGB image and the event are in the same coordinate system.
Specifically, for example, when no parallax exists between RGB and DVS, the region with poor quality in the RGB image can be directly reconstructed, and the region with good quality can be directly used. Such as high light areas in RGB images, bi-color images are of poor quality in RGB images and are difficult to distinguish. DVS has a high dynamic range and boundaries can be distinguished. The highlight portion can be reconstructed using the fast scan reconstruction approach provided herein, and the non-highlight portion can directly use RGB data. The image quality can be measured by using contrast, and when the contrast is smaller than a threshold value, the image is considered to be a poor quality area. In this embodiment of the present application, the difference between the two types of RGB and DVS data may also be used to distinguish the quality of the original RGB image. Specifically, edge information of the RGB image is extracted and then compared with the event image. The contrast mode comprises pixel-by-pixel difference and the like, the event image can be segmented to calculate the contrast, the place with larger difference of the two data contrast is the place with poor image quality, the quick reconstruction mode is needed to be used for reconstruction, and the other areas adopt the data in RGB.
Therefore, in the embodiment of the application, the image reconstruction can be performed by combining the RGB images, and the reconstruction is performed on the part with poor quality in the RGB images through the information acquired by the DVS, so that clear images can be obtained rapidly and accurately, and the subsequent tasks such as image recognition or image classification can be performed efficiently.
The present application further provides an image processing apparatus, referring to fig. 114, which may be configured to execute steps of a method flow corresponding to the foregoing fig. 45 to 51, where the image processing apparatus may include:
an acquiring module 11401, configured to acquire motion information, where the motion information includes information of a motion track of the target object when the target object moves within a detection range of the motion sensor 11403;
a processing module 11402, configured to generate an event image according to the motion information, where the event image is an image representing a motion track of the target object when the target object moves within the detection range;
the processing module 11402 is further configured to obtain a first reconstructed image according to at least one event included in the event image, where a color type of a first pixel is different from a color type of at least one second pixel, where the first pixel is a pixel corresponding to any one event in the at least one first reconstructed image, and the at least one second pixel is included in a plurality of pixels adjacent to the first pixel in the first reconstructed image.
Alternatively, the image processing apparatus may also execute the steps of the method flow corresponding to fig. 38 to 44.
In a possible implementation manner, the processing module 11402 is specifically configured to scan each pixel in the event image according to a first direction, determine a color type corresponding to each pixel in the event image, and obtain a first reconstructed image, where if the first pixel is scanned to have an event, the color type of the first pixel is determined to be a first color type, and if a second pixel arranged in front of the first pixel according to the first direction does not have an event, the color type corresponding to the second pixel is a second color type, the first color type and the second color type are different color types, and a pixel with an event indicates a pixel corresponding to a position where the motion sensor detects that there is a change in the event image.
In one possible implementation manner, the first direction is a preset direction, or the first direction is a direction determined according to data acquired by the IMU, or the first direction is a direction determined according to an image captured by the color RGB camera.
In one possible implementation manner, if a plurality of consecutive third pixels arranged after the first pixel according to the first direction do not have an event, the color type corresponding to the plurality of third pixels is the first color type.
In one possible implementation manner, if a fourth pixel point arranged after the first pixel point in the first direction and adjacent to the first pixel point has an event, and a fifth pixel point arranged after the fourth pixel point in the first direction and adjacent to the fourth pixel point does not have an event, the color types corresponding to the fourth pixel point and the fifth pixel point are both the first color type.
In a possible implementation manner, the processing module 11402 is further configured to scan each pixel in the event image according to a first direction, determine a color type corresponding to each pixel in the event image, obtain a first reconstructed image, scan the event image according to a second direction, determine a color type corresponding to each pixel in the event image, and obtain a second reconstructed image, where the second direction is different from the first direction; and fusing the first reconstructed image and the second reconstructed image to obtain the updated first reconstructed image.
In a possible implementation manner, the processing module 11402 is further configured to update the motion information if the first reconstructed image does not meet the preset requirement, update the event image according to the updated motion information, and obtain an updated first reconstructed image according to the updated event image.
In a possible implementation manner, the processing module 11402 is further configured to, before determining, according to at least one event included in the event image, a color type corresponding to each pixel in the event image, and obtaining a first reconstructed image, compensate the event image according to a motion parameter during a relative motion between the target object and the motion sensor, to obtain the compensated event image, where the motion parameter includes one or more of: depth, optical flow information, acceleration of the motion sensor or angular velocity of the motion sensor, the depth representing a distance between the motion sensor and the target object, the optical flow information representing information of a motion velocity of a relative motion between the motion sensor and the target object.
In one possible implementation, the color type of the pixel point in the reconstructed image is determined according to the color acquired by the color RGB camera.
In one possible implementation, the processing module 11402 is further configured to: obtaining an RGB image according to the data acquired by the RGB camera; and fusing the RGB image and the first reconstruction image to obtain the updated first reconstruction image.
3. Application flow
The method for optimizing the image is described in detail, and after the optimized RGB image or the event image is obtained, the optimized event image can be used for further application. Alternatively, the data acquisition and data encoding and decoding part may acquire RGB images or event images, and the acquired RGB images or event images may be used for further application, and specific application scenarios of the RGB images or event images are described below.
The application scene provided in the application has a plurality of corresponding method flows, and may specifically include scenes such as motion photography enhancement, DVS image and RGB image fusion, key frame selection, SLAM or pose estimation, and the like, and the following are respectively described in an exemplary manner.
(1) Motion photography enhancement
Photographing is a commonly used function, for example, a terminal may be provided with or connected with a color (RGB) camera, and a user may photograph RGB images. In some scenes, it is possible to take pictures of some moving objects, or in scenes with moving cameras, and also in environments where there is a large difference in illumination intensity.
Typically, in an RGB image captured by a camera, it may be represented by information of a plurality of channels, each of which is represented by a limited range, such as a range of 0-255. In an actual application scene, a scene with a larger difference between the maximum illumination intensity and the minimum illumination intensity may exist, and a range of 0-255 may not represent different illumination intensities presented in the actual scene, so that the texture of the finally obtained RGB image is not abundant enough, and a condition such as blurring exists in a visual interface. Or in an actual application scene, situations such as lens shake or high-speed movement of an object in a shooting range may occur, so that an RGB image obtained by final shooting is blurred, an image presented to a user in a visual interface is unclear, and user experience is reduced.
In some scenes, in order to obtain an image including more information, a high dynamic range image (high dynamic range image, HDR) can be obtained by fusing images with different exposure durations, and simultaneously, texture information of a bright part and a dark part in the scene is captured, so that the definition of the finally obtained image is improved. For example, two images of short exposure time and long exposure time can be shot, and then the images corresponding to the short exposure time and the long exposure time are fused to obtain HDR, wherein the HDR comprises richer information, so that the image finally presented to the user in the visual interface is clearer. In a shooting scene, scenes such as larger difference between maximum illumination intensity and minimum illumination intensity (hereinafter referred to as large light ratio), lens shake or high-speed movement of an object in a shooting range may occur, so that a finally obtained image is blurred, and user experience is reduced.
The image processing method provided by the application can be applied to various scenes, such as shooting scenes, monitoring scenes and the like, and is used for shooting clearer images or enabling the shot images to be clearer and the like. For example, in one scenario, a user may use a terminal to take one or more clearer images, or, after taking multiple images, combine the multiple images into one clearer image, or the like. The image processing method provided by the application includes various embodiments, for example, the first embodiment may be that one or more images are shot in the shooting process, for example, the moving object is shot by combining the information acquired by the motion sensor, the second embodiment is that a plurality of images are shot, and then the plurality of images are combined to obtain a higher-definition image. In the second embodiment, the process of capturing a plurality of images may be implemented by referring to the manner of capturing images in the first embodiment and the second embodiment provided in the present application, or may be implemented in combination, specifically, may be adjusted according to the actual application scenario, and the present application only describes the manner of implementing the first embodiment and the second embodiment separately, but is not limited thereto.
In a specific embodiment, for example, a user uses a mobile phone with a shooting function to shoot a scene with a moving object, a mode of shooting the moving object can be selected, after the user clicks a shooting button, focusing on the moving object can be automatically completed by combining information acquired by DVS, and one or more images can be shot, so that a clearer image of the moving object can be shot.
In the second embodiment, for example, in some large light ratio or sports scenes, a user may use a mobile phone with a shooting function to shoot, after the user clicks a shooting button, the mobile phone may automatically set different exposure time periods to shoot multiple images, and fuse the shot multiple images to obtain the final HDR. The method of capturing a plurality of images individually may be referred to as the capturing method according to the first embodiment, so that a clear image is captured efficiently.
The first and second embodiments are described below, and the first and second embodiments may be implemented separately or in combination, and may be specifically adjusted according to the actual application scenario.
Embodiment one, image processing method when moving object exists in shooting range
With the rapid development and wide spread of smartphones and digital cameras, etc., users' demands for photography are also becoming stronger. However, the existing mobile phones or digital cameras and the like can cover most of shooting scenes, but the shooting of sports is not satisfactory, and the shooting is particularly performed at the moment when a user needs to accurately grasp shooting, and the moment of the sports can be grasped, and the focusing and exposure control and other operation skills of a sports area influence the final imaging effect. In the existing scheme, a color (RGB) camera is generally adopted for shooting, the RGB camera is usually triggered to shoot at the moment of movement manually by a user, the user needs to select an area to focus before shooting, and then a shutter (or shooting key of a mobile phone) is selected to be pressed at a proper time when movement occurs to record the moment of movement; specifically, a series of processes such as focusing, focus locking, shutter pressing, exposure, output and the like are required to be triggered according to the operation of a user, and finally an image is output. However, if the user triggers operations such as focusing and focus locking, an optimal trigger time point may not be determined, resulting in unclear captured images and reduced user experience.
Therefore, the application provides an image processing method, in a moving scene, focusing on a moving target object is completed by capturing a moving track of the target object during moving, and the definition of an obtained image is improved.
It should be noted that, in the scene where a moving object exists in the shooting range mentioned in the present application, the situation that the camera moves relative to the object in the shooting range exists, in the actual application scene, the camera may move, or the object in the shooting range may move, or the camera and the object in the shooting range may move simultaneously, specifically, the camera may be adjusted according to the actual application scene, and for the camera, the camera may be understood as that the object in the shooting range is in a moving state.
The method of image processing provided in the present application is described in detail below. Referring to fig. 52, a flowchart of an image processing method provided in the present application is as follows.
5201. Motion information of the target object is detected.
The motion condition of the target object in the preset range can be monitored through the motion sensor, and the motion information of the target object in the preset range can be obtained. The target object is an object moving within a preset range, the number of the target objects can be one or more, and the movement information can comprise information of a movement track when the target object moves within the preset range.
For example, the motion information may include information such as a size of an area where the target object is located, a frame, or coordinates of corner points within a preset range when the target object moves within the preset range.
For ease of understanding, the area in which the target object is located at each time of detection when the target object moves within a preset range is hereinafter referred to as a movement area of the target object. For example, if the target object is a pedestrian and the pedestrian is performing a whole-body motion, the whole-body motion of the pedestrian may be included in the motion region, and if the pedestrian is moving only with the arm, the target object may be merely the arm of the pedestrian and the motion region may include the arm portion of the pedestrian.
Typically, the preset range is related to the focal length or the angle of view of the camera, etc., and is typically not smaller than the detection range of the motion sensor. For example, the larger the angle of view of the camera, the larger the area of the captured range, and the smaller the angle of view of the camera, the smaller the area of the captured range. For another example, the larger the focal length of the camera, the larger the shooting range, and the more clear the object that is far away, the smaller the focal length of the camera, and the smaller the shooting range.
In this embodiment, the range monitored by the motion sensor includes a shooting range of the camera, the preset range may be a shooting range of the camera, and the range monitored by the motion sensor includes the preset range, i.e., the range monitored by the motion sensor may be greater than or equal to the preset range.
For example, the motion information may be referred to in the foregoing description of fig. 35-37, and will not be described herein.
Furthermore, the motion information may also comprise data or data streams derived from the data acquisition or data codec mentioned in section 1, etc.
5202. Focusing information is determined from the motion information.
After the motion information of the target object within the preset range is acquired, the focusing information is determined according to the motion information. The motion information includes a motion track of the target object, that is, focusing information for focusing the target object in a preset range can be determined according to the motion track.
Optionally, there are various ways of determining the focusing information, which are described in detail below.
Mode one, focus information is obtained by predicting an area
For ease of understanding, in the following embodiments of the present application, a region in which at least one corresponding focal point is located when a target object is photographed is referred to as a focus region.
The focusing information may include position information of at least one point in the focusing area, such as a frame of the focusing area or coordinates of corner points within a preset range. The specific manner of determining the focus area may include: predicting the motion trail of the target object within a preset time length according to the motion information to obtain a prediction area, and determining a focusing area according to the prediction area, wherein the focusing area comprises at least one focusing point for focusing the target object, and the focusing information comprises the position information of the at least one focusing point. The preset duration may be a preset duration, such as 10 microseconds, 5 microseconds, etc.
It will be appreciated that in some scenarios, since motion has occurred, if the RGB camera is triggered to take a picture based only on the region in which the target object is currently located and the motion characteristics, the target object may have already entered the next position or state, at which time there is a lag in the captured image. Therefore, it is necessary to predict the target object area within the preset time length in the future, screen incomplete movement, especially screen the situation that the moving object just enters the field of view of the lens, or the moving object is far away and unfavorable for shooting, etc., and decide the best shooting time, and trigger the RGB camera to work.
In a specific embodiment, the motion trail of the target object in the future preset duration may be predicted according to the motion information obtained in the foregoing step 5201, and specifically, the motion trail of the target object in the future preset duration may be predicted according to the motion trail of the target object when moving in the preset range and at least one of the motion direction and the motion speed, so as to obtain the prediction area.
In a more specific embodiment, a change function of the central point of the area where the target object is located along with time is fitted according to the motion track and the motion direction and/or the motion speed when the monitored target object of the moving object moves within the preset range, then a prediction central point is calculated according to the change function, the prediction central point is the central point of the prediction area, and the prediction area is determined according to the prediction central point.
Illustratively, as shown in FIG. 53, a change function F (x c ,y c T), where (x c ,y c ) And the central point of the area where the target object is located is t is time, so that the position of the area where the moving object is located in the next time period can be calculated. Center point (x) c ,y c ) From the coordinate positions (x i ,y i ) The average value is obtained, i=1, 2, … n, n is the number of events in a short time window, and n is a positive integer. Specific calculation means, e.g.
The change function may be a linear function, an exponential function, or the like, and may be specifically adjusted according to the actual application scenario, which is not limited herein. And then predicting a future motion track of the target object according to the change function, selecting a point from the motion track as a prediction center point, and determining a prediction area according to the prediction center point, wherein the shape of the prediction area can be adjusted according to an actual application scene, for example, the prediction area can be a circumscribed rectangle, a circumscribed minimum circle, a polygon, an irregular shape and the like.
In one possible implementation, if the predicted area meets a preset condition, determining a focusing area according to the predicted area; if the predicted area does not meet the preset condition, predicting the motion trail of the target object in the preset time length according to the motion information again to obtain a new predicted area, and determining a focusing area according to the new predicted area. The preset condition may be that the target object included in the preset area is in a complete form, that is, the preset area includes a complete target object, or the area of the predicted area is greater than a preset value, or the distance between the target object and the camera is greater than a preset distance threshold, or the like.
In general, the prediction center point may be predicted by a motion sensor, such as a sensor like DAVIS or CeleX, or may be predicted by a processor of the electronic device, and then when the preset area meets a preset condition, focusing may be performed according to a focusing area by triggering an image capturing module of the electronic device.
In the first mode, the area where the target object is located in the future preset duration can be predicted by fitting the motion track of the target object in the preset range, so that the prediction of the focusing area is realized, and the picture shot later is clearer. Particularly in a scene that some target objects move at a high speed, the prediction of the focusing area can be realized by predicting the area where the target objects are located in a future preset time length, so that a clearer image of the target objects in a motion state can be captured in time later, and the user experience is improved.
Mode two, confirm the focusing information directly according to the area where the goal target is present at present
After the motion track of the target object moving within the preset range is obtained, the current area of the target object can be used as a focusing area, the focusing area comprises at least one focusing point for focusing the target object, and the focusing information comprises the position information of the at least one focusing point. For example, if the current area of the target object is monitored by DVS and the movement speed of the target object is less than the speed threshold, the movement speed of the target object is slow, and the focusing time is sufficient. Therefore, the area where the target object is currently located can be directly used as the focusing area, so that a clear image can be shot.
The manner of acquiring the current area of the target object may refer to the first manner, and will not be described herein.
In the second mode, the current area of the target object, that is, the current area of the target object, is used as a focusing area, so that the target object can be accurately focused. Particularly, in some scenes moving at low speed, the focusing time is sufficient, focusing can be performed only through the current area, and a clearer image can be obtained. And the prediction is not needed, so that the workload is reduced.
5203. Focusing the target object in a preset range according to the focusing information, and shooting an image in the preset range.
The focusing information may include position information of at least one point in the focusing area, focus the target object in the preset range according to the focusing area after determining the focusing area, and capture an image in the preset range.
Specifically, the focusing area may be the same as the prediction area, or may be larger than the prediction area, and may be specifically adjusted according to the actual application scenario. For example, after the prediction area is determined, the prediction area may be directly used as the focusing area, or a larger range than the prediction area may be selected as the focusing area, so that the integrity of the photographed target object may be ensured. In another scenario, for example, a scenario of low-speed movement, the focusing area may be the current area of the target object, and then focusing may be directly performed in the current area, that is, a clear image may be shot, so that the workload of the step of prediction is reduced.
In one possible implementation, the image capturing may be performed by a camera, so as to obtain an image within a preset range. As captured by the camera 193 shown in fig. 1B described above. The camera may comprise a color (RGB) sensor (which may also be referred to as an RGB camera), i.e. be photographed by the RGB camera. Accordingly, specific focusing modes may include: at least one point with the minimum norm distance from the central point of the focusing area in a plurality of focusing points of the RGB camera is used as the focusing point to focus the area where the target object is or the predicted area, further shooting the target object is completed, and an image shot by the RGB camera is obtained, and can be called as an RGB image hereinafter. Of course, in some scenes, the center point of the prediction area may be directly used as a focusing point, so that focusing and shooting are completed, and an RGB image is obtained.
For example, as shown in fig. 54, the RGB camera may have a plurality of preset focusing points, and after a prediction area of a target object is predicted and a focusing area is determined according to the prediction area, one or more points closest to a norm of a center point of the focusing area are selected as the focusing points and focused, thereby completing photographing of the target object. The calculation method of the distance may include, but is not limited to, an L1 norm distance or an L2 norm distance, for example, the calculation formula of the L1 norm distance may include: i x 1 -x 2 |+|y 1 -y 2 The calculation formula of the L, L2 norm distance may include:wherein, (x 1, y 1) is the midpoint of the prediction area, and (x 2, y 2) is the preset focusing point of the RGB camera.
In another possible scene, the RGB camera may not preset a focusing point, and after determining the focusing area, directly take the center point of the focusing area as the focusing point, or take all the pixel points in the focusing area as the focusing point, or select one or more pixel points in the focusing area as the focusing point, which may be specifically adjusted according to the actual application scene.
In one possible embodiment, exposure parameters may also be acquired before the image is taken, and the image taken in accordance with the exposure parameters.
The exposure parameters may include, but are not limited to, exposure Value (EV), exposure amount, exposure duration, aperture size, or sensitivity (international standardization organization, ISO), etc. The exposure time period can be understood as the time period for which the shutter is to be opened by projecting light onto the photosensitive surface of the photosensitive material of the camera. The shooting time length of the camera can be matched with the movement speed of the target object by adjusting the exposure time length, so that the camera can rapidly capture clearer images. The exposure value represents a combination between an aperture of exposure and an exposure period. The exposure represents the integral over time of the illuminance received by a certain bin on the surface of the object. IOS is a value determined according to the exposure amount.
In a specific embodiment, the manner of obtaining the exposure parameters may include: the exposure parameters are determined from the motion information. Taking an example that the exposure parameter includes an exposure time length, the exposure time length and the movement speed of the target object are in a negative correlation. For example, the faster the movement speed of the target object, the shorter the exposure time period, the slower the movement speed of the target object, and the longer the exposure time period, so that the camera can shoot a clearer image under the matched exposure time period.
In another specific embodiment, the manner of obtaining the exposure parameter may include: the exposure parameters are determined from the illumination intensity. Taking exposure parameters including exposure time as an example, the exposure time and the illumination intensity are in negative correlation. For example, the greater the illumination intensity, the shorter the exposure time period, the smaller the illumination intensity, and the longer the exposure time period.
For example, the RGB camera may adjust the exposure parameters according to the predicted motion characteristics, specifically, the trend of the motion speed. The exposure parameters are default to a plurality of gears, respectively adapting to the motions with different speeds, such as 1/30 seconds, 1/60 seconds, 1/100 seconds, 1/20 seconds, 1/500 seconds, etc. When the movement is faster, if the exposure time is longer, the exposure time is properly reduced and the gear is adjusted to be one step smaller. When the motion is slower, if the exposure time is shorter, the exposure time is properly increased and the gear is adjusted to a higher level, so that the exposure amount during shooting is matched with the illumination intensity, and the conditions of overexposure, insufficient illumination and the like are avoided.
In one possible implementation manner, after shooting by the camera, the method may further include: and fusing the images shot by the camera through the motion information of the target object, which is monitored by the motion sensor, when the images are shot, so as to obtain a target image in a preset range.
For example, as shown in fig. 55, the RGB camera completes exposure and photographing, and outputs an RGB camera image after processing an image signal inside thereof. The DVS records event data of a simultaneous period, accumulates events in the period to obtain the outline and the position of the moving object, registers with the RGB camera image, i.e. aligns pixel coordinates, and highlights edge details of the moving object, including but not limited to filtering, edge sharpening, and other modes. And the enhanced target image is used as final output and presented to a user or stored in a mobile phone memory. Thereafter, depending on the system settings or user settings, the DVS may continue with motion detection, triggering the RGB camera to take the next shot, i.e., a continuous shot of the moving object.
Therefore, in the embodiment of the application, focusing can be performed according to the detected motion track of the target object in the preset range during motion, so that a clearer image is shot. And the focusing area is identical to or intersects with the area where the target object is in when in motion or the predicted area where the target object is in motion, so that a clearer image is shot, and the user experience is improved. And further, the area where the target object is located in the future preset time length can be predicted according to the motion track of the target object in the preset range, so that focusing can be performed according to the predicted area, the focusing area can be determined in advance, and the shot moving object can be clearer. And, the shot image can be enhanced according to the motion information in the same time period as the shot image, so that the definition of the obtained target image is further improved.
The foregoing describes the flow of the image processing method provided in the present application, and for convenience of understanding, a specific application scenario is taken as an example to describe in more detail based on the foregoing description.
Scene one
For example, a flow of photographing a high-speed moving object may refer to fig. 56.
5601. DVS performs motion monitoring.
The shooting range of the RGB camera, that is, the foregoing preset range, may be monitored by DVS, to monitor one or more objects moving in the shooting range.
It should be noted that the one or more objects may be a person, an animal, a vehicle, an unmanned aerial vehicle, a robot, or the like, which are active in a shooting range, may have different objects in different application scenarios, and may specifically be adjusted according to an actual application scenario, which is not limited in this application.
Specifically, the DVS may generate an event in response to a change in illumination intensity within a photographing range. One or more events may be included within a short time window. Since static regions do not trigger events, events mostly occur in regions where motion is present. And acquiring events in a short time window for accumulation, and solving a connected domain of the events to obtain one or more regions with motion. For ease of understanding, this region where motion exists will be referred to as a motion region hereinafter. The shape of the motion region includes, but is not limited to, circumscribed rectangle, circumscribed minimum circle, polygon, irregular shape, etc. Typically, if the motion area is less than a predetermined threshold, the area is screened out. It will be appreciated that when the monitored movement area is less than a threshold, the movement area may be noisy, or the monitored movement object may be incomplete, etc., filtering out the area may reduce meaningless effort.
The specific manner of DVS monitoring the target object may be referred to the related description in step 3801, which is not described herein.
As shown in fig. 57, for example, the shooting range of the camera, that is, the aforementioned preset range, is correlated with the angle of view α of the camera. In general, the larger the field angle of the camera, the larger the shooting range, and the smaller the field angle, the smaller the shooting range. The DVS monitoring range includes a photographing range of the camera, so that monitoring of a moving object within the preset range is achieved. Events monitored by the DVS have sparsity; meanwhile, each pixel in the DVS responds to continuous light intensity change independently and asynchronously without the synchronous exposure influence of the RGB camera, and is not limited by exposure time and frame rate, so that the DVS generally has extremely high time resolution, for example, the time precision of the DAVIS can reach 1us, and the DVS is suitable for capturing objects moving at high speed.
It should be noted that the high speed and the low speed mentioned in the present application are relatively speaking, the division between the high speed and the low speed may be adjusted according to the practical application scenario, for example, a speed higher than 10KM/h may be referred to as a high speed, and a speed lower than 10KM/h may be referred to as a low speed.
5602. The prediction is performed to obtain a prediction area, and it is determined whether to trigger RGB camera shooting, if so, step 4203 is executed, and if not, step 5601 is executed.
The DVS can continuously predict the area of the target object within a period of time in the future according to the continuously monitored motion trail of the target object, and judge whether to trigger the RGB camera to shoot according to the predicted area.
The specific manner of determining the prediction area may be referred to the related description in step 3802, which is not repeated here.
After the predicted area of the target object is determined, judging whether the preset area meets the preset condition, if so, triggering the RGB camera to perform subsequent focusing and shooting, and if not, continuing to monitor the shooting range until the predicted area meeting the preset condition is obtained or shooting is finished.
For example, as shown in fig. 58, when the vehicle is traveling at a high speed on a road, the traveling locus of the vehicle may be predicted based on the moving direction and moving speed of the vehicle monitored by the DVS, so that the area to which the vehicle is about to travel, i.e., 5801 shown in fig. 58, may be predicted. When the predicted area meets the preset condition, the RGB camera can be triggered to focus, and if the predicted area does not meet the preset condition, the RGB camera is not triggered to focus, and the movement track of the vehicle is continuously monitored. The preset condition may be that the vehicle in the predicted area is incomplete or that the area of the predicted area is too small, etc. For example, if the vehicle does not completely enter the field of view of the lens, the RGB camera is not triggered to take a picture.
When the predicted area meets the preset condition, the DVS can transmit the predicted area as a focusing area to the RGB camera to trigger the RGB camera to shoot. In general, there may be parallax between the RGB camera and the DVS, and thus a registration operation is required. The coordinate system of the prediction area is aligned with the pixel coordinate system of the RGB camera, so that the prediction area has the same coordinate system as the field of view of the RGB camera after registration.
Specifically, the focusing area may be the current area of the target object or may be the predicted area. The focusing area can be described by geometric shape parameters, if the focusing area adopts an external rectangle, the DVS can transmit the vertex coordinates of the upper left corner, the width, the height and the like to the RGB camera; if the focusing area adopts a polygon, the DVS can sequentially transmit each vertex of the polygon to the RGB camera clockwise (or anticlockwise); if the focusing area adopts an external minimum circle, the DVS may transmit the center coordinates and the radius of the circle to the RGB camera, and may specifically be adjusted according to the actual application scenario, which is merely illustrative and not limiting.
In addition, the DVS may also transmit motion characteristics of the target object, such as a motion speed and a motion direction, to the RGB camera. The movement speed may be a change value or a change trend of the speed of the target object compared to the previous short time window. The trend may include, but is not limited to, a faster, slower, etc. speed trend state quantity, or even more levels of speed trend state quantity, such as fast, faster, very fast, slow, slower, very slow, etc. The direction of movement may also be the direction or change of direction compared to the previous short window. The direction change may include, but is not limited to, a left, right, up, down, unchanged, etc., direction trend state quantity, even more levels of direction trend state quantity, such as up left, down left, up right, down right, left, right, up, down, unchanged, etc.
5603. Focusing is performed based on the predicted area.
After determining the predicted area, the predicted area may be taken as a focusing area, and at least one in-focus point may be determined according to the focusing area and focused based on the at least one in-focus point. Specifically, focusing may be performed directly according to the point included in the focusing area, or focusing may be performed by selecting a focusing point closest to the center point of the focusing area, or the like.
Typically, an RGB camera has multiple focal points, and according to a focus area provided by the DVS, one or more focal points closest to a norm of the focus area are selected to focus, and the focus is locked, i.e., the one or more focal points are maintained in focus. For example, referring to fig. 54, in the predicted region where the RGB camera receives DVS transmission, one or more points closest to the norm of the center point of the focusing region may be selected as the focusing point and locked. The focusing mode includes but is not limited to phase focusing or contrast focusing.
5604. And adjusting exposure parameters and shooting.
After focusing, the exposure parameters can also be adjusted according to the motion characteristics monitored by the DVS. For example, the faster the movement speed of the target object, the smaller the exposure parameter, and the slower the movement speed of the target object, the larger the exposure parameter, so that the camera can capture a clearer image. Specifically, the camera can convert the collected optical signals into electrical signals, so that a shot image in a preset range is obtained.
Illustratively, as shown in fig. 59, the focusing area 4401 is determined by predicting the running track of the vehicle, then focusing is completed, and after the proper exposure time is adjusted, the vehicle runs to the predicted area in the period of focusing and adjusting the exposure time, and the shooting of the moving vehicle is completed, so that a clear image of the vehicle is obtained.
For example, in some scenes, a mapping relationship between the movement speed of the target object and the exposure time period may be established, and after the movement speed of the target object is determined, the exposure time period may be adjusted according to the mapping relationship, so that the exposure time period is matched with the movement speed of the target object, and a clearer image is captured. Specifically, the map may be a preset map, such as an exposure time of 1/60 second when the movement speed is in the first range, an exposure time of 1/360 second when the movement speed is in the second range, and the like. The mapping relationship may also be a linear relationship, an exponential relationship, an inverse proportion relationship, etc., and may specifically be adjusted according to an actual application scenario, which is not limited herein.
For another example, in some scenarios, a mapping relationship between the magnitude of the change in the movement speed of the target object and the manner of adjustment of the exposure time period may be established. For example, if the movement speed of the target object increases, the exposure time is reduced, and if the movement speed of the target object decreases, the exposure time is increased, so that the camera can capture a clearer image. More specifically, the amount of adjustment of the exposure time period may be related to the magnitude of change in the movement speed, e.g., the larger the amount of change in the movement speed, the larger the amount of adjustment of the exposure time period, and the smaller the amount of change in the movement speed, the smaller the amount of adjustment of the exposure time period.
Also for example, in some scenarios, the exposure time period may be adjusted in combination with the speed and direction of movement of the target object. If the movement speed may be the speed of the target object in the actual environment, the speed of the direction perpendicular to the shooting direction of the camera may be determined according to the speed and the movement direction, and then the exposure time period may be adjusted according to the speed of the direction perpendicular to the shooting direction of the camera. If the speed in the direction perpendicular to the shooting direction of the camera is higher, the longer the exposure time period is, and if the speed in the direction perpendicular to the shooting direction of the camera is lower, the shorter the exposure time period is.
In addition, for how to adjust the exposure parameters, reference may also be made to the following description in step 7304, which is not repeated here.
5605. Enhancing motion details.
After the shot image is obtained by shooting through the camera, the motion details of the image shot by the camera can be enhanced according to the information of the moving object in the preset range monitored by the DVS, such as the outline of the target object or the position in the image, and the like, so as to obtain a clearer target image.
It can be understood that, while shooting by the camera (hereinafter, the period of shooting by the camera is referred to as a shooting period), the DVS may continuously monitor a moving object within a preset range, obtain information of the moving object within the preset range in the shooting period, such as a contour of a target object, a position in an image, and the like, and perform noise filtering or edge sharpening and other processing on the shot image according to the information, thereby enhancing texture details or contours of the image shot by the camera, further obtaining a clearer image, and improving user experience.
Therefore, in the embodiment of the application, the motion trail of the target object can be fitted through the acquired motion information of the target object. And then obtaining a predicted area of the target object according to the motion trail of the target object obtained by fitting, wherein the predicted area is an area to be moved to by the target object within a period of time in the future, focusing and locking focus according to the predicted area, and adjusting exposure parameters according to the motion characteristics of the target object, thereby completing shooting of the moving target object. It can be understood that after a series of steps such as focusing, focus locking, and exposure parameter adjustment, the target object moves into the predicted area, i.e., the focusing area, and at this time, the target object is photographed, so that a clearer image can be photographed. Therefore, even if the target object is in a state of high-speed movement, focusing on the target object can be accurately completed, so that a clearer image is photographed.
The foregoing details of the specific flow of the image processing method provided by the present application are described in detail, and for convenience of understanding, a specific scenario is taken as an example below to describe some application scenarios of the image processing method provided by the present application, and different application scenarios are described below respectively.
Illustratively, for ease of understanding, the flow of scenario one is described in more detail below. Referring to fig. 60, another flow chart of the image processing method provided in the present application is shown.
First, motion detection is performed by the DVS, that is, a moving object within the photographing range of the RGB camera is detected, and event data is generated from information of the detected target object, taking the moving target object as an example. The DVS may generate event data in the detection range according to the change of the light intensity in the detection range, and when the difference between the current light intensity and the light intensity generated by the last event exceeds a threshold value, the DVS will generate an event to obtain data of the event. In general, event data of an event may include one or more information such as a position of a pixel point where a light intensity change occurs in an event, a pixel value of the pixel point, or a light intensity change value.
The DVS may fit a motion trajectory of the target object according to the event data obtained by monitoring, and predict an area to which the target object is about to move according to the motion trajectory of the target object, to obtain a predicted area.
Optionally, during the motion detection and prediction region obtaining process of the DVS, the RGB camera may be in a closed state, so as to reduce power consumption of the RGB camera. For example, when shooting an object moving at a high speed, such as an airplane, a vehicle, a user moving at a high speed, etc., the motion condition of the object can be monitored through the DVS, and when the obtained prediction area meets a preset condition, the DVS triggers the RGB camera to shoot, so that the power consumption generated by the RGB camera is reduced.
After the DVS obtains the prediction area, the prediction area is transmitted to the RGB camera, the RGB camera is triggered to start, and the RGB camera is instructed to focus according to the prediction area. Alternatively, the DVS may determine a focus area according to the predicted area, the range of the focus area being greater than the range of the predicted area, and then instruct the RGB camera to focus according to the focus area. The following is an exemplary explanation taking an example of instructing an RGB camera to focus according to a predicted area.
In general, before the DVS transmits the predicted area to the RGB camera, the predicted area may be registered, that is, the coordinate system where the predicted area is located is kept consistent with the coordinate system of the RGB camera, so that the RGB camera may accurately obtain the position of the predicted area in the shooting range, thereby accurately determining the focusing point.
The RGB camera can be started under the triggering of DVS, and focusing is carried out according to the prediction area. For example, an RGB camera may select one or more in-focus points closest to the norm of the center point of the predicted area to focus and lock in focus, i.e., remain in focus.
In addition, the DVS transmits the motion characteristics of the target object to the RGB camera, and the motion characteristics may include information such as a motion speed or a motion direction of the target object.
The RGB camera adjusts exposure parameters including exposure time length or exposure value and the like according to the received motion characteristics. For example, a mapping relationship between the movement speed of the target object and the corresponding exposure time period may be set, and when the movement speed of the target object is received, the exposure time period associated with the movement speed may be determined according to the mapping relationship, so as to adjust the exposure time period. Specifically, for example, as shown in table 2,
speed of movement Duration of exposure(s)
[0,5) 1/60
[5,10) 1/200
[10,15) 1/500
[15,20) 1/800
TABLE 2
The motion speed may be calculated by coordinates of the target object within the shooting range, for example, a coordinate system may be established according to the shooting range, and the coordinate system may be a two-dimensional coordinate system or a three-dimensional coordinate system, and may be specifically adjusted according to an actual application scene. And then calculating the movement speed of the target object according to the change value of the target object in the coordinate system.
After the exposure adjustment is performed, image signals within a photographing range are collected by a photosensitive element of the RGB camera, and the collected image signals are processed, for example, the collected analog signals are converted into electrical signals, thereby obtaining a photographed image.
While the RGB camera shoots, the DVS can continuously monitor the movement condition of the target object in the shooting range, so that event data in the shooting period can be obtained.
After the RGB camera shoots an image in a shooting range, the image and event data in the same time period can be fused, so that motion details of the shot image are enhanced, and a clearer target image is obtained.
Illustratively, as shown in fig. 61, the DVS event in the shooting period may include a contour of a moving vehicle, and the image shot by the RGB camera, that is, the RGB image shown in fig. 61, may be fused according to the DVS event, so as to enhance the motion details of the RGB image, such as filtering noise, edge sharpening, and so on, so as to obtain the target image after enhancing the motion details. The enhanced image may be displayed in a display interface or stored in a storage medium of an electronic device as a final output.
For example, a more specific manner of capturing a target image by an RGB camera and DVS may refer to fig. 62. The DVS monitors an object moving in a shooting range, acquires a long time window, fits a motion track of the target object in a time window segmentation mode, predicts an area where the target object is located in a period of time in the future according to the motion track obtained by fitting, and obtains a prediction area. When the prediction area meets the preset condition, the RGB camera is triggered to start, and focusing is carried out according to the prediction area.
Secondly, the DVS calculates running characteristics such as the movement speed or the movement direction of the target object according to the monitored movement track of the target object, and transmits the running characteristics to the RGB camera. The RGB camera adjusts the exposure parameters according to the motion characteristics to use the exposure parameters matched with the motion characteristics, such as exposure time length, exposure value and the like.
After the exposure parameters are adjusted, shooting is carried out, signals acquired by the photosensitive elements are converted into electric signals, and an RGB image obtained through shooting is obtained.
When the RGB camera focuses, adjusts exposure parameters and outputs RGB images, the DVS continuously monitors the moving object in the shooting range to obtain event data in the shooting period, wherein the event data comprise the outline of the moving object, the position of the moving object in a preset area and the like.
The RGB image may then be enhanced by a processor of the electronic device based on the event data collected by the DVS, such as to filter out noise, edge sharpening, etc., to obtain a clearer target image.
Therefore, in the scene, for a high-speed moving object, focusing can be performed in advance by predicting the area where the target object is located within a period of time in the future, so that a clear moving image can be shot. And the exposure parameters can be adjusted to expose the target object in a manner of matching with the movement speed, so that a clearer image is further shot by the camera. In addition, the motion details of the shot image can be enhanced through the event detected by the DVS in the same time period, so that a clearer target image can be obtained.
Scene two
For example, a process of photographing a non-high-speed motion may refer to fig. 63. The scenes of the non-high-speed motion are such as security, entrance guard and the like.
6301. DVS performs motion monitoring.
In this scenario, the target object may be an object that moves at a low speed.
Specifically, step 6301 may refer to the related description in step 5201, which is not described herein.
For example, in the second scenario, as shown in fig. 64, an RGB camera and a DVS may be set in the door access, and devices such as an ISP or a display may be further set, which are only illustrative and not described in detail herein.
6302. Judging whether the RGB camera shooting is triggered or not according to the current area of the target object, if so, executing step 6303, and if not, executing step 6301.
In the scene, because the target object moves at a low speed, whether the RGB camera is triggered to shoot can be judged according to the current area of the target object. Specifically, it may be determined whether the current area of the target object meets the preset condition, if so, step 6303 is executed, and if not, step 6301 is executed.
For example, it may be determined whether the target object in the area where the current exists is complete, whether the area of the area where the current exists is greater than a preset value, and the like. When the target object in the current area is complete, or the area of the current area is larger than a preset value, the DVS may send the current area as a focusing area to the RGB camera to trigger the RGB camera to start, and shoot according to the current area.
For example, as shown in fig. 65, when there is a target object entering the monitoring range of the entrance guard and abnormal movement occurs, such as approaching the entrance guard or touching the entrance guard, the area where the object may exist covers the photographing ranges of the DVS and the RGB cameras, resulting in the DVS detecting a change in the illumination intensity. For example, the entrance guard is a community public entrance guard, and when personnel enter the front of the entrance guard, the light of a corridor can be blocked, so that the light intensity in the whole visual field is reduced. When the DVS monitors the moving object according to the change of the illumination intensity, as shown in 1801 in fig. 65, the current area of the target object may be monitored, and then it is determined whether the area of the current area of the target object is greater than a preset value, or whether the target object in the current area of the target object is complete, etc., to determine whether to trigger the RGB camera to shoot. When the trigger RGB camera shooting is determined, the DVS can transmit the current area of the target object to the RGB camera as a focusing area, the RGB camera can focus based on the current area of the target object, and the exposure parameters are adjusted according to the motion characteristics of the target object, so that shooting of the target object is completed, and an RGB image of the target object is obtained. Meanwhile, the DVS may continuously monitor the region where the target object is located during the photographing period.
6303. Focusing is performed based on the area where the target object is currently located.
The focusing based on the current motion area is similar to the focusing based on the preset area, and will not be described herein. Step 6303 is similar to step 5203 described above and will not be described again here.
6304. And adjusting exposure parameters and shooting.
In this scenario, the exposure parameters may be adjusted according to the light intensity. Specifically, the exposure parameter may include an exposure time period that has a negative correlation with the intensity of light in the photographing range.
The illumination intensity value used for adjusting the exposure parameter may be an illumination intensity value collected by DVS, or an illumination intensity value collected by an RGB camera or other devices, specifically may be adjusted according to an actual application scene, and is not limited herein.
For example, the change in the average light intensity can be estimated from the occurrence rate of the DVS overall event, and the average light intensity L and the DVS event rate R are in positive correlation, i.e., l≡r. The exposure parameters may be adjusted according to this relationship, increasing the exposure time period, e.g. from 1/100 second to 1/30 second, when the estimated average light intensity decreases, and decreasing the exposure time period, e.g. from 1/30 second to 1/100 second, when the estimated average light intensity increases.
For another example, the value of the average light intensity may be calculated, and then the exposure parameter may be determined based on the value of the average light intensity. If the value of the average light intensity is larger, the shorter the exposure time period is, the smaller the value of the average light intensity is, and the longer the exposure time period is. Therefore, the exposure time of the camera is matched with the value of the average light intensity, and further the image in the shooting range can be fully shot, a clearer image is obtained, and the user experience is improved.
6305. Enhancing motion details.
Step 6305 is similar to step 5205, and will not be described here.
Therefore, in the application scene, focusing can be performed according to the current region where the target object is monitored by the DVS, and the region where the moving object is can be accurately identified, so that accurate focusing is performed. In addition, the exposure parameters can be adjusted according to the light intensity, so that the RGB camera can be accurately adapted to the light intensity, and a clearer image can be shot. In addition, the application scene can also enhance the motion details of the shot image through the event detected by the DVS in the same period, so as to obtain a clearer target image.
In addition, in this scenario, particularly in some monitoring scenarios, if the RGB camera is continuously used for monitoring, larger power consumption will be generated, for example, the power consumption of the RGB camera for continuous shooting is more typically hundreds of milliwatts to tens of watts, and the amount of generated data is large. According to the image processing method, the RGB camera can start shooting when the DVS detects that a moving object exists, the DVS power consumption is usually tens of milliwatts, for example, the power consumption of a DAVIS346 type sensor is 10-30 milliwatts, and therefore the power consumption can be reduced. And the DVS only acquires the outline of the moving object, so that all data of the user, such as privacy data, can be prevented from being monitored, and the user experience can be improved. And the abnormal movement can be shot, the subsequent alarm operation can be performed according to the shot image, and the adjustment can be specifically performed according to the actual application scene, so that the safety is improved. It can be understood that the image processing method provided by the application monitors the external motion in real time with lower power consumption through the DVS, and triggers the RGB camera to work only when the abnormal motion is judged, so that the method has the advantage of power consumption; meanwhile, the events output by the DVS do not contain specific texture details, only the outline and the position of the moving object are provided, and the advantages of privacy and safety are achieved.
The present application also provides a graphical user interface (graphical user interface, GUI) that may be applied in an electronic device, such as a terminal, a monitoring device, an autonomous vehicle, etc., that may include a display screen, a memory, one or more processors for executing one or more computer programs stored in the memory, such as the steps of the image processing method mentioned in the foregoing fig. 52-65, and for displaying, via the display screen, pictures taken by the cameras in the foregoing fig. 52-65.
The GUI provided in the present application is described in detail below.
The graphical user interface comprises: responding to a triggering operation of shooting a target object, shooting an image of a preset range according to focusing information, displaying the image of the preset range, wherein the preset range is a camera shooting range, the focusing information comprises parameters for focusing the target object in the preset range, the focusing information is determined according to movement information of the target object, and the movement information comprises information of a movement track of the target object when moving in the preset range.
As an example, as shown in fig. 66, the GUI may specifically include, in response to detecting movement information of the target object, information of a movement locus of the target object within a preset range, which is a camera shooting range; then, determining focusing information according to the motion information, wherein the focusing information comprises parameters for focusing a target object in a preset range; then, the target object is focused in a preset range according to the focusing information, and after an image of the vehicle is shot by the camera, the shot image is displayed in a display screen, and the image can comprise the vehicle in high-speed movement.
Therefore, in the embodiment of the application, the movement track of the moving target object can be detected in the shooting range of the camera, and then the focusing information is determined according to the movement track of the target object and focusing is completed, so that a clearer image can be shot. Even if the target object is in motion, the target object can be accurately focused, a clear motion state image is shot, and user experience is improved.
In one possible implementation, the focusing information includes information of a focusing area, and the graphical user interface may further include: and responding to the motion information, predicting the motion trail of the target object within a preset time length to obtain a prediction area, determining the focusing area according to the prediction area, and displaying the focusing area in the display screen.
For example, as shown in fig. 67, when the vehicle is in a high-speed motion state, the motion trajectory of the vehicle in a preset time period in the future may be predicted according to the detected motion trajectory of the vehicle in the shooting range, a predicted region in which the vehicle will arrive in the future is obtained, the region is taken as a focusing region 6701, and focusing is performed based on the focusing region 6701, as shown in fig. 68, so that a clearer image of the target object is shot.
Therefore, in the embodiment of the application, the motion track of the target object within the future preset time period can be predicted, the focusing area is determined according to the predicted area, and the focusing on the target object can be accurately completed. Even if the target object moves at a high speed, the embodiment of the application can focus the target object in advance in a prediction mode, so that the target object is in a focusing area, and a clearer target object moving at a high speed is shot.
In one possible implementation, the graphical user interface may specifically include: if the predicted area meets the preset condition, responding to the determination of the focusing area according to the predicted area, and displaying the focusing area in the display screen; if the predicted area does not meet the preset condition, the predicted area is obtained in response to predicting the motion trail of the target object within the preset time period again according to the motion information, the focusing area is determined according to the new predicted area, and the focusing area is displayed in the display screen.
The preset condition may be that the prediction area includes a complete target object, or that the area of the prediction area is larger than a preset value, or the like.
For example, as shown in fig. 69A, when the target object photographed by the camera is incomplete, the area of the predicted area for the target object may be small, that is, the focusing area 6901 is small, smaller than the area of the vehicle, resulting in a situation in which the subsequently photographed vehicle may be partially unclear. When the vehicle body is completely photographed in the photographing range as shown in fig. 69B, a predicted area, i.e., a focusing area 6902 having a desired area can be obtained, so that a complete and clear image of the vehicle is photographed based on the focusing area 5502, as shown in fig. 69C.
Therefore, in the embodiment of the application, the focusing area is determined according to the prediction area only when the prediction area meets the preset condition, and the camera is triggered to shoot, and when the prediction area does not meet the preset condition, the camera is not triggered to shoot, so that incomplete target objects in a shot image can be avoided, or meaningless shooting can be avoided. And when shooting is not performed, the camera can be in an unactuated state, and the camera is triggered to perform shooting only when the predicted area meets the preset condition, so that the power consumption generated by the camera can be reduced.
In a possible embodiment, the motion information further includes at least one of a motion direction and a motion speed of the target object; the graphical user interface may specifically include: and responding to the motion trail when the target object moves within a preset range, and predicting the motion trail of the target object within a preset time length according to the motion direction and/or the motion speed to obtain the prediction area, and displaying the prediction area in the display screen.
Therefore, in the embodiment of the application, the motion trail of the target object in the future preset duration can be predicted according to the motion trail of the target object in the preset range, the motion direction and/or the motion speed and the like, so that the future motion trail of the target object can be accurately predicted, the target object can be more accurately focused, and a clearer image can be shot.
In one possible implementation, the graphical user interface may specifically include: and responding to a motion track of the target object in a preset range, the motion direction and/or the motion speed, fitting a change function of the central point of the area where the target object is located along with the change of time, calculating a prediction central point according to the change function, wherein the prediction central point is the central point of the area where the target object is located, and displaying the prediction area in a display screen.
In a possible implementation manner, the image of the prediction horizon is taken by an RGB camera, and the graphical user interface may specifically include: and in response to focusing at least one point with the smallest norm distance from the central point of the focusing area among the plurality of focusing points of the RGB camera, displaying an image shot after focusing based on the at least one point as the focusing point on a display screen.
In one possible implementation manner, the focusing information includes information of a focusing area, the motion information includes an area where the target object is currently located, and the graphical user interface specifically may include: and responding to the current area of the target object as the focusing area, and displaying the focusing area in the display screen.
For example, as shown in fig. 70, the target object may be a pedestrian moving at a low speed, where the moving speed of the target object is low, the current area of the target object may be directly used as the focusing area 7001, and then focusing is performed based on the focusing area 5601, so that a clear image may be obtained.
Therefore, in the embodiment of the present application, the information of the motion trail of the target object in the preset range may include the current area and the historical area of the target object, and in some low-speed scenes, the current area of the target object may be used as a focusing area, so as to complete focusing on the target object, and further, a clearer image may be shot.
In one possible implementation, the graphical user interface may specifically include: acquiring exposure parameters before shooting the image in the preset range, and displaying the exposure parameters in a display screen; and responding to the image of the preset range shot according to the exposure parameters, and displaying the image of the preset range shot according to the exposure parameters in a display screen. Therefore, in the embodiment of the application, the exposure parameters can be adjusted, so that shooting is completed through the exposure parameters, and a clear image is obtained.
Specifically, the exposure parameters may include parameters such as EV, exposure duration, exposure amount, aperture size, or ISO, and when an image is shot, the exposure parameters may be displayed in the shooting interface, so that a user may obtain a current shooting situation according to the displayed exposure parameters, and user experience is improved.
For example, as shown in fig. 71, the exposure parameters may include EV, and when capturing an image, if ev=6, "EV" may be displayed in the display interface: 6", so that the user can improve the user experience through the display interface or specific value of EV.
In one possible implementation manner, the exposure parameter is determined according to the motion information, and the exposure parameter includes an exposure time period, where the exposure time period has a negative correlation with the motion speed of the target object.
Therefore, in the embodiment of the application, the exposure time length can be determined by the movement speed of the target object, so that the exposure time length is matched with the movement speed of the target object, for example, the faster the movement speed is, the shorter the exposure time length is, the slower the movement speed is, and the longer the exposure time length is. Overexposure or underexposure and the like can be avoided, so that a clearer image can be shot later, and user experience is improved.
In one possible implementation manner, the exposure parameter is determined according to illumination intensity, the illumination intensity can be illumination intensity detected by a camera or illumination intensity detected by a motion sensor, the exposure parameter comprises exposure duration, and the magnitude of the illumination intensity in the preset range is in negative correlation with the exposure duration.
Therefore, in the embodiment of the application, the exposure time can be determined according to the detected illumination intensity, when the illumination intensity is larger, the exposure time is shorter, and when the illumination intensity is smaller, the exposure time is longer, so that a proper amount of exposure can be ensured, and a clearer image can be shot.
In one possible implementation, the graphical user interface may further comprise: and in response to the monitored information of the movement of the target object corresponding to the image, fusing the images in the preset range to obtain a target image in the preset range, and displaying the image in the display screen.
Therefore, in the embodiment of the application, while capturing an image, the motion condition of the target object in the preset range can be monitored, information of the corresponding motion of the target object in the image, such as the contour of the target object, the position of the target object in the preset range and the like, is obtained, and the captured image is enhanced by the information, so that a clearer image is obtained.
Illustratively, the DVS may collect the contour of the moving target object, so that the image collected by the RGB camera may be enhanced according to the contour of the target object collected by the DVS, where the image collected by the RGB camera may be as shown in fig. 72A, for example, to eliminate noise of the contour of the target object, enhance the contour of the target object, and so on, so as to obtain a clearer image of the target object, as shown in fig. 72B.
In one possible implementation, the motion information is obtained by monitoring the motion condition of the target object within the preset range through a dynamic vision sensor DVS.
Therefore, in the embodiment of the application, the object moving in the shooting range of the camera can be monitored by the DVS, so that accurate movement information can be obtained, and even if the target object is in a state of moving at a high speed, the movement information of the target object can be timely captured by the DVS.
Mode for acquiring image in shooting scene in HDR mode
First, in a shooting scenario in an HDR mode, referring to fig. 73, a flowchart of an image processing method provided in the present application is shown. Note that, the same or similar terms or steps as those in the first embodiment are not described in detail below.
7301. A first event image and a plurality of photographed RGB images are acquired.
The first event image may be an image acquired by a motion sensor, including information of an object moving within a preset range, where the preset range may be understood as a detection range of the motion sensor, and the preset range includes a shooting range of a camera. For ease of understanding, reference to an event image in this application may be understood as a dynamic image generated based on information acquired by a motion sensor over a period of time, representing a change in motion of an object moving relative to the motion sensor over a detection range of the motion sensor over a period of time.
Note that, the first event image and the second event image mentioned in the second embodiment refer to event images used in the shooting scene in fig. 73 to 80, and the first event image and the second event image mentioned in fig. 95 to 108 or fig. 118 to 120 corresponding to SLAM below may be the same event image or different event images, and may be specifically adjusted according to the actual application scene.
The plurality of RGB images (or referred to as first images) may be images captured using different exposure durations. For example, the images shot with short exposure time length and the images shot with long exposure time length can be used in the multiple images, in general, the longer the exposure time length is, the more texture information can be acquired in the weak light scene, the shorter the exposure time length is, the more texture information can be acquired in the strong light scene, and therefore, the images with rich textures can be acquired through different exposure time lengths.
For example, referring to fig. 36, the first event image may be acquired by the DVS when the camera captures a plurality of RGB images, where the first event image may include information such as a size of an area where the target object is located, a frame or coordinates of a corner point within a preset range when the object moves within a capturing range of the camera within a period of time.
Specifically, for example, an image made up of data acquired by DVS may be expressed asWherein the method comprises the steps of(x, y) represents coordinates of a certain position in the image, t represents time, t 1 To begin capturing the exposure image, 50 milliseconds is subtracted, t 2 To start capturing an exposure image, events represent data acquired by a motion sensor, such as DVS.
More specifically, the first event image is similar to the motion information mentioned in the foregoing step 5201, and will not be described here again.
Alternatively, the event images mentioned in this embodiment may be optimized by the method flows corresponding to fig. 38 to fig. 44, so as to obtain clearer event images.
In the present embodiment, for ease of understanding, the image captured by the RGB camera is referred to as an RGB image, and the information acquired by the motion sensor is referred to as an event image.
7302. And calculating the corresponding dithering degree of each RGB image according to the first event image.
After the first event image and the plurality of RGB images are obtained, the first event image is used to calculate the degree of shake corresponding to each RGB image. The shake degree may be understood as a shake degree of a camera when an RGB image is captured, or a blur degree when an object in a capturing range is in a motion state to blur an image when an RGB image is captured, or both of them may exist at the same time, or the like.
Specifically, the first event image includes information such as a position, a contour, and the like of an object in a moving state within a photographing range of the camera for a period of time that covers photographing periods of a plurality of RGB images. Therefore, the degree of shake of each RGB image can be calculated by the information of the object in motion included in the photographing period of each RGB image in the first event image.
By way of example, the manner of quantifying the degree of shake when capturing RGB images may include: wherein Blur e For measuring the degree of blurring of the e-th RGB exposure image, i.e. the degree of shaking when each image is taken, t e0 To start shooting the exposure image, t e For the exposure time of the current RGB image, H.W represents the length and width of the current RGB image, (x, y) represents the position of a certain part in the RGB image, r e (x, y) represents the local blur degree in the e-th RGB exposure image, events represents the DVS-acquired data, α is a normalization factor for normalizing r (x, y) to [0,1 ]]Within the range.
It should be noted that, the manner of quantifying the jitter degree in this application is merely an example, and specifically, the jitter degree may be quantified by using other manners, such as a Brenner gradient function, a variance function, or an entropy function, which may be specifically adjusted according to an actual application scenario, which is not limited in this application.
7303. Whether the RGB image needs to be captured is determined, if yes, step 7304 is executed, and if not, step 7306 is executed.
After calculating the shake degree of capturing each RGB image, it is determined whether or not the RGB image needs to be captured again based on the shake degree corresponding to each RGB image, if the RGB image needs to be captured again, step 7304 may be continuously performed, and if the RGB image does not need to be captured again, step 7306 may be continuously performed.
Specifically, whether the dithering degree of each RGB image exceeds a first preset value can be determined, if the dithering degree of a certain image exceeds the first preset value, one RGB image can be subjected to the supplementary shooting, and if the dithering degree of all RGB images does not exceed the first preset value, the supplementary shooting of the RGB image is not required. If the dithering degree of the RGB images exceeds the first preset value, the RGB images can be subjected to supplementary shooting.
In one possible implementation manner, in addition to determining whether the shake degree of each RGB image exceeds the first preset value, it may also be determined whether the number of times of re-shooting the RGB image exceeds a certain number of times, if the number of times of re-shooting exceeds a certain number of times, the re-shooting may not be performed any more, and if the number of times of re-shooting does not exceed a certain number of times, the re-shooting of the RGB image may be continued. For example, the number of re-shooting times may be set in advance to not more than 5 times, and when the number of re-shooting times reaches 5 times, re-shooting of the RGB image is not performed even if the degree of shake of the RGB image is high.
For example, the manner of quantifying the degree of shake when capturing an RGB image may include: when Blur e ≥threshold 2 When in use, the RGB image needs to be taken in a complementary manner, and threshold 2 I.e. a first preset value.
In another possible implementation manner, when the shake degree of one or more RGB images (but not all RGB images) is higher than the first preset value, the remaining RGB images with shake degrees not higher than the first preset value may be used for fusion, so as to obtain a final image, thereby improving the efficiency of obtaining the image.
For ease of understanding, an image with a lower degree of shake may be illustrated in fig. 74, and an image with a higher degree of shake may be illustrated in fig. 75, for example. Obviously, the information included in the image with higher jitter degree is inaccurate, the blurring is displayed in the visual interface, if the image with higher jitter degree is used for fusion, the information included in the finally obtained target image is inaccurate, and even the blurring may occur, so that one or more RGB images need to be taken in a supplementary mode in this case. For another example, in a scene where the ratio of the maximum illumination intensity to the darkest illumination intensity is large, as shown in fig. 76, the captured RGB image may be overexposed, resulting in unclear image of the maximum illumination intensity portion, and the sharpness of the image captured by the DVS is high, so that it can be determined whether the RGB image needs to be captured by the DVS, thereby obtaining a clearer RGB image.
In one possible implementation, the method for determining whether the RGB image needs to be re-captured may further include: the first event image is segmented into a plurality of areas, and an RGB image (or referred to as a third image) having the smallest exposure value among the plurality of RGB images is segmented into a plurality of areas, respectively, the shape and position of the plurality of areas in the first event image correspond to the shape and position of the plurality of areas in the RGB image. For example, if the first event image is divided into 16 rectangular regions, the RGB image having the smallest exposure value may be divided into 16 rectangular regions having the same shape, size, and position as those of the region in the first event image. The exposure value may include one or more of an exposure time period, an exposure amount, or an exposure level. Then, whether each region in the first event image includes texture information (or called first texture information) is calculated, whether each region in the RGB image with the minimum exposure value includes texture information is calculated, then each region in the first event image is compared with each region in the RGB image with the minimum exposure value, if one region in the first dynamic region includes texture information and the region which is the same as the region in the RGB image with the minimum exposure value does not include texture information, the region in the RGB image is indicated to have higher blurring degree, and the RGB image can be taken in a supplementary mode. If each region in the first event image does not include texture information, then the RGB image need not be taken in a supplemental manner.
For example, in a scene where the ratio between the maximum illumination intensity and the minimum illumination intensity is large, that is, a scene where the difference between the brightness and darkness is large, the first event image and the RGB image with the minimum exposure value are both divided into areas of the same shape and size, and then it is calculated whether each area in the first event image and the RGB image with the minimum exposure value includes texture information. As shown in fig. 77, the first event image and the RGB image with the smallest exposure value may be divided into a plurality of macro blocks, if the variance is greater than the preset threshold value threshold on the first event image 0 Is denoted as macroblock set { MB } i }, i.e. macroblock MB i Texture information is included. Correspondingly, in the RGB image I with the minimum exposure value e Find the corresponding macroblock area by computing I e The pixel variance of these macroblock areas is used to determine whether the texture contained on the first event image is captured by the RGB image with the smallest exposure value. If there is a macroblock MB i It is shown in I e The pixel variance is smaller than a preset threshold value threshold 1 It is indicated that the texture of this region is not completely captured by the RGB image and that an additional capture of the RGB image is required.
In one possible implementation manner, the texture feature included in the first event image and the feature in the third image may be further extracted, and the feature included in the first event image and the feature in the third image may be matched, for example, the closer the euclidean distance between the features is compared, the higher the matching degree, the farther the euclidean distance is, and the lower the matching degree is. If the matching degree is low, for example, less than 0.8, the texture information of the object with motion may not be captured completely in the RGB image, and the RGB image needs to be taken. The feature extracted from the first event image or the RGB image may be a scale invariant feature transform (scale invariant feature transform, SIFT), or a feature extracted by a deep neural network, or a gray histogram may be generated, which may specifically be adjusted according to an actual application scenario, which is not limited in this application.
Alternatively, in one possible embodiment, the size of the area where the event is detected in the first event image may be calculated, if the size of the area where the event is detected exceeds a preset size, it is determined that the RGB image needs to be captured, if the size of the area where the event is detected does not exceed the preset size, it may be unnecessary to capture the RGB image, or it may be determined by other embodiments whether the RGB image needs to be captured.
In addition, in some scenes, if the location of the event area is located in the central area of the first event image or RGB image, such as a center point of the RGB image is covered, it is necessary to additionally capture the RGB image. If the area is in the peripheral area of the RGB image, for example, near the boundary line of the RGB image, and the area of the event area is smaller than a certain value, the RGB image may not need to be additionally captured. Alternatively, it is also possible to determine whether or not the supplementary photographing of the RGB image is required based on the distance between the event area and the center point of the event image. For example, if the distance between the event area and the center point of the event image is smaller than a preset distance, if the distance is smaller than 200 pixels, the RGB image needs to be shot in a supplementary manner, and if the distance is not smaller than 200 pixels, the RGB image does not need to be shot in a supplementary manner, and the adjustment can be specifically performed according to the actual application scene.
Specifically, for example, an event image corresponding to an RGB image may be expressed as:dividing the event image into a plurality of macro blocks which are the same in size and do not overlap, calculating the size of a motion area by using the number of the motion macro blocks, if and only if the number of non-0 pixels contained in the macro blocks on the event image is greater than a preset threshold value threshold 3 This macroblock is then determined to be a moving macroblock. For example, the event image contains 16×16 macro blocks, threshold 3 May be set to 128. When the number of the motion macro blocks included in the motion area exceeds 128, that is, when the number of the motion macro blocks does not exceed 128, the RGB image needs to be subjected to the additional shooting, or whether the RGB image needs to be subjected to the additional shooting or not is judged in other modes.
In a possible implementation, the range captured by the motion sensor and the capturing range of the RGB camera may be different, and in this scenario, before step 7302, the first event image and the plurality of RGB images need to be aligned, so that the capturing range corresponding to the first event image matches the capturing range corresponding to the RGB image.
7304. The exposure parameters are calculated.
The exposure parameter may include one or more of an exposure time period, an exposure amount, an exposure level, or the like.
For example, the manner of calculating the exposure time period may include: estimating true pixel values for overexposed areas using DVS data corresponding to shortest exposed imagesWhere C is the threshold of the DVS camera, typically c=2, p (x, y, t) = { -1,0,1} is a signed event at the pixel position (x, y) at time t, and compared to the previous time, the event is-1 when the light intensity is reduced, the event is 1 when the light intensity is increased, no event occurs when the light intensity is unchanged, and it is recorded as 0. Based on the camera response curve (Camera Response Function, CRF), the exposure values of the exposed areas are: estimating optimal exposure time according to exposure value Where vmin=0, vmax=255.
In addition, if the dithering degree of a certain RGB image is higher than the first preset value, the exposure parameter corresponding to the RGB image may be directly used when calculating the exposure parameter for re-shooting.
For another example, the manner of calculating the exposure parameters here may refer to the manner of adjusting the exposure parameters in the foregoing embodiment, such as the manner of adjusting the exposure mentioned in the foregoing examples corresponding to step 5604, step 7404, fig. 60 or fig. 62.
It should be noted that, step 7304 is an optional step, the exposure parameters, such as the exposure time, the exposure amount, or the exposure level, may be recalculated, or the exposure parameters corresponding to the RGB image with the shake degree greater than the first preset value may be used for recapturing, or the like.
7305. And re-shooting to obtain an RGB image with the jitter degree not higher than a first preset value.
When the RGB image is captured in a complementary manner, the motion sensor may be used to capture an image of a subject moving in the capturing range, and then an event image captured by the motion sensor during the capturing of the complementary RGB image is used to calculate the shake degree corresponding to the complementary RGB image. When the shake degree is higher than the first preset value, step 7303 may be continued until the shake degree of the RGB image obtained by the additional shooting is not higher than the first preset value, or the number of additional shots reaches a preset number of times, or the like.
If step 7304 is performed, i.e. the exposure parameters are calculated, before step 7305, the RGB image may be taken in-line using the exposure parameters, whereas if step 7304 is not performed, the new RGB image may be obtained by taking a picture using the exposure parameters corresponding to the RGB image having the shake degree greater than the first preset value, or referred to as the second image.
In addition, if the RGB image is captured in a scene where a moving object exists in the capturing range, the specific manner of capturing the RGB image in a complementary manner may refer to the first embodiment, for example, the motion trail of the object in the capturing range may be predicted by the data collected by the DVS, so that focusing is performed according to the prediction result, and a clearer image is captured, which is not described herein.
Therefore, in the embodiment of the application, the exposure strategy is adaptively adjusted by using the information acquired by the dynamic sensing camera (i.e. the motion sensor), that is, the high dynamic range sensing characteristic of the texture in the shooting range is utilized by using the dynamic sensing information, the image with proper shooting exposure time is adaptively supplemented, and the capability of capturing the texture information of the strong light area or the dark light area by the camera is improved.
7306. And calculating fusion weights according to the dithering degree of each RGB image, and fusing a plurality of RGB images according to the fusion weights of each image to obtain a target image.
After obtaining a plurality of RGB images with the jitter degree not exceeding a first preset value, calculating fusion weights corresponding to each RGB image according to the jitter degree of each RGB image, and fusing the plurality of RGB images according to the fusion weights of each RGB image to obtain a high-definition target image.
In the process of fusing a plurality of RGB images, the RGB images with different exposure time lengths can be fused, so that the fused target image is clearer. In general, the alignment of RGB images may be accomplished by calculating optical flow, finding the corresponding positions between a plurality of RGB images based on pixel points or feature points, and calculating an offset according to the offset.
For example, specific ways of aligning RGB images may include: firstly, calculating local optical flow information according to event data, wherein the calculation method is as follows: assuming that the event image data p (x, y, t) in a local space Ω are on the same plane, usingTo represent parameters of this plane. The parameters a, b, c, d are solved by optimizing the following formula:
according to plane sigma e =ax+by+ct+d, and solving for the local offset of this local space ΩAlignment of images according to the values of u and v, i.e. I e (x,y)=I e+N (x+u, y+v), where I e Is an exposure image photographed earlier, I e+N Is at I e The exposure image taken later, n=1, 2, …, p (x, y, t) is I e And I e+N Event data occurring in a local space Ω, the spatial resolution of Ω being 8 x 8. And estimating a local offset for each local space omega on the image, and completing the alignment of the whole RGB image. In particular, when the resolution of Ω is h×w, a global offset of the entire image is calculated.
In the process of calculating the fusion weight of each RGB image, a higher fusion weight can be set for the RGB image with low jitter degree, and a lower fusion weight can be set for the RGB image with high jitter degree, so that the information included in the finally obtained target image is clearer.
There are various ways of setting the fusion weight for each image, such as setting the fusion weight for each RGB image by a ratio of the degree of dithering of a plurality of RGB images, or setting the fusion weight for each RGB image according to the magnitude of the degree of dithering of each RGB image, or the like.
For example, the manner in which the initial fusion weights are set for each RGB image can be expressed as:
w e (x, y) represents the fusion weight of RGB images,Vmin=0,Vmax=255。
If there is a situation that the camera shakes or the object in the shooting range is in a moving state when shooting the RGB image, the fusion weight of each image can be adjusted according to the shake degree of each image, for example, the adjusted fusion weight can be expressed as:
Blur e BS for the degree of dithering per RGB image e Indicating the size of the movement region.
Generally, if the dithering degree of the RGB image is higher, i.e. the whole is blurred, and a larger motion area exists in the RGB image in a corresponding period of the event image, the fusion weight of the RGB image may be greatly reduced, so as to avoid a blurred area in the finally obtained target image. If the dithering degree of the RGB image is low, i.e. the whole image is clear, and the RGB image has a small or no motion area in the corresponding period of the event image, the fusion weight of the RGB image can be increased on the basis of the initial fusion weight, so that the finally obtained target image is clearer.
In one possible implementation manner, if the first dithering degree of each RGB image is not higher than the first preset value and is higher than the second preset value, the de-dithering process is performed on each first image, so as to obtain each de-dithered first image. Specifically, the de-jittering manner may include: the AI debounce algorithm, the optical flow-based debounce algorithm, the USM (Unsharp Mask) sharpening algorithm, and the like may be specifically adjusted according to the actual application scenario, which is not limited in this application. Therefore, in the embodiment of the present application, the shake situations can be distinguished based on the dynamic data, and the shake situations are directly fused when there is no shake, and the RGB image is adaptively debounced when the shake is not strong, and the RGB image is complementarily photographed when the shake is strong, so that scenes with various shake degrees are used, and the generalization capability is strong.
For example, a frame of image may be directly shot as shown in fig. 78, and a target image obtained by fusing multiple RGB images with an event image according to the method provided by the present application may be as shown in fig. 79.
Therefore, in the embodiment of the present application, the degree of shake when capturing RGB images may be quantified by an event image, and the fusion weight of each RGB image may be determined according to the degree of shake of each RGB image. Generally, the fusion weight corresponding to the RGB image with low jitter is higher, so that the information included in the final target image tends to be a clearer RGB image, and a clearer target image is obtained. In addition, aiming at the RGB image with high jitter degree, the RBG image can be subjected to supplementary shooting to obtain the RGB image with lower jitter degree and clearer, so that when the image fusion is carried out later, the clearer image can be used for fusion, and the finally obtained target image is clearer.
For ease of understanding, the flow of the method of image processing provided herein is described in more detail below in one more specific scenario.
The image processing method provided by the application can be executed by a device such as a mobile phone or a camera provided with or connected with a camera and a motion sensor such as a DVS, and the mobile phone is taken as an example for illustration.
As shown in fig. 80, in a scene where a user photographs with a mobile phone, the user can turn on the HDR mode, and photograph a clearer image in the HDR mode.
After the user clicks the shooting button, the mobile phone is kept still for a short time, and at this time, the mobile phone can shoot a plurality of RGB images by using different exposure time lengths. In this process, if the shake degree of one or more RGB images is higher than a preset value, the photographed RGB image may be supplemented, so that a new RGB image may be added. In the process of the supplementary photographing of the RGB image, the exposure time corresponding to the RGB image with the jitter degree higher than the preset value can be used, and the exposure time after the exposure level is reduced on the basis can be used, so that the clearer supplementary RGB image is obtained.
If the dithering degree of the RGB image is not higher than the preset value but higher than 0, the dithering removal processing can be carried out on the RGB image, so that a clearer RGB image is obtained, and the efficiency of obtaining a final target image can be improved relative to the process of capturing the RGB image. If the dithering degree of all RGB images does not exceed the preset value, the RGB images do not need to be shot in a supplementary mode and the dithering removing process is not needed.
And then, according to the dithering degree of each RGB image, distributing fusion weights for each RGB image. In general, the RGB image with higher jitter degree has smaller corresponding weight value, and the RGB image with lower jitter degree has larger corresponding weight value, so that the information included in the final obtained target image is more prone to the information included in the clearer RGB image, the final obtained target image is clearer, and the user experience is improved. And if the target image is used for subsequent image recognition or feature extraction and the like, the obtained recognition result or the extracted feature is more accurate.
In addition, the method (i.e. the method of capturing an image in a shooting scene in an HDR mode) of generating a high-quality image by using a DVS sensor and an RGB camera in a cooperative manner may be applied to a scene of a high frame rate video application (High frame rate video, HFR video), and the image quality of each frame in the HFR is improved by using the motion blur removal and HDR characteristics of the DVS, so as to enhance the image quality. Meanwhile, the RGB sensor shoots an image sequence (video) at a fixed frame rate, and the DVS event between two frames of RGB images is utilized to assist in reconstructing the high-frame-rate video, so that the video frame rate can be improved.
Referring to fig. 115, a schematic structural diagram of an image processing apparatus provided in the present application may include:
the motion sensor 11501 is configured to detect motion information of a target object, where the motion information includes information of a motion track of the target object when the target object moves within a preset range, and the preset range is a shooting range of the camera;
a calculation module 11502, configured to determine focusing information according to the motion information, where the focusing information includes a parameter for focusing on a target object within a preset range;
the photographing module 11503 is configured to focus the target object in a preset range according to the focusing information, and is configured to photograph an image of the preset range.
In one possible implementation, the computing module 11502 may be a module coupled to the motion sensor 11501 or a module disposed within the motion sensor 11501.
In one possible implementation, the focus information includes information of a focus area; the computing module 11502 is specifically configured to: predicting the motion trail of the target object in a preset time length according to the motion information to obtain a prediction area; and determining a focusing area according to the prediction area.
In one possible implementation, the computing module 11502 is specifically configured to: if the predicted area meets the preset condition, the predicted area is used as a focusing area, and the shooting module 11503 is triggered to focus; if the predicted area does not meet the preset condition, predicting the motion trail of the target object in the preset time length according to the motion information again to obtain a new predicted area, and determining the focusing area according to the new predicted area.
It may be understood that when the calculating module 11502 determines that the preset area meets the preset condition, the preset area is taken as a focusing area, for example, the preset area is taken as a focusing area or a range larger than the preset area is determined as a focusing area, and the shooting module is triggered to shoot. Before this, the camera module may be in a closed state, for example, if the camera module includes a camera, before the calculation module 11502 triggers shooting, if the preset area does not meet the preset condition, the camera may be in a closed state, so as to reduce power consumption of the camera and save resources.
In one possible embodiment, the motion information further includes at least one of a motion direction and a motion speed of the target object;
the calculating module 11502 is specifically configured to predict a motion trajectory of the target object within a preset duration according to a motion trajectory of the target object when the target object moves within a preset range, and a motion direction and/or a motion speed, so as to obtain a prediction area.
In one possible implementation, the computing module 11502 is specifically configured to: fitting a change function of the center point of the motion area of the target object along with the change of time according to the motion area, the motion direction and/or the motion speed; calculating a predicted central point according to the change function, wherein the predicted central point is the central point of the area where the target object is located in the predicted preset duration; and obtaining a prediction area according to the prediction center point.
In one possible implementation, the photography module 11503 includes an RGB camera;
the photographing module 11503 is specifically configured to focus, as the focus, at least one point with a minimum norm distance from a center point of the focusing area among a plurality of focus points of the RGB camera.
In one possible implementation, the focusing information includes information of a focusing area, the moving area includes an area where the target object is currently located, and the calculating module 11502 is specifically configured to take the area where the target object is currently located as the focusing area.
In one possible implementation, the capturing module 11503 is further configured to obtain an exposure parameter before the capturing module 11503 captures an image of the preset range, and capture the image of the preset range according to the exposure parameter.
In one possible implementation manner, the shooting module 11503 is specifically configured to obtain an exposure parameter according to the motion information, where the exposure parameter includes an exposure duration, and the exposure duration has a negative correlation with a motion speed of the target without a map.
In a possible implementation manner, the shooting module 11503 is specifically configured to obtain an exposure parameter according to the illumination intensity, where the exposure parameter includes an exposure duration, and the magnitude of the illumination intensity in the preset range has a negative correlation with the exposure duration.
In one possible embodiment, the image processing apparatus may further include:
the enhancement module 11504 is configured to, after the photographing module photographs an image in a preset range, fuse the images in the preset range according to the monitored motion information of the target object and the image, and obtain a target image in the preset range.
In one possible implementation, the motion sensor 11501 may include a dynamic vision sensor DVS, where the DVS is configured to monitor a motion condition of a target object within a preset range to obtain motion information.
Referring to fig. 116, the present application further provides an image processing apparatus, including:
the acquiring module 11601 is configured to acquire a first event image and a plurality of captured first images, where the first event image includes information of an object moving in a preset range in a capturing period of the plurality of first images, specifically may be obtained by using a motion sensor 11603, and exposure durations corresponding to the plurality of first images are different, where the preset range is a capturing range of a camera;
the processing module 11602 is configured to calculate a first shake degree corresponding to each of the plurality of first images according to the first event image, where the first shake degree is used to represent a degree of camera shake when the plurality of first images are captured;
The processing module 11602 is further configured to determine a fusion weight of each first image in the plurality of first images according to the first jitter degree corresponding to each first image, where the first jitter degree corresponding to the plurality of first images and the fusion weight are in a negative correlation;
the processing module 11602 is further configured to fuse the plurality of first images according to the fusion weight of each first image, so as to obtain a target image.
In a possible implementation manner, the processing module 11602 is further configured to, before determining, according to the first shake degree, a fusion weight of each of the plurality of first images, if the first shake degree is not higher than a first preset value and is higher than a second preset value, perform a de-shake process on each of the first images, so as to obtain each of the first images after de-shake.
In a possible implementation manner, the obtaining module 11601 is further configured to, if the first jitter degree is higher than a first preset value, re-shoot to obtain a second image, where the second jitter degree of the second image is not higher than the first preset value;
the processing module 11602 is specifically configured to calculate a fusion weight of each first image according to a first jitter degree of the each first image, and calculate a fusion weight of the second image according to the second jitter degree;
The processing module 11602 is specifically configured to fuse the plurality of first images and the second image according to the fusion weight of each first image and the fusion weight of the second image, so as to obtain the target image.
In a possible implementation manner, the obtaining module 11601 is further configured to obtain a second event image before the capturing the second image, where the second event image is obtained before the obtaining the first event image;
and calculating exposure parameters according to the information included in the second event image, wherein the exposure parameters are used for shooting the second image.
In one possible implementation manner, the obtaining module 11601 is specifically configured to: dividing the first event image into a plurality of areas, and dividing a third image into a plurality of areas, wherein the third image is a first image with the minimum exposure value in the plurality of first images, the plurality of areas included in the first event image correspond to the positions of the plurality of areas included in the third image, and the exposure value comprises at least one of exposure duration, exposure amount or exposure level; calculating whether each region in the first event image includes first texture information and whether each region in the third image includes second texture information; if a first area in the first event image includes the first texture information and an area corresponding to the first area in the third image does not include the second texture information, shooting according to the exposure parameter to obtain the second image, wherein the first area is any area in the first dynamic area.
(2) DVS image and RGB image fusion
The photographing technology is commonly used in mobile phones, cameras and other terminal devices, and is a process that a photosensitive device receives photons (natural light) within a period of time (set exposure time) and quantifies digital signals (such as 0-255), the photosensitive device can also be called a photosensitive element or an image sensor, is an important component forming a digital camera, and can be divided into two major types, namely a charge coupled element (charge coupled device, CCD) and a metal oxide semiconductor element (complementary metal-oxide semiconductor, CMOS) according to different elements; if the types of the photographed optical images are different, the two types of color sensors and motion sensors can be classified, wherein the color sensors can be called RGB sensors, and the motion sensors can be motion detection vision sensors (motion detection vision sensor, MDVS) and can be called dynamic vision sensors (dynamic vision sensor, DVS) for short.
Under the drive of intelligent terminal equipment such as cell-phones, image sensor has obtained rapid development, and along with the continuous abundance of image sensor use kind, the type and the sensing function of the image sensor that are equipped with on a terminal equipment are more and more, and this makes the service scene that terminal equipment can handle also more and more extensive, therefore, how to construct an image sensor that sensing function is abundant to and how to handle the data that image sensor obtained to different service scenes, how to output data, what kind of data etc. all become the urgent problem of waiting to solve in the shooting process.
In general, information acquired by an image sensor may be used for image reconstruction, object detection, capturing moving objects, capturing using a moving device, capturing deblurring, motion estimation, depth estimation, or object detection recognition, etc. scenes. At present, under the drive of intelligent terminal equipment such as mobile phones, image sensors are rapidly developed, and along with the continuous enrichment of the use types of the image sensors, the types and sensing functions of the image sensors equipped on one terminal equipment are more and more, so that the use scenes which can be processed by the terminal equipment are more and more extensive, therefore, how to construct an image sensor with rich sensing functions, how to process data acquired by the image sensors according to different use scenes, how to output the data, what kind of data to output and the like are all the problems to be solved.
Based on the above, the embodiment of the application combines the respective advantages of the color sensor and the motion sensor to construct a new image sensor, and provides a new data processing method for realizing data acquisition and data output in various application modes, so that the constructed new image sensor has rich and powerful supported functions and wider use scenes. The following three aspects will explain the contents of the embodiments of the present application, in which, in the first aspect, how to construct a new structure of an image sensor with more powerful sensing function based on an existing photosensitive unit (i.e., a sensor pixel) or an existing image sensor, and the chip architecture, the circuit structure and the corresponding workflow of the new image sensor are involved; in a second aspect, based on the newly constructed image sensor, how to implement data acquisition and data output involves new data processing algorithms and different application modes, and developing adapted algorithms for different data fusion modes to process the corresponding data streams. The third aspect is an application example of output data, namely how to efficiently and accurately remove moving objects (also referred to as moving prospects) from captured images. In particular, the implementation of the second aspect may be further divided into the following points: 1) Based on the newly constructed image sensor, how to realize data acquisition in different application modes; 2) How to output data, what kind of data to output, etc. in different application modes. Specifically, under different output modes, with the cooperation of a new algorithm, high-quality image reconstruction can be realized, for example, high-frame-rate image reconstruction, high-dynamic-range (high dynamic range, HDR) image reconstruction, or low-power-consumption target detection and recognition functions can be realized, and semantic information and images are associated, so that better experience is provided for users. It should be noted that, for ease of understanding, in the following embodiments, a motion sensor is taken as an example of DVS.
(1) New structure of image sensor and workflow thereof
Since the embodiments of the present application relate to a lot of knowledge about the image sensor, in order to better understand the schemes of the embodiments of the present application, related terms and concepts that may be related to the embodiments of the present application will be described below. It should be understood that the related conceptual illustrations may be limited by the specific embodiments of this application, but are not intended to limit the application to that specific embodiment, and that differences between the specific embodiments may exist, and are not specifically defined herein.
Since the purpose of any imaging system is to obtain a picture that can meet the requirements, in one imaging system, the task of the graphics processor is to extract enough high-quality picture information for the corresponding imaging system, specifically, the imaging objective lens images a scene under the irradiation of external illumination light (or self-luminescence) on the image plane of the objective lens, so as to form a light intensity distribution (optical image) in a two-dimensional space, and the sensor capable of converting the optical image of the two-dimensional light intensity distribution into a one-dimensional time-sequence electrical signal is called an image sensor. In an image sensor, each light sensing unit corresponds to a pixel (Pixels), and the larger the number of Pixels included on an image sensor, the more object details it is able to sense, and thus the clearer the image, i.e. the higher the resolution of the picture it provides. The mainstream cameras in the market mostly use 30 ten thousand pixel image sensors, that is, about 30 ten thousand photosensitive units are used in the image sensors. The imaging resolution corresponding thereto is 640×480 (i.e., equal to 307200 pixels). As shown in fig. 81, two conventional color sensors are illustrated, in which a sensor a includes 5×5=25 photosensitive cells (only for illustration), a corresponding captured image a includes 25 pixel values, each of which is obtained based on the photosensitive cell at the corresponding position, and a sensor B includes 10×10=100 photosensitive cells (only for illustration), and similarly, a corresponding captured image B includes 100 pixel values, each of which is obtained based on the photosensitive cell at the corresponding position.
It should be noted here that, similar to the color sensor corresponding to fig. 81, a DVS also includes a plurality of photosensitive units, each of which corresponds to a pixel point on an image, unlike the conventional color sensor, which specifically outputs a full image in a frame format, in which the DVS uses a three-layer model of the human retina, each pixel independently and asynchronously operates without the concept of frame and exposure time, and the DVS can only capture dynamic changes, and when the photographed scene does not change at all, the camera does not output (does not take noise into consideration) and therefore cannot capture static information.
When at least one frame of event image mentioned in the embodiment of the present application is a multi-frame event image, the event images may be event images in the same time window, or event images in different time windows, for example, event image 1 is an event image in a period of [ t1, t2], and event image 2 is an event image in a period of [ t2, t3 ]. Of course, the at least one frame of event image may be event images of different areas within the same period. For example, the monitoring region of the DVS may be divided into a plurality of regions, and a corresponding event image may be generated based on events detected within each region. In addition, events in different pixel positions and within a period of time form an event data stream, which may also be simply referred to as an event stream.
Illustratively, as shown in fig. 37 described above, the time window may be divided into a plurality of short time windows, such as k short time windows shown in fig. 37, each of which may correspond to one frame of event image. The cutting mode can be cutting according to set time length, cutting according to random time length, cutting according to motion track change condition, and the like, and can be specifically adjusted according to actual application scenes. After the k short-time windows are obtained by segmentation, the positions of the events in each short-time window are analyzed, and the area where the target object is located in each short-time window is determined, for example, the motion area in the short-time window 1 is the motion area 1 shown in fig. 37, and the motion area in the short-time window k is the motion area k shown in fig. 37. Then, the motion area and motion characteristics of the target area, such as the motion direction or the motion speed, are determined by the change condition of the motion area in the short-time window 1-k. In addition, events at different pixel locations throughout the time window (i.e., the lower solid rectangular box in FIG. 37) form an event data stream.
In daily photographing activities of users, moving objects (which may be referred to as a moving foreground) are sometimes unexpectedly appeared in a photographing range, so that photographing effects are affected, and some methods for removing moving objects are currently available on the market, for example, a Lumia mobile phone can remove moving objects in a certain scene by taking a dynamic photo (such as 2 seconds) for a certain period of time and performing photo stitching based on the dynamic photo, which has a high requirement on photographing time, needs to stably photograph for a certain period of time (such as 2 seconds as described above), and has poor removing effect, so that objects moving at a high speed cannot be identified and removed. Based on this, how to efficiently and accurately remove moving objects from a captured image is a problem to be solved.
The embodiment of the application provides a data processing method, in particular to an image processing method which is used for efficiently and accurately removing moving objects from shot images.
First, the specific flow of the image processing method provided in the present application may include: firstly, acquiring an event stream and a first RGB image, wherein the event stream comprises at least one frame of event image, each frame of event image in the at least one frame of event image is generated by motion track information when a target object moves in a monitoring range of a motion sensor, and the first RGB image is superposition of shooting scenes at each moment captured by a camera in exposure time; constructing a mask according to the event stream, wherein the mask is used for determining a motion area of each frame of event image; a second RGB image is obtained from the event stream, the first RGB image and the mask, the second RGB image being an RGB image from which the target object (i.e., the moving object) has been removed. In the embodiment of the application, the moving object can be removed based on only one RGB image and event stream, so that an RGB image without the moving object is obtained.
The moving object removal has strong significance in the application fields of photography, detection and identification, background modeling, panoramic stitching and the like, for example, in the application scenes of mobile phone photography and the like, when a user wants to take some scenic photos, in the shooting process, a plurality of pedestrians (such as scenic spots of people) exist in a shooting area sometimes, and scenic photos meeting the requirements of the user can be obtained through the moving object removal; for another example, in the monitoring scene, the background and the foreground (i.e. the motion foreground) can be separated by using a background subtraction method, so that the purpose of detecting the moving object can be quickly achieved; also for example, in panoramic stitching scenes, stitching of multiple photographs of moving objects involved in panoramic stitching requires removal of the moving foreground.
The image processing method provided in the present application is described in detail below. Referring specifically to fig. 82, fig. 82 is a flowchart of an image processing method according to an embodiment of the present application, where the method may include the following steps:
8201. an event stream and a frame of a first RGB image are acquired.
First, an event stream and one frame of RGB image (which may be referred to as a first RGB image) are acquired by a camera equipped with a motion sensor (e.g., DVS) and an RGB sensor, respectively, wherein the acquired event stream includes at least one frame of event image, each frame of event image in the at least one frame of event image is generated from motion trajectory information when a target object (i.e., a moving object) moves within a monitoring range of the motion sensor, and the first RGB image is a superposition of photographed scenes at each time captured by the camera during an exposure period.
For ease of understanding, a description will be given below of how to acquire the event stream and the first RGB image, respectively.
a. Process for acquiring event streams
Firstly, motion information is acquired through a motion sensor, specifically, the motion condition of a target object in the detection range of the motion sensor can be monitored through the motion sensor, and the motion information of the target object in the detection range is obtained. Wherein the target object is an object moving within the detection range, the number of the target objects can be one or more, and the movement information can comprise information of a movement track of the target object when moving within the detection range. For example, the motion information may include information such as a motion profile of the target object, a size of a region in which the target object is located, or coordinates of corner points in the detection range when the target object moves in the detection range.
For ease of understanding, the region in which the target object is located at each time of detection when the target object moves within the detection range is hereinafter referred to as a movement region of the target object. For example, if the target object is a pedestrian and the pedestrian is performing a whole-body motion, the whole-body motion of the pedestrian may be included in the motion region, and if the pedestrian is moving only with the arm, the target object may be merely the arm of the pedestrian and the motion region may include the arm portion of the pedestrian.
Then, an event image is generated from the motion information, that is, after the motion information is obtained, at least one frame of event image is generated from the information acquired by the motion sensor in the detection range. In general, the motion information may include information of a track of the target object moving in the detection range within a period of time, and the motion information may be regarded as an event, and the motion information acquired within a period of time forms an event stream. And mapping all motion information corresponding to one time window in the event stream into the same image according to the corresponding coordinates to obtain an event image.
For example, the event image may refer to fig. 35-37 and the related description thereof, and will not be described herein.
Alternatively, the event images mentioned in this embodiment may be optimized by the method flows corresponding to fig. 38 to fig. 44, so as to obtain clearer event images.
For ease of understanding, and as illustrated below, referring to FIG. 83, the motion sensor, because of its non-exposure-requiring nature, captures moving lightning with extremely high temporal resolution, e.g., within a very small time window (which may be considered a time t k ) The motion sensor can capture a clear outline of the location of the lightning, then the position of the lightning in the exposure time (assumed to be t 1 ,t 2 ]) The motion trail of the internal lightning is continuously changedCaptured, thereby forming an event stream as shown in fig. 83.
b. Process for obtaining a frame of a first RGB image
The first RGB image can be obtained by a color sensor, when the color sensor is started, the camera can obtain a frame of first RGB image by the color sensor, and when the color sensor obtains the image, the principle of the color sensor determines that the obtained image is the superposition of scenes in a period of exposure time. Assuming that the first RGB image is denoted as I, then image I represents the shooting scene f from t 1 From time to t 2 The exposure result between the moments, here assuming an exposure time period of [ t ] 1 ,t 2 ]And the shooting scene f refers to an image of a true clear scene within the shooting range of the camera. As shown in fig. 84, fig. 84 illustrates an image I and an exposure time t of the image I 1 Time-dependent shooting scene f (t 1 ) And the exposure time is t 2 Time-dependent shooting scene f (t 2 ) The shooting scene f represented by the image I is from t 1 From time to t 2 The superposition of exposure results between moments can be seen that the image I obtained after the exposure superposition of a plurality of shooting scenes is a blurred image.
8202. A mask is constructed from the event stream.
After the event stream and the first RGB image are acquired, a mask may be constructed according to the event stream, where the mask is used to determine a motion area of each frame of the event image in the event stream, that is, to determine a position of a moving object in the RGB image. As shown in fig. 85, the gray area is a static area, which may also be referred to as a background area, and the black area is a moving area.
Note that, the process of constructing the mask M (t) based on the event stream E is denoted as g (x), and for the shooting scene f (x) at the x time, since the motion thereof is already recorded in the event stream E by the motion sensor, the mask M (t) may be expressed as: m (t) =g (E (t+Δt)), where E (t+Δt) can be expressed asAs shown in fig. 86. g (x) as a method for representing the construction of the mask, a variety of implementations may be adopted, for example, a connected region may be constructed by using position information of an event image in a morphological manner; or selecting a function with time attenuation to give different weights to the area of the event image generated in a period of time, thereby obtaining a mask; the mask may also be constructed by marking a region in the spatial neighborhood where the number of event images generated for a period of time exceeds a preset threshold value as 0 (representing a motion region), and a region not exceeding the preset threshold value as 1 (representing a background region). In the embodiment of the present application, a specific implementation manner of the mask is not limited.
For ease of understanding, one specific implementation of constructing a mask is described herein: first, the monitoring range of the motion sensor may be divided into a plurality of preset neighborhoods (set as a neighborhood k), then, in each neighborhood k, when the number of event images of the event stream within the preset duration Δt exceeds a threshold value P, the corresponding neighborhood is determined to be a motion region, the motion region may be marked as 0, and if the number of event images of the event stream within the preset duration Δt does not exceed the threshold value P, the corresponding neighborhood is determined to be a background region, the background region may be marked as 1, and specifically, the following formula (18) may be shown:
Wherein M is xy (t) represents the value of mask M at the (x, y) position at time t, e ij (s) represents an event (belonging to an event stream) recorded at the (i, j) position of the event image e at the time s in the event stream.
8203. And obtaining a second RGB image according to the event stream, the first RGB image and the mask, wherein the second RGB image is an RGB image for removing the target object.
After the event stream, the first RGB image and the mask are obtained according to the above steps, a second RGB image can be obtained according to the event stream, the first RGB image and the mask, where the second RGB image is an RGB image of the removal target object, as shown in fig. 87, and fig. 87 is an image I' with the moving object removed corresponding to the image I.
In the following, it is described in detail how the second RGB image is derived from the event stream, the first RGB image (which may also be referred to as image I) and the mask.
First, it is described how to calculate an image f (t) 1 ) Specifically, it can be obtained by the following series of formulas:
logf(t)=logf(t 1 )+c·E(t) (22)
wherein the color sensorThe shooting principle of (2) determines that the shooting result is superposition of scenes corresponding to all moments in exposure time, namely if the standard definition image corresponding to the shot scene at the moment t is f (t), the image I is formed by t 1 To t 2 The time f (t) is integrated, that is, as shown in the above formula (19)As shown in image I in fig. 84: each lightning represents the position of the moving object at a real certain moment, because the exposure time is long, the lightning moves to different positions to be captured by the camera for a plurality of times, and finally a blurred picture is taken.
The motion sensor can capture lightning in motion with extremely high time resolution due to the characteristic that the motion sensor does not need exposure, and can be specifically expressed as the formula (20) above, and the motion sensor captures discrete motion information, namely discrete events (x, y, sigma, t) 0 ) Where x, y represents the coordinates of the light intensity change, p represents the direction of the light intensity change, and t represents the time at which the change occurs. By e xy (t) to represent a continuous function of time t at the (x, y) position, there is Represents an impact function that integrates to 1 at time t0, sigma represents at t 0 Whether there is a change in light intensity at the moment, i.e., if the change in light intensity over the logarithmic domain is greater than a threshold c, σ=1, and if it is less than the opposite number-c of the threshold, σ= -1. Otherwise σ=0. E subsequently representing the (x, y) position by e (t) xy (t). For example, within a very small time window (which may be considered as a time t k The DVS can capture a clear outline of the location of the lightning, and a point on the outline can be shownShown as e (t) k ) As with the profile of each lightning in fig. 83, then the movement of the lightning is captured continuously during the exposure time, thereby forming an event image, and the value of each pixel point in a specific event image can be expressed as shown in the above formula (21):
also because the principle of event generation is that the log value change of the light intensity at the corresponding pixel position reaches the c value, as shown in the following formula (29):
true sharp image f (t 1 ) The true sharp image f (t) at time t can be expressed as f (t 1 ) The intensity variation is continuously superimposed, as shown in the above formula (22): logf (t) =logf (t) 1 ) +c.E (t). Substituting the formula (22) into the formula (19) to obtain the formula (23), substituting the formula (21) into the formula (23) to obtain the formula (24), performing operations such as term shifting on the formula (24) to obtain a real clear image f (t) obtained from the blurred image I and the event stream e 1 ) Is shown in formula (25):
therefore, when a plurality of images at different moments are shot by the camera, two frames of shot images I are taken 1 And I 2 For illustration, referring to FIG. 88, if the position of the moving object in each image is known to be the ROI 1 、ROI 2 Mask M with background area of 1 and motion area of 0 can be obtained respectively 1 、M 2 . Then an image of a motionless object can be obtained by combining images of different motion areas at different timesMay be represented as I', and may be specifically represented as the following formula (27):
from the two images, it can be deduced that if n images are taken, the expression of the image I' without moving object is obtained, as shown in fig. 28:
both the above formula (27) and formula (28) require at least two images to be manually taken by the user to obtain an image I 'without moving object, if the shooting time is short, for example, when shooting the images of fireworks in the sky, shooting the images of aircraft flying from the window at high speed, etc., the existence time of these shooting scenes in the shooting range is very short, the user may only get to take an image, and at this time, the image I' without moving object can be obtained by: first, assuming that a photographed image is denoted as I, the image I represents a photographed scene f from t 1 From time to t 2 The exposure result between the moments, here assuming an exposure time period of [ t ] 1 ,t 2 ]And the shooting scene f refers to an image of a real clear scene in the shooting range of the camera, namely f (t) represents an ideal image without any motion blur, then the image I' with the motion foreground removed can be expressed as the following formula (29):
wherein M (t) represents a mask of the image I at the time t, and the mask may be constructed as shown in the above formula (18), which is not described herein.
While it is known to derive an ideal true sharp image f (t 1 ) As shown in equation (25) above, i.e., through a series of transformations, the following equations (30) through (32) are obtained:
by combining the above formulas (29) and (32), it can be finally obtained how to obtain the corresponding image I' without moving object when only one image I is captured, specifically, as shown in the formula (33):
it should be noted that, in the process of obtaining an image I' without moving object according to an image I, the user needs to trigger the camera to shoot to obtain an image I manually, however, in some application scenarios, as shown in fig. 89, when the moving object moves less during shooting, at t 1 Time and at t 2 At this time, the overlapping area of the moving object at the two times is too large, which may result in the inability to remove the moving object or poor removal effect in the above manner. Thus, in some embodiments of the present application, there is also provided an image processing method, which is different from the above-described image processing method in that: the method comprises the steps of judging whether motion mutation exists in motion data acquired by a motion sensor, triggering a camera to shoot a third RGB image (as shown in a figure 90) when the motion mutation exists, acquiring an event stream and a frame of first RGB image according to the similar mode from step 8201 to step 8202, and obtaining the root A mask is constructed according to the event stream (refer to steps 8201-8202, which are not repeated herein), and finally a second RGB image without motion foreground is obtained according to the event stream, the first RGB image, the third RGB image and the mask. The obtained third RGB image is automatically captured by triggering the camera under the condition of abrupt motion, and has high sensitivity, so that a frame of image can be obtained at the beginning of the user perceiving that the motion object changes, and the scheme can be combined with the method for removing the motion object in the single image I proposed by the formula (33) to obtain the third RGB image (namely the image B) based on the automatic capturing of the camera as shown in the following formula (34) k ) And a method for removing moving objects in the first RGB image (namely, image I) manually shot by a user:
wherein, image B k The method is characterized in that in the exposure time, a third RGB image shot by a camera is triggered at the moment k, if 3 motion abrupt changes exist in the exposure time, 3 moment k are corresponding, the camera is triggered to shoot the third RGB image at each moment k, and M is the same as the first RGB image k Then it is shown that at time k, based on image B k The resulting mask was constructed. Thus, as shown in fig. 91, the camera snap shot image B is triggered based on the motion abrupt change k The image(s) and the image I taken by the user actively during a certain exposure time can be used to obtain an image I' without moving objects by the above formula (34).
To facilitate understanding of two ways of removing motion foreground in an image according to an embodiment of the present application, the following description is given by way of example: FIG. 92 is a schematic flow diagram of a process for obtaining a second RGB image (i.e., image I ') without moving object based on a frame of the first RGB image (i.e., image I) and an event stream E and an expression of the image I', and FIG. 93 is a third RGB image (i.e., one or more images B) obtained by trigger shooting based on a frame of the first RGB image (i.e., image I) and motion blur k ) And the event stream E obtains a flow diagram of a second RGB image (i.e. image I ') without moving objects and an expression form of the image I ', from which it can be seen that the pattern of fig. 92 is obtained by establishing a relationship between a single image I and the event stream E, and calculating an image I ' with moving objects removed based on the mask M correspondingly constructed, and the specific process can be summarized as follows: the event camera acquires an event stream E, a user takes a picture, an image I is acquired through the RGB camera, a mask M is generated at different moments in the event stream E, and based on the image I, the event stream E and the mask M, an image I' with a moving object removed is calculated through the formula (33); while the pattern of FIG. 93 is a determination of triggering RGB camera capture image B when motion abrupt conditions exist by motion data collected by a motion sensor k For subsequent removal of moving objects in the image I, the specific procedure can be summarized as: the event camera acquires an event stream E, determines whether motion mutation (such as new moving object) occurs in the monitoring range by analyzing the event stream E, and triggers the RGB camera to capture an image B k A user takes a picture, acquires an image I through an RGB camera, generates a mask M for different moments in an event stream E, and is based on the image I and the image B k The event stream E and the mask M, the image I' from which the moving object is removed is calculated by the above formula (34).
It should be noted that, in some embodiments of the present application, the event camera and the RGB camera may be integrated into one camera, or two independent cameras may operate separately, which is not limited herein.
In addition, in the photographing scene applied in the application, sensors used in combination during photographing can be displayed in a display interface, for example, selection such as DVS, IMU or infrared can be displayed in the display interface, and whether to turn on the sensors is selected by a user, so that an image conforming to the user's expectations is obtained. For example, when the user opens the photographing interface, as shown in fig. 94A, the configuration at the time of photographing may be selected in the setting options, and as shown in fig. 94B, whether DVS, IMU, infrared, or the like is opened may be selected by the user, thereby obtaining an image or video conforming to the user's desire.
In order to better implement the above-described scheme of the embodiment of the present application, on the basis of the embodiments corresponding to fig. 81 and 94, a related apparatus for implementing the above-described scheme is further provided below. Referring specifically to fig. 117, fig. 117 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus 11700 includes: the device comprises an acquisition module 11701, a construction module 11702 and a processing module 11703, wherein the acquisition module 11701 is used for acquiring an event stream and a frame of first RGB image, the event stream comprises at least one frame of event image, each frame of event image in the at least one frame of event image is generated by motion track information when a target object moves within a monitoring range of a motion sensor, and the first RGB image is superposition of shooting scenes at each moment captured by a camera within exposure time; a construction module 11702 for constructing a mask from the event stream, the mask for determining a motion region of the per-frame event image; and a processing module 11703, configured to obtain a second RGB image according to the event stream, the first RGB image, and the mask, where the second RGB image is an RGB image from which the target object is removed.
In the above embodiment of the present application, the moving object may be removed based on only one RGB image and event stream, so as to obtain an RGB image without moving object.
In one possible design, the obtaining module 11701 is further configured to trigger the camera to capture a third RGB image when the motion sensor detects that a motion mutation occurs in the monitoring range at a first time; the processing module 11703 is further configured to obtain a second RGB image based on the event stream, the first RGB image, the third RGB image, and the mask.
In the above embodiment of the present application, whether the motion data acquired by the motion sensor has a motion mutation may be determined, when the motion mutation exists, the camera is triggered to capture a third RGB image, then an event stream and a frame of first RGB image are obtained according to the similar manner, a mask is constructed according to the event stream, and finally a second RGB image without motion foreground is obtained according to the event stream, the first RGB image, the third RGB image and the mask. The obtained third RGB image triggers the camera to automatically snap under the condition of sudden movement change, so that the sensitivity is high, a frame of image can be obtained at the beginning of the user perceiving that the moving object changes, and a better removal effect on the moving object can be realized based on the third RGB image and the first RGB image.
In one possible design, the motion sensor monitoring the monitored range for motion abrupt changes at a first time includes: in the monitoring range, an overlapping portion between a generation region of a first event stream acquired by the motion sensor at a first time and a generation region of a second event stream acquired by the motion sensor at a second time is smaller than a preset value.
In the above embodiments of the present application, the determination conditions for motion mutation are specifically described, and the feasibility is provided.
In one possible design, the construction module 11702 is specifically configured to: dividing the monitoring range of the motion sensor into a plurality of preset neighborhoods; in a target preset neighborhood, when the number of event images of the event stream in a preset duration exceeds a threshold value, determining the target preset field as a motion sub-region, wherein the target preset neighborhood is any one of the preset neighbors, and each motion sub-region forms the mask.
In the above embodiments of the present application, a method for constructing a mask is specifically described, which is simple and easy to operate.
(3)SLAM
The traditional APS (Advaenced Photo System) camera mainly locates a moving target based on a background subtraction-like method and mainly analyzes the key information, and the simplest implementation method is a frame difference method. While DVS captures moving objects by sensing the change in luminance of individual pixels, it has almost the same effect as the frame difference method, but with lower latency. The DVS camera can quickly locate a rectangular area/mask where a foreground moving object is located in a single moving object scene, for example, in a monitored scene where a lens is fixed and a photographed background is clean. For example, referring to fig. 95, fig. 95 is a schematic diagram illustrating comparison between a conventional camera and a scene photographed by a DVS according to an embodiment of the present application. In fig. 95, (a) is a schematic view of a scene shot by a conventional APS camera, and (b) in fig. 95 is a schematic view of a scene shot by a DVS.
Specifically, the flow of moving object detection using DVS is as follows.
When a moving object appears in the screen or a scene has a light change, an event (events) is generated in the DVS corresponding region. By setting the pixel position where events occur within a certain period of time (for example, 1 second) to 1 and the pixel position where no events are set to 0, a binary image as shown in (b) of fig. 95 is obtained. And finding connected rectangular frame areas on the binary image. Then, judging the size of the rectangular frame area, and when the area of the rectangular frame area is greater than a threshold value 1, judging that the scene light changes; when the threshold 2> is the area of the rectangular frame area, the area of the rectangular frame area is considered to be too small, and is noise, such as a movement area generated by leaf shaking due to wind blowing; when the threshold value 1> the area of the rectangular frame area > the threshold value 2, it is further judged whether or not it is a moving object based on the continuity of the movement.
Optionally, the flow of moving object detection and identification using DVS sensor and RGB camera is as follows:
when a moving object appears in the picture or the scene has light change, the corresponding region of the DVS generates event. By setting the pixel position where events occur within a certain period of time (for example, 1 second) to 1 and the pixel position where no events occur to 0, a binary image as shown in (b) of fig. 95 is obtained. Connected rectangular box areas are found on the graph. After expanding the rectangular frame area by one circle (h x w x 0.1), finding a corresponding rectangular area on a corresponding frame of the RGB camera as a moving object area. Existing RGB image deep learning networks are used to identify categories of objects within a moving object region.
In general, the use of DVS for moving object detection has the advantage of low latency because DVS is sensitive to high speed moving objects, can quickly capture motion events and perform response analysis, and has a higher "time resolution" than APS. In addition, the DVS sensor is highly sensitive to object motion and is not greatly affected by scene light intensity, i.e., can still perform moving object identification information in an excessively bright or excessively dark scene.
The DVS is applied to SLAM technology, and can provide accurate positioning and map reconstruction functions. This functionality is very useful in AR scene applications. In addition, based on DVS, the user can see the virtual information of the physical space through virtual-real fusion.
Illustratively, some examples of virtual information that enables a user to see physical space based on DVS are described below.
1. Virtual signage of an actual building, virtual presentation of building entrances, virtual signage of a campus service facility, and the like. Such as train stations, canteens, cafes, convenience stores, maternal and infant rooms, gymnasiums, charging posts, cash dispensers, toilets, etc.
2. Intelligent information such as temperature and humidity of indoor places, air quality, the number of building people, meeting room names, meeting issues and the like is displayed. Specifically, referring to fig. 96, fig. 96 is a schematic diagram of indoor navigation with DVS applied according to an embodiment of the present application.
3. 3D walking navigation supports real-time walking navigation indoor and outdoor. Specifically, referring to fig. 97 to 99, fig. 97 is an outdoor navigation schematic diagram with DVS applied according to an embodiment of the present application; fig. 98a is a schematic diagram of station navigation using DVS according to an embodiment of the present application; FIG. 98b is a view point navigation schematic diagram with DVS applied according to an embodiment of the present application; fig. 99 is a schematic diagram of market navigation using DVS according to an embodiment of the present application.
Specifically, in the whole scene space calculation, the pose estimation can be performed by using the fusion framework of the DVS camera and the traditional sensor (such as a camera) disclosed by the invention, so that the precision of the rapid movement, the high-dynamic environment acquisition and the environment with larger light intensity change is improved. For example, since DVS cameras are sensitive to changes in light intensity, image matching points can be found also at night, making night positioning possible.
The characteristics of high speed, high dynamic range and capability of detecting the light intensity change caused by movement of the DVS can solve the problem that the conventional SLAM is easy to fail in a rapid movement and high dynamic environment. In addition, the DVS only detects the light intensity change, the data redundancy is low, the acquisition power consumption (20 mW) and the bandwidth (100 kB/s) are small, the data volume input into the SLAM is small, and the power consumption of the SLAM can be obviously reduced.
After the pose estimation information is acquired in the full scene space, the method can be used for environment or scene recognition understanding.
Optionally, in order to improve the accuracy of understanding the strong scene, the accurate positioning can be performed in combination with a high-accuracy map.
Finally, virtual information can be rendered and imaged at the corresponding position in the real environment based on the map information, the position estimation information and the current application requirements.
Referring to fig. 100, fig. 100 is a schematic flow chart of performing SLAM according to an embodiment of the present application. As shown in fig. 100, a SLAM system is deployed on a terminal, which may be, for example, a terminal of a robot, an unmanned aerial vehicle, or an unmanned aerial vehicle, and the terminal acquires input data by operating the SLAM system and performs a series of SLAM procedures based on the input data, thereby completing SLAM. The input data for performing the SLAM procedure may include, but is not limited to, one or more of event images, RGB images, depth images, and IMU data, among others. For example, if an event image sensor (e.g., DVS) and an RGB camera are disposed on the terminal, input data of the SLAM system on the terminal are an event image and an RGB image. For another example, if DVS, RGB camera, and depth camera are deployed on the terminal, input data of the SLAM system on the terminal is an event image, RGB image, and depth image.
Optionally, the event images mentioned in this embodiment may be optimized by the method flows corresponding to fig. 38 to fig. 44, so as to obtain clearer event images, which will not be described in detail below.
Alternatively, in practical applications, a plurality of devices for acquiring different types of input data may be disposed on the terminal, for example, DVS, RGB cameras, and depth cameras are disposed on the terminal, and the terminal may adaptively select data for performing SLAM according to a current scene. For example, in a high-speed motion or illumination abrupt scene, the terminal may select only an event image as data for performing SLAM. That is, the terminal may choose to turn on only the event image sensor and turn off the RGB camera and the depth camera in the scene; alternatively, the SLAM system in the terminal may acquire only data transmitted by the event image sensor, and not data transmitted by the RGB camera and the depth camera, in the process of performing SLAM.
In the process of performing the SLAM procedure by the SLAM system of the terminal, the terminal may perform pose estimation based on the input data and determine whether the input data is a key frame. And under the condition that the input data is a key frame, mapping is performed based on the determined key frame. In addition, the terminal can also continuously perform closed loop detection based on the determined key frame, and perform global optimization to continuously perform SLAM flow when a closed loop is detected.
For ease of understanding, the respective steps of the SLAM procedure performed by the terminal will be described below.
a. Pose estimation
Referring to fig. 101, fig. 101 is a schematic flow chart of a pose estimation method 10100 according to an embodiment of the present application. As shown in fig. 101, the pose estimation method 10100 includes the following steps.
In step 10101, a first event image and a target image are acquired, the target image comprising an RGB image or a depth image.
In this embodiment, the pose estimation method 10100 may be applied to a SLAM scene, and the main body for executing the pose estimation method 10100 may be a terminal (or the electronic device of fig. 1B described above), for example, a robot terminal, an unmanned vehicle terminal, or an unmanned vehicle terminal for executing SLAM.
In the present embodiment, the first event image is generated from motion trajectory information when the target object moves within the monitoring range of the motion sensor. For example, the first event image may be a DVS event map, the terminal may be connected with or preset with a DVS, and the terminal monitors an environment through the DVS and acquires the DVS event map corresponding to the environment. The terminal may be connected with or preset with a DVS, monitor an environment through the DVS, and acquire a DVS event corresponding to the environment, so as to obtain the first event image. The terminal may also be a Red Green Blue (RGB) camera or a depth camera connected or preset for capturing environmental information, through which the terminal may obtain an RGB image or a depth image of the environment, for example, obtain the above-mentioned target image. Wherein the RGB image, also called true color image, uses R, G, B components to identify the color of a pixel, R, G, B respectively represents the 3 different basic colors of red, green and blue, and any color can be synthesized by 3 primary colors.
Since a single event output by a DVS typically carries less information and is susceptible to noise. Therefore, in practical applications, an event image can be formed based on a plurality of consecutive events output by the DVS.
In one possible embodiment, the terminal may acquire N consecutive DVS events output by the DVS, and integrate the N consecutive DVS events into a first event image, N being an integer greater than 1. In practical application, the value of N may be adjusted according to practical situations, for example, the value of N is determined to be a value of 4, 5 or 10 according to sensitivity of DVS and accuracy requirement of SLAM, and the embodiment does not specifically limit the value of N.
In this embodiment, after acquiring the event image and the RGB image, the terminal may perform a time-sequence alignment operation on the event image and the RGB image to obtain the time-sequence aligned event image and RGB image, so as to perform pose estimation based on the time-sequence aligned event image and RGB image. Because the terminal is continuously moved in the process of acquiring the event image and the RGB image, and the event image and the RGB image are acquired based on different devices, in order to ensure that the event image and the RGB image can be used together for performing subsequent pose estimation, a time sequence alignment operation needs to be performed. After the time sequence alignment operation is performed, it can be ensured that the event image and the RGB image which are aligned in time sequence are acquired at the same time or at similar times, that is, it is ensured that the environmental information acquired by the event image and the RGB image is the same.
For example, after acquiring a first event image and a target image, a terminal may determine an acquisition time of the first event image and an acquisition time of the target image. Then, the terminal may determine that the first event image is time-aligned with the target image according to a time difference between the acquisition time of the target image and the acquisition time of the first event image being less than a second threshold. The second threshold may be determined according to the accuracy of SLAM and the frequency of RGB image acquired by the RGB camera, for example, the second threshold may be 5 ms or 10 ms, which is not specifically limited in this embodiment.
Since the first event image is obtained by integrating N consecutive DVS events, the terminal may determine the acquisition time of the first event image according to the acquisition time corresponding to the N consecutive DVS events, that is, determine the acquisition time of the first event image as a time period from when the first DVS event to when the last DVS event is acquired in the N consecutive DVS events. The acquisition time of the target image may refer to a time when the terminal receives the target image from the RGB camera. Since the acquisition time corresponding to the first event image is actually a period of time and the acquisition time of the target image is a moment of time, the terminal can determine the time difference between the acquisition time of the target image and the acquisition time of the first event image based on whether the acquisition time of the target image is within the range of the acquisition time corresponding to the first event image. For example, if the acquisition time of the target image is within the time period range of the acquisition time corresponding to the first event image, it may be determined that the time difference therebetween is 0 (i.e., the instant difference is smaller than the second threshold value); if the acquisition time of the target image is not within the time period of the acquisition time corresponding to the first event image, the time difference may be determined based on the acquisition time of the first DVS event or the acquisition time of the last DVS event of the first event image.
For example, assuming that N is 4, the moments when the terminal acquires 4 consecutive DVS events integrated into the first event image are t1, t2, t3, t4, respectively, the terminal may determine a period of time when the acquisition time of the first event image is t1 to t 4; in addition, the time of the target image acquired by the terminal is t5, and the time t5 is out of the range of the time period t1-t 4. In this way, the terminal can determine the time difference 1 between the time t5 and the time t1 and the time difference 2 between the time t5 and the time t4, and if either one of the time difference 1 or the time difference 2 is smaller than the second threshold value, can determine that the first event image is time-aligned with the target image.
For example, referring to fig. 102, fig. 102 is a schematic diagram for integrating DVS events into event images according to an embodiment of the present application. As shown in fig. 102, each point in the first row in fig. 102 represents each DVS event acquired by the DVS camera. In fig. 102, every 4 DVS events are integrated into an event image, and the integrated event images are, for example, wk, wk+1, wk+2, wk+3, wk+4, wk+5, and wk+6 in fig. 102. It can be seen that since the time interval between every two DVS events is different, the integration time of the event images integrated by the different 4 DVS events is also different. In fig. 102, the vertical dashed lines represent RGB images acquired by the RGB camera, and tk, tk+1, tk+2, tk+3, and tk+4 are the time points when the RGB camera acquires RGB image k, RGB image k+1, RGB image k+2, RGB image k+3, and RGB image k+4, respectively. As can be seen from fig. 102, the time difference between the acquisition time of the event image Wk and the acquisition time of the RGB image k is smaller than the second threshold value, it can be determined that the event image Wk is time-aligned with the RGB image k; the time difference between the acquisition time of the event image wk+1 and the acquisition time of the RGB image k and the RGB image k+1 is greater than the second threshold value, it may be determined that the event image Wk (i.e., the second event image) does not have the time-aligned RGB image.
In step 10102, an integration time of the first event image is determined.
In the case where the first event image is integrated from a plurality of consecutive DVS events, the integration time of the first event image may then be the time interval between the plurality of consecutive DVS events. In brief, the terminal may determine the integration time of the first event image by determining a time interval between a last DVS event and a first DVS event of the plurality of consecutive DVS events.
For example, assuming that N is 4, the moments when the terminal acquires 4 consecutive DVS events integrated into the first event image are t1, t2, t3, t4, respectively, the terminal may determine that the integration time of the first event image is t4-t1.
Step 10103 determining not to perform pose estimation by the target image if the integration time is less than a first threshold.
In this embodiment, when the terminal is in a scene of high-speed movement or abrupt illumination change, since the environmental information changes rapidly, the DVS may capture a large number of events in a short time, that is, output a large number of events in a short time, so that the integration time corresponding to the event images obtained by a fixed number of events is small. In this case, since the environmental information changes rapidly, it is often difficult for the RGB camera to capture effective environmental information, for example, in a case where a scene has a high-speed moving object, it is often difficult for the RGB camera to capture the high-speed moving object in an RGB image, that is, a region corresponding to the high-speed moving object in the RGB image is a blurred region.
Accordingly, when the integration time corresponding to the first event image is less than the first threshold value, the terminal may determine that the RGB image quality is low or even effective information of the environment cannot be captured, and thus may determine that pose estimation is not performed through the target image. The value of the first threshold may be determined according to the accuracy requirement of SLAM, for example, the value of the first threshold may be 5 ms or 7 ms, which is not specifically limited in this embodiment. That is, when the terminal calculates the pose at the acquisition time corresponding to the first event image, the terminal performs pose estimation based on only the first event image even if the terminal can determine that the first event image has a time-aligned target image. Thus, the situation that the pose estimation effect is poor due to the fact that the quality of the target image is low when the pose estimation is jointly performed by adopting the first event image and the target image can be effectively avoided.
In one possible embodiment, the terminal may determine that pose estimation is not performed by RGB images corresponding to the first event image after obtaining the first event image and determining that the integration time of the first event image is less than the first threshold value, in addition to determining that pose estimation is not performed by the target image based on the integration of the first event image being less than the first threshold value after determining that the target image of the first event image is time-aligned. That is, the terminal may perform the pose estimation operation directly through the first event image without performing the timing alignment operation on the first event image, thereby saving the resource overhead of performing the timing alignment operation.
Step 10104, performing pose estimation according to the first event image.
After determining that pose estimation is not performed by the target image, the terminal may perform pose estimation according to the first event image, so as to calculate and obtain the pose of the terminal corresponding to the moment of acquiring the first event image.
In this embodiment, when the terminal determines that the current scene is in a scene where the RGB camera is difficult to collect effective environmental information based on the integration time of the event image is less than the threshold, the terminal determines not to perform pose estimation through the RGB image with poor quality, so as to improve the precision of pose estimation.
The above describes in detail the process of the terminal performing pose estimation when the terminal is in a scene of high-speed motion or abrupt illumination change; the process of the terminal performing pose estimation when the terminal is in a static or low-speed moving scene will be described below.
In one possible embodiment, in the case that the input signal acquired by the terminal includes DVS event, RGB image and IMU data, the terminal integrates the DVS event to obtain an event image, and performs time alignment on the event image, RGB image and IMU data to perform pose estimation according to the time alignment result.
Specifically, the process of the terminal performing the timing alignment operation is as follows:
When the terminal integrates the event image based on N continuous DVS events, the terminal determines whether the time difference between the acquisition time of the RGB image and the IMU data adjacent to the event image and the acquisition time of the event image is smaller than a second threshold value so as to determine whether the event image has the RGB image and/or the IMU data aligned in time sequence.
Since the frequency of IMU acquisition data is much greater than the frequency of RGB camera acquisition images, it can be considered that IMU data aligned with RGB image timing exists at any time. Thus, when the terminal acquires an RGB image, the terminal may determine whether a time difference between the acquisition time of the RGB image and the acquisition time of an adjacent event image is less than a second threshold value to determine whether the RGB image has a time-aligned event image.
Based on the above process of performing the timing alignment operation by the terminal, the terminal may obtain a plurality of possible signal combinations for the timing alignment after performing the timing alignment operation. Specifically, the various possible time aligned signal combinations are shown below.
1. Event images, RGB images, and IMU data.
After the terminal acquires the event image, the terminal determines the acquisition time of the RGB image adjacent to the event image. When the time difference between the acquisition time of the RGB image and the acquisition time of the event image is smaller than a second threshold value, the terminal determines that the event image and the RGB image are aligned in time sequence. Since the frequency of IMU acquisition data is much greater than the frequency of RGB camera acquisition images, it can be considered that IMU data aligned with RGB image timing exists at any time. Thus, after determining that the event image and the RGB image are time-aligned, IMU data aligned with the event image and the RGB image may be acquired.
In this case, the terminal may perform pose estimation from the event image, the RGB image, and the IMU data.
2. RGB image and IMU data.
After the acquisition terminal acquires the RGB image, the terminal determines whether an event image aligned with the RGB image in time sequence exists, that is, whether a time difference between the acquisition time of the RGB image and the acquisition time of an adjacent event image is smaller than a second threshold. If the time difference between the acquisition time of the RGB image and the acquisition time of the adjacent event image is not less than the second threshold, the terminal determines that the RGB image does not have the event image aligned in time sequence, i.e. the RGB image has only the IMU data aligned in time sequence.
In this case, the terminal may perform pose estimation from the RGB image and IMU data.
3. Event images and IMU data.
After the terminal acquires the event image, the terminal determines the acquisition time of the RGB image adjacent to the event image. When the time difference between the event image and the acquisition time of the RGB image adjacent to the event image is not less than the second threshold value, the terminal determines that the event image is not an RGB image aligned with its timing. After determining that the event image is not an RGB image aligned with its timing, the terminal may continue to determine whether IMU data is present that is aligned with the event image timing. Specifically, the terminal determines the acquisition time of IMU data adjacent to the event image. If the time difference between the acquisition time of the IMU data adjacent to the event map and the acquisition time of the adjacent event image is less than a third threshold, the terminal determines that the event image is time-aligned with the IMU data.
In this case, the terminal may perform pose estimation from the event image and IMU data.
4. Event images.
Similarly, after the terminal acquires the event image, the terminal determines the acquisition time of the RGB image adjacent to the event image. When the time difference between the event image and the acquisition time of the RGB image adjacent to the event image is not less than the second threshold value, the terminal determines that the event image is not an RGB image aligned with its timing. After determining that the event image is not an RGB image aligned with its timing, the terminal may continue to determine whether IMU data is present that is aligned with the event image timing. Specifically, the terminal determines the acquisition time of IMU data adjacent to the event image. If the time difference between the acquisition time of IMU data adjacent to the event map and the acquisition time of an adjacent event image is not less than a third threshold, the terminal determines that the event image also does not have IMU data aligned with its timing.
In this case, the terminal may perform pose estimation from the event image.
The terminal acquires a second event image, which is an image representing a motion trajectory of the target object when the target object moves within a detection range of the motion sensor, for example. The time period corresponding to the second event image and the first event image is different, that is, the time period when the motion sensor detects that the first event image is obtained is different from the time period when the motion sensor detects that the second event image is obtained. And if the target image aligned with the second event image in time sequence does not exist, determining that the second event image does not have the target image for jointly executing pose estimation. Thus, the terminal performs pose estimation from the second event image, e.g. the terminal performs pose estimation from the second event image only or the terminal performs pose estimation from the second event image and IMU data.
In one possible embodiment, where a sliding window based visual inertial pose estimation is employed, the process of pose estimation is essentially a joint optimization process of the cost function. Based on the above-described various possible time-aligned signal combinations, a cost function for different signal combinations can be obtained.
Illustratively, for time-aligned signal combinations into event image, RGB image, and IMU data, the cost function includes three terms, respectively: weighted reprojection errors of event cameras, weighted reprojection errors of RGB cameras, and inertial error terms. Specifically, the cost function is shown in equation 35.
Wherein J represents a cost function; i represents a camera index, i represents an event camera when i is 0, and represents an RGB camera when i is 1; k represents a frame index; j represents a landmark index;a landmark index indicating that sensor i remains in the kth frame;an information matrix representing the landmark measurements;an information matrix representing the kth IMU error; e, e s Representing an inertial error term.
Optionally, when the integration time of the event image is smaller than the first threshold, that is, when the terminal is in a scene of high-speed motion or abrupt illumination, the cost function is not obtained through the RGB image, so as to improve the pose estimation precision. That is, the cost function includes two terms, namely, weighted reprojection error and inertial error terms of the event camera. In brief, the modification may be performed on the basis of the formula 35, so that the value range of i is 0, thereby obtaining a formula corresponding to the cost function.
For the combination of the time sequence aligned signals into RGB image and IMU data, the terminal only obtains the cost function through the RGB image and the IMU data. That is, the cost function includes two terms, namely, weighted reprojection error and inertial error terms of the RGB camera. In brief, the modification may be performed on the basis of the formula 35, so that the value range of i is 1, thereby obtaining a formula corresponding to the cost function.
And combining the time sequence aligned signals into event images and IMU data, and solving a cost function by the terminal only through the event images and the IMU data. That is, the cost function includes two terms, namely, weighted reprojection error and inertial error terms of the event camera.
For the time sequence aligned signal combination to be event image, the terminal only obtains the cost function through the event image. That is, only one term is included in the cost function, which is the weighted re-projection error of the event camera.
In the pose estimation process, the pose estimation is often a recursive process, namely, the pose of the current frame is calculated by the pose of the previous frame. Thus, errors in the pose estimation process can be transferred frame by frame to form accumulated errors. Therefore, in the process of performing pose estimation, the terminal can also perform loop detection so as to reduce the accumulated error of the pose estimation and improve the precision of the pose estimation. The loop detection, also called closed loop detection, refers to a process of identifying a certain scene that the terminal arrives at, so that the map is closed. By loop detection, it can be determined whether the terminal has returned to the previously passed position. If the loop is detected, the information is transmitted to the back end for optimization processing so as to eliminate accumulated errors.
In one possible embodiment, the terminal performs loop detection according to the first event image and a dictionary constructed based on the event image.
Alternatively, before performing loop-back detection, the terminal may construct a dictionary in advance based on the event image so that loop-back detection can be performed based on the dictionary in the course of performing loop-back detection. Specifically, the process of constructing the dictionary by the terminal includes: the terminal acquires a plurality of event images, wherein the event images are event images used for training, and the event images can be event images shot by the terminal under different scenes. The terminal acquires visual features of the plurality of event images through a feature extraction algorithm, wherein the visual features can comprise features such as textures, patterns or gray statistics of the images. Event images captured in different scenes have different visual features, and thus individual elements in the scene can be represented by the visual features. After the visual features of a plurality of event images are obtained, the terminal clusters the visual features through a clustering algorithm to obtain clustered visual features, wherein the clustered visual features have corresponding descriptors. By clustering visual features, similar visual features can be grouped into a class to facilitate subsequent execution of matching of visual features. And constructing the dictionary according to the clustered visual characteristics.
Illustratively, after the terminal extracts visual features in the plurality of event images, a descriptor corresponding to each visual feature, for example, an ORB descriptor or a BRIEF descriptor, may be obtained, where the descriptor is used to represent the visual feature. Then, a plurality of visual features are classified into K clusters (clusters) through a Hierarchical K-means (Hierarchical K-means) clustering algorithm or a K-means++ clustering algorithm, and each cluster is described through a centroid (centroid) of the cluster, so that a descriptor of each cluster is obtained. Wherein the quality of the visual feature clusters can be generally expressed by the sum of squares error (Sum of Squared Error, SSE) of the same cluster, and the smaller the SSE is, the closer the data points of the same cluster are to the centroid of the data points, and the better the clustering effect is. The "proximity" here may be implemented using distance measurement methods, which may also have an impact on the clustering effect.
In the process of constructing the dictionary, all N clustered descriptors can be dispersed on leaf nodes of a k-branch and d-depth k-ary tree, so that k is obtained d Tree structure of individual leaf nodes. In practical applications, the values of k and d may be adjusted according to the scene size, and the effect to be achieved. Thus, in retrieving the visual features of the event image, the visual features can be extracted by logarithmic time complexity (dorder = log k N) find their corresponding cluster centers, but require the use of a complex brute force search.
After obtaining the dictionary, the terminal performs loop detection according to the first event image and the dictionary, which specifically may include: the terminal determines a descriptor of the first event image, for example, the terminal extracts visual features in the first event image through a feature extraction algorithm, and determines a descriptor of the extracted visual features. The terminal determines visual features corresponding to the descriptors of the first event image in the dictionary, e.g., the terminal retrieves visual features matching the descriptors of the first event image in a k-ary tree of the dictionary. And the terminal determines a bag-of-word vector (BoW vector) corresponding to the first event image based on the visual features, and determines the similarity between the bag-of-word vector corresponding to the first event image and the bag-of-word vectors of other event images so as to determine the event image matched with the first event image.
In short, a dictionary built by a terminal based on event images can be considered as a set of all visual features in the entire scene. The terminal determines the corresponding visual feature in the dictionary based on the current event image as: the terminal searches the dictionary for visual features included in the current event image. Based on the visual features included in the current event image, a bag of words vector may be constructed, for example, in which the visual features included in the current event image are represented as 1 and the visual features not included in the current event image are represented as 0. Finally, by comparing the similarity of the bag of words vectors between different event images, it can be determined whether there is a match between the two event images. If the similarity of the bag-of-words vectors between the two event images is greater than or equal to a preset threshold value, the two event images can be determined to be matched; if the similarity of the bag of words vectors between the two event images is less than a preset threshold, it may be determined that the two event images do not match.
b. Key frame
A keyframe is an image frame in a video or image set that can represent the key information of the video or image set, and typically there is a large change in the two pieces of content that the keyframe connects. Along with the increasing of video data volume, key frames are widely applied in the fields of video analysis, video coding, security monitoring, robot vision and the like.
Key frame selection is an essential part of video analysis by which appropriate video summaries for video indexing, browsing, retrieving, and detecting, etc., can be provided. The use of the key frames reduces the redundancy of the video data stream in content and time, not only effectively reflects video content, but also remarkably reduces video analysis time.
In the video coding process, a coding key frame needs to be dynamically added to ensure that the image quality and the coding bit rate meet the requirements. In general, key frames need to be inserted when there is a significant change in the image content.
Video surveillance is widely used in many fields as an effective means of security protection, and since all-weather surveillance generates a large amount of video data, but only a small portion of the video data is of interest to the user, the key frame extraction technique can screen out key frame sequences of interest to the user.
With the increasing explosion of robots, unmanned aerial vehicles, unmanned vehicles, AR/VR (AR/VR) and other devices in recent years, visual synchronous positioning and mapping (visual simultaneous localization and mapping, VSLAM) are also known as one of the key technologies in the field. The VSLAM refers to a process of constructing an environment map while calculating the position of a moving object according to information of a sensor, and can be used for realizing the positioning of the moving object, and subsequent path planning and scene understanding. In VSLAM, key frames are usually used for mapping and positioning, so that the problems of reduced system instantaneity, increased calculation cost and overlarge system memory consumption caused by frame-by-frame insertion can be avoided.
In the fields of video analysis, video coding, security monitoring, robot vision and the like, DVS may be used to obtain a corresponding event image, and a required key frame may be selected from the obtained plurality of event images.
In the related art, the key frame selection method for the event image is to perform complex calculations such as feature extraction and pose estimation on all event images, and then determine whether the event image is a key frame. The calculation amount of the scheme is large because complex calculation is required for all event images.
Referring to fig. 103, fig. 103 is a flowchart of a key frame selecting method 10300 according to an embodiment of the disclosure. As shown in fig. 103, the key frame selection method 10300 includes the following steps.
In step 10301, an event image is acquired.
In this embodiment, the key frame selection method 10300 may be applied to video analysis, video encoding and decoding, security monitoring, or other scenarios, and the main body performing the key frame selection method 10300 may be a terminal or a server, for example, a server for performing video analysis, a terminal or a server for performing video encoding and decoding, or a terminal for performing monitoring. For convenience of description, the key frame selection method 10300 provided in the embodiments of the present application will be described below by taking a terminal as an execution body.
In this embodiment, the terminal may be connected to or preset with a DVS, and monitor the target environment through the DVS, and acquire an event image corresponding to the target environment.
Step 10302, determining first information of the event image.
Wherein the first information may include an event and/or feature in the event image, and the terminal may determine the first information by detecting the event and/or feature in the event image.
In one possible example, if a pixel in the event image represents a trend of change in light intensity, the event image is a binary image, and the terminal may determine a pixel having a pixel value other than 0 as an event in the event image. That is, the number of pixels having a value other than 0 is the number of events in the event image.
In another possible example, the event image is a gray scale image if the pixels in the event image represent absolute light intensity. In this way, the terminal may determine that the pixels in the event image with pixel values exceeding a certain threshold are events in the event image, i.e. the number of pixels with pixel values exceeding a certain threshold is the number of events in the event image. Alternatively, the terminal may determine that the pixel whose absolute value exceeds a certain threshold value is an event in the event image by subtracting the pixel value in the current event image from the corresponding pixel value in the event image adjacent to the previous time and taking the absolute value.
For features in the event image, the terminal may extract features in the event image through a feature extraction algorithm. The feature extraction algorithm may include, but is not limited to, algorithms for accelerating segmentation test features (Features from Accelerated Segment Test, FAST), (Oriented FAST and Rotated BRIEF, ORB), accelerating robustness features (Speeded Up Robust Features, SURF), scale-invariance feature transformation (SIFT), and the like. After extracting the features in the event image, the terminal can determine the feature quantity of the event image by counting the extracted features.
Step 10303, if it is determined that the event image satisfies the first condition based on the first information, determining that the event image is a key frame.
In one possible example, the first condition may include one or more of a number of events greater than a first threshold, a number of event active areas greater than a second threshold, a number of features greater than a third threshold, and a number of feature active areas greater than a fourth threshold. That is, the terminal may determine that the event image is a key frame when it is determined that the event image satisfies one or more of the number of events being greater than the first threshold, the number of event effective areas being greater than the second threshold, the number of features being greater than the third threshold, and the number of feature effective areas being greater than the fourth threshold based on the first information.
In case the first condition includes that the number of event effective areas is greater than the second threshold, the terminal may divide the event image into a plurality of areas and determine the number of events in each area. When the number of events of a region is greater than a certain threshold, the terminal may determine that the region is an effective region. In this way, the terminal can determine whether the event image satisfies the first condition by counting whether the number of effective areas is greater than the second threshold. The threshold corresponding to the number of events in the area may be determined according to the division manner of the area, and the embodiment does not specifically limit the threshold.
There are various ways in which the terminal divides the event image into a plurality of areas. In one possible implementation, the terminal may divide the event image into a plurality of regions equally, for example equally dividing the event image into 1030 regions, each of which has the same area. In another possible implementation, the terminal may divide the event image into a plurality of regions unevenly, for example, the region divided at the middle position of the event image has a smaller area, and the region divided at the edge position of the event image has a larger area. For example, referring to fig. 104, fig. 104 is a schematic view of region division of an event image according to an embodiment of the present application. As shown in (a) of fig. 104, the event image is uniformly divided into 1030 areas, each of which has the same area. As shown in (b) in fig. 104, the event image is unevenly divided into a plurality of areas, and the area size of the area at the edge position of the event image is 4 times the area size of the area at the intermediate position.
In the case where the first condition includes that the number of feature effective regions is greater than the fourth threshold, the terminal may divide the event image into a plurality of regions and determine the number of features in each region. When the feature quantity of the region is greater than a certain threshold, the terminal can determine that the region is a feature effective region. In this way, the terminal can determine whether the event image satisfies the first condition by counting whether the number of feature effective areas is greater than the fourth threshold. The threshold corresponding to the number of features in the region may be determined according to a division manner of the region, and the embodiment is not specifically limited to the threshold.
In addition, the manner in which the terminal divides the event image into a plurality of areas is similar to the manner in which the event effective area is determined, and specific reference may be made to the above description, which is not repeated here.
In one possible embodiment, the terminal may acquire a plurality of event images generated by DVS and aligned in time sequence, where the above-mentioned key frame selection is performed by optionally 1 frame, and if the above-mentioned first condition is satisfied, the current plurality of event images are determined as key frames.
In one possible embodiment, the event image may also have time-aligned depth images. For example, in the case of a depth camera configured, a depth image aligned with the event image timing may be generated by the depth camera. In this case, if the event image satisfies the first condition, both the event image and the depth image time-aligned with the event image may be determined as the key frame.
In one possible embodiment, the event image may also have a corresponding RGB image that is time-aligned with the event image. That is, the terminal can acquire the event image and the RGB image aligned in time series. In this case, the terminal may acquire the feature number and/or the feature effective area corresponding to the RGB image. Whether the RGB image satisfies a specific condition is determined by determining whether the number of features corresponding to the RGB image is greater than a specific threshold and/or whether the feature effective area corresponding to the RGB image is greater than a specific threshold. In this way, the terminal can determine whether to determine the RGB image and the event image as a key frame by determining whether the RGB image satisfies a specific condition and/or whether the event image satisfies a first condition. The specific threshold value corresponding to the number of features of the RGB image and the specific threshold value corresponding to the feature effective region of the RGB image may be different threshold values.
For example, when the terminal determines that the number of features corresponding to the RGB image is greater than a specific threshold or the features corresponding to the RGB image is greater than a specific threshold, the terminal may determine the RGB image and the corresponding event image as key frames. When the terminal determines that the event image satisfies the first condition, the terminal may also determine the event image and the corresponding RGB image as key frames.
In this embodiment, whether the current event image is a key frame is determined by determining information such as the number of events, the event distribution, the feature number and/or the feature distribution in the event image, so that the key frame can be quickly selected, the algorithm amount is small, and the quick selection of the key frame in the scenes such as video analysis, video encoding and decoding or security monitoring can be satisfied.
Referring to fig. 105, fig. 105 is a flowchart of a key frame selection method 10500 according to an embodiment of the present application. As shown in fig. 105, the key frame selection method 10500 includes the following steps.
In step 10501, an event image is acquired.
Step 10502, determining first information for the event image, the first information including events and/or features in the event image.
In this embodiment, the key frame selection method 10500 may be applied to a VSLAM scene, and the main body performing the key frame selection method 10500 may be a terminal, such as a robot terminal, an unmanned vehicle terminal, or an unmanned vehicle terminal.
Step 10501 and step 10502 are similar to step 10301 and step 10302 described above, and reference may be made to step 10301 and step 10302 described above, which are not repeated here.
In step 10503, if it is determined, based on the first information, that the event image meets a first condition, second information of the event image is determined, where the second information includes motion features and/or pose features in the event image, and the first condition is related to the number of events and/or the number of features.
In this embodiment, determining that the event image satisfies the first condition based on the first information is similar to the step 10303, and the specific reference may be made to the step 10303, which is not repeated herein.
The second information may include motion features and/or pose features in the event image, and the terminal may determine the first information by detecting the event and/or features in the event image.
In one possible embodiment, the manner in which the terminal determines the second information may be by employing a polar constraint method. The epipolar constraint method includes the following steps.
The terminal initializes the three-dimensional pose of the first keyframe (i.e., the first event image determined to be a keyframe) to the origin of the coordinate system.
And the terminal determines the characteristics of the current event image, and matches the characteristics of the event image with the characteristics of the last key frame to obtain a matching point pair. The method of matching the features of the event image with the features of the previous key frame by the terminal includes, but is not limited to, a method of violent search, i.e. traversing the features in the event image, and determining whether each feature has a matched feature in the previous key frame one by one.
The terminal selects as many sub-sample sets as possible, which fit the 6-degree-of-freedom relative motion model, from the above-mentioned matching point pairs by means of a random sample consensus algorithm (Random sample consensus, RANSAC). When the number of the matching point pairs conforming to the relative motion model is larger than a preset threshold value, a least square method is used for the found matching point pairs to calculate a three-dimensional space relative motion matrix between the current moment event image and the key frame image. And the terminal can calculate the motion change condition of the current event image relative to the last key frame, namely the motion characteristic and the pose characteristic according to the calculated three-dimensional space relative motion matrix.
Step 10504, if it is determined based on the second information that the event image satisfies a second condition, determining that the event image is a key frame, where the second condition is related to a motion variation amount and/or a pose variation amount.
Wherein the second condition may include: one or more of a distance of the current event image from the last key frame exceeding a preset distance value (which may be, for example, 10 mm), a rotation angle of the current event image from the last key frame exceeding a preset angle value (which may be, for example, 10 °), a distance of the current event image from the last key frame exceeding a preset distance value and a rotation angle exceeding a preset angle value, and a distance of the current event image from the last key frame exceeding a preset distance value or a rotation angle exceeding a preset angle value.
That is, the terminal may determine whether the event image satisfies the second condition based on one or more of the motion variation amount and the pose variation amount in the event image, thereby determining whether the event image can be used as a key frame.
In this embodiment, the coarse screening condition is set, the images are coarsely screened with smaller calculation amount, the images meeting the coarse screening condition are added into the key frame sequence, and then the images in the key frame sequence are screened based on the fine screening condition, so that the key frame is selected. Because the images which partially do not meet the conditions are screened in advance through smaller calculation amount, and then the key frame selection is carried out in the images which meet the conditions, calculation with larger calculation amount is not needed to be carried out on all the images, and the calculation amount can be reduced.
In a possible embodiment, in the case that the input acquired by the terminal is a multi-path event image stream, the terminal may select any 1 path event image to calculate motion features and pose features according to the above method; the terminal can also select any 2 paths or multiple paths of event images, and calculate the motion characteristics and the pose characteristics in a binocular or multi-purpose mode.
In one possible embodiment, the event image may also have synchronized depth images. For example, in the case where a depth camera is configured, a depth image synchronized with an event image may be generated by the depth camera. In this case, if the event image satisfies the first condition and the second condition, the terminal may determine the event image and the corresponding depth image as a key frame. Further, in step 10503, in the case of obtaining the depth image corresponding to the event image, the terminal can obtain the corresponding depth information, so the terminal may also calculate the motion feature and the pose feature through a (selective-N-Points, pnP) algorithm or an iterative closest point (Iterative Closest Point, ICP) algorithm.
In one possible embodiment, the event image may also have a synchronized RGB image that is time aligned with the event image. That is, the terminal can acquire the event image and the RGB image aligned in time series. In this case, the terminal may determine whether to determine the event image and its corresponding RGB image as the key frame by judging whether the event image satisfies the second condition and/or whether the RGB image satisfies the second condition after determining that the event image satisfies the first condition based on the first information. For example, when determining that the RGB image satisfies the second condition, the terminal may determine the event image and the RGB image corresponding thereto as a key frame; the terminal may determine that the event image and the RGB image corresponding to the event image are key frames when determining that the event image and the RGB image satisfy the second condition. In the process of determining whether the RGB image meets the second condition, the terminal may determine the corresponding motion feature and pose feature based on the RGB image, and the process of determining the motion feature and pose feature corresponding to the RGB image by the terminal is similar to the process of determining the motion feature and pose feature corresponding to the event image, which may be specifically referred to the description of step 10503 and will not be repeated herein.
In one possible embodiment, in the case that the event image has a synchronized RGB image, in some scenes, such as a 3D reconstructed scene that requires the generation of a high quality texture map, the requirements for sharpness and brightness consistency of the RGB image are high, so that the choice of a key frame may also take into account the sharpness and brightness consistency of the RGB image at this time.
Specifically, the terminal may determine whether to determine the event image and the RGB image corresponding thereto as the key frame by determining whether the event image satisfies the second condition, whether the sharpness of the event image or the RGB image is greater than a sharpness threshold, and/or whether the brightness consistency index of the event image or the RGB image is greater than a preset index threshold.
For example, the terminal may determine that the event image and the RGB image corresponding thereto are key frames according to the event image satisfying the second condition; the terminal can also determine that the event image and the corresponding RGB image are key frames according to the event image or the RGB image with the definition larger than the definition threshold; the terminal may further determine that the event image or the RGB image corresponding to the event image is a key frame according to whether the brightness consistency index of the event image or the RGB image is greater than a preset index threshold. In addition, the terminal may determine that the event image and the RGB image corresponding to the event image are key frames according to the event image satisfying the second condition and the definition of the event image or the RGB image being greater than the definition threshold, or the definition of the RGB image being greater than the definition threshold and the brightness consistency index of the event image or the RGB image being greater than the preset index threshold.
Specifically, the method of determining the sharpness of the RGB image by the terminal may include, but is not limited to, a Brenner gradient method, a tenngrad gradient method, a Laplacian gradient method, a variance method, and the like. Taking the Brenner gradient method as an example, the terminal can calculate the square of the gray difference between two adjacent pixels, and the function is defined as follows:
D(f)=∑ yx |f(x+2,y)-f(x,y)| 2
wherein f (x, y) represents the gray value of the corresponding pixel point (x, y) of the image f, and D (f) is the image definition calculation result.
As can be seen from the above functions, in the process of calculating the definition of the RGB image, all pixels in the RGB image participate in calculation, and the calculated amount is large.
In this embodiment, the method for determining the sharpness of the event image by the terminal may also include, but is not limited to, the Brenner gradient method, tenngrad gradient method, laplacian gradient method, variance method, and the like described above. The terminal calculates the sharpness based on the event image, normalizes the calculation result by dividing the number of pixels involved in the calculation, and uses the normalized result as a final sharpness calculation result. In this way, since only the pixels of the event response participate in the above-described calculation in the process of calculating the sharpness of the event image, the calculation amount of the terminal can be reduced as much as possible.
Methods for the terminal to calculate the brightness uniformity index of the RGB image include, but are not limited to, the following methods:
1. the average brightness of the current RGB image is calculated, namely, the brightness values of all pixels of the RGB image are summed, and then divided by the number of pixels to obtain the average brightness value of the RGB image. Similarly, the average luminance of neighboring RGB image key frames is calculated based on the foregoing manner. Finally, calculating the absolute value of the difference between the average brightness of the current RGB image and the average brightness of the adjacent key frames, and taking the absolute value as the brightness consistency index of the RGB image.
2. The current RGB image and the neighboring RGB image keyframes are differenced pixel by pixel (i.e., the luminance difference between each set of corresponding pixels is calculated), and the absolute value of the difference is calculated. And then, carrying out summation operation on absolute values corresponding to each group of pixels, and finally dividing the obtained summation result by the number of pixels to obtain a normalization result, wherein the normalization result can be used as a brightness consistency index.
The method for calculating the brightness uniformity index of the RGB image by the terminal can be known that when the brightness uniformity index is calculated based on the RGB image, all pixels in the RGB image participate in calculation, and the calculated amount is large. In this embodiment, the terminal may calculate the brightness uniformity index based on the event image, so that only the pixels of the event response participate in the calculation, and the calculation amount of the terminal can be reduced as much as possible. Illustratively, the method for the terminal to calculate the brightness uniformity index of the event image is as follows:
1. If the pixels in the event image represent the polarity of the light intensity variation, the terminal may first calculate the absolute value of the difference between the number of events of the current event image and the number of events of the key frame of the adjacent event image, and then divide the absolute value by the number of pixels of the event image as the brightness uniformity index.
2. If the pixels in the event image represent light intensity, the current event image and the neighboring event image key frames are differenced pixel by pixel (i.e., the luminance difference between each set of corresponding pixels is calculated), and the absolute value of the difference is calculated. And then, carrying out summation operation on absolute values corresponding to each group of pixels, and finally dividing the obtained summation result by the number of pixels to obtain a normalization result, wherein the normalization result can be used as a brightness consistency index.
In one possible embodiment, the event image may have synchronized depth images in addition to the corresponding RGB images. For example, in the case where a depth camera is configured, a depth image synchronized with an event image may be generated by the depth camera. In this case, if the event image satisfies the first condition and the second condition, the terminal may determine the event image, the RGB image, and the corresponding depth image as key frames. In addition, in the case of obtaining the depth image corresponding to the event image, the terminal can obtain the corresponding depth information, so the terminal can calculate the motion feature and the pose feature through the PnP algorithm or the ICP algorithm.
c. Dynamic SLAM
For a moving object adopting the SLAM technology, in order to realize autonomous movement, the moving object needs to have the functions of sensing the environment and accurately estimating the pose of the moving object. In the related art, a moving object collects an environment image through a camera in the moving process, and feature point extraction and inter-frame matching are performed on the environment image, so that observation data of key point coordinate change can be obtained. Then, a functional relation between the observation information and the pose is established, and an extremum of the function is solved by using an optimization method, so that estimated pose information can be obtained finally.
At present, algorithms for realizing pose estimation in the related art are all suitable for static scenes, namely, no dynamic object exists in the scenes, and the algorithms are generally difficult to realize accurate pose estimation in dynamic scenes.
In view of this, the embodiments of the present application provide a pose estimation method that captures a dynamic region in a scene through an event image and performs pose determination based on the dynamic region, thereby enabling ready determination of pose information.
Referring to fig. 106, fig. 106 is a schematic flow chart of a pose estimation method 1060 according to an embodiment of the present application. As shown in fig. 106, the pose estimation method 1060 includes the following steps.
Step 10601, acquiring an event image and an image corresponding to the event image, where the event image and the environmental information captured by the image are the same.
In this embodiment, the pose estimation method 1060 may be applied to a SLAM scene, and the subject performing the pose estimation method 1060 may be a terminal, for example, a robot terminal, an unmanned vehicle terminal, or an unmanned vehicle terminal for performing SLAM.
In this embodiment, the event image is generated from the motion trajectory information when the target object moves within the monitoring range of the motion sensor. For example, the event image may be a DVS event map, the terminal may be connected with or preset with a DVS, and the terminal monitors an environment through the DVS and acquires the DVS event map corresponding to the environment. The terminal may also be a camera connected or preset for capturing environmental information, such as a depth camera or an RGB camera, through which the terminal may acquire a corresponding environmental image, such as a depth image of the environment through the depth camera or a Red Green Blue (RGB) image of the environment through the RGB camera. RGB images, also called true color images, use R, G, B components to identify the color of a pixel, R, G, B represents the 3 different base colors red, green, and blue, respectively, and any color can be synthesized from the 3 primary colors.
In one possible embodiment, after the terminal acquires the event image and the target image, the terminal may align the event image and the target image, thereby obtaining the target image corresponding to the event image. The terminal may illustratively align the event image with the target image in the time domain by matching the time domain nearest neighbor signals and the calibration. That is, the aligned event image and target image may be regarded as capturing the environmental information in the same scene at the same timing.
Step 10602, determining a first motion region in the event image.
It will be appreciated that since the DVS captures only dynamically changing portions of the scene and the DVS responds strongly to dynamically changing object edges, the terminal can determine the regions of motion in the event image, i.e., the regions where dynamic changes have occurred, based on the response of the event image.
During the acquisition of the event images, the DVS may be stationary or moving. Under the condition that the DVS is static, an event in the event image acquired by the DVS is an object moving in the current scene. In the case of DVS motion, both the stationary object and the moving object in the current scene are moved relative to the DVS, so the event in the event image acquired by the DVS may include the stationary object and the moving object in the current scene. That is, the manner in which the terminal determines the motion region in the event image may be different for event images acquired by the DVS in different motion states.
In one possible embodiment, when the DVS collects an event image in a static state, the terminal may perform binarization processing on the event image, that is, a pixel point having an event response in the event image is set to 1, and a pixel point having no event response in the event image is set to 0, so as to obtain a binary image corresponding to the event image. Then, the terminal detects the contour in the binary image, and if the area surrounded by the contour is larger than a set threshold value, the area surrounded by the contour can be determined as a motion area. The set threshold may be, for example, 10 pixels or 106 pixels, that is, if more than 10 pixels or 106 pixels in the area surrounded by the contour, the area surrounded by the contour may be determined to be a motion area.
It should be understood that there may be some noise in the event image acquired by the terminal, that is, the area in the scene where no motion occurs may also have corresponding pixels in the event image. In this way, by determining the motion region in the event image by presetting the threshold value, noise in the event image can be removed as much as possible, thereby avoiding the determination of the noise region in the event image as the motion region.
In another possible embodiment, in the case that the DVS acquires the event image in a motion state, the terminal may acquire the current event image (i.e., the event image acquired at the current time) and the previous frame event image (i.e., the event image acquired at the previous time), and calculate the optical flows of the current event image and the previous frame event image. Where optical flow is a 2D vector field representing the displacement vector of the relative position of each pixel between adjacent frames. After calculating the optical flow, the terminal may traverse the displacement vector of each pixel of the current event image, if the displacement direction of the current pixel is inconsistent with surrounding pixels, or if the displacement direction of the current pixel is consistent with surrounding pixels but the difference value of the displacement magnitudes is greater than a preset threshold, mark the pixel as belonging to a motion area (for example, mark the pixel as 1 to indicate that the pixel belongs to the motion area), otherwise mark the pixel as belonging to a stationary area (for example, mark the pixel as 0 to indicate that the pixel belongs to the stationary area), thereby obtaining a marked image (that is, obtain a binary image marked with 1 and 0). The preset threshold may be, for example, 5 pixels or 10 pixels, that is, when the difference between the displacement size of a certain pixel and the displacement size of pixels around the certain pixel is greater than 5 pixels or 10 pixels, the certain pixel may be marked as a motion area. After obtaining the marked image, the terminal may detect pixels in the image marked as belonging to the moving region, thereby obtaining a contour constituted by the pixels, and the terminal may determine that a region surrounded by the contour is a first moving region and a region other than the first moving region is a stationary region.
Step 10603, determining a corresponding second motion region in the image according to the first motion region.
For the image, each pixel in the image has a corresponding pixel in the event image, and therefore, the terminal can determine a second motion region in the image corresponding to the first motion region based on the first motion region in the event image. The second motion area is the same as the environment information corresponding to the first motion area. For example, the event image may be an event image captured by the DVS in an indoor scene, and there is a moving pedestrian in the indoor scene, that is, a first moving area in the event image is an area where the pedestrian is located, and a second moving area in the image corresponding to the event image is also an area where the pedestrian is located.
The terminal illustratively retains pixels in the image corresponding to a first motion region in the event image, and the resulting region after other pixels are removed is then a second motion region in the image.
Step 10604, estimating the pose according to the second motion area in the image.
In this embodiment, the camera may be stationary or moving during the process of capturing the image. That is, the image acquired by the terminal may be an image acquired by the camera in a stationary state or an image acquired by the camera in a moving state. The manner in which the terminal determines pose based on the images may also be different for images acquired by the camera in different states.
In the first mode, the image is an image acquired by the camera in a stationary state.
When the camera is in a stationary state, the camera captures a plurality of different images in the same scene. Thus, for an object stationary in the scene, the pose of the camera relative to the object is unchanged, i.e. the position and pose of the camera relative to the object are unchanged. However, for an object moving in the scene, the pose of the camera relative to the object changes, i.e. the position or pose of the camera changes. For example, when the camera collects environmental information in an indoor scene, the pose of the camera is unchanged relative to a stationary pillar in the room in the case that the camera is in a stationary state; the pose of the camera changes relative to the person moving in the room.
In this way, since the pose of the camera with respect to the stationary object is unchanged, the terminal can determine the pose of the camera based on the moving object. That is, the terminal may determine the pose of the camera based on the second motion region in the acquired image without determining the pose of the camera based on the stationary region in the image.
Specifically, in the case that the image is an RGB image, the terminal may extract the feature point of the second motion region in the image and match the feature point in the previous frame image, to obtain a plurality of pairs of feature points. Alternatively, the terminal may perform matching through optical flow to obtain multiple pairs of feature points. Then, for each pair of feature points, the terminal can estimate the pose according to the method of the static scene VSLAM, and for the non-feature points of the motion area of the current frame, the pose is calculated through interpolation.
The feature point is a point where the gray value of the image changes drastically or a point with a large curvature on the edge of the image (i.e., an intersection point of two edges). The image feature points play a very important role in the feature point-based image matching algorithm. The image feature points can reflect the essential features of the image and can identify the target object in the image. Matching of images can be completed through the feature points.
And under the condition that the image is a depth image, matching by utilizing an ICP algorithm, then carrying out gesture estimation on each pair of matching points according to a static scene VSLAM method, and if the motion area of the current frame has non-matched pixels, calculating the gesture by interpolation. The pose estimation method has large calculated amount, and an alternative method is that firstly, sampling (such as equidistant sampling or key point detection and the like) is carried out on a depth image motion area, the pose estimation is carried out on the sampling points, and the pose of the non-sampling points is obtained through the pose interpolation of the sampling points.
And in a second mode, the image is acquired by the camera in a motion state.
When the camera is in a motion state, the pose of the camera is changed for both a stationary object and a moving object in a scene, and the pose change amount of the camera relative to the stationary object is different from the pose change amount of the camera relative to the moving object. For example, when the camera collects environmental information in an indoor scene, the pose of the camera changes relative to a stationary pillar in the room when the camera is in a rightward moving state; the pose of the camera is also changed relative to a person moving to the left in the room, and the pose change amount of the camera relative to the person is larger than the pose change amount of the camera relative to the pillar.
In this way, the terminal may determine the pose of the camera based on stationary and/or moving objects in the scene, i.e. the terminal may determine the pose of the camera based on the second moving region in the acquired image and/or the stationary region in the image. The process by which the terminal determines the pose of the camera based on different regions in the image will be described below.
1. The terminal determines a camera pose based on a stationary region in the image.
Referring to fig. 107, fig. 107 is a schematic flow chart of performing pose estimation based on a still region of an image according to an embodiment of the present application. As shown in fig. 107, the terminal may perform detection of a still region of an image based on an event image and a depth image or an RGB image. Specifically, after determining a motion region in a depth image or an RGB image, the terminal eliminates pixels corresponding to the motion region in the depth image or the RGB image, and the remaining region is a still region in the depth image or the RGB image. Then, the terminal can estimate the pose of the static region in the depth image or the RGB image according to the static scene VSLAM method.
2. The terminal determines a camera pose based on a motion region in the image.
Referring to fig. 108a, fig. 108a is a schematic flow chart of performing pose estimation based on a motion area of an image according to an embodiment of the present application. As shown in fig. 108a, the terminal may perform detection of a still region of an image based on an event image and a depth image or an RGB image. Specifically, after determining a motion region in a depth image or an RGB image, the terminal eliminates pixels except the motion region in the depth image or the RGB image, and the remaining region is the motion region in the depth image or the RGB image. Then, the terminal can estimate the pose of the motion area in the depth image or the RGB image according to the method of the static scene VSLAM.
3. The terminal determines a camera pose based on a moving region and a stationary region in the image.
Referring to fig. 108b, fig. 108b is a schematic flow chart for performing pose estimation based on the whole area of the image according to the embodiment of the present application. As shown in fig. 108b, the terminal may detect a still region and a dynamic region of an image based on the event image and the depth image or the RGB image, and detect the still region and the dynamic region in the depth image or the RGB image, respectively. The process of detecting the still region of the image by the terminal is similar to that of the embodiment corresponding to the above-mentioned fig. 107, and specific reference may be made to the embodiment corresponding to the above-mentioned fig. 107; the process of detecting the motion area of the image by the terminal is similar to the process of the embodiment corresponding to fig. 108a, and the embodiment corresponding to fig. 108a may be referred to specifically, and will not be described herein. Then, the terminal can respectively estimate the pose of the static region and the motion region in the depth image or the RGB image according to the method of the static scene VSLAM.
In addition, besides the application scenario, the method provided by the application can be applied to other more detailed scenarios, such as eye movement tracking, detection and identification, wherein the eye movement tracking can comprise remote eye movement tracking, AR/VR near-eye movement tracking, fixation response interaction, and the like, and the detection and identification can comprise moving target positioning, face detection and identification, vehicle-mounted detection and identification, gesture identification, detection and identification in security and protection scenarios, and the like. For example, after obtaining a clearer event image, the method may further process the clearer event image, and perform application scenes such as eye tracking, gaze response, detection recognition under security scene or vehicle detection recognition based on the clearer event image, and some more detailed application scenes of the method provided in the present application are described below by way of example.
Scene one, eye movement tracking
Firstly, for wearing type AR/VR glasses, the camera is close to the eyes, and the distance between the camera and the eyes is relatively fixed, so that eye motion information can be captured conveniently. As the DVS camera tracks dynamic objects more quickly and can output motion change information, the DVS camera is beneficial to eye tracking compared with the traditional camera.
In an eye-tracking scenario, an AR/VR glasses may be constructed using a DVS sensor and an infrared sensor, the structure of which may be as shown in FIG. 109. The number of DVS sensors may be one or more, for example, one DVS sensor is provided for each lens frame, and the number of infrared light sources may be one or more, for generating infrared light sources, so that the infrared light sources may be irradiated on the cornea to generate a scintillation point, i.e., purkinje image, which is generated by light entering the pupil and reflected (CR) on the outer surface of the cornea. Since the eyeball approximates a sphere, the position of the blinking dot irradiated on the eyeball does not substantially change with the rotation of the eyeball. And calculating and obtaining the cornea curvature center by using one or more scintillation points and the light source positions, and taking the cornea curvature center as anchor point position information coordinates of eye movement vision. According to the real-time tracking eye movement change, the DVS sensor generates four-element information [ X, Y, t, e ], wherein X and Y are position information; t is time information; e is event change information. And then, the information output by the DVS can be used for motion compensation by the method provided by the application, so that a clearer event image is obtained. Then, according to the relative position relation between the DVS and eyes of the user and the geometrical model of the eyeball, the iris and the pupil, the rotation angles of the eyeball in the horizontal plane and the vertical plane can be reversely pushed based on the pixel coordinates (x, y) of the event in the event image, and the sight angle of the human eye relative to the DVS camera can be calculated. The event image can be optimized in a motion compensation mode, so that a clearer event image is obtained. Then, based on the relative relation between the DVS camera and the screen (such as a binocular camera on the screen, the spatial position of the head relative to the screen can be positioned), the view point position falling on the screen is estimated by combining the view angle of the eyes relative to the head, and eye movement tracking is realized.
In another eye-tracking scenario, DVS may also be used in combination with an eye tracker to achieve more accurate eye-tracking. For example, the eye tracker acquires the viewpoint at a frequency of less than 1000HZ and the DVS acquires the viewpoint at a gap of 1000 HZ. The eye movement data with higher time precision is obtained by fusing the eye movement instrument and the viewpoint acquired by the DVS.
In another eye-tracking scenario, the DVS sensor may also be used above for remote eye-tracking, for example, as shown in fig. 13, may be applied to implementing eye-tracking through a cell phone and a tablet terminal device. Such devices, because the distance between the eye and the sensor is far and the distance is varying, can interact with the screen by determining the center of cornea and center of eye movement in a three-dimensional coordinate system through multiple sensors in a manner similar to the scene in fig. 109 described above.
In a gaze-responsive scenario, the structure shown in fig. 110 may also be used to sense eye gaze, and if the DVS senses that the user is looking longer than t on the screen, then corresponding eye movement control may be performed, such as looking at a bright screen, where the control device may be a cell phone, a tablet, a wearable watch, etc. For example, a binary mask map as shown in the above chart is obtained by setting the pixel position where events occur for a period of time (for example, 1 second) to 1 and the pixel position where no events occur to 0.
Training phase: and collecting the photographed binary mask images when looking at the screen from different angles and different distances, and marking out the human eye area. And collecting the photographed binary mask images when the screen is not watched from different angles and different distances, and marking the human eye area. Based on the collected data, the model is trained, which can locate the human eye region and identify both "gazing" and "non-gazing" states.
Testing: the model is used on the current binary mask map to find the human eye region and identify whether it is in a "gazing" state. If the screen is in the 'gazing' state, further judging gazing time length, and if the gazing time length is greater than or equal to 1 second, lighting the screen. If the device is in the 'non-gazing' state, the device is ensured to be in the screen-off state.
In addition, when the gazing duration is longer than a certain duration (such as 3 seconds), the characteristics of eyes are extracted to identify identity solutions, and quick unlocking of the mobile phone is achieved. When the distance between the eyes and the screen exceeds a certain distance (for example, more than 30 cm), the identity recognition unlocking function is not started, so that the safety of the user in using the mobile phone is ensured.
In this scenario, DVS-based eye tracking is faster, lower power consumption than traditional camera-based eye tracking schemes. Moreover, in the case of remote eye tracking or gaze interactive recognition, the DVS sensor does not have to collect all features and recognition of the face, thus protecting user privacy better than RGB devices. For example, DVS sensors need only sense changes in the eyes of a user, as opposed to using RGB to sense the gaze of the user, with low power consumption.
Detection and identification are carried out in scene two and security scene
The traditional APS (Advaenced Photo System) camera mainly locates a moving target based on a background subtraction-like method and mainly analyzes the key information, and the simplest implementation method is a frame difference method. While DVS captures moving objects by sensing the change in luminance of individual pixels, it has almost the same effect as the frame difference method, but with lower latency. The DVS camera can quickly locate a rectangular area/mask where a foreground moving object is located in a single moving object scene, for example, in a monitored scene where a lens is fixed and a photographed background is clean. By the method, the image acquired by the DVS can be subjected to motion compensation, so that a clearer event image is obtained, and detection and identification under a security scene are more accurate.
A process for moving object detection using a DVS sensor in one scenario, for example, may include: when a moving object appears in the picture or the scene has light change, the corresponding region of the DVS generates event. Setting the pixel position at which events occur within a period of time (such as 1 second) to be 1, setting the pixel position at which no events occur to be 0, and obtaining a mask image. And screening out connected rectangular frame areas on the event images. Then judging the size of the rectangular frame, and when the area of the rectangular frame is greater than a threshold value 1, detecting and identifying the movement area detected by the DVS without detection, wherein the change detected by the DVS can be understood as scene light change; when the threshold value 2 is larger than the rectangular frame area, the rectangular frame can be used for representing noise, such as a movement area generated by leaf shaking of wind blowing; when threshold 1> rectangular frame area > threshold 2, it can be further determined whether it is a moving object based on the continuity of motion, thereby determining whether further detection and recognition are required.
Another scenario, for example, a process of moving object detection and recognition using a DVS sensor and an RGB camera, may include: when a moving object appears in a picture or light changes exist in a scene, an event is generated in a DVS corresponding region, a clear event image is obtained through the method provided by the application, a rectangular frame representing the moving object is determined in the event image, and after the rectangular frame is expanded for one circle (h x w x 0.1), a corresponding rectangular region on the RGB camera corresponding frame is found and used as the moving object region. Existing RGB image deep learning networks are used to identify categories of objects within a moving object region.
Therefore, in the present scenario, since the DVS sensor is sensitive to high-speed moving objects, motion events can be captured quickly and response analysis can be performed, and the "time resolution" is higher than APS, so that the DVS sensor is used for moving object detection, which has the advantage of low latency. In addition, the DVS sensor is highly sensitive to object motion and is not greatly affected by scene light intensity, i.e., can still perform moving object identification information in an excessively bright or excessively dark scene.
Scene three and vehicle-mounted detection and identification
In general, during the running of a vehicle, a dynamic sensing camera can capture the outline, license plate information and lane lines of the vehicle which are stationary or moving in the visual field. Three main applications are included in this scenario: moving object detection, high-speed license plate recognition and lane line detection. Specifically, the DVS sensor may be deployed on a vehicle to detect an object outside the vehicle, or may be deployed on a camera for public transportation to perform security monitoring.
More specifically, the detection of moving objects realizes the real-time target detection of moving objects (vehicles and pedestrians) in a vehicle-mounted scene by means of the motion sensitivity and low time delay of the dynamic sensing chip, and particularly assists a driver in obstacle avoidance judgment for a fast motion scene (avoiding motion blur) and a high dynamic range scene.
High-speed road sign and license plate recognition: the method is divided into two scenes, namely ADAS and traffic monitoring. The vehicle-mounted DVS is used for identifying the target outside the vehicle, only a small amount of simple textures (two-color images and the like) are needed, and the DVS is particularly suitable for identifying high-speed license plates and road signs, relatively measuring the speed of adjacent vehicles and the like; the latter is DVS traffic monitoring of fixed scene, and the application scene includes highway snapshot, stop-break snapshot, red light running snapshot etc. Overall, DVS is of greater value in on-board ADAS, and traffic monitoring may need to be done cooperatively with other sensors (to remedy the lack of texture).
Lane line detection: the system is used for the functions of lane keeping, lane merging assistance and the like of automatic driving, and can detect lane lines in real time by means of DVS.
The high dynamic characteristic of the DVS has the advantage of all-weather availability under the scene, and can still be detected and identified under the conditions of backlight, night and the like.
The autopilot networking diagram may be shown in fig. 111, where the networking may include an autopilot vehicle (such as autopilot vehicle a, autopilot vehicle B, autopilot vehicle C shown in fig. 14, etc.) and a centralized control device, and may also include a monitoring camera or other devices, etc. The centralized control device may be used to control or identify environmental data of vehicles in the roadway.
In this scenario, moving object detection, lane line detection, high-speed road marking or license plate recognition, etc. may be performed. And in particular may be a clearer image of an event obtained in connection with the method provided herein.
Detecting a moving object: by means of motion sensitivity and low time delay of the dynamic sensing chip, real-time target detection of moving objects (vehicles and pedestrians) in a vehicle-mounted scene is achieved, and particularly, obstacle avoidance judgment is carried out on a fast motion scene (motion blur avoidance) and a high dynamic range scene by a driver.
Lane line detection: the system is used for the functions of lane keeping, lane merging assistance and the like of automatic driving, and can detect lane lines in real time by means of DVS.
High-speed road sign and license plate recognition: the method is divided into two scenes, namely ADAS and traffic monitoring. The vehicle-mounted DVS is used for identifying the target outside the vehicle, only a small amount of simple textures (such as a bicolor image) are needed, and the DVS is particularly suitable for identifying high-speed license plates and road signs, relatively measuring the speed of adjacent vehicles and the like; the latter is DVS traffic monitoring of fixed scene, and the application scene includes highway snapshot, stop-break snapshot, red light running snapshot etc. In general, DVS is of greater value in on-board ADAS, and traffic monitoring may need to be accomplished cooperatively by other sensors (e.g., to remedy the lack of texture).
In this scenario, moving objects can be identified more quickly and accurately in combination with images acquired by the DVS. In particular, for simple and grammatical images, the recognition is more accurate and the power consumption is lower. Is not affected by light intensity, such as night driving or recognition in tunnels.
In order to better implement the above-described scheme of the embodiments of the present application, on the basis of the embodiments corresponding to fig. 95, 108a and 108b, the following further provides related devices for implementing the above-described scheme. Referring specifically to fig. 118, fig. 118 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 118, the data processing apparatus includes an acquisition module 11801 and a processing module 11802. The acquiring module 11801 is configured to acquire a first event image and a first RGB image, where the first event image is aligned with the first RGB image in time sequence, and the first event image is an image that represents a motion track of a target object when the target object moves within a detection range of the motion sensor; the processing module 11802 is configured to determine an integration time of the first event image; the processing module 11802 is further configured to determine not to perform pose estimation with the first RGB image if the integration time is less than a first threshold; the processing module 11802 is further configured to perform pose estimation according to the first event image.
In one possible design, the processing module 11802 is further configured to determine the acquisition time of the first event image and the acquisition time of the first RGB image; and determining that the first event image is aligned with the first RGB image in time sequence according to the time difference between the acquisition time of the first RGB image and the acquisition time of the first event image is smaller than a second threshold value.
In one possible design, the acquiring module 11801 is further configured to acquire N consecutive DVS events; the processing module 11802 is further configured to integrate the N consecutive DVS events into a first event image; the processing module 11802 is further configured to determine an acquisition time of the first event image according to the acquisition times of the N consecutive DVS events.
In one possible design, the processing module 11802 is further configured to determine N consecutive DVS events for integration into the first event image; the processing module 11802 is further configured to determine an integration time of the first event image according to the acquisition times of the first DVS event and the last DVS event in the N consecutive DVS events.
In one possible design, the acquiring module 11801 is further configured to acquire a second event image; the processing module 11802 is further configured to determine that the second event image does not have an RGB image for jointly performing pose estimation if there is no RGB image aligned with the second event image in time sequence; the processing module 11802 is further configured to perform pose estimation according to the second event image.
In one possible design, the processing module 11802 is further configured to: if the second event image is determined to have time sequence aligned inertial measurement unit IMU data, determining a pose according to the second DVS image and IMU data corresponding to the second event image; if it is determined that the second event image does not have time-aligned inertial measurement unit IMU data, a pose is determined from the second event image only.
In one possible design, the acquiring module 11801 is further configured to acquire a second RGB image; the processing module 11802 is further configured to determine that the second RGB image does not have an event image for jointly performing pose estimation if there is no event image aligned with the second RGB image in time sequence; the processing module 11802 is further configured to determine a pose according to the second RGB image.
In one possible design, the processing module 11802 is further configured to perform loop detection according to the first event image and a dictionary, where the dictionary is a dictionary constructed based on the event image.
In one possible design, the acquiring module 11801 is further configured to acquire a plurality of event images, where the plurality of event images are event images for training; the acquiring module 11801 is further configured to acquire visual features of the plurality of event images; the processing module 11802 is further configured to cluster the visual features through a clustering algorithm to obtain clustered visual features, where the clustered visual features have corresponding descriptors; the processing module 11802 is further configured to construct the dictionary according to the clustered visual features.
In one possible design, the processing module 11802 is further configured to: determining a descriptor of the first event image; determining visual features corresponding to descriptors of the first event image in the dictionary; determining a bag-of-word vector corresponding to the first event image based on the visual features; and determining the similarity between the bag-of-words vector corresponding to the first event image and the bag-of-words vectors of other event images to determine the event image matched with the first event image.
Referring specifically to fig. 119, fig. 119 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 119, the data processing apparatus includes an acquisition module 11901 and a processing module 11902. The acquiring module 11901 is configured to acquire an event image; the processing module 11902 is configured to determine first information of the event image, the first information including events and/or features in the event image; the processing module 11902 is configured to determine that the event image is a key frame if it is determined that the event image satisfies at least a first condition based on the first information, where the first condition is related to the number of events and/or the number of features.
In one possible design, the first condition includes: one or more of the number of events in the event image being greater than a first threshold, the number of event effective areas in the event image being greater than a second threshold, the number of features in the event image being greater than a third threshold, and the feature effective areas in the event image being greater than a fourth threshold.
In one possible design, the acquiring module 11901 is configured to acquire depth images aligned with the event image timing; the processing module 11902 is further configured to determine that the event image and the depth image are key frames if it is determined that the event image satisfies at least a first condition based on the first information.
In one possible design, the acquiring module 11901 is configured to acquire RGB images aligned with the event image timing; the obtaining module 11901 is configured to obtain a feature number and/or a feature effective area of the RGB image; the processing module 11902 is further configured to determine that the event image and the RGB image are key frames if it is determined based on the first information that the event image at least meets a first condition and the number of features of the RGB image is greater than a fifth threshold and/or the number of feature effective areas of the RGB image is greater than a sixth threshold.
In one possible design, the processing module 11902 is further configured to: determining second information of the event image if the event image at least meets the first condition based on the first information, wherein the second information comprises motion characteristics and/or pose characteristics in the event image; and if the event image is determined to meet at least a second condition based on the second information, determining that the event image is a key frame, wherein the second condition is related to the motion variation and/or the pose variation.
In one possible design, the processing module 11902 is further configured to determine a sharpness and/or brightness consistency indicator for the event image; the processing module 11902 is further configured to determine that the event image is a key frame if it is determined, based on the second information, that the event image at least meets the second condition, and the sharpness of the event image is greater than a sharpness threshold and/or the brightness uniformity index of the event image is greater than a preset index threshold.
In one possible design, the processing module 11902 is further configured to: if the pixels in the event image represent the polarity of the light intensity change, calculating the absolute value of the difference value between the event number of the event image and the event number of the adjacent key frames, and dividing the absolute value by the pixel number of the event image to obtain a brightness consistency index of the event image; if the pixels in the event image represent light intensity, the event image and the adjacent key frames are subjected to pixel-by-pixel difference, the absolute value of the difference value is calculated, summation operation is carried out on the absolute value corresponding to each group of pixels, and the obtained summation result is divided by the number of pixels, so that the brightness consistency index of the event image is obtained.
In one possible design, the acquiring module 11901 is configured to acquire RGB images aligned with the event image timing; the processing module 11902 is further configured to determine a sharpness and/or brightness uniformity indicator for the RGB image; the processing module 11902 is further configured to determine that the event image and the RGB image are key frames if it is determined based on the second information that the event image at least meets a second condition, and the sharpness of the RGB image is greater than a sharpness threshold and/or the brightness uniformity index of the RGB image is greater than a preset index threshold.
In one possible design, the second condition includes: one or more of the distance between the event image and the last key frame exceeds a preset distance value, the rotation angle between the event image and the last key frame exceeds a preset angle value, and the distance between the event image and the last key frame exceeds a preset distance value, and the rotation angle between the event image and the last key frame exceeds a preset angle value.
Referring to fig. 120 in particular, fig. 120 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 120, the data processing apparatus includes an acquisition module 12001 and a processing module 12002. The acquisition module is used for acquiring a first event image and a target image corresponding to the first event image, wherein the first event image is the same as the environment information captured by the image, the target image comprises a depth image or an RGB image, and the first event image is an image representing a motion track of a target object when the target object moves within a detection range of the motion sensor; the processing module is used for determining a first motion area in the first event image; the processing module is further used for determining a corresponding second motion area in the image according to the first motion area; and the processing module is also used for estimating the pose according to the second motion area in the image.
In one possible design, the acquiring module is further configured to acquire a pixel point in the first event image having an event response if the dynamic vision sensor DVS that acquires the first event image is stationary; the processing module is further configured to determine the first motion area according to the pixel point having the event response.
In one possible design, the processing module is further configured to: determining an outline formed by pixel points with event response in the first event image; and if the area surrounded by the outline is larger than a first threshold value, determining the area surrounded by the outline as a first movement area.
In one possible design, the acquiring module is further configured to acquire a second event image if the DVS for acquiring the first event image is moving, where the second event image is a previous frame event image of the first event image; the processing module is further used for calculating the displacement size and the displacement direction of the pixels in the first event image relative to the second event image; the processing module is further configured to determine that the pixel belongs to the first motion region if a displacement direction of the pixel in the first event image is different from a displacement direction of surrounding pixels, or if a difference between a displacement magnitude of the pixel in the first event image and a displacement magnitude of the surrounding pixels is greater than a second threshold.
In one possible design, the processing module is further configured to determine a corresponding stationary region in the image according to the first motion region; the processing module is also used for determining the pose according to the static area in the image.
Referring to fig. 121, another schematic structural diagram of an electronic device provided in the present application is described below.
The electronic device may include a processor 12101, a memory 12102, an RGB sensor 12103, and a motion sensor 12104. The processor 12101, the RGB sensor 12103 and the motion sensor 12104 are interconnected by wires. Therein, the memory 12102 stores program instructions and data. The RGB sensor 12103 is used for photographing, and converts the acquired analog signal into an electrical signal. The motion sensor 12104 is used to monitor a moving object within a photographing range.
The memory 12102 stores therein program instructions and data corresponding to the steps in fig. 3-a to 108-b described above.
Processor 12101 is used to perform the method steps performed by the electronic device shown in any of the foregoing embodiments of fig. 3-a through 108-b.
The RGB sensor 12103 is used to perform the steps of capturing images that are performed by the electronic device shown in any of the foregoing embodiments of fig. 3-a through 108-b.
The motion sensor 12104 is used to perform the steps of monitoring a moving object performed by the electronic device as described in any of the embodiments of fig. 3-a through 108-b.
There is also provided in an embodiment of the present application a computer-readable storage medium having stored therein a program for generating a vehicle running speed, which when run on a computer causes the computer to perform the steps of the method described in the embodiment shown in the foregoing fig. 2-18.
Alternatively, the device shown in the aforementioned fig. 121 may be a chip.
The embodiment of the application also provides an electronic device, which may also be called a digital processing chip or a chip, where the chip includes a processing unit and a communication interface, where the processing unit obtains program instructions through the communication interface, where the program instructions are executed by the processing unit, and where the processing unit is configured to execute the method steps executed by the electronic device shown in any of the foregoing embodiments of fig. 3-a to 108-b.
The embodiment of the application also provides a digital processing chip. The digital processing chip has integrated therein circuitry and one or more interfaces for implementing the functions of the processor 12101, or processor 12101, described above. When the memory is integrated into the digital processing chip, the digital processing chip may perform the method steps of any one or more of the preceding embodiments. When the digital processing chip is not integrated with the memory, the digital processing chip can be connected with the external memory through the communication interface. The digital processing chip implements the actions executed by the electronic device in the above embodiment according to the program codes stored in the external memory.
Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps performed by the electronic device in the method described in the embodiments of fig. 3-a to 108-b described above.
The electronic device provided in this embodiment of the present application may be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the server to perform the image processing method described in the embodiments shown in fig. 3-a to 108-b. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.
In particular, the aforementioned processing unit or processor may be a central processing unit (central processing unit, CPU), a Network Processor (NPU), a graphics processor (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC) or field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The general purpose processor may be a microprocessor or may be any conventional processor or the like.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method described in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (74)

  1. A vision sensor chip, comprising:
    a pixel array circuit for generating at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of change in light intensity, the at least one data signal being indicative of a light intensity change event indicative of the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit exceeding a predetermined threshold;
    a read circuit coupled to the pixel array circuit for reading the at least one data signal from the pixel array circuit in a first event representation;
    the read circuit is further configured to provide the at least one data signal to a control circuit;
    the reading circuit is further configured to, upon receiving a transition signal generated based on the at least one data signal from the control circuit, transition to reading the at least one data signal from the pixel array circuit in a second event representation.
  2. The vision sensor chip of claim 1, wherein the first event is represented by a polarity information, the pixel array circuit comprises a plurality of pixels, each of the pixels comprises a threshold comparison unit,
    The threshold value comparing unit is used for outputting polarity information when the light intensity conversion quantity exceeds a preset threshold value, and the polarity information is used for indicating whether the light intensity conversion quantity is increased or decreased;
    the reading circuit is specifically configured to read the polarity information output by the threshold comparing unit.
  3. The vision sensor chip of claim 1, wherein the first event is represented by light intensity information, the pixel array comprises a plurality of pixels, each of the pixels comprises a threshold comparing unit, a readout control unit and a light intensity collecting unit,
    the light intensity detection unit is used for outputting an electric signal corresponding to the light signal irradiated on the light intensity detection unit, and the electric signal is used for indicating the light intensity;
    the threshold comparison unit is used for outputting a first signal when the light intensity conversion amount is determined to exceed a preset threshold according to the electric signal;
    the reading control unit is used for responding to the received first signal and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the first signal receiving moment;
    the reading circuit is specifically configured to read the electrical signal buffered by the light intensity acquisition unit.
  4. A vision sensor chip as claimed in any one of claims 1 to 3, wherein the control circuit is further configured to:
    determining statistical data based on the at least one data signal received from the read circuit;
    and if the statistical data is determined to meet a preset conversion condition, transmitting the conversion signal to the reading circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
  5. The vision sensor chip of claim 4, wherein the first event representation is an event represented by light intensity information, the second event representation is an event represented by polarity information, the predetermined conversion condition is that a total amount of data read from the pixel array circuit by the first event representation is greater than the preset bandwidth, or the predetermined conversion condition is that a number of the at least one data signal is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  6. The vision sensor chip of claim 4, wherein the first event representation is an event represented by polarity information, the second event representation is an event represented by light intensity information, the predetermined transition condition is that if the at least one data signal is read from the pixel array circuit by the second event representation, a total amount of data read is not greater than the preset bandwidth, or the predetermined transition condition is that a number of the at least one data signal is not greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  7. A decoding circuit, comprising:
    a reading circuit for reading the data signal from the vision sensor chip;
    a decoding circuit for decoding the data signal according to a first decoding scheme;
    the decoding circuit is further configured to decode the data signal according to a second decoding method when receiving the conversion signal from the control circuit.
  8. The decoding circuit of claim 7, wherein the control circuit is further configured to:
    determining statistical data based on the data signal read from the read circuit;
    and if the statistical data is determined to meet a preset conversion condition, transmitting the conversion signal to the coding circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
  9. The decoding circuit according to claim 8, wherein the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means representing an event by light intensity information, the second decoding means decodes the data signal according to a second bit corresponding to a second event representation means representing an event by polarity information indicating whether the light intensity variation amount is increased or decreased, the conversion condition is that a total data amount decoded according to the first decoding means is greater than the preset bandwidth, or the predetermined conversion condition is that a number of the data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  10. The decoding circuit according to claim 8, wherein the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means representing an event by polarity information indicating whether the amount of change in light intensity is increased or decreased, the second decoding means decodes the data signal by a second bit corresponding to a second event representation means representing an event by light intensity information, the conversion condition is that a total amount of data is not greater than the preset bandwidth if the data signal is decoded according to the second decoding means, or the predetermined conversion condition is that a number of the data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  11. A method of operating a vision sensor chip, comprising:
    generating, by a pixel array circuit of the vision sensor chip, at least one data signal corresponding to a pixel in the pixel array circuit, the at least one data signal indicating a light intensity variation event indicating that the measured light intensity variation of the corresponding pixel in the pixel array circuit exceeds a predetermined threshold;
    Reading the at least one data signal from the pixel array circuit in a first event representation by a reading circuit of the vision sensor chip;
    providing the at least one data signal to a control circuit of the vision sensor chip by the reading circuit;
    upon receiving a transition signal generated based on the at least one data signal from the control circuit by the read circuit, transitioning to reading the at least one data signal from the pixel array circuit in a second event representation.
  12. The method of claim 11, wherein the first event representation is an event represented by polarity information, the pixel array circuit comprising a plurality of pixels, each pixel comprising a threshold comparison unit, the reading of the at least one data signal from the pixel array circuit by the read circuit of the vision sensor chip in the first event representation comprising:
    when the light intensity conversion quantity exceeds a preset threshold value, outputting polarity information through the threshold value comparison unit, wherein the polarity information is used for indicating whether the light intensity conversion quantity is increased or reduced;
    and reading the polarity information output by the threshold value comparing unit through the reading circuit.
  13. The method of claim 11, wherein the first event is represented by light intensity information, the pixel array includes a plurality of pixels, each of the pixels includes a threshold comparison unit, a readout control unit, and a light intensity acquisition unit, and the reading of the at least one data signal from the pixel array circuit by the reading circuit of the vision sensor chip in the first event is represented by:
    outputting an electric signal corresponding to the light signal irradiated on the light source through the light intensity acquisition unit, wherein the electric signal is used for indicating the light intensity;
    when the electric signal determines that the light intensity conversion quantity exceeds a preset threshold value, a first signal is output through the threshold value comparison unit;
    responding to the received first signal, and indicating a light intensity acquisition unit to acquire and buffer the electric signal corresponding to the first signal receiving moment through the read-out control unit;
    and reading the electric signals cached by the light intensity acquisition unit through the reading circuit.
  14. The method according to any one of claims 11 to 13, further comprising:
    determining statistical data based on the at least one data signal received from the read circuit;
    And if the statistical data is determined to meet a preset conversion condition, transmitting the conversion signal to the reading circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
  15. The method of claim 14, wherein the first event representation is an event represented by light intensity information, the second event representation is an event represented by polarity information, the predetermined conversion condition is that a total amount of data read from the pixel array circuit by the first event representation is greater than the preset bandwidth, or the predetermined conversion condition is that a number of the at least one data signal is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  16. The method according to claim 14, wherein the first event representation is an event represented by polarity information, the second event representation is an event represented by light intensity information, the predetermined conversion condition is that if the at least one data signal is read from the pixel array circuit by the second event representation, a total amount of data read is not greater than the preset bandwidth, or the predetermined conversion condition is that a number of the at least one data signal is not greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  17. A decoding method, comprising:
    reading a data signal from the vision sensor chip by a reading circuit;
    decoding the data signal according to a first decoding mode by a decoding circuit;
    when receiving the conversion signal from the control circuit, the decoding circuit decodes the data signal according to a second decoding mode.
  18. The decoding method of claim 17, further comprising:
    determining statistical data based on the data signal read from the read circuit;
    and if the statistical data is determined to meet a preset conversion condition, transmitting the conversion signal to the coding circuit, wherein the preset conversion condition is determined based on the preset bandwidth of the vision sensor chip.
  19. The decoding method according to claim 18, wherein the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means representing an event by light intensity information, the second decoding means decodes the data signal according to a second bit corresponding to a second event representation means representing an event by polarity information indicating whether the light intensity variation amount is increased or decreased, the conversion condition is that the total amount of data decoded according to the first decoding means is greater than the preset bandwidth, or the predetermined conversion condition is that the number of data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  20. The decoding method according to claim 18, wherein the first decoding means decodes the data signal according to a first bit corresponding to a first event representation means representing an event by polarity information indicating whether the light intensity variation amount is increased or decreased, the second decoding means decodes the data signal by a second bit corresponding to a second event representation means representing an event by light intensity information, the conversion condition is that a total data amount is not greater than the preset bandwidth if the data signal is decoded according to the second decoding means, or the predetermined conversion condition is that a number of the data signals is greater than a ratio of the preset bandwidth to the first bit, the first bit being a preset bit of a data format of the data signal.
  21. A vision sensor chip, comprising:
    a pixel array circuit for generating at least one data signal corresponding to a pixel in the pixel array circuit by measuring an amount of change in light intensity, the at least one data signal being indicative of a light intensity change event indicative of the amount of change in light intensity measured by the corresponding pixel in the pixel array circuit exceeding a predetermined threshold;
    A first encoding unit, configured to encode the at least one data signal according to a first bit to obtain first encoded data;
    the first coding unit is further configured to code the at least one data signal according to a second bit indicated by a first control signal when the first control signal is received from the control circuit, where the first control signal is determined by the control circuit according to the first coded data.
  22. The vision sensor chip of claim 21, wherein the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
  23. The visual sensor chip of claim 22, wherein the second bit indicated by the control signal is less than the first bit when the amount of data of the first encoded data is not less than the bandwidth such that the total amount of data of the at least one data signal encoded by the second bit is not greater than the bandwidth.
  24. The visual sensor chip of claim 22, wherein the second bit indicated by the control signal is greater than the first bit when the amount of data of the first encoded data is less than the bandwidth, and the total amount of data of the at least one data signal encoded by the second bit is not greater than the bandwidth.
  25. The vision sensor chip of any one of claims 21-24, wherein the pixel array comprises N regions, a maximum bit of at least two of the N regions being different, the maximum bit representing a preset maximum bit encoding the at least one data signal generated by one of the regions,
    the first coding unit is specifically configured to code the at least one data signal generated in the first area according to the first bit, so as to obtain first coded data, where the first bit is not greater than a maximum bit of the first area, and the first area is any one of the N areas;
    the first encoding unit is specifically configured to encode the at least one data signal generated in the first area according to a second bit indicated by the first control signal when the first control signal is received from the control circuit, where the first control signal is determined by the control circuit according to the first encoded data.
  26. The vision sensor chip of claim 21, wherein the control circuit is further configured to:
    when it is determined that the total data amount of the at least one data signal encoded by the third bit is greater than the bandwidth and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth, the first control signal is transmitted to the first encoding unit, the third bit and the second bit differing by 1 bit unit.
  27. A decoding apparatus, comprising:
    a reading circuit for reading the data signal from the vision sensor chip;
    a decoding circuit for decoding the data signal according to a first bit;
    the decoding circuit is further configured to decode the data signal according to a second bit indicated by the first control signal when the first control signal is received from the control circuit.
  28. The decoding device of claim 27, wherein the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
  29. The decoding device of claim 28, wherein the second bit is smaller than the first bit when a total data amount of the data signal decoded from the first bit is not smaller than the bandwidth.
  30. The decoding device of claim 28, wherein the second bit is greater than the first bit when a total amount of data of the data signal decoded from the first bit is less than the bandwidth, and wherein the total amount of data of the data signal decoded by the second bit is not greater than the bandwidth.
  31. The decoding device according to any one of claims 27 to 30, wherein,
    the reading circuit is specifically configured to read a data signal corresponding to a first area from a vision sensor chip, where the first area is any one of N areas included in a pixel array of the vision sensor, and maximum bits of at least two areas in the N areas are different, where the maximum bits represent preset maximum bits for encoding the at least one data signal generated in one area;
    the decoding circuit is specifically configured to decode a data signal corresponding to the first area according to the first bit.
  32. The decoding device of claim 27, wherein the control circuit is further configured to:
    and when the total data amount of the data signal decoded by the third bit is determined to be larger than the bandwidth and the total data amount of the data signal decoded by the second bit is determined to be not larger than the bandwidth, transmitting the first control signal to the first coding unit, wherein the difference between the third bit and the second bit is 1 bit unit.
  33. A method of operating a vision sensor chip, comprising:
    Generating, by a pixel array circuit of the vision sensor chip, at least one data signal corresponding to a pixel in the pixel array circuit, the at least one data signal indicating a light intensity variation event indicating that the measured light intensity variation of the corresponding pixel in the pixel array circuit exceeds a predetermined threshold;
    encoding the at least one data signal according to a first bit by a first encoding unit of the vision sensor chip to obtain first encoded data;
    when a first control signal is received from a control circuit of the vision sensor chip through the first coding unit, the at least one data signal is coded according to a second bit indicated by the first control signal, and the first control signal is determined by the control circuit according to the first coding data.
  34. The method of claim 33, wherein the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
  35. The method of claim 34, wherein the second bit indicated by the control signal is smaller than the first bit when the amount of data of the first encoded data is not smaller than the bandwidth such that the total amount of data of the at least one data signal encoded by the second bit is not greater than the bandwidth.
  36. The method of claim 34, wherein the second bit indicated by the control signal is greater than the first bit when the amount of data of the first encoded data is less than the bandwidth, and wherein the total amount of data of the at least one data signal encoded by the second bit is not greater than the bandwidth.
  37. The method according to any one of claims 33 to 36, wherein the pixel array comprises N regions, a maximum bit of at least two regions of the N regions being different, the maximum bit representing a preset maximum bit encoding the at least one data signal generated by one of the regions,
    the encoding of the at least one data signal by the first encoding unit of the vision sensor chip according to a first bit includes:
    encoding the at least one data signal generated in the first area according to the first bit by the first encoding unit to obtain first encoded data, wherein the first bit is not greater than the maximum bit of the first area, and the first area is any one area of the N areas;
    when the first coding unit receives a first control signal from the control circuit of the vision sensor chip, the at least one data signal is coded according to a second bit indicated by the first control signal, and the method comprises the following steps:
    And when the first control signal is received from the control circuit through the first coding unit, the at least one data signal generated by the first area is coded according to a second bit indicated by the first control signal, and the first control signal is determined by the control circuit according to the first coded data.
  38. The method as recited in claim 33, further comprising:
    when it is determined that the total data amount of the at least one data signal encoded by the third bit is greater than the bandwidth and the total data amount of the at least one data signal encoded by the second bit is not greater than the bandwidth, the first control signal is transmitted to the first encoding unit by the control circuit, the third bit and the second bit differing by 1 bit unit.
  39. A decoding method, comprising:
    reading a data signal from the vision sensor chip by a reading circuit;
    decoding the data signal by a decoding circuit according to a first bit;
    and when the decoding circuit receives the first control signal from the control circuit, decoding the data signal according to the second bit indicated by the first control signal.
  40. The decoding method of claim 39, wherein the first control signal is determined by the control circuit based on the first encoded data and a bandwidth preset by the vision sensor chip.
  41. The decoding method of claim 40, wherein the second bit is smaller than the first bit when a total data amount of the data signal decoded according to the first bit is not smaller than the bandwidth.
  42. The decoding method of claim 40, wherein the second bit is greater than the first bit when a total amount of data of the data signal decoded from the first bit is less than the bandwidth, and wherein the total amount of data of the data signal decoded from the second bit is not greater than the bandwidth.
  43. The decoding method of any one of claims 39 to 42, wherein the reading the data signal from the vision sensor chip by the reading circuit includes:
    reading a data signal corresponding to a first area from a vision sensor chip through the reading circuit, wherein the first area is any one of N areas included in a pixel array of the vision sensor, the maximum bits of at least two areas in the N areas are different, and the maximum bits represent preset maximum bits for encoding at least one data signal generated by one area;
    The decoding of the data signal by the decoding circuit according to the first bit comprises:
    and decoding the data signal corresponding to the first area according to the first bit by the decoding circuit.
  44. The decoding method of claim 39, wherein the method further comprises:
    and when the total data amount of the data signal decoded by the third bit is determined to be larger than the bandwidth and the total data amount of the data signal decoded by the second bit is determined to be not larger than the bandwidth, transmitting the first control signal to the first coding unit, wherein the difference between the third bit and the second bit is 1 bit unit.
  45. A vision sensor chip, comprising:
    a pixel array circuit for generating a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit by measuring an amount of light intensity variation, the plurality of data signals being indicative of at least one light intensity variation event indicative of the amount of light intensity variation measured by the corresponding pixels in the pixel array circuit exceeding a predetermined threshold;
    and a third encoding unit for encoding a first differential value according to a first preset bit, wherein the first differential value is a difference value between the light intensity conversion amount and the preset threshold value.
  46. The vision sensor chip of claim 45, wherein the pixel array circuit comprises a plurality of pixels, each of the pixels comprising a threshold comparison unit,
    the threshold value comparing unit is used for outputting polarity information when the light intensity conversion quantity exceeds the preset threshold value, wherein the polarity information is used for indicating whether the light intensity conversion quantity is increased or reduced;
    the third encoding unit is further configured to encode the polarity information according to a second preset bit.
  47. The vision sensor chip of claim 46, wherein each of the pixels comprises a light intensity detection unit, a readout control unit, and a light intensity acquisition unit,
    the light intensity detection unit is used for outputting an electric signal corresponding to the light signal irradiated on the light intensity detection unit, and the electric signal is used for indicating the light intensity;
    the threshold value comparing unit is specifically configured to output the polarity information when the light intensity conversion amount is determined to exceed a predetermined threshold value according to the electrical signal;
    the readout control unit is used for responding to the received polarity signal, and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the polarity information receiving moment;
    The third coding unit is further configured to code a first electrical signal according to a third preset bit, where the first electrical signal is the electrical signal collected by the light intensity collection unit and corresponds to the first receiving time of the polarity information, and the third preset bit is a maximum bit preset by the vision sensor and used for representing the characteristic information of the light intensity.
  48. The vision sensor chip of claim 47, wherein the third encoding unit is further configured to:
    and encoding the electric signals acquired by the light intensity acquisition unit according to the third preset bit at intervals of preset time.
  49. The visual sensor chip of any one of claims 45 to 48, wherein the third encoding unit is specifically configured to:
    and when the first differential value is smaller than the preset threshold value, encoding the first differential value according to the first preset bit.
  50. The visual sensor chip of any one of claims 45 to 49 wherein the third encoding unit is further configured to:
    and when the first differential value is not smaller than the preset threshold value, the first residual differential value and the preset threshold value are encoded according to the first preset bit, and the first residual differential value is the differential value of the differential value and the preset threshold value.
  51. The visual sensor chip of claim 50, wherein the third encoding unit is specifically configured to:
    when the first residual differential value is not smaller than the preset threshold value, a second residual differential value is encoded according to the first preset bit, and the second residual differential value is the difference value between the first residual differential value and the preset threshold value;
    performing first encoding on the preset threshold according to the first preset bit;
    and carrying out second coding on the preset threshold value according to the first preset bit.
  52. A decoding apparatus, comprising: an acquisition circuit for reading the data signal from the vision sensor chip;
    and the decoding circuit is used for decoding the data signal according to the first bit to obtain a differential value, the differential value is smaller than a preset threshold value, the differential value is the difference value between the light intensity transformation amount measured by the vision sensor and the preset threshold value, the light intensity transformation amount exceeds the preset threshold value, and the vision sensor generates at least one light intensity change event.
  53. The decoding device of claim 52, wherein the decoding circuit is further configured to:
    The data signal is decoded according to the second bit to obtain polarity information indicating whether the light intensity variation is increased or decreased.
  54. The decoding device of claim 52 or 53, wherein the decoding circuit is further configured to:
    and decoding the data signal received at the first moment according to a third bit to acquire an electric signal corresponding to the optical signal which is output by the visual sensor and irradiated on the data signal, wherein the third bit is the maximum bit preset by the visual sensor and used for representing the characteristic information of the light intensity.
  55. The decoding device of claim 54, wherein the decoding circuit is further configured to:
    and decoding the data signal received at the first moment according to the third bit at intervals of preset time length.
  56. Decoding apparatus according to any one of claims 52 to 55, characterized in that the decoding circuit is specifically configured to:
    the data signal is decoded according to the first bit to obtain a differential value and at least one of the predetermined thresholds.
  57. A method of operating a vision sensor chip, comprising:
    generating, by a pixel array circuit of the vision sensor chip, a plurality of data signals corresponding to a plurality of pixels in the pixel array circuit, the plurality of data signals indicating at least one light intensity variation event that indicates that the measured light intensity variation of the corresponding pixels in the pixel array circuit exceeds a predetermined threshold;
    And encoding a first differential value according to a first preset bit by a third encoding unit of the vision sensor chip, wherein the first differential value is the difference value between the light intensity conversion quantity and the preset threshold value.
  58. The method of claim 57, wherein the pixel array circuit comprises a plurality of pixels, each pixel comprising a threshold comparison unit, the method further comprising:
    outputting polarity information by the threshold comparing unit when the light intensity conversion amount exceeds the predetermined threshold, wherein the polarity information is used for indicating whether the light intensity conversion amount is increased or decreased;
    and encoding the polarity information according to a second preset bit by the third encoding unit.
  59. The method of claim 58, wherein each of said pixels comprises a light intensity detection unit, a readout control unit, and a light intensity acquisition unit, said method further comprising:
    outputting an electric signal corresponding to the light signal irradiated on the light source through the light intensity detection unit, wherein the electric signal is used for indicating the light intensity;
    the outputting of the polarity information by the threshold comparing unit includes:
    outputting the polarity information through the threshold comparing unit when the light intensity conversion amount is determined to exceed a preset threshold according to the electric signal;
    The method further comprises the steps of:
    responding to the received polarity signal, and indicating the light intensity acquisition unit to acquire and buffer the electric signal corresponding to the polarity information receiving time through the read-out control unit;
    and encoding a first electric signal according to a third preset bit, wherein the first electric signal is the electric signal which is acquired by the light intensity acquisition unit and corresponds to the first receiving moment of the polarity information, and the third preset bit is the maximum bit which is preset by the vision sensor and is used for representing the characteristic information of the light intensity.
  60. The method of claim 59, further comprising:
    and encoding the electric signals acquired by the light intensity acquisition unit according to the third preset bit at intervals of preset time.
  61. The method according to any one of claims 57 to 60, wherein said encoding the first differential value by the third encoding unit of the vision sensor chip according to a first preset bit comprises:
    and when the first differential value is smaller than the preset threshold value, encoding the first differential value according to the first preset bit.
  62. The method according to any one of claims 57 to 61, wherein said encoding the first differential value by the third encoding unit of the vision sensor chip according to a first preset bit, further comprises:
    And when the first differential value is not smaller than the preset threshold value, the first residual differential value and the preset threshold value are encoded according to the first preset bit, and the first residual differential value is the differential value of the differential value and the preset threshold value.
  63. The method of claim 62, wherein when the first differential value is not less than the predetermined threshold, encoding the first residual differential value and the predetermined threshold according to the first preset bit comprises:
    when the first residual differential value is not smaller than the preset threshold value, a second residual differential value is encoded according to the first preset bit, and the second residual differential value is the difference value between the first residual differential value and the preset threshold value;
    performing first encoding on the preset threshold according to the first preset bit;
    and carrying out second encoding on the preset threshold value according to the first preset bit, wherein the first residual difference value comprises the second residual difference value and two preset threshold values.
  64. A decoding method, comprising:
    reading a data signal from the vision sensor chip by the acquisition circuit;
    decoding the data signal according to the first bit by a decoding circuit to obtain a differential value, wherein the differential value is smaller than a preset threshold value, the differential value is the difference value between the light intensity transformation amount measured by the vision sensor and the preset threshold value, the light intensity transformation amount exceeds the preset threshold value, and the vision sensor generates at least one light intensity change event.
  65. The decoding method of claim 64, further comprising:
    the data signal is decoded according to the second bit to obtain polarity information indicating whether the light intensity variation is increased or decreased.
  66. The decoding method of claim 64 or 65, further comprising:
    and decoding the data signal received at the first moment according to a third bit to acquire an electric signal corresponding to the optical signal which is output by the visual sensor and irradiated on the data signal, wherein the third bit is the maximum bit preset by the visual sensor and used for representing the characteristic information of the light intensity.
  67. The decoding method of claim 66, further comprising:
    and decoding the data signal received at the first moment according to the third bit at intervals of preset time length.
  68. The decoding method of any one of claims 64 to 67, wherein decoding the data signal by the decoding circuit based on the first bit to obtain the differential value comprises:
    the data signal is decoded according to the first bit to obtain a differential value and at least one of the predetermined thresholds.
  69. A vision sensor chip comprising a processor and a memory, said processor being coupled to said memory, characterized in that,
    the memory is used for storing programs;
    the processor configured to execute a program in the memory, so that the image processing apparatus performs the method according to any one of claims 11 to 16 or 33 to 38 or 57 to 63.
  70. A decoding device comprising a processor and a memory, said processor being coupled to said memory, characterized in that,
    the memory is used for storing programs;
    the processor configured to execute a program in the memory, so that the image processing apparatus performs the method according to any one of claims 17 to 20 or 39 to 44 or 64 to 68.
  71. A computer readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any of claims 11-16 or 33-38 or 57-63.
  72. A computer readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 17-20 or 39-44 or 64-68.
  73. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 11-16 or 33-38 or 57-63.
  74. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 17-20 or 39-44 or 64-68.
CN202080104370.3A 2020-12-31 2020-12-31 Visual sensor chip, method and device for operating visual sensor chip Pending CN116530092A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/141973 WO2022141351A1 (en) 2020-12-31 2020-12-31 Vision sensor chip, method for operating vision sensor chip, and device

Publications (1)

Publication Number Publication Date
CN116530092A true CN116530092A (en) 2023-08-01

Family

ID=82258823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080104370.3A Pending CN116530092A (en) 2020-12-31 2020-12-31 Visual sensor chip, method and device for operating visual sensor chip

Country Status (2)

Country Link
CN (1) CN116530092A (en)
WO (1) WO2022141351A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117319450A (en) * 2023-11-27 2023-12-29 成都秦川物联网科技股份有限公司 Ultrasonic metering instrument data interaction method, device and equipment based on Internet of things
CN117479031A (en) * 2023-12-22 2024-01-30 珠海燧景科技有限公司 Pixel arrangement structure of event sensor and denoising method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117879714A (en) * 2023-06-27 2024-04-12 脉冲视觉(北京)科技有限公司 Compression encoding method, circuit, electronic device and storage medium for pulse signal
JP2024090012A (en) * 2022-12-22 2024-07-04 ソニーセミコンダクタソリューションズ株式会社 Signal processing apparatus, signal processing method, and imaging system
CN116563419B (en) * 2023-07-11 2023-09-19 上海孤波科技有限公司 Correction method and device for wafer map configuration data, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9554100B2 (en) * 2014-09-30 2017-01-24 Qualcomm Incorporated Low-power always-on face detection, tracking, recognition and/or analysis using events-based vision sensor
KR102503543B1 (en) * 2018-05-24 2023-02-24 삼성전자주식회사 Dynamic vision sensor, electronic device and data transfer method thereof
US10909824B2 (en) * 2018-08-14 2021-02-02 Samsung Electronics Co., Ltd. System and method for pulsed light pattern capturing using a dynamic vision sensor
CN109005329B (en) * 2018-09-19 2020-08-11 广东工业大学 Pixel unit, image sensor and camera
CN110971792B (en) * 2018-09-29 2021-08-13 华为技术有限公司 Dynamic vision sensor
US10827135B2 (en) * 2018-11-26 2020-11-03 Bae Systems Information And Electronic Systems Integration Inc. BDI based pixel for synchronous frame-based and asynchronous event-driven readouts
KR102707749B1 (en) * 2019-03-28 2024-09-23 삼성전자주식회사 Dynamic vision sensor configured to calibrate event signals using optical black region and method of operating the same

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117319450A (en) * 2023-11-27 2023-12-29 成都秦川物联网科技股份有限公司 Ultrasonic metering instrument data interaction method, device and equipment based on Internet of things
CN117319450B (en) * 2023-11-27 2024-02-09 成都秦川物联网科技股份有限公司 Ultrasonic metering instrument data interaction method, device and equipment based on Internet of things
CN117479031A (en) * 2023-12-22 2024-01-30 珠海燧景科技有限公司 Pixel arrangement structure of event sensor and denoising method

Also Published As

Publication number Publication date
WO2022141351A1 (en) 2022-07-07

Similar Documents

Publication Publication Date Title
JP7516675B2 (en) Pose estimation method and related apparatus
CN116113975A (en) Image processing method and device
CN116134829A (en) Image processing method and device
CN116114260A (en) Image processing method and device
CN114946169B (en) Image acquisition method and device
CN116530092A (en) Visual sensor chip, method and device for operating visual sensor chip
CN110445978B (en) Shooting method and equipment
CN116134484A (en) Image processing method and device
CN111050269B (en) Audio processing method and electronic equipment
US20190108622A1 (en) Non-local means denoising
US11810269B2 (en) Chrominance denoising
CN113052056B (en) Video processing method and device
CN113709464B (en) Video coding method and related equipment
WO2021077878A1 (en) Image processing method and apparatus, and electronic device
CN113810601A (en) Terminal image processing method and device and terminal equipment
CN113850726A (en) Image transformation method and device
CN113920010A (en) Super-resolution implementation method and device for image frame
CN114449151B (en) Image processing method and related device
CN112188094B (en) Image processing method and device, computer readable medium and terminal equipment
CN115225799B (en) Image processing method and terminal equipment
WO2022033344A1 (en) Video stabilization method, and terminal device and computer-readable storage medium
CN117519555A (en) Image processing method, electronic equipment and system
CN115706869A (en) Terminal image processing method and device and terminal equipment
CN117974519B (en) Image processing method and related equipment
CN118071611A (en) Image fusion method, device, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination