CN106887230A

CN106887230A - A kind of method for recognizing sound-groove in feature based space

Info

Publication number: CN106887230A
Application number: CN201510947369.8A
Authority: CN
Inventors: 祝铭明
Original assignee: Yutou Technology Hangzhou Co Ltd
Current assignee: Yutou Technology Hangzhou Co Ltd
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2017-06-23

Abstract

The invention discloses a kind of method for recognizing sound-groove in feature based space, belong to technical field of biometric identification；Method includes：Pre-set the first frequency range of high band, and low-frequency range the second frequency range, the two frequency ranges are proceeded as follows respectively then：Voice is divided into multiple identification sections；Characteristic point, and and then formation identification feature space are identified after eigentransformation is done to each identification section；Identification feature space is divided into many sub-spaces；Done according to training sentence and obtain time sequence characteristic point after eigentransformation and be dispensed into each sub-spaces, then the sequence number according to subspace forms First ray, and and then form training identification feature.Similarly, obtained testing identification feature according to test statement；Finally test identification feature and training identification feature are contrasted, and the result of Application on Voiceprint Recognition is obtained according to comparing result treatment.The beneficial effect of above-mentioned technical proposal is：The amount of calculation of Application on Voiceprint Recognition is smaller, saves storage and computing resource.

Description

A kind of method for recognizing sound-groove in feature based space

Technical field

The present invention relates to technical field of biometric identification, more particularly to a kind of Application on Voiceprint Recognition in feature based space Method.

Background technology

Application on Voiceprint Recognition and fingerprint, iris, recognition of face etc. are the same, belong to one kind of bio-identification, are recognized To be most natural living things feature recognition identity authentication mode.Can easily to saying by Application on Voiceprint Recognition The identity for talking about people is verified, and the privacy of this verification mode is very high, because the usual nothing of vocal print Method and is stolen at fraudulent copying, thus Application on Voiceprint Recognition various fields especially smart machine field have it is prominent The application advantage for going out.

The basic process of Application on Voiceprint Recognition is voice collecting, feature extraction, disaggregated model.Common voice is special It is, using the short-term stationarity characteristic of voice, to be converted speech into using U.S. Cepstrum Transform method to levy extracting method Identification feature collection, is modeled the classification mould for obtaining speaker to speaker's voice by learning process afterwards Type, then obtains the result of Application on Voiceprint Recognition by all kinds of identification models.But said process exists following several Individual problem：(1) model of above-mentioned Application on Voiceprint Recognition needs to learn more samples to apply；(2) foundation The complexity of the calculating of the Application on Voiceprint Recognition that above-mentioned identification model is carried out is higher；(3) according to above-mentioned identification mould The model data amount that type is calculated is larger.In sum, for the intelligence system of resource-constrained, The above-mentioned problem both deposited limits the application of voiceprint recognition algorithm of the prior art.

The content of the invention

According to the above-mentioned problems in the prior art, a kind of technical scheme of method for recognizing sound-groove is now provided, Specifically include：

A kind of method for recognizing sound-groove in feature based space, wherein：Default one first frequency range and one second frequency Section, first frequency range is higher than second frequency range, is also included：

Step S1, will be respectively at different background under first frequency range or second frequency range, no The identification section of length-specific is divided into the voice of voice；

Step S2, corresponding multiple identification feature is obtained after eigentransformation is done to identification section each described, And correspondence described first is respectively constituted using all described identification feature for being associated with all identification sections The identification feature space of frequency range, or correspond to the identification feature space of second frequency range；

Step S3, plural sub-spaces are divided into by the identification feature space, and with description information each The subspace being divided, and respectively to imparting one corresponding sequence number in subspace each described；

Step S4, will be associated with training in first frequency range or in second frequency range respectively Every of model training sentence does the time sequence characteristic point for obtaining including corresponding time sequence characteristic point after eigentransformation Collection, each described subspace that each described time sequence characteristic point is respectively allocated under same frequency range, according to every The sequence number of the corresponding subspace of the individual time sequence characteristic point formed respectively be associated with first frequency range or The First ray of the second frequency range described in person, and and then the corresponding training identification feature of formation；

Step S5, will be associated with test in first frequency range or in second frequency range respectively Every test statement of model obtains the temporal aspect point set after doing eigentransformation, and each described sequential is special Levy and be a little respectively allocated into subspace each described, according to the corresponding son of each described time sequence characteristic point The sequence number in space forms the second sequence for being associated with first frequency range or second frequency range respectively, and And then form corresponding test identification feature；

Step S6, contrast is associated with the training identification feature of first frequency range and the test identification Whether feature is similar, and the confirmation result of Application on Voiceprint Recognition is obtained according to comparing result treatment, or

For being associated with whether the training identification feature of second frequency range tests identification feature with described It is similar, and the confirmation result of Application on Voiceprint Recognition is obtained according to comparing result treatment.

Preferably, the method for recognizing sound-groove, wherein, in the step S4, each described time sequence characteristic point It is dispensed into each described subspace according to nearest neighbouring rule.

Preferably, the method for recognizing sound-groove, wherein, in the step S4, the sequential will be dispensed into Each described subspace of characteristic point constitutes a spatial sequence according to the sequence number, and by the spatial sequence As the First ray, to form the training identification feature.

Preferably, the method for recognizing sound-groove, wherein, in the step S5, the sequential will be dispensed into Each described subspace of characteristic point constitutes a spatial sequence according to the sequence number, and by the control sequence As second sequence, to form the test identification feature.

Preferably, the method for recognizing sound-groove, wherein, in the step S4, the spatial sequence includes It is associated with the data group of each subspace, one sequence number of a data group correspondence；

After the spatial sequence is formed, also including respectively in first frequency range or second frequency The process of the first data compression that the spatial sequence of section is carried out, specially：

Step S41, records the sequence number of each data group, and record is associated with each described sequence number Repetition sequence number quantity；

Step S42, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing State and step S43 is turned to when repetition sequence number quantity is 1 data group；

Step S43, deletes the corresponding data group of the sequence number that the repetition sequence number quantity is 1；

Step S44, judge the deleted data group previous data group the sequence number whether with quilt The sequence number of latter data group of the data group deleted is identical：

If identical, by the previous data group and latter data combination simultaneously；

If differing, retain the previous data group and the latter data group；

All described data group in the spatial sequence forms institute after being performed both by first data compression State First ray.

Preferably, the method for recognizing sound-groove, wherein, in the step S5, the spatial sequence includes It is associated with the data group of each subspace, one sequence number of a data group correspondence；

After the spatial sequence is formed, also including respectively in first frequency range or second frequency The process of the second data compression that the spatial sequence of section is carried out, specially：

Step S51, records the sequence number of each data group, and record is associated with each described sequence number Repetition sequence number quantity；

Step S52, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing State and step S53 is turned to when repetition sequence number quantity is 1 data group；

Step S53, deletes the corresponding data group of the sequence number that the repetition sequence number quantity is 1；

Step S54, judge the deleted data group previous data group the sequence number whether with quilt The sequence number of latter data group of the data group deleted is identical：

If differing, retain the previous data group and the latter data group；

All described data group in the spatial sequence forms institute after being performed both by second data compression State the second sequence.

Preferably, the method for recognizing sound-groove, wherein：The eigentransformation is U.S. Cepstrum Transform.

Preferably, the method for recognizing sound-groove, wherein：During the U.S. Cepstrum Transform is performed, respectively Every sentence is divided into the frames of 20ms mono-, and the frame of 10ms is pipetted out is associated with the sentence Sentence frame；

Then, remove Jing Yin in units of frame, 12 are stayed per frame after Cepstrum Transform is helped to the sentence frame Coefficient, and constituted the identification feature with 12 coefficients.

Preferably, the method for recognizing sound-groove, wherein：In the step S3, will using " K- averages " algorithm Identification feature space is divided into several subspaces, and the subspace of each after division is respectively with " K- averages " Central point be recorded as the description information of the correspondence subspace.

The beneficial effect of above-mentioned technical proposal is：A kind of method for recognizing sound-groove in feature based space is provided, So that the amount of calculation of Application on Voiceprint Recognition is smaller, storage and computing resource can be saved, and overcome based on general The problem that the modeling method of rate statistics is present, the intelligence system for being suitable for limited system resources is used.Simultaneously The first frequency for pre-setting the speaker for representing children and the second frequency of the speaker for representing adult And be compared respectively, further improve the degree of accuracy of Application on Voiceprint Recognition.

Brief description of the drawings

During Fig. 1 is preferred embodiment of the invention, a kind of method for recognizing sound-groove in feature based space Overview flow chart；

During Fig. 2 is preferred embodiment of the invention, the schematic flow sheet of the first data compression；

During Fig. 3 is preferred embodiment of the invention, the schematic flow sheet of the second data compression.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the invention, and It is not all, of embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art are without work The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.

It should be noted that in the case where not conflicting, the embodiment in the present invention and the spy in embodiment Levying to be mutually combined.

The invention will be further described with specific embodiment below in conjunction with the accompanying drawings, but not as of the invention Limit.

In preferred embodiment of the invention, based on the above-mentioned problems in the prior art, one is now provided Plant the method for recognizing sound-groove in feature based space.The method for recognizing sound-groove goes for Voice command In the smart machine of function, intelligent robot being for example applied in personal air etc..

In above-mentioned method for recognizing sound-groove, one first frequency range and one second frequency range, described the are preset first One frequency range is higher than second frequency range.Specifically, for different users, the frequency of its voice May be different, carry out dividing the relatively low frequency range of the speaker that can be divided into correspondence adult roughly to frequency, And the frequency range higher of the speaker of correspondence children.

Further, for the speaker for growing up and the speaker of children, its Application on Voiceprint Recognition may And differ, the structure of the extraction and corresponding sound-groove model that are characterized in particular in its vocal print feature might have Difference.Therefore in technical solution of the present invention, two frequency ranges of phonetic incepting are set, and according to the two Frequency range recognizes the speech differentiation of the voice of adult and children, so as to further lift accuracy of identification.Change Yan Zhi, the first the above frequency range can be used to indicate that the voice band of the speaker of children, second Frequency range can be used to indicate that the voice band of the speaker of adult.Therefore, preferred embodiment of the invention In, above-mentioned two frequency range can accordingly be changed according to the constantly cumulative of experimental data, so as to reach one The purpose of the individual voice band that accurately can respectively represent adult speaker and children speaker.

Then in preferred embodiment of the invention, as shown in figure 1, above-mentioned method for recognizing sound-groove is specifically included：

Step S1, will be respectively at different background under the first frequency range or the second frequency range, different voice Voice is divided into the identification section of length-specific；

Step S2, obtains corresponding multiple identification feature, and adopt after eigentransformation is done to each identification section The identification feature for respectively constituting the first frequency range of correspondence with all identification features for being associated with all identification sections is empty Between, or the identification feature space for corresponding to the second frequency range；

Step S3, plural sub-spaces are divided into by identification feature space, and each is drawn with description information The subspace divided, and respectively to every sub-spaces one corresponding sequence number of imparting；

Step S4, will be associated with the every of training pattern in the first frequency range or in the second frequency range respectively Bar training sentence does the temporal aspect point set for obtaining including corresponding time sequence characteristic point after eigentransformation, each Time sequence characteristic point is respectively allocated each sub-spaces under same frequency range, according to each time sequence characteristic point correspondence The sequence number of subspace form the First ray for being associated with the first frequency range or the second frequency range respectively, and and then Form corresponding training identification feature；

Step S5, will be associated with the every of test model in the first frequency range or in the second frequency range respectively Bar test statement obtains temporal aspect point set after doing eigentransformation, each time sequence characteristic point be respectively allocated into Each sub-spaces, the sequence number according to the corresponding subspace of each time sequence characteristic point forms be associated with first respectively Second sequence of frequency range or the second frequency range, and and then the corresponding test identification feature of formation；

Step S6, contrast be associated with the training identification feature of the first frequency range with test identification feature whether phase Seemingly, and according to comparing result treatment the confirmation result of Application on Voiceprint Recognition is obtained, or

Whether the training identification feature for being associated with the second frequency range is similar to test identification feature, and according to Comparing result treatment obtains the confirmation result of Application on Voiceprint Recognition.

In preferred embodiment of the invention, on the basis of above-mentioned pre-setting, above-mentioned steps S1-S2 In, obtain first be respectively under the first frequency range or the second frequency range based on different background, different voice Voice, and these voices are divided into the identification section of length-specific.Specifically, can be by the different back ofs the body Scape, the corresponding every sentence of voice of different voice are divided into multiple sentence frames that 20ms is a frame, And pipette the sentence frame of 10ms, then remove Jing Yin in units of every frame, cepstrum is helped to speech frame Conversion, 12 coefficients are stayed per frame, and 12 coefficients are to constitute identification feature.The identification of all voice segments Feature constitutes identification feature collection, that is, constitutes corresponding identification feature space.

In preferred embodiment of the invention, in above-mentioned steps S3, will be recognized using " K- averages " algorithm Feature space is divided into plural sub-spaces, and the several subspaces after division are respectively with the center of " K- averages " Point is recorded as the data description of the subspace, and each sub-spaces are numbered, and record is per sub-spaces Description information sequence number corresponding with its.Above-mentioned steps are same under the first frequency range or the second frequency range Identification feature space perform respectively.

It is empty to the son under the first frequency range or the second frequency range respectively in preferred embodiment of the invention Between carry out such as the operation of above-mentioned step S4：Every training sentence that training pattern will be associated with does feature change Obtain including the temporal aspect point set of corresponding time sequence characteristic point after changing, each time sequence characteristic point is divided respectively Allocate each sub-spaces under same frequency range into, the sequence number difference according to the corresponding subspace of each time sequence characteristic point Formation is associated with the First ray of the first frequency range or the second frequency range, and and then the corresponding training identification of formation Feature.

Specifically, in preferred embodiment of the invention, so-called training sentence can be by instructing repeatedly The part of the training pattern that reference is carried out when internal system is compared for system is defaulted in after white silk.

Specifically, in preferred embodiment of the invention, in above-mentioned steps S4, by each temporal aspect Point is respectively allocated under same frequency range (the first frequency range or the second frequency range) according to nearest neighbouring rule In each sub-spaces, and the sequence number of the corresponding subspace of each time sequence characteristic point is recorded, ultimately form one Individual First ray, the First ray is made up of the sequence number of different subspaces, for example (2,2,4,8,8, 8th, 5,5,5,5,5), and then corresponding training identification feature is formed according to the First ray.

In preferred embodiment of the invention, similarly, in above-mentioned steps S5, respectively in above-mentioned Subspace under first frequency range or the second frequency range proceeds as follows：Test to being associated with test model Sentence obtains temporal aspect point set after doing eigentransformation, and each time sequence characteristic point is respectively allocated into each height Space, the sequence number according to the corresponding subspace of each time sequence characteristic point formed respectively be associated with the first frequency range or Second sequence of the frequency range of person second, and and then the corresponding test identification feature of formation.

In preferred embodiment of the invention, so-called test statement, it is associated with test model, that is, Need the sentence for comparing.

Specifically, in preferred embodiment of the invention, in above-mentioned steps S5, equally by above-mentioned test Each time sequence characteristic point in sentence is respectively allocated (first under same frequency range according to nearest neighbouring rule Frequency range or the second frequency range) each sub-spaces in, and it is empty to record the corresponding son of each time sequence characteristic point Between sequence number, ultimately form second sequence, the same sequence number by different subspaces of second sequence Composition, such as (2,3,3,5,5,8,6,6,6,4,4), and then according to the second sequence shape Into corresponding test identification feature.In preferred embodiment of the invention, above-mentioned steps S4 and step S5 Between and in the absence of the relation that mutually depends on, (i.e. the execution of step S5 is necessarily finished with step S4 Premised on), therefore above-mentioned steps S4 and step S5 can carry out simultaneously.Step is still shown in Fig. 1 The embodiment that S4 and step S5 orders are carried out.

In preferred embodiment of the invention, in above-mentioned steps S6, the training of above-mentioned formation is recognized special Test identification feature of seeking peace is compared, and the final result of Application on Voiceprint Recognition is obtained according to comparison result treatment.

Specifically, in above-mentioned steps S6, equally compared respectively in accordance with the first frequency range and the second frequency range It is right, test identification feature that will be under the first frequency range and the training identification feature being similarly under the first frequency range Compare, and the result of Application on Voiceprint Recognition is obtained according to comparison result treatment.Similarly, by the second frequency range Under test identification feature compare with the training identification feature being similarly under the second frequency range, and according to Comparison result treatment obtains the result of Application on Voiceprint Recognition.

Further, in preferred embodiment of the invention, in above-mentioned steps S4, wrapped in spatial sequence Include the data group for being associated with every sub-spaces, data group one sequence number of correspondence；

Then after spatial sequence is formed, also including respectively to the space in the first frequency range or the second frequency range The process of the first data compression that sequence is carried out is specific as shown in Fig. 2 being：

Step S41, records the sequence number of each data group, and record the repetition sequence number for being associated with each sequence number Quantity；

Step S42, judges whether that the repetition sequence number quantity of sequence number is 1, and repeat sequence number existing Step S43 is turned to when quantity is 1 data group；

Step S43, deletes the corresponding data group of sequence number for repeating that sequence number quantity is 1；

Step S44, judge deleted data group previous data group sequence number whether with it is deleted The sequence number of latter data group of data group is identical：

If identical, by previous data group and latter data combination simultaneously；

If differing, retain previous data group and latter data group；

All data groups in spatial sequence form First ray after being performed both by the first data compression.

Specifically, in preferred embodiment of the invention, during above-mentioned first data compression, record The sequence number of subspace and the quantity of same sequence number, using the quantity of sequence number and same sequence number as one group of data Arranged, when the quantity of same sequence number is 1, removed this group of data.In a foot stool of the invention Embodiment in, the data of serial number 4 only have 1, then deleted during the first data compression is carried out Fall this group of data.

If after this group of data were removed, sequence number and one group of rear data in the one group of data in data front In sequence number it is identical when, then by two combination simultaneously.The sequence number and the deleted data of the new data group for being formed The sequence number of the one group of data in front of group is identical, and the quantity of same sequence number is deleted this group of data front one The quantity of group data and the quantity sum of deleted one group of this group of data rear data.Or, deleting After this group of data, the sequence number in the one group of data in data front is different with the sequence number in the data of one group of rear, Then retain this two groups of data simultaneously.For example, in a preferred embodiment of the invention, working as serial number After 4 data group is removed, positioned at this group of serial number of the data of data previous group 2, positioned at this group of data The serial number 8,2 and 8 of the data of later group is differed, so retaining former data group.

In preferred embodiment of the invention, above-mentioned instruction is by the First ray after the first data compression Practice identification feature.

Correspondingly, in preferred embodiment of the invention, in above-mentioned steps S5, spatial sequence includes It is associated with the data group of every sub-spaces, data group one sequence number of correspondence；

Then after spatial sequence is formed, also including respectively to the space in the first frequency range or the second frequency range The process of the second data compression that sequence is carried out is specific as shown in figure 3, being：

Step S51, records the sequence number of each data group, and record the repetition sequence number for being associated with each sequence number Quantity；

Step S52, judges whether that the repetition sequence number quantity of sequence number is 1, and repeat sequence number existing Step S53 is turned to when quantity is 1 data group；

Step S53, deletes the corresponding data group of sequence number for repeating that sequence number quantity is 1；

Step S54, judge deleted data group previous data group sequence number whether with it is deleted The sequence number of latter data group of data group is identical：

If differing, retain previous data group and latter data group；

All data groups in spatial sequence form the second sequence after being performed both by the second data compression.

Specifically, it is similar to the step of described in above-mentioned steps S4, in step S5, equally records subspace Sequence number and same sequence number quantity, arranged the quantity of sequence number and same sequence number as one group of data Row.When the quantity of same sequence number is 1, remove this group of data.

Similarly, in preferred embodiment of the invention, above-mentioned the second sequence by the second data compression As test identification feature.

Then as mentioned above it is possible, in above-mentioned steps S6, eventually through will be in same frequency range (the first frequency Section or the second frequency range) under training identification feature and test identification feature compare, and according to comparison Result treatment obtains the result of final Application on Voiceprint Recognition.

The execution of above-mentioned steps causes that the amount of calculation of Application on Voiceprint Recognition is smaller, and discrimination more preferably, and needs place The data volume of reason is also relatively small.

The foregoing is only preferred embodiments of the present invention, not thereby limit embodiments of the present invention and Protection domain, to those skilled in the art, should can appreciate that all utilization description of the invention And the equivalent done by diagramatic content and the scheme obtained by obvious change, should include Within the scope of the present invention.

Claims

1. a kind of method for recognizing sound-groove in feature based space, it is characterised in that：Default one first frequency range with And one second frequency range, first frequency range is higher than second frequency range, is also included：

2. method for recognizing sound-groove as claimed in claim 1, it is characterised in that in the step S4, often The individual time sequence characteristic point is dispensed into each described subspace according to nearest neighbouring rule.

3. method for recognizing sound-groove as claimed in claim 1, it is characterised in that in the step S4, will Each the described subspace for being dispensed into the time sequence characteristic point constitutes a spatial sequence according to the sequence number, And using the spatial sequence as the First ray, to form the training identification feature.

4. method for recognizing sound-groove as claimed in claim 1, it is characterised in that in the step S5, will Each the described subspace for being dispensed into the time sequence characteristic point constitutes a spatial sequence according to the sequence number, And using the control sequence as second sequence, to form the test identification feature.

5. method for recognizing sound-groove as claimed in claim 3, it is characterised in that in the step S4, institute Stating spatial sequence includes being associated with the data group of each subspace, a data group correspondence one The individual sequence number；

If differing, retain the previous data group and the latter data group；

6. method for recognizing sound-groove as claimed in claim 4, it is characterised in that in the step S5, institute Stating spatial sequence includes being associated with the data group of each subspace, a data group correspondence one The individual sequence number；

If differing, retain the previous data group and the latter data group；

7. method for recognizing sound-groove as claimed in claim 1, it is characterised in that：The eigentransformation is U.S. Cepstrum Transform.

8. method for recognizing sound-groove as claimed in claim 7, it is characterised in that：In the execution U.S. cepstrum During conversion, every sentence is divided into the frames of 20ms mono- respectively, and the frame of 10ms is pipetted Go out to be associated with the sentence frame of the sentence；

9. method for recognizing sound-groove as claimed in claim 1, it is characterised in that：In the step S3, adopt Identification feature space is divided into several subspaces with " K- averages " algorithm, the son of each after division is empty Between the description information of the correspondence subspace is recorded as with the central point of " K- averages " respectively.