CN102867513A

CN102867513A - Pseudo-Zernike moment based voice content authentication method

Info

Publication number: CN102867513A
Application number: CN2012102787243A
Authority: CN
Inventors: 王宏霞; 刘正辉
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2012-08-07
Filing date: 2012-08-07
Publication date: 2013-01-09
Anticipated expiration: 2032-08-07
Also published as: CN102867513B

Abstract

The invention discloses a pseudo-Zernike moment based voice content authentication method. The method comprises the following steps of: during watermark embedding, dividing an original voice signal A into P frames, and dividing each frame into N sections; generating a watermark W according to an average value of the amplitudes of n-order pseudo-Zernike moments of discrete cosine transformation (DCT) low-frequency coefficients of the first N/2 sections of each frame; and embedding the watermark by quantizing pseudo-Zernike moments of the DCT low-frequency coefficients of the last N/2 sections of each frame, and thus obtaining a watermark-containing voice signal A'. The method has the advantages that by fully using the characteristic of close relevancy of the amplitudes of the pseudo-Zernike moments of the DCT low-frequency coefficients of the voice signal and voice contents, and the processing robustness of the conventional voice signal, the sensitivity of attack on malicious tampering is ensured, and high tolerance on certain conventional voice signal processing capacity is ensured.

Description

A kind of voice content authentication method based on Zernike pseudo-matrix

Technical field

The present invention relates to a kind of speech recognition, especially the solution of voice content authenticity and integrity authentication question.

Background technology

In recent years, extensively the popularizing of the fast development of digitize voice communication and various voice products, and the appearance of various powerful audio frequency process software so that the transmission of digital speech with use day by day become frequent with extensively.Meanwhile, distort the voice content data of transmitting and storing and become relatively easy.For example, one section important court testimony recording, its consequence is well imagined if the key partial content is maliciously tampered in storage, transmission course!Therefore, how to differentiate whether one section important or responsive voice content was tampered, and where had been tampered, whether voice record source is true, credible, and the authentication question that these relate to the digital speech authenticity has caused that Chinese scholars studies interest greatly.Audio watermarking technique occurs just being subject to the people's attention from the nineties in last century, and becomes the focus of information security research field as a kind of technological means of protecting audio frequency.

Compare with sound signal, it is low that voice signal has sampling rate, and normal signal is processed the characteristics such as more responsive.Therefore, existing a lot of audio content identifying algorithms can't be used for the voice content authentication, and the effect that perhaps is used for the voice content authentication is not very desirable.In the actual life, more being to solve Copyright Protection for audio frequency, then more is to solve content authenticity and integrated authentication problem for voice.Voice content authentication techniques based on digital watermarking, if the watermark that embeds and voice self content are irrelevant, can increase on the one hand the transmission quantity of information, also there is on the other hand certain potential safety hazard, so come the voice authentication algorithm of generating watermark just to have more Research Significance and practical value based on voice unique characteristics or content.

The amplitude of Zernike pseudo-matrix (Zernike square) has the feature of rotational invariance, and this feature has been widely used in the fields such as image representation, image retrieval and image watermark, and the application on audio frequency also seldom.Document " Robust audio watermarking based on low-order Zernike moments " (Xiang Shi-jun, Huang Ji-wu, Yang Rui, 5 ^ThInternational Workshop on Digital Watermarking, pp226-240, Oct.2006) at first audio frequency is carried out one dimension to the conversion of two dimension, then corresponding 2D signal is carried out the Zernike conversion.The amplitude that has proved by experiment the Zernike square is processed normal signal has very strong robustness; Analyze simultaneously the amplitude of Zernike square and the linear relationship of audio samples value, proposed thus the robust audio watermarking algorithm based on low order Zernike square.Document " A pseudo-Zernike moments based audio watermarking scheme robust against desynchronization attacks " (Wang Xiang-yang, Ma Tian-xiao, Niu Pan-pan, Computers and Electrical Engineering, vol.37, no.4, pp.425-443, July 2011) at first embed synchronous code in time domain based on average statistical, then quantize the amplitude embed watermark of Zernike pseudo-matrix, proposed the Audio Watermarking Algorithm based on the synchronous attack resistant of Zernike pseudo-matrix.For the above-mentioned watermarking algorithm based on Zernike pseudo-matrix (Zernike square), on the one hand, need to calculate the Zernike pseudo-matrix of all sample points, calculated amount is larger, and the time of expending is longer.The embedding of watermark is to finish by the sample value of each audio section of convergent-divergent in proportion.The analysis showed that directly the scalable audio sample value is larger to the change amount of original audio, the quality of original audio signal is caused larger destruction; On the other hand, the embedded location of watermark and method are disclosed, and the calculating of the feature of each audio frame (Zernike pseudo-matrix) also is known.So the assailant can find the position of each audio frame and calculate the feature of every frame, the re-quantization Zernike pseudo-matrix removes the watermark of embedding, makes algorithm lose the effect of protection copyright.Perhaps, the assailant can replace the audio frequency that contains watermark with other audio section, and the audio content after then quantizing to replace makes it satisfy the correct condition of extracting of watermark, and its Content Implementation is attacked.Therefore, the content-based strong voice content identifying algorithm of anti-attack ability of research has important practical significance.

Summary of the invention

In view of the deficiencies in the prior art, the object of the present invention is to provide a kind of voice content identifying algorithm based on Zernike pseudo-matrix, this algorithm can effectively be distinguished the normal signal of voice is processed operation and malicious attack, and can effective location the voice content position of maliciously distorting, thereby realize the authenticity and integrity authentication of voice content.

For realizing such purpose, the robustness that the present invention processes normal signal take the Zernike pseudo-matrix amplitude of DCT low frequency coefficient has designed a kind of new watermark generation and embedding grammar as foundation.

A kind of voice content authentication method based on Zernike pseudo-matrix can effectively be distinguished normal signal and process operation and malicious attack, simultaneously to malicious attack tampering location effectively.Thereby realize the authenticity and integrity authentication of voice content, comprise following concrete steps:

(1) watermark embeds: at first K sample point from voice signal begins primary speech signal A is divided into P frame (K is as the key of watermaking system), and every frame is divided into the N section.Then calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and obtain the average of Zernike pseudo-matrix amplitude, by average generation watermark W.The watermark that obtains is embedded in the rear N/2 section of every frame by the Zernike pseudo-matrix that quantizes the DCT low frequency coefficient, and the voice signal that contains watermark that obtains is designated as A '

(2) voice content verification process: similar with watermark embed process, at first from the k of voice signal to be detected ₁Individual sample point begins A ^*Be divided into the P frame, every frame is divided into the N section.Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '.Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient behind every frame, go out watermark W by the magnitude extraction of Zernike square ^*Compare W ^*And W ', judge that those different places, corresponding position are the position that voice signal was tampered, thereby realized the authentication of voice content authenticity and integrity.

Compare with existing voice watermarking algorithm for content authentication, the present invention utilizes the content of voice to come generating watermark, and receiving end has also been received the watermark that is embedded in the voice signal when receiving voice signal.Thereby reduced transmission bandwidth, saved resource; Also strengthened simultaneously the security that watermark transmits.The embedding of watermark only need to be carried out pseudo-Zernike conversion to the DCT low frequency coefficient, has improved the efficient of algorithm and the ability that watermark tolerance normal signal is processed.So the present invention is easier to practical application.

Description of drawings

Fig. 1 is the moisture indo-hittite tone signal figure of the embodiment of the invention.

Fig. 2 is to the voice signal figure after the quiet attack of Fig. 1 part voice content.

Fig. 3 is to voice signal figure corresponding behind Fig. 1 partial content substitution attack.

Fig. 4 is the tampering location result of Fig. 2.

Fig. 5 is the tampering location result of Fig. 3.

Fig. 6 is for can not listen property testing the results list.

Fig. 7 is the robustness test result tabulation that normal signal is processed.

Embodiment

Below in conjunction with appendix and embodiment technical scheme of the present invention is further described.

1, the generation of watermark and embedding:

(1) minute frame of speech data and the division of every frame voice segments.With primary speech signal A={a (l), 1≤l≤LA+K} is divided into P frame (K is as the key of watermaking system), and every frame length is I=LA/P, the i frame be designated as A (i) (i=1,2 ..., P).Every frame is divided into the N section, and every section length is I/N, and i frame j section is designated as A (i, j), 1≤i≤P, 1≤j≤N.

(2) dct transform.A (i, j) is done dct transform, the DCT coefficient of D (i, j) expression i frame j section, the DCT coefficient of getting the front N/2 section of i frame is designated as D ₁(i, j).

(3) the heavy Zernike pseudo-matrix of Calculation of N-order m.With D ₁The front m of (i, j) ₁* m ₁Individual low frequency coefficient is transformed to 2D signal.Calculate as follows the heavy Zernike pseudo-matrix of its n rank m:

Note { V _NmBe pseudo-Zernike polynomial expression, it is the set that a series of complex value polynomial expressions consist of, { V _NmThe interior Complete Orthogonal base of component unit circle, it is defined as follows formula

V _nm(x，y)＝V _nm(ρ，θ)＝R _nm(ρ)exp(imθ)

Wherein n is nonnegative integer, and m is for satisfying | the integer of m|≤n.The note true origin is l to the vector of point (x, y), ρ=| l|, θ are that x axle forward is to the anticlockwise angle of vectorial l.R _Nm(ρ) be radial polynomial, namely

R_{nm} (ρ) = Σ_{s = 0}^{n - | m |} \frac{{(- 1)}^{s} (2 n + 1 - s)!}{s! (n + | m | + 1 - s)! (n - | m | - s)!} ρ^{n - s}

2D signal f (x, y) (x in the coordinate plane ²+ y ²≤ 1) can be expressed as V _NmThe linear combination of (x, y), as shown in the formula

f (x, y) = Σ_{n = 0}^{\infty} Σ_{m = - n}^{n} A_{nm} V_{nm}^{*} (x, y)

Wherein

And V _Nm(x, y) be conjugate complex number each other, A _NmBe the heavy Zernike pseudo-matrix of n rank m, be defined as follows:

A_{nm} = \frac{n + 1}{π} \underset{x}{Σ} \underset{y}{Σ} f (x, y) V_{nm}^{*} (x, y), x^{2} + y^{2} \leq 1

(4) generation of voice watermark.Get the front N/2 section of each frame and come generating watermark.Note 1≤i≤P, 1≤j≤N/2 are the amplitude sum of n rank Zernike pseudo-matrix, calculate C ₁The average of (i, j)

{\overset{&OverBar;}{C}}_{1} (i) = Σ_{j = 1}^{N / 2} C_{1} (i, j) / N / 2 .

Note

Most significant digit be M ₁(i), M ₁(i) corresponding scale-of-two is made as W ₁(i)={ w ₁(i, t), 1≤t≤N/2}, W ₁(i) be the watermark that the i frame generates.

(5) embedding of watermark.Get that the DCT coefficient of N/2 section is designated as D behind the i frame ₂(i, j), N2+1≤j≤N is with D ₂The front m of (i, j) ₂* m ₂Individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as C ₂(i, j).Note

Most significant digit be M ₂(i, j), watermark embeds according to the methods below:

Work as w ₁(i, t)=1 o'clock

M_{2}^{'} (i, j) = \{\begin{matrix} M_{2} (i, j), & M_{2} (i, j) \mod 2 = 1 \\ M_{2} (i, j) + 1, & M_{2} (i, j) \mod 2 = 0 \end{matrix}

Work as w ₁(i, j)=0 o'clock

M_{2}^{'} (i, j) = \{\begin{matrix} M_{2} (i, j), & M_{2} (i, j) \mod 2 = 0 \\ M_{2} (i, j) + 1, & M_{2} (i, j) \mod 2 = 1 \end{matrix}

In the following formula, work as M ₂(i, j)=9 o'clock, M ₂' (i, j)=M ₂(i, j)-1; J=t+N2,1≤t≤N2.Use M ₂' (i, j) replaces C ₂(i, j) integral part most significant digit, and an inferior high position is quantified as 5, corresponding value is designated as C ₂' (i, j).

With D ₂The front m of (i, j) ₂* m ₂Individual low frequency coefficient enlarges α ₂(i, j) doubly, corresponding value is designated as D ₂' (i, j), α ₂(i, j) can be obtained by following formula:

α_{2} (i, j) = \frac{C_{2}^{'} (i, j)}{C_{2} (i, j)}, N / 2 + 1 \leq j \leq N

To D ₂' (i, j) does inverse DCT, and the signal that obtains is the latter half content of i frame, and i frame first half and latter half combine and be the moisture indo-hittite tone signal of i frame.

(6) P speech frame carried out such embedding successively, until complete all speech frames of embedding just obtain moisture indo-hittite sound A '.

2, voice content authentication:

(1) step (1)～(4) of similar watermark generation and telescopiny are to voice signal A to be detected ^*Begin to be divided into the P frame from K sample point, every frame is divided into the N section, and the i frame is designated as A ^*(i) (i=1,2 ..., P), i frame j section is designated as A ^*(i, j), 1≤j≤N; To A ^*(i, j) is DCT, and corresponding DCT coefficient is designated as D ^*(i, j).The DCT coefficient of getting the front N/2 section of i frame is designated as

Will Front m ₁* m ₁Individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as

1≤j≤N/2.Calculate

The average of 1≤j≤N/2

{\overset{&OverBar;}{C}}_{1}^{*} (i) = Σ_{j = 1}^{N / 2} C_{1}^{*} (i, j) / N / 2 .

Note

Most significant digit be

Two-value turns to

W_{1}^{*} (i) = {w_{1}^{*} (i, t), 1 \leq t \leq N / 2},

Be the watermark that the i frame generates reconstruct.

(2) get that the DCT coefficient of N/2 section is designated as behind the i frame

Will

Front m ₂* m ₂Individual low frequency coefficient is transformed to 2D signal, and calculates its n rank Zernike pseudo-matrix amplitude sum, is designated as

N2+1≤j≤N.Note

Most significant digit be

Carry out the watermark that following calculating obtains extraction

{\hat{W}}_{1}^{*} (i) = {{\hat{w}}_{1}^{*} (i, t), 1 \leq t \leq N / 2}

{\hat{w}}_{1}^{*} (i, t) = \{\begin{matrix} 1 & M_{2}^{*} (i, t + N / 2) \mod 2 = 1 \\ 0, & M_{2}^{*} (i, t + N / 2) \mod 2 = 0 \end{matrix}

(3) definition identification sequences TA (i) is

TA (i) = Σ_{t = 1}^{N / 2} {\hat{w}}_{1}^{*} (i, t) &CirclePlus; w_{1}^{*} (i, t), T &Element; {0,1}

If TA (i)=0 show that then i frame voice content is real, otherwise TA (i)=1 shows that i frame voice content is tampered.

The effect of the inventive method can be verified by following performance evaluation:

1, can not listening property

Choosing sampling rate is 22.05kHz, and sample length is that the monophony voice signal of 1024078,16 quantifications is done and can not be listened property testing.Fig. 6 has provided the SNR value of 3 kinds of sound-types, and can find out by test result that this paper algorithm has well can not listening property.

2, the robustness of normal signal being processed

Test the robustness that this paper algorithm is processed normal signal with error rate BER (bit error rate), BER is defined as follows formula

BER = \frac{E}{T} \times 100 %

Wherein, E is for extracting watermark error bit number, and T is the total bit number of voice signal institute water mark inlaying.The BER value more bright algorithm of novel is stronger to the robustness of normal signal processing.

Fig. 7 has listed the BER value (test result of other type voice signal similarly) of adult male voice after processing through some normal signals, can find out that the inventive method has stronger robustness to the conventional voice signal processing such as MP3 compression, low-pass filtering, resampling.

3, malice tampering location

Moisture indo-hittite tone signal has as shown in Figure 1 been carried out respectively quiet and substitution attack.Voice signal after the attack is distinguished as shown in Figures 2 and 3, and corresponding tampering location result respectively as shown in Figure 4 and Figure 5.Among Fig. 4, Fig. 5, the frame of TA (i)=1 represents that by the part of malicious attack the frame of TA (i)=0 represents not have the part of malicious attack.From the result of tampering location, the inventive method is to malicious attack tampering location effectively.

Above-mentioned description for preferred embodiment is too concrete; those of ordinary skill in the art will appreciate that; embodiment described here is in order to help the reader to understand principle of the present invention, should to be understood to that the protection domain of inventing is not limited to such special statement and embodiment.

Claims

1. voice content authentication method based on Zernike pseudo-matrix is processed operation and malicious attack in order to distinguish normal signal, and to malicious attack tampering location effectively, concrete steps comprise simultaneously:

(1) watermark embeds: at first K sample point from voice signal begins primary speech signal A is divided into the P frame, and every frame is divided into the N section; Then N/2 section discrete cosine becomes the n rank Zernike pseudo-matrix amplitude sum of DCT low frequency coefficient before calculating every frame, and obtains the average of Zernike pseudo-matrix amplitude, by average generation watermark W; The watermark that obtains is embedded in the rear N/2 section of every frame by the Zernike pseudo-matrix that quantizes the DCT low frequency coefficient, obtains moisture indo-hittite sound A ';

(2) voice content verification process: similar with watermark embed process, at first from voice signal A to be detected ^*K ₁Individual sample point begins voice are divided into the P frame, and every frame is divided into the N section.Calculate the n rank Zernike pseudo-matrix amplitude sum of the front N/2 section DCT low frequency coefficient of every frame, and ask its average, by average generation watermark W '; Calculate the n rank Zernike pseudo-matrix amplitude of N/2 section DCT low frequency coefficient behind every frame, go out watermark W by the magnitude extraction of Zernike square ^*Compare W ^*And W ', judge that different place, corresponding position is the position that voice signal was tampered, thereby realized the authentication of voice content authenticity and integrity.