Talking about the Development Trend and Application Prospect of Speech Recognition Technology

First, the definition of speech recognition technology

Speech recognition technology, also known as Automated Speech Recognition AutomaTIc Speech RecogniTIon (ASR), aims to convert vocabulary content in human speech into computer readable input such as buttons, binary codes or sequences of characters. Unlike speaker recognition and speaker confirmation, the latter attempts to identify or confirm the speaker who made the speech rather than the vocabulary content contained therein.

Applications of speech recognition technology include voice dialing, voice navigation, indoor device control, voice document retrieval, and simple dictation data entry. Speech recognition technology combined with other natural language processing techniques such as machine translation and speech synthesis technology can be used to build more complex applications, such as speech-to-speech translation.

Talking about the Development Trend and Application Prospect of Speech Recognition Technology

Second, the principle of speech recognition technology

The speech recognition system prompts the customer to use the new password in a new situation so that the user does not need to remember the fixed password and the system will not be deceived by the recording. Text-related voice recognition methods can be classified into dynamic time warping or hidden Markov model methods. Text-independent voice recognition has been studied for a long time, and the performance degradation caused by inconsistent environments is a big obstacle in the application.

How it works:

The dynamic time warping method uses instantaneous, variable scrambling. In 1963, Bogert et al. published "Sequence Scrambling Analysis of Echoes". By exchanging the alphabetical order, they define a new signal processing technique with a broad terminology, and the calculation of the cepstrum usually uses a fast Fourier transform.

Since 1975, hidden Markov models have become very popular. Using the hidden Markov model, the statistical variation of the spectral features is measured. Examples of text-independent speech recognition methods are average spectral method, vector quantization method, and multivariate autoregressive method.

The average spectral method uses a favorable scrambling distance, and the influence of the phoneme in the speech spectrum is removed by the average spectrum. Using vector quantization, the set of short-term training eigenvectors of the speaker can be used directly to describe the essential features of the speaker. However, when the number of training vectors is large, this direct depiction is impractical because the amount of storage and computation becomes bizarre. So try to use vector quantization to find an effective way to compress the training data. Montacie et al applied multivariate autoregressive mode to determine the speaker characteristics in the time series of scrambling vectors, and achieved good results.

I want to fool the speech recognition system to have a high-quality recorder, which is not very easy to buy. A typical recorder cannot record the complete spectrum of sound, and the quality loss of the recording system must also be very low. For most speech recognition systems, the imitation sound will not succeed. The use of speech recognition to identify an identity is very complicated, so the speech recognition system will combine personal identification number identification or chip card.

Speech recognition systems benefit from inexpensive hardware, and most computers have sound cards and microphones that are easy to use. But speech recognition still has some shortcomings. Voice changes over time, so biometric templates must be used. Voice can also change due to cold, hoarseness, emotional stress or puberty. Speech recognition systems have a higher false positive rate than fingerprint recognition systems because people's voices are not as unique and unique as fingerprints. For fast Fourier transform calculations, the system requires a synergistic processor and more performance than a fingerprint system. Currently, speech recognition systems are not suitable for mobile applications or battery powered systems.

Talking about the Development Trend and Application Prospect of Speech Recognition Technology

Third, the technical realization of speech recognition

Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology. The most basic one is the selection of speech recognition unit.

(1) Selection of speech recognition unit. The basis of speech recognition research is the selection of speech recognition units. The speech recognition unit has three types of words (sentences), syllables and phonemes. The specific speech recognition unit is determined by the type of specific research task:

The word (sentence) unit is widely used in small and medium vocabulary speech recognition systems, but because the model library is too large, the model matching algorithm is complex and the real-time performance is not strong, so it is not suitable for large vocabulary systems;

The syllable unit is mainly used for Chinese speech recognition. Because Chinese is a monosyllabic structure language, although there are about 1300 syllables, there are 408 unvoiced syllables, relatively few, so the Chinese syllables in medium and large vocabulary It is feasible on the system.

The phoneme unit has been widely used in English speech recognition before, and is increasingly used in Chinese and large vocabulary Chinese speech recognition systems. The reason is that the Chinese syllable consists of only 22 initials and 28 finals, which refines the initials. Although the number of models is increased, the distinguishing ability of the confusing syllables is improved.

(2) Feature parameter extraction technology. Feature extraction is the analysis and processing of speech signals, and the redundant information in rich speech information is removed to obtain useful information for speech recognition. This is a process of compressing information on speech signals. The feature extraction technique that is often used today is linear prediction (LP) analysis. The cepstrum parameters extracted based on the LP technique, together with the Mel parameters and the perceptual linear predictive cepstrum extracted based on the perceptual linear prediction (PLP) analysis, simulate the human ear processing sound, further improving the performance of the speech recognition system.

(3) Pattern matching and model training techniques. The pattern matching and model training technique used in early speech recognition applications is Dynamic Time Correction (DTW), which achieves good performance in isolated speech recognition, but due to the inaccuracy of large vocabulary and continuous speech recognition, It was replaced by Hidden Markov Model (HMM) and Artificial Neural Network (ANN).

Talking about the Development Trend and Application Prospect of Speech Recognition Technology

5.00MM Wire To Board Connectors

5.00MM Wire To Board Connectors


5.0mm Wire to Board connectors are avialable in different terminations and sizes intended for use on a variety of applications. These connectors provide power and signal with different body styles, termination options, and centerlines. To find the wire to board set required, click on the appropriate sub section below.


5.0mm Wire To Board Connectors Type

5.0mm Terminal
5.0mm Housing
Pitch 5.0mm Wafer Right Angle&SMT Type


5.00MM Wire To Board Connectors

ShenZhen Antenk Electronics Co,Ltd , https://www.atkconnectors.com