Blind Source Separation

The LASP audio research line works with forensic focused on the comparison and identification of speakers in partnership with the Federal Police through the PRO-FORENSE project.

Imagine an environment where people speak at the same time and want to recognize each other’s voices. This is a problem that can be solved via Blind Source Separation (BSS). BSS is a procedure used to separate the source signals such as audio and speech. This process is said to be blind since there is no previous knowledge about the signals or the environment. The schematic of this separation process is shown in Figure 1.

Figure 1 Scheme of the separation process of the sources.

After a separation is carried out, other algorithms can be applied such as the identification of the speakers. Figure 2 shows a block diagram illustrating the application of BSS techniques to improve speaker identification algorithms. We can observe that the sounds are picked up by an microphone array and the separation system reconstructs the sound emitted by each source. After that the signal is processed to detect a speaker via its voice characteristics.

Figure 2 Block diagram illustrating an application of BSS techniques for speaker identification.
Several separation algorithms exist today such as the parallel factor analysis (PARAFAC), the joint approximate diagonalization (JAD), independent component analysis (ICA) and the triple-N ICA for convolutive mixtures (TRINICON). These algorithms have also several applications beyond speech processing such as telecommunications, biomedical signal processing and Analysis of data images or astronomical satellites.
This research at the LASP is focused on BSS techniques with low complexity and high accuracy to provide improvements on the state-of-the-art BSS systems.

Speaker Identification

In the forensic applications topic, the LASP aims at the comparison and identification of speakers in partnership with the Brazilian Federal Police through the PRO-FORENSE project. This project focuses on the creation of an automated system that performs the comparison and identification of speakers through feature extraction from the voice signals. In this sense, the most important studied characteristic is the long term fundamental frequency (LTF0), since it is one of the parameters that are more robust to the limitations imposed by the forensic environment.
In the forensic comparison of speakers, several analyzes are performed, such as perceptual analysis, acoustic analysis and analyzes that can be performed by means of acoustics together with the articulation rate, both of which seek to determine the suspect through the analysis of the recorded audios. Usually, these audios are acquired from tapping or telephone calls (they are called vestiges).

The Forensic comparison exams follow the structure shown in Figure 1.

In step 1 the identification of traces (dialect, sociolect and acoustic phonetic parameters) of line present in the material questioned. Subsequently in step 2 a standard collection of voice of the suspect, according to the characteristics observed. In step 3 the traces of the standard material are identified. Step 4 is responsible for making the comparison between the characteristics identified (vestige x pattern), thus identifying the suspect.
In the LASP, research on comparison and forensic identification of speakers is advancing and we aim to find more effective results that provide the correct identification of the speaker. Also, we seek to provide a system with greater robustness and a shorter response time to provide the Brazilian Federal Police with efficient tools for forensic identification.