Study of Multimodal Identification Algorithms Using Modern Methods and Tools of Multivariate Analysis

Nataliya Boykoa


The exponential growth of information technologies and computer hardware has led to their pervasive integration into various facets of society. Consequently, digitalization and qualitative transformations have exerted a profound influence on the processes and phenomena occurring within these domains. In this context, multimodal algorithms have emerged as indispensable tools, playing an increasingly pivotal role with their expanding range of functionalities. Consequently, this article aims to comprehensively investigate the theoretical and practical foundations, as well as the distinctive characteristics, underpinning the study of multimodal identification algorithms. This investigation will be conducted using state-of-the-art methods and tools of multidimensional analysis. The development of a multimodal algorithm using the method of modality fusion at the feature level encompasses the integration of various algorithms rooted in multivariate analysis. These include a combined voice activity detector, a face detector utilizing the MTCNN (multi-task cascade convolutional networks) architecture, fine-frequency cepstral coefficients, facial image features, and a decision-making module. To construct a multimodal identification algorithm, a framework for combining these algorithms based on multivariate analysis is proposed. Analysis of the acquired data indicates that “Test 1,” utilizing facial image data, exhibits the highest performance indicators, approaching nearly 100%. This test stands out from the others due to its superior accuracy in determining key algorithm parameters. Tests 2 and 3 involving voice signals exhibit a minor error in the pre-processing stage, attributed to the inherent delay experienced by participants during the video conference. In these instances, short pauses occurred within 40-60 seconds, accompanied by occasional noise interference stemming from external processes. This interference significantly impacts the accuracy of signal processing by the algorithm. Nevertheless, the performance metrics of “Test 3” demonstrate a negligible error in the range of 0.32-0.39%, while “Test 2” exhibits an error of 0.06-0.12%. The proposed multimodal algorithm, integrated within a biometric identification system, enables successful user verification research through the utilization of a combined multidimensional analysis algorithm. Furthermore, the algorithm showcases superior research outcomes in comparison to other analogous multimodal identification algorithms, as it yields precise results.


Algorithms, modality, multimodal data, multimodal machine learning, multivariate analysis.