Francesco Camilli — International Centre for Theoretical Physics (ICTP) # Fundamental limits in structured principal component analysis and how to reach them # Principal Component Analysis is a powerful tool for dimension reduction and clustering of high-dimensional data. It has been widely studied in various theoretical settings to assess its performance in retrieving low rank structures inside high rank data matrices. One of the most relevant settings from an information theoretical perspective, is a teacher-student scenario, where a teacher plants a low rank structure, called spike, inside a noise matrix, and the student is tasked with reconstructing it at the best of their possibilities. It turns out that the student’s best possible performance is in direct correspondence with information theoretical quantities such as the mutual information (MI) between the data and the spike. Prior to our contribution [1], the MI was computed only in the hypothesis of i.i.d., and thus structureless, Gaussian noise. With a novel technique, inspired by the theory of spin glasses with rotational invariant couplings, we extended the type of noises allowed to a class of random matrix ensembles of trace-type, with low-degree polynomial matrix potential. The predicted student’s performance is shown to be in perfect agreement with an algorithm, that we named Bayes-optimal Approximate Message Passing, whose iterates are rigorously characterized step by step. Despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties. In recent times [2], we were able to extend our analysis to any kind of trace ensemble with an arbitrary matrix potential, and to identify an algorithm whose performance is predicted by theory. Tracking the iterates of the latter rigorously still remains open. [1] J. Barbier, F. Camilli, M. Mondelli, M. Sáenz, "Fundamental limits in structured principal component analysis and how to reach them”, Proceedings of the National Academy of Sciences 120 (30) e2302028120 (2023) [2] J. Barbier, F. Camilli, M. Mondelli, Y. Xu, in preparation.