Constrained Coupled Matrix and Tensor Factorization#

Reference

[Cabral Farias et al., 2016] R. Cabral Farias, J. E. Cohen, and P. Comon, “Exploring multimodal data fusion through joint decompositions with flexible couplings,” IEEE Transactions on Signal Processing, vol. 64, pp. 4830-4844, September 2016. pdf

[Cohen and Bro, 2018] J. E. Cohen, R. Bro, “Nonnegative PARAFAC2, a flexible coupling approach”, LVA/ICA 2018, arXiv:1802.05035

[Schenker et al., 2021] C. Schenker, J. E. Cohen, E. Acar, “An optimization framework for regularized linearly coupled matrix-tensor factorization”, EUSIPCO2020, 2021 pdf

[Schenker et al., 2021] C. Schenker, J. E. Cohen, E. Acar, ``A Flexible Optimization Framework for Regularized Matrix-Tensor Factorizations with Linear Couplings’’, IEEE Journal on Selected Topics in Signal Processing, 2020. arxiv IEEE Xplore Supplementary Materials Matlab code

[Roald et al., 2021] M. Roald, C. Schenker, J. E. Cohen, E. Acar, “PARAFAC2 AO-ADMM: Constraints in all modes”, EUSIPCO2021, arxiv

[Roald et al., 2022] M. Roald, C. Schenker, V. D. Calhoun, T. Adali, R. Bro, J. E. Cohen, E. Acar, “An AO-ADMM approach to constraining PARAFAC2 on all modes”, SIAM Journal of Mathematics on Data Science, 2022. arxiv code SIMODS

[Chatzis et al., 2025] C. Chatzis, C. Schenker, J. E. Cohen, E. Acar, “DCMF: Learning Interpretable Evolving Patterns from Temporal Multiway Data” EUSIPCO 2025. arxiv

Background on joint factorization models#

[Figures for each model in tikz ?]

Joint factorization models are collections of matrix or tensor factorization problems in which some of the variables are explicitly related. There are at least two motivations for considering this family of problems.

  • Multimodal acquisitions: Several datasets are acquired informing on the same phenomenon but with different modalities, e.g., EEG+FMRI, NMR+LCMS [Acar and Yener, 2009], occulometry+EEG [Rivet and Cohen, 2016]. Joint factorization techniques allow the extraction of latent information from each dataset while leveraging the shared information across datasets to reduce estimation error and enhance uniqueness [Sorensen and De Lathauwer, 2015], thereby improving interpretability.

  • Alternative formulations of tensor decompositions: Joint factorization models are useful for rewriting the classical tensor models such as CP decomposition and proposing new extended models such as PARAFAC2 [Harshman, 1972] [Kiers et al., 1999], Shift/Conv NMF/CP, PARATUCK2 [Usevich, 2025] and so on. These formulations are also useful for identifiability proofs, as they relate tensor decompositions to matrix low-rank approximation problems.

Let us describe a few important joint factorization models, namely CPD, CMTF, and PARAFAC2.

Joint diagonalisation#

CP decomposition is a particular case of coupled matrix factorization. Indeed, if a tensor \(T\in\mathbb{R}^{n_1\times n_2\times n_3}\) follows a rank \(r\) CP decomposition with factors \(A,B,C\), then for each slice index \(k\leq n_3\)

\[ T[:,:,k] = A\text{Diag}(C[:,k])B^T. \]

All the slices \(T[:,:,k]\) are therefore jointly diagonalized with left and right bases \(A\) and \(B\). There is therefore a strong connection between CP decomposition and joint diagonalisation methods such as GSVD, SOBI [Belouchrani et al., 1997], JBSS [Lahat and Jutten, 2018].

CMTF (direct coupling)#

One of the most well-known examples of joint decompositions is coined Coupled Matrix and Tensor Factorization (CMTF) [Acar et al., 2013]. CMTF couples a matrix and a tensor on a single mode, where the matrix \(Y\) is low-rank, and the tensor \(T\) has low CP-rank. CMTF may be formulated as an optimization problem

\[ \argmin{A,B,C,D} \|T - I_r \times_1 A \times_2 B \times_3 C \|_F^2 + \|Y-AD^T\|_F^2. \]

The first factor of the low-rank approximations of the tensor and the matrix is the same. This means that the same underlying patterns are sought in both datasets along the first mode. In practice, the two terms in the cost function should be balanced based on the SNR of each dataset [Cohen et al., 2016].

CMTF assumes that the parameter matrix \(A\) is shared across measurement modalities. Therefore, CMTF lacks flexibility in many applications, such as genomics and chemometrics, where some components cannot be observed by certain modalities due to physical constraints. It is then necessary to modify the CMTF model to share only a subset of components. This variant of CMTF is referred to as ACMTF [Acar et al., 2017]. Another extension of CMTF assumes that the two modalities share a parameter matrix through a known linear relationship. This allows modeling possible variations in the sampling rates by linear interpolation [Cabral Farias et al., 2016], or coupling only the derivatives of the components [Rivet et al., 2015]. This work is discussed below.

Parafac2 and variants#

Starting from the joint diagonalization formulation of the CPD, one may observe that CPD imposes the coupled slices/matrices \(T[:,:,k]\) to have the same row and column factors. In some applications, it could be interesting to relax this assumption and suppose that, rather than only a single mode, say the first one, has a shared factor. Then the coupled factorization model becomes

\[ T[:,:,k] = A\text{Diag}(C[k,:])B_k^T \]

where each slice \(T[:,:,k]\) has a different factor matrix \(B_k\in\mathbb{R}^{n_{2,k}\times r}\). The issue with this relaxed formulation is that it is equivalent to an unconstrained matrix factorization of the stacked slices,

\[\begin{split} \left[T[:,:,1], \ldots, T[:,:,n_3] \right] &= A \left[\text{Diag}(C[1,:])B_1^T, \ldots, \text{Diag}(C[n_3,:])B_{n_3}^T \right], \\ T_{[1]} &= A \tilde{B}^T. \end{split}\]

The factor matrix \(\tilde{B}\) is virtually unconstrained, as any matrix can be written as a stacked matrix of products. This observation means that to obtain a multiway decomposition that does not reduce to unconstrained low-rank matrix approximation, further constraints must be applied on the parameter matrices \(B_k\). There are several ways to do so, for instance, imposing some shift-invariance with respect to the slice index (Shift-Parafac) [Harshman et al., 2003], or imposing further low-rank structure on the factors (PARATUCK2) [Usevich, 2025]. A popular extension of CPD based on this construction is the PARAFAC2 model, obtained by imposing that the cross product of the \(B_k\) matrices is constant,

\[ \forall k\leq n_3, \;B_k^TB_k = \Delta^T\Delta \]

with \(\Delta\) of size \(r\) by \(r\).

There are several reasons for assuming that the correlation matrices of the second-mode factors are constant across slices. A simple explanation may be obtained by observing that this constraint is equivalent to the reparameterization

\[ B_k = P_k \Delta \]

with \(P_k\) a left-orthogonal matrix. Therefore, all the slices have the same rank \(r\) factor matrix, transformed from slice to slice by a rotation matrix. Orthogonal linear coupling between slices models many linear transformations, such as circular shifts or diffeomorphisms [Cohen et al., 2018]. The PARAFAC2 model has therefore been used extensively in chemometrics applications where such transformations occur, in particular LCMS and GCMS data [Cohen and Bro, 2018]. The computation of PARAFAC2 with nonnegativity constraints is a topic I have worked on, detailed in ./NNParafac2.ipynb.

General formulation: Linearly-Coupled Constrained CMTF#

My earliest contribution on joint factorization models was to consider a generalization of CMTF that allows for more complicated coupled relationships [Cabral Farias et al., 2016]. If CMTF supposes that the same factor matrix can be extracted from several datasets, Linearly-Coupled CMTF (LC-CMTF) assumes that the shared components are linked through a linear relationship. For the particular case of a joint matrix and tensor decomposition, LC-CMTF can be formalized as the optimization problem

\[\begin{split} \argmin{A_1, B, C, A_2, D, A} \|T - I_r \times_1 A_1 \times_2 B \times_3 C \|_F^2 + \|M-A_2D^T\|_F^2. \\ \text{s.t.}~ \vec{A_1} = H_1 \vec{A} \text{ and } \vec{A_2} = H_2 \vec{A} \end{split}\]

for known linear coupling matrices \(H_1\) and \(H_2\) and a shared, unknown latent factor matrix \(A\) which size may differ from \(A_1\) and \(A_2\).

This model is flexible enough to express several realistic scenarios, such as partially shared components and unaligned data. Two particular linear couplings often encoutered are \([A_2; A_2] = [H_1; H_2]A\) (allows for resampling on a common grid) and \([A_1, A_2] = A[H_1,H_2]\). The latent variable \(A\) is introduced to enable the model to easily extend to more than two coupled tensors. In general, a necessary condition for identifiability of \(\vec{A}\) is that the matrix \([H_1; H_2]\) is invertible.

In collaboration with Carla Schenker and Evrim Acar [Schenker et al., 2021], we formalized the linear couplings more clearly and proposed an algorithm based on Alternating Optimization, where each subproblem is solved by the ADMM algorithms [Huang et al., 2016]. ADMM is chosen for its flexibility, as we can also impose constraints on the parameter matrices, including the coupled ones. It is also possible to use other loss functions than the Frobenius norm. A MATLAB implementation is available here. Sadly, there are no available Python implementations, so I cannot show examples of the AO-ADMM algorithm in action in this manuscript.