Cross-Domain Collaborative Learning in Social Multimedia

Shengsheng Qian, Tianzhu Zhang , Richang Hong and Changsheng Xu

Summary

Cross-domain data analysis is one of the most important tasks in social multimedia. It has a wide range of real-world applications, including cross-platform event analysis, cross-domain multi-event tracking, cross-domain video recommendation, etc. It is also very challenging because the data have multi-modal and multi-domain properties, and there are no explicit correlations to link different domains. To deal with these issues, we propose a generic Cross-Domain Collaborative Learning (CDCL) framework based on non-parametric Bayesian dictionary learning model for cross-domain data analysis. In the proposed CDCL model, it can make use of the shared domain priors and modality priors to collaboratively learn the data’s representations by considering the domain discrepancy and the multi-modal property. As a result, our CDCL model can effectively explore the virtues of different information sources to complement and enhance each other for cross-domain data analysis. To evaluate the proposed model, we apply it for two different applications: cross-platform event recognition and cross-network video recommendation. The extensive experimental evaluations well demonstrate the effectiveness of the proposed algorithm for cross-domain data analysis.

Framework

We propose a novel generic Cross-Domain Collaborative Learning (CDCL) framework based on non-parametric Bayesian dictionary learning model for cross-domain data analysis. In our CDCL, the non-parametric Bayesian dictionary learning model can explore the multi-domain, multi-modality, and sparse properties jointly. (1) To deal with the domain discrepancy, we adopt the shared domain priors across multiple domains to make them share a common feature space. (2) To make use of the multi-modal property, we learn the sparse representation of multi-modal data by introducing the shared modality priors to infer the sparse structure shared among different modalities of media data. (3) To deal with the sparsity of the media data, we learn the shared dictionary space to bridge cross-domain information. The details of our CDCL algorithm is shown in the Figure. For simplicity, we only show an example for cross-platform data association. There are two domains (Google News and Flickr) with two modalities (Text and Image) related to the event “United States Presidential Election". In the left panel, the related data associated with the event include textual and visual information. Here, each social event instance contains text and its corresponding images. Since the multi-modal data among different domains have their own characteristics but also have their commonalities, we can collaboratively learn the shared feature representation by adopting the shared domain priors and modality priors across multiple domains. As a result, the proposed CDCL can effectively combine the virtues of different information sources to complement each other for cross-domain multi-modal data analysis. The proposed generic framework can be applied for many applications, such as cross-platform event recognition and cross-network video recommendation.

Cross-Domain Collaborative Learning

In the proposed model, (1) To deal with the domain discrepancy, we add the the shared domain priors $\pi$ , $\gamma_s$ , to associate with the information across multiple domains. Meanwhile, the shared modality priors $\gamma _{j,\varepsilon}$ , $\pi_j$ , $\gamma _{j,s}}$ are adopted to associate with the multi-modality information in the $j$ -th domain. (2)The sparseness is achieved by introducing the Beta Process priors. Since different domains may favor different sparse reconstruction coefficients, the constraint of joint sparsity across different domains is necessary, which is to enforce the robustness in the sparse coefficient estimation. Due to the shared priors, our model can realize joint sparsity across different domains. (3) The shared dictionary space is learned by using the shared domain and modality priors, and it can bridge the domain gap for cross-domain data analysis.

Results

We evaluate the performance of the proposed CDCL algorithm on two different applications: cross-platform event recognition and cross-network video recommendation.

A.Cross-platform Event Recognition:

Table 1: The event classification accuracy of different methods.

Figure 4: The classification accuracy for each event on Google News.

Figure 5: The classification accuracy for each event on Flickr.

The classification results of different methods are shown in Table 1 and the accuracy comparison of each event class is given in Figure4 and Figure 5. Based on these results, we have the following observations. (1) The BOW model shows inferior classification performance. This is because the BOW models textual and visual words obscurely and cannot differentiate the associations between multi-modal data. (2) The CCA and our CDCL achieve better performance than the BOW, which shows that it is useful to model and fuse the textual and visual information. (3) The SRC-L1-DL achieves better results than the SRC-L1. This shows that the dictionary learning method by adopting the auxiliary domain can obtain a more compact and representative dictionary to improve the performance. (4) Overall, the proposed CDCL method consistently outperforms other existing methods. The major reason is that the proposed non-parametric Bayesian dictionary learning model can adopt the shared domain priors and modality priors to collaboratively learn the feature representation by considering the domain discrepancy and the multi-modal property. As a result, it can effectively combine the virtues of different information sources to complement and enhance each other.

B. Cross-network Video Recommendation

(a) Precision@K.

(b) MAP@K.

Figure 6: The Precision and MAP of cross-network video recommendation for new YouTube users.

Figure 7: Four examples on cross-network video recommendation from Twitter to YouTube users.

The evaluation results of different methods are shown in Figure 6. From the results, we have the following conclusions: (1) The POP method shows inferior performance. This is due to its incapability of learning user's personalized needs and considering cross-network user behaviors. (2) The KNN and CNAS methods achieve better results. This shows that it is useful to adopt the auxiliary domain and consider the cross-network collaboration for the cold start recommendation task. (3) The proposed CDCL method outperforms the CNAS and the KNN, and achieves the best recommendation performance in terms of precision and MAP under all values of K computed. In Figure 7, we also show four new YouTube users with their twitter history information on Twitter and the corresponding recommended video list from YouTube.

Publication

Cross-Domain Collaborative Learning in Social Multimedia. [pdf][slides]

Shengsheng Qian, Tianzhu Zhang , Richang Hong and Changsheng Xu
ACM International Conference on Multimedia, Brisbane, Australia, 26-30 Oct 2015, 99-108.