Cross-Domain Multi-Event Tracking via CO-PMHT

Tianzhu Zhang , Changsheng Xu

Summary

With the massive growth of events on the Internet, efficient organization and monitoring of events becomes a practical challenge. To deal with this problem, we propose a novel CO-PMHT (CO-Probabilistic Multi-Hypothesis Tracking) algorithm for crossdomain multi-event tracking to obtain their informative summary details and evolutionary trends over time. We collect a largescale dataset by searching keywords on two domains (Gooogle News and Flickr) and downloading both images and textual content for an event. Given the input data, our algorithm can track multiple events in the two domains collaboratively and boost the tracking performance. Specifically, the bridge between two domains is a semantic posterior probability, that avoids the domain gap. After tracking, we can visualize the whole evolutionary process of the event over time and mine the semantic topics of each event for deep understanding and event prediction. The extensive experimental evaluations on the collected dataset well demonstrate the effectiveness of the proposed algorithm for cross-domain multi-event tracking.

Framework

Figure 2: The flowchart of our cross-domain multi-event tracking via the proposed CO-PMHT.

We propose a novel CO-PMHT (CO-Probabilistic Multi-Hypothesis Tracking) algorithm for cross-domain multievent tracking. The details of our framework are shown in Figure 2. The cross-domain dataset is built by searching keywords on two domains (Gooogle News and Flickr) and downloading both images and textual content. Inputting the data to our algorithm, it can track multiple events in the two domains collaboratively and boost the tracking performance. Specifically, the linkage between two domains has semantic information and effectively bridges the domain gap. After tracking in social media, based on the results, we can visualize the whole evolutionary process of an event over time and discover the semantic topics of events as shown in Figure 2.

CO-Probabilistic Multi-Hypothesis Tracking

Figure 3:The basic idea of our proposed CO-PMHT algorithm for multiple-events tracking in a cross-domain (Google News and Flickr).

In the mmETM model, a document could be a tagged photo, or a long news with images. Figure 3 illustrates the graphical representation of mmETM. From the figure, we can see that the proposed model is based on the traditional mm-LDA model by considering non-visual-representative topics, which can effectively model multi-modal social event documents. Each document is associated with two different topic distributions: $\theta$ over topics shared between textual and visual modalities, and $\psi$ over topics unique to textual modality. Each kind of topics is probability distribution over textual or visual words. In the model, we use binary variable $x$ to control whether the topic word is generated from the visual-representative topic space or the non-visual-representative topic space. When $x=1$ or $x=0$ , the topic word is generated from the visual-representative topic space or the non-visual-representative topic space, respectively. We assume that all visual aspect words are generated from visual-representative topic space, i.e., $x_v =1$ . We omit the illustration of $x$ in the plate of visual aspect words for simplicity. Input multimedia documents $E_t$ in the epoch $t$ , our aim is to infer the two document-topic distributions $\theta_d$ and $\psi_d$ , and a set of $K$ visual-representative topics ${\phi_v^s,\phi_w^s}$ , and $H$ non-visual-representative topics $\phi _w^p$ . The $\theta_d$ represents that textual and visual information in a social event document share the same document-specific distribution over topics while the $\psi_d$ includes only textual information in a social event document.

An overview of the proposed online mmETM algorithm is shown in Algorithm 1. The inputs of the algorithm are: fixed Dirichlet values, $a$ and $b$ , which are used to initialize the priors $\{\alpha, \beta, \gamma\}$ and $\eta$ , respectively, at epoch 1. And, multimedia documents of an social event over time are ${E_t, t \in \{1,\cdot \cdot \cdot T\}}$ . Here, $T$ is the number of stories according to the evolution time of social event. The outputs of the algorithm are: $T$ generative models including visual-representative topics $\{\phi _{t,w}^s, \phi _{t,v}^s\}_{t=1}^T$ , non-visual-representative topics $\{\phi _{t,w}^p\}_{t=1}^T$ , document-topic distributions $\{\theta_{t,d}\}_{t=1}^T$ and $\{\psi_{t,d}\}_{t=1}^T$ , and the evolution matrices ${{\bf B}_{w}^s}, {{\bf B}_{v}^s}, {{\bf B}_{w}^p}$ .

Results

We will show the multi-event tracking results. We can see that our proposed CO-PMHT algorithm can collaboratively track multiple events in a cross-domain with multi-modality information and boost the tracking performance.

A. Qualitative Evaluation:

Figure 4: Tracking results of our CO-PMHT on eight events. Six stories from each event are selected to show the results. Each story includes one image and several keywords. Stories tracked incorrectly are denoted with red bounding boxes.

In Figure 5, we show tracking results of our CO-PMHT on a subset of the events due to limited space. In Figure 5(a), we show tracking results of four events on Google News. They are “2008 Chinese Milk Scandal”, “2012 United States presidential election”, “Occupy Wall Street”, and “Senkaku Islands Dispute”, respectively. In Figure 5(b), we give the tracking results of four events on Flickr, namely, “Occupy Wall Street”, “2008 Chinese Milk Scandal”, “Greek protests ”, and “Bahraini Uprising”, respectively. To show the tracking results effectively, we randomly select six stories from each event and show each event as one row in Figure 5(a). For each story, we select several keywords to represent the corresponding textual information and also use one image to denote the visual information. From Figure 5(a) and Figure 5(b), we can see some stories denoted with red bounding boxes that represent these stories incorrectly tracked.

B. Quantitative Evaluation

Figure 5: Comparison Results on Google News and Flickr with Three Different Features.

In the Figure 5, we show the multi-event tracking results on Google News and Flickr. Note that, for PMHT, the MAP (Google) means only on Google and the MAP (Flickr) means only on Flickr. For CO-PMHT, the MAP (Google) means Google-based event tracking with the help of Flickr and the MAP (Flickr) means Flickr-based event tracking with the help of Google.

Publication

Cross-Domain Multi-Event Tracking via CO-PMHT. [pdf][slides]

Tianzhu Zhang , Changsheng Xu and Jie Shao
ACM Transactions on Multimedia Computing Communications and Applications, 2015, 11(3): Article 35.