IVA Group - National Lab.of Pattern Recognition

Research (2013 on going)

Theories and Methods on Cross-Media AnalysisDigital media understanding focusing on text, image, audio, video, etc., is to obtain the semantic description accessible to users. With the exploding prevalence of multimedia resources, we are witnessing the new problems and challenges on the content diversity, the computing complexity, and the demand pluralism. It is of very important science meaning and practice value to understand these challenging multimedia data successfully. The project aims to learn an effective cross-media representation model according to the multi-modality of media content. The cross-media model is a hierarchical one, which is able to express three level correlations: feature-to-feature, feature-to-semantics, and semantics-to-semantics. Furthermore, we propose a multi-granularity semantic mapping model, in which a structural descriptive system on semantics is built based on entities, events, and their correlations. Principal Investigators: Hanqing Lu On going
Contextual Latent Semantic Model for Image and Video ClassificationThe effective management of large-scale multimedia data is one of the hottest research topics in computer science community, and the classification of image and video is the key technology. This project aims to investigate the latent semantic models which can be applied to image and video classification. Motivated by the success of latent semantic models in document analysis, we attempt to propose new latent semantic models by considering the characteristics of image and video classification. The project mainly focuses on three aspects. Firstly, we will improve the traditional bag-of-word methods by integrating spatio-temporal information with multi-level description. Secondly, since the traditional latent semantic models cannot effectively describe the correlation between concepts, we will improve them by introducing the idea of manifold learning algorithms, such as local preserving project. Thirdly, we will propose new contextual latent semantic model by mining the context of image and video and analyzing their correlation, which will benefit image and video classification. Principal Investigators: Hanqing Lu On going
Web Video Recommendation based on Spectral Graph AnalysisWeb video recommendation need to address scalability, heterogeneity, and ambiguity issues existing in the understanding of video content, but also need to take the contextual relation and social network into account to assist semantic understanding. Unfortunately, the traditional recommendation models cannot effectively describe such complex and diverse relations. This project aims to explore and discover those complex user-item relations, and utilize graph to represent and describe them. Then, spectral graph theory is using to solve the graph problem and result in recommendation. Concretely speaking, we will firstly exploit abundant user-user, user-item and item-item relations from internet and community to build an user-item hypergraph relation model. To address the multiple attributes existing between user and item, such as low-level features and context, we attempt to extend the traditional 2-D user-item relation to high dimensional relation by using tensor analysis. Moreover, we will leverage some advances on spectral graph theory, computer vision and machine learning to build and solve large-scale complex user-item relation model. Principal Investigator: Jian Cheng On going
Web Image Annotation and Tag Recommendation with Social TagsWith the permeation of Web 2.0, large-scale user contributed images with tags are easily available on social websites. Due to the subjectivity and diversity of such social tagging, noisy and missing tags for images are inevitable, which limits the performance of tag-based image retrieval system. In this proposal, we aim to solve the problems of image annotation and tag recommendation by exploring the correlations among images, tags, and users, in which the usage of the social tags should be taken more attention. We plan to carry out the project from the following aspects: (1) hierarchical structure learning and correlation estimation for social tags; (2) semantic-incorporated image representation and similarity measure; (3) improved matrix factorization for image annotation; (4) image tag recommendation based on user preference learning. It is noted that image annotation and tag recommendation are boosted each other. Due to the decreased tagging cost brought by tag recommendation, more users are willing to tag images, then more high-quality tagging data can be obtained for image annotation methods. In addition, the results of image annotation can provide prior guidance for image tag recommendation. We believe that the union of image annotation and tag recommendation make it possible to indexing and search the large-scale web resources incrementally and effectively. Principal Investigator: Jing Liu On going
Video Search and Browsing for Mobile TerminalWith the popularization of mobile multimedia terminal and the rapid development of the Internet, how to provide effective video retrieval and content adaptation gradually becomes a hot research topic. This proposal aims to research on the mobile video retrieval and browsing approach. Through semantic understanding for videos with noise tags, we investigate the multiple instances graph model for video annotation. Combined with the context information, we study the video retrieval methods based on graph models. To adjust the frame size and video length simultaneously, we investigate video retargeting based on the subspace theory to overcome the limitation of bandwidth and storage. We plan to perform hierarchical clustering for search results with dual constraint topic models, and generate the video thumbnails with screen adaptation. Principal Investigator: Jinqiao Wang On going
Hybrid knowledge and data driven probabilistic graphical model with the application to activity analysisSubstantial progress has been made in the past decades in computer vision, in particular as a result of the application of statistical machine learning methods. However, the mainstream data-driven approaches cannot generalize well and become very brittle when the training data is inadequate. Furthermore, current machine learning methods cannot lend themselves easily to exploit the readily available prior knowledge, which is essential to alleviate the problem with the data and to regularize the ill posed nature with many vision problems. In this proposal, a hybrid knowledge and data driven probabilistic graphical model is proposed. We will systematically identify and exploit prior knowledge from various sources and integrate them with the image training data. The knowledge will be converted as the format of the prior models, the constraints or the pseudo data, in order to restrict the hypothesis space and to regularize the otherwise ill-posed problems. As a result, we expect to gain the probabilistic graphical models that are less prone to overfitting, less dependent on image training data, and more robust and accurate under realistic conditions, and readily generalizable to novel visual learning tasks. The method will be applied to human activity analysis to evaluate its effectiveness. The robustness and generalization ability of the models will be especially studied under different quantity and quality conditions of the training data. Principal Investigator: Yifan Zhang On going