ACCV 2016 Tutorial, 2016.11.20 AM

Modern Local Image Descriptors: Hand-crafted VS. Learning-based Methods

Bin Fan1, Jiwen Lu2, Kwang Moo Yi3, and Pascal Fua3

1Institute of Automation, Chinese Academy of Sciences, China. Email: bfan@nlpr.ia.ac.cn

2Tsinghua University, China. Email: lujiwen@mail.tsinghua.edu.cn

3CVLab, EPFL, Switzerland. Email: pascal.fua@epfl.ch

Abstract:

       Local image descriptors have been developed over two decades. Representative methods such as SIFT, SURF, and LBP, have been widely used in various computer vision applications. Although popularity, these methods are either not suitable or less robust in many new applications, which emerges the development of modern local image descriptors. In this half-day tutorial, we will give an extensive introduction of the latest advances on this topic. Particularly, after a brief introduction of local descriptors and review of the classical methods, we will introduce modern approaches for local image description. We divide them into two categories: those were proposed for high matching performance, and those for high efficiency. For each category, we will introduce both hand-crafted and learning-based methods, as well as discuss their advantages and disadvantages. Moreover, we will give an introduction of benchmarks and software for performance evaluation and some typical applications.

       This tutorial only requires basic knowledge of image processing, computer vision and machine learning. The lecture slides will be made available immediately during the tutorial. The source codes of most of the introduced methods will be released to the public.

The slides of tutorial can be downloaded here.

Course description:

I.   Introduction and Overview of the Tutorial (10 minutes)

II.  A Brief Review of Classical Feature Descriptors (35 minutes)

        (a) Scale Invariant Feature Transform (SIFT)

        (b) Speeded Up Robust Features (SURF)

        (c) Extensions to SIFT

III. Modern Descriptors: Towards High Matching Performance (45 minutes)

        (a) Hand-crafted Feature Descriptors

        (b) Learned Feature Descriptors

IV. Modern Descriptors: Towards High Efficiency (45 minutes)

        (a) Hand-crafted Binary Descriptors

        (b) Learning Compact Binary Descriptors

V.  Evaluation and Applications (30 minutes)

        (a) Benchmarks and Software

        (b) Applications

VI. Open Questions and Discussion (15 minutes)

About Lecturers:

Bin Fan received the B.Eng. degree from Beijing University of Chemical Technology in 2006, and the Ph.D. degree from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) in 2011. After got his doctoral degree, he has been worked in the NLPR, CASIA, firstly as an Assistant Professor and now as an Associate Professor. During 2014.5-2014.6 and 2015.3-2016.3, he visited the Computer Vision Laboratory in EPFL twice as a visiting professor. He is currently a member of IEEE (Institute of Electrical and Electronics Engineers), serves as the member of editorial board of Neurocomputing (Elsevier), and was an Area Chair of WACV 2016. He also serves regularly as program committee member for major vision conferences.

        His research interests focus on computer vision, specialized on local feature extraction, indexing and matching. He has published one book on "Local Image Descriptor" (Springer) and over 20 journal and conference papers in top venues, including IEEE Trans. on Pattern Analysis and Machine Intelligence, IEEE Trans. on Image Processing, and Pattern Recognition (Elsevier), and leading international conferences, such as CVPR, ICCV, ECCV and AAAI.

Webpage: http://www.nlpr.ia.ac.cn/fanbin

Jiwen Lu is an Associate Professor with the Department of Automation, Tsinghua University, China. From March 2011 to November 2015, he was a Research Scientist at the Advanced Digital Sciences Center (ADSC), Singapore. He received the B.Eng. degree in mechanical engineering and the M.Eng. degree in electrical engineering from the Xi'an University of Technology, Xi'an, China, and the Ph.D. degree in electrical engineering from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, respectively.

        His research interests include computer vision, pattern recognition, and machine learning. He has authored/co-authored over 130 scientific papers in these areas, where 29 papers are published in the IEEE Transactions journals (T-PAMI/IP/CSVT/IFS/MM/CYB) and 19 papers are published in top-tier computer vision conferences (ICCV/CVPR/ECCV). He is an elected member of the Information Forensics and Security Technical Committee of the IEEE Signal Processing Society. He serves an Associate Editor of Pattern Recognition Letters, Neurocomputing, IEEE Access, and IEEE Biometrics Council Newsletters, and a Guest Editor of Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing. He is/was an Area Chair for BTAS'16, ICB'16, WACV'16, ICME'15, and ICB'15, a Workshop Co-Chair for ACCV'16, and a Special Session Co-Chair forVCIP'15, respectively. He is a senior member of the IEEE.

Webpage: http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/

Pascal Fua received an engineering degree from Ecole Polytechnique, Paris, in 1984 and the Ph.D. degree in Computer Science from the University of Orsay in 1989. He then worked at SRI International and INRIA Sophia-Antipolis as a Computer Scientist. He joined EPFL in 1996 where he is now a Professor in the School of Computer and Communication Science and heads the Computer Vision Laboratory.

His research interests include shape modeling and motion recovery from images, analysis of microscopy images, and augmented reality. He has (co)authored over 300 publications in refereed journals and conferences. He is an IEEE Fellow and has been an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He often serves as program committee member, area chair, and program chair of major vision conferences.

Webpage: http://cvlab.epfl.ch/~fua

Relevant References:

[1] David Lowe. Distinctive image features from scale invariant keypoints. International Journal of Computer Vision, 60(2): 91-100, 2004.

[2] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. SURF: Speeded up robust features. Computer Vision and Image Understanding, 110(3): 346-359, 2008.

[3] Zhenhua Wang, Bin Fan, Gang Wang, and Fuchao Wu. Exploring local and overall ordinal information for robust feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.

[4] Bin Fan, Zhenhua Wang, and Fuchao Wu. Local image descriptor: modern approaches. Springer, 2015.

[5] Kun Ding, Chunlei Huo, Bin Fan, and Chunhong Pan. kNN hashing with factorized neighborhood representation. In IEEE ICCV 2015.

[6] Bin Fan, Qingqun Kong, Tomasz Trzcinski, Zhiheng Wang, Chunhong Pan, and Pascal Fua. Receptive fields selection for binary feature description. IEEE Transactions on Image Processing, 23(6): 2583-2595, 2014.

[7] Zhenhua Wang, Bin Fan, and Fuchao Wu. Affine subspace representation for feature description. In ECCV, 2014.

[8] Bin Fan, Fuchao Wu, and Zhanyi Hu. Rotationally invariant descriptors using intensity order pooling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10): 2031-2045, 2012.

[9] Bin Fan, Fuchao Wu, and Zhanyi Hu. Aggregating gradient distributions into intensity orders: a novel local image descriptor. In IEEE CVPR 2011.

[10] Zhenhua Wang, Bin Fan, and Fuchao Wu. Local intensity order pattern for feature description. In IEEE ICCV, 2011.

[11] Kevin Lin, Jiwen Lu, Chu-Song Chen, and Jie Zhou. Learning compact binary descriptors with unsupervised deep neural networks. In IEEE CVPR, 2016.

[12] Jiwen Lu, Venice Erin Liong, and Jie Zhou. Simultaneous local binary feature learning and encoding for face recognition. In IEEE ICCV, 2015.

[13] Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. Deep hashing for compact binary codes learning. In IEEE CVPR 2015.

[14] Jiwen Lu, Venice Erin Liong, Xiuzhuang Zhou, and Jie Zhou. Learning compact binary face descriptor for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10): 2041-2056, 2015.

[15] Jiwen Lu, Venice Erin Liong, and Jie Zhou. Cost-Sensitive local binary feature learning for facial age estimation. IEEE Transactions on Image Processing, 24(12): 5356-5368, 2015.

[16] Yi Jin, Jiwen Lu, and Qiuqi Ruan. Coupled discriminative feature learning for heterogeneous face recognition. IEEE Transactions on Information Forensics and Security, 10(3): 640-652, 2015.

[17] Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, and Yingying Zhang. Single sample face recognition via learning deep supervised auto-encoders. IEEE Transactions on Information Forensics and Security, 10(10): 2108-2118, 2015.

[18] Kwang Moo Yi, Yannick Verdie, Pascal Fua, and Vincent Lepetit. Learning to assign orientations to feature points. In IEEE CVPR 2016.

[19] Hani Altwaijry, Eduard Trulls, James Hays, Pascal Fua, and Serge Belongie. Learning to matching aerial images with deep attentive structures. In IEEE CVPR 2016.

[20] Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In IEEE ICCV, 2015.

[21] Tomasz Trzcinski, Christos M. Christoudias, Pascal Fua, and Vincent Lepetit. Boosting binary keypoint descriptors. In IEEE CVPR 2013.

[22] Christoph Strecha, Alexander M. Bronstein, Michael .M. Bronstein, and Pascal Fua. LDAHash: improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34: 66-78, 2012.

[23] Michael Calonder, Vincent Lepetit, Mustafa Ozuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. BRIEF: computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1281-1298, 2012.

[24] Engin Tola, Vincent Lepetit, and Pascal Fua. DAISY: an efficient dense descriptor applied to wide baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 815-830, 2010.