CVPR 2017 Tutorial, 2017.7.21 PM
Local Feature Extraction and Learning for Computer Vision
Institute of Automation, Chinese Academy of Sciences, China.
Tsinghua University, China
CVLab, EPFL, Switzerland.
Local feature or local image descriptor is at the core of many computer vision tasks. Classical methods such as SIFT, SURF, and LBP, have been extensively used in various computer vision applications. Although popularity, these methods are either not suitable or less robust in many new applications, which motivates successive innovations in this area. In this tutorial, we will give an extensive introduction of the latest advances on this topic. Particularly, after a brief introduction of local feature descriptors and review of the classical methods, we will introduce modern approaches for local image description. We divide them into two categories: those were proposed for high matching performance, and those for high efficiency. For each category, we will introduce both hand-crafted local descriptors and learning-based ones, as well as discuss their advantages and disadvantages. Finally, we introduce typical computer vision applications based on local features.
This tutorial only requires basic knowledge of image processing, computer vision and machine learning. The lecture slides will be made available immediately during the tutorial. The source codes of most of the introduced methods will be released to the public.
I. Introduction and Overview of the Tutorial (10 minutes)
Outline: This part will give a brief introduction of local image descriptor, i.e., what is a local image descriptor? Why should we use it? In which cases, we need to use it. This part also gives an overview of the content in tutorial, introducing what will be presented in the following parts.
II. A Brief Review of Classical Feature Descriptors (35 minutes)
a) Scale Invariant Feature Transform (SIFT)
b) Speeded Up Robust Features (SURF)
c) Extensions to SIFT
Outline: This part will give a brief review of several classical methods in local image description. These methods include the milestone work SIFT and SURF. It will also introduce some famous extensions to these classical methods, such as GLOH, DAISY, CS-LBP, etc.
III. Modern Descriptors: Towards High Matching Performance (45 minutes)
a) Hand-crafted Feature Descriptors
b) Learned Feature Descriptors
Outline: This part introduces advances in designing local descriptors with high distinctiveness and robustness. It includes two ways: one way is to design hand-crafted local descriptors based on researchers' expertise, while the other way resorts to machine learning techniques and large scale labeled matching and non-matching pairs. For the hand-crafted feature descriptors, we will mainly introduce the intensity order based methods which were proposed in recent years and reported with good performance. For the learned feature descriptors, after introducing the traditional descriptor learning methods, we show how CNN based methods can be used to learn good descriptors. Finally, we will summarize these methods and analysis their advantages and suitable/successful situations.
IV. Modern Descriptors: Towards High Efficiency (45 minutes)
a) Hand-crafted Binary Descriptors
b) Learning Compact Binary Descriptors
Outline: This part introduces advances in designing local binary descriptor which is an emerging descriptor type for its high efficiency and low memory footprint in recent years. We will cover both hand-crafted binary descriptors and learned ones. We will analysis their advantages and disadvantages.
V. Computer Vision Applications (30 minutes)
a) Structure from Motion
b) Visual SLAM
c) Image Classification
d) Image Retrieval
e) Visual Localization
f) Face Recognition
Outline: This part elaborates several typical applications that rely on good local image descriptor. We describe typical pipelines of these applications and discuss challenges faced by them. We also show how a descriptor plays a fundamental role in these applications.
Open Questions and Discussion (15 minutes)
Bin Fan received the B.Eng. degree from Beijing University of Chemical Technology in 2006, and the Ph.D. degree from the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) in 2011. After got his doctoral degree, he has been worked in the NLPR, CASIA, firstly as an Assistant Professor and now as an Associate Professor. During 2014.5-2014.6 and 2015.3-2016.3, he visited the Computer Vision Laboratory in EPFL twice as a visiting professor. He is currently a senior member of IEEE, serves as the member of editorial board of Neurocomputing (Elsevier), and was an Area Chair of WACV 2016. He also serves regularly as program committee member for major vision conferences.
His research interests focus on computer vision, specialized on local feature extraction, indexing and matching. He has published one book on "Local Image Descriptor" (Springer) and over 30 journal and conference papers in top venues, including IEEE Transactions (TPAMI/TIP/TNNLS/TMM/TVCG/TGRS) and Pattern Recognition (Elsevier), and leading international conferences, such as CVPR, ICCV, ECCV and AAAI.
Jiwen Lu is currently an Associate Professor with the Department of Automation, Tsinghua University, Beijing, China. From 2011 to 2015, he was a Research Scientist with the Advanced Digital Sciences Center, Singapore.
His current research interests include computer vision, pattern recognition, and machine learning. He has authored or co-authored over 150 scientific papers in these areas, where 40 were the IEEE Transactions papers. He is the Workshop Chair/Special Session Chair/Area Chair for over ten international conferences. He was a recipient of the National 1000 Young Talents Plan Program in 2015. He serves as an Associate Editor of the Pattern Recognition Letters, the Neurocomputing, and the IEEE ACCESS, a Guest Editor for Special Issue of 5 journals including Pattern Recognition, Computer Vision and Image Understanding, and Image and Vision Computing, and an Elected Member of the Information Forensics and Security Technical Committee of the IEEE Signal Processing Society.
Pascal Fua received an engineering degree from Ecole Polytechnique, Paris, in 1984 and the Ph.D. degree in Computer Science from the University of Orsay in 1989. He then worked at SRI International and INRIA Sophia-Antipolis as a Computer Scientist. He joined EPFL in 1996 where he is now a Professor in the School of Computer and Communication Science and heads the Computer Vision Laboratory.
His research interests include shape modeling and motion recovery from images, analysis of microscopy images, and augmented reality. He has (co)authored over 300 publications in refereed journals and conferences. He is an IEEE Fellow and has been an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He often serves as program committee member, area chair, and program chair of major vision conferences.
 Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. SURF: Speeded up robust features. Computer Vision and Image Understanding, 110(3): 346-359, 2008. [OpenCV's implementation]
 Yurun Tian, Bin, Fan, and Fuchao Wu. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In IEEE CVPR 2017.
 Kun Ding, Bin Fan, Chunlei Huo, Shiming Xiang, and Chunhong Pan. Cross-Modal hashing via rank-order preserving. IEEE Transactions on Multimedia, 2016, accepted.
 Kun Ding, Chunlei Huo, Bin Fan, Shiming Xiang, and Chunhong Pan. In defense of locality-sensitive hashing. IEEE Transactions on Neural Network and Learning System, 2016, accepted.
 Zhenhua Wang, Bin Fan, Gang Wang, and Fuchao Wu. Exploring local and overall ordinal information for robust feature description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2198-2211, 2016. [code]
 Bin Fan, Zhenhua Wang, and Fuchao Wu. Local image descriptor: modern approaches. Springer, 2015.
 Kun Ding, Chunlei Huo, Bin Fan, and Chunhong Pan. kNN hashing with factorized neighborhood representation. In IEEE ICCV 2015.
 Bin Fan, Qingqun Kong, Tomasz Trzcinski, Zhiheng Wang, Chunhong Pan, and Pascal Fua. Receptive fields selection for binary feature description. IEEE Transactions on Image Processing, 23(6): 2583-2595, 2014. [code]
 Zhenhua Wang, Bin Fan, and Fuchao Wu. Affine subspace representation for feature description. In ECCV, 2014.
 Bin Fan, Fuchao Wu, and Zhanyi Hu. Rotationally invariant descriptors using intensity order pooling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10): 2031-2045, 2012. [Linux64 Binary][Win32 Binary]
 Bin Fan, Fuchao Wu, and Zhanyi Hu. Aggregating gradient distributions into intensity orders: a novel local image descriptor. In IEEE CVPR 2011. [code]
 Zhenhua Wang, Bin Fan, and Fuchao Wu. Local intensity order pattern for feature description. In IEEE ICCV, 2011. [code]
 Zhixiang Chen, Jiwen Lu*, Jianjiang Feng, and Jie Zhou, Nonlinear structural hashing for scalable video search, IEEE Transactions on Circuits and Systems for Video Technology, 2017, accepted.
 Venice Erin Liong, Jiwen Lu*, Yap-Peng Tan, and Jie Zhou, Deep video hashing, IEEE Transactions on Multimedia, 2017, accepted.
 Zhixiang Chen, Jiwen Lu, Jianjiang Feng, and Jie Zhou, Nonlinear discrete hashing. IEEE Transactions on Multimedia, 19(1): 123-135, 2017.
 Kevin Lin, Jiwen Lu, Chu-Song Chen, and Jie Zhou. Learning compact binary descriptors with unsupervised deep neural networks. In IEEE CVPR, 2016. [code]
 Jiwen Lu, Venice Erin Liong, and Jie Zhou. Simultaneous local binary feature learning and encoding for face recognition. In IEEE ICCV, 2015.
 Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. Deep hashing for compact binary codes learning. In IEEE CVPR 2015.
 Jiwen Lu, Venice Erin Liong, Xiuzhuang Zhou, and Jie Zhou. Learning compact binary face descriptor for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10): 2041-2056, 2015.
 Jiwen Lu, Venice Erin Liong, and Jie Zhou. Cost-Sensitive local binary feature learning for facial age estimation. IEEE Transactions on Image Processing, 24(12): 5356-5368, 2015.
 Jiwen Lu, Venice Erin Liong, Gang Wang, and Pierre Moulin, Joint feature learning for face recognition. IEEE Transactions on Information Forensics and Security, 10(7): 1371-1383, 2015.
 Yi Jin, Jiwen Lu, and Qiuqi Ruan. Coupled discriminative feature learning for heterogeneous face recognition. IEEE Transactions on Information Forensics and Security, 10(3): 640-652, 2015.
 Shenghua Gao, Yuting Zhang, Kui Jia, Jiwen Lu, and Yingying Zhang. Single sample face recognition via learning deep supervised auto-encoders. IEEE Transactions on Information Forensics and Security, 10(10): 2108-2118, 2015.
 Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. LIFT: Learned invariant feature transform. In ECCV 2016. [code]
 Kwang Moo Yi, Yannick Verdie, Pascal Fua, and Vincent Lepetit. Learning to assign orientations to feature points. In IEEE CVPR 2016. [code]
 Hani Altwaijry, Eduard Trulls, James Hays, Pascal Fua, and Serge Belongie. Learning to matching aerial images with deep attentive structures. In IEEE CVPR 2016.
 Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In IEEE ICCV, 2015. [code]
 Tomasz Trzcinski, Christos M. Christoudias, Pascal Fua, and Vincent Lepetit. Boosting binary keypoint descriptors. In IEEE CVPR 2013. [code]
 Christoph Strecha, Alexander M. Bronstein, Michael .M. Bronstein, and Pascal Fua. LDAHash: improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34: 66-78, 2012. [code]
 Michael Calonder, Vincent Lepetit, Mustafa Ozuysal, Tomasz Trzcinski, Christoph Strecha, and Pascal Fua. BRIEF: computing a local binary descriptor very fast. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7): 1281-1298, 2012. [code] Engin Tola, Vincent Lepetit, and Pascal Fua. DAISY: an efficient dense descriptor applied to wide baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 815-830, 2010. [code]