深度学习在基于单幅图像的物体三维重建中的应用

陈加; 张玉麒; 宋鹏; 魏艳涛; 王煜; 陈加; 张玉麒; 宋鹏; 魏艳涛; 王煜

doi:10.16383/j.aas.2018.c180236

[1]

Rezende D J, Ali Eslami S M, Mohamed S, Battaglia P, Jaderberg M, Heess N. Unsupervised learning of 3D structure from images. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016). New York, USA: Curran Associates, Inc., 2016. 4996-5004

[2]

Haming K, Peters G. The structure-from-motion reconstruction pipeline-a survey with focus on short image sequences. Kybernetika, 2010, 46(5):926-937 https://dml.cz/bitstream/handle/10338.dmlcz/141400/Kybernetika_46-2010-5_8.pdf

[3]

Lhuillier M, Quan L. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(3):418-433 doi: 10.1109/TPAMI.2005.44

[4]

Habbecke M, Kobbelt L. A surface-growing approach to multi-view stereo reconstruction. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007. 1-8

[5]

Oswald M R, Töppe E, Nieuwenhuis C, Cremers D. A review of geometry recovery from a single image focusing on curved object reconstruction. Innovations for Shape Analysis. Berlin, Germany: Springer-Verlag, 2013. 343-378

[6]

Yi L, Shao L, Savva M, Huang H B, Zhou Y, Wang Q R, et al. Large-scale 3D shape reconstruction and segmentation from ShapeNet Core55. arXiv preprint arXiv: 1710.06104, 2017.

[7]

Aspert N, Santa-Cruz D, Ebrahimi T. MESH: measuring errors between surfaces using the Hausdorff distance. In: Proceedings of the 2002 IEEE International Conference on Multimedia and Expo. Lausanne, Switzerland: IEEE, 2002. 705-708

[8]

Choy C B, Xu D F, Gwak J Y, Chen K, Savarese S. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 628-644

[9]

Fan H Q, Su H, Guibas L. A point set generation network for 3D object reconstruction from a single image. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, Hawaii, USA: IEEE, 2017. 2463-2471

[10]

Kemelmacher-Shlizerman I, Basri R. 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2):394-405 doi: 10.1109/TPAMI.2010.63

[11]

Wang H K, Stout D B, Chatziioannou A F. Mouse atlas registration with non-tomographic imaging modalities-a pilot study based on simulation. Molecular Imaging and Biology, 2012, 14(4):408-419 doi: 10.1007/s11307-011-0519-x

[12]

Dworzak J, Lamecker H, Von Berg J, Klinder T, Lorenz C, Kainmüller D, et al. 3D reconstruction of the human rib cage from 2D projection images using a statistical shape model. International Journal of Computer Assisted Radiology and Surgery, 2010, 5(2):111-124 doi: 10.1007/s11548-009-0390-2

[13]

Baka N, Kaptein B L, De Bruijne M, Van Walsum T, Giphart J E, Niessen W J, et al. 2D-3D shape reconstruction of the distal femur from stereo X-ray imaging using statistical shape models. Medical Image Analysis, 2011, 15(6):840-850 doi: 10.1016/j.media.2011.04.001

[14]

Blanz V, Vetter T. A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM Press, 1999. 187-194

[15]

Cashman T J, Fitzgibbon A W. What shape are dolphins? Building 3D morphable models from 2D images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):232-244 doi: 10.1109/TPAMI.2012.68

[16]

Bakshi S, Yang Y H. Shape from shading for non-Lambertian surfaces. In: Proceedings of the 1st International Conference on Image Processing. Austin, TX, USA: IEEE, 1994. 130-134

[17]

Ahmed A, Farag A. Shape from shading for hybrid surfaces. In: Proceedings of the 2007 IEEE International Conference on Image Processing. San Antonio, TX, USA: IEEE, 2007. Ⅱ-525-Ⅱ-528

[18]

Jin H L, Soatto S, Yezzi A J. Multi-view stereo reconstruction of dense shape and complex appearance. International Journal of Computer Vision, 2005, 63(3):175-189 doi: 10.1007/s11263-005-6876-7

[19]

Vicente S, Carreira J, Agapito L, Batista J. Reconstructing PASCAL VOC. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014. 41-48

[20]

Kar A, Tulsiani S, Carreira J, Malik J. Category-specific object reconstruction from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 1966-1974

[21]

Prasad M, Zisserman A, Fitzgibbon A W. Fast and controllable 3D modelling from silhouettes. In: Proceedings of the 2005 Eurographics. Hamburg, Federal Republic of Germany: Elsevier Science Publishing Company, 2005. 9-12

[22]

Ikeuchi K, Horn B K P. Numerical shape from shading and occluding boundaries. Artificial Intelligence, 1981, 17(1-3):141-184 doi: 10.1016/0004-3702(81)90023-0

[23]

Prasad M, Fitzgibbon A. Single view reconstruction of curved surfaces. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). New York, NY, USA: IEEE, 2006. 1345-1354

[24]

Daum M, Dudek G. On 3-D surface reconstruction using shape from shadows. In: Proceedings of the 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Santa Barbara, CA, USA: IEEE, 1998. 461-468

[25]

Kato H, Ushiku Y, Harada T. Neural 3D mesh renderer. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 37-44

[26]

Rother D, Sapiro G. Seeing 3D objects in a single 2D image. In: Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009. 1819-1826

[27]

Nevatia R, Binford T O. Description and recognition of curved objects. Artificial Intelligence, 1977, 8(1):77-98 https://dl.acm.org/citation.cfm?id=3015410.3015415

[28]

Gupta A, Efros A A, Hebert M. Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece: Springer-Verlag, 2010. 482-496

[29]

Xiao J X, Russell B C, Torralba A. Localizing 3D cuboids in single-view images. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 746-754

[30]

Pentland A P. Perceptual organization and the representation of natural form. Artificial Intelligence, 1986, 28(3):293-331 doi: 10.1016/0004-3702(86)90052-4

[31]

Haag M, Nagel H H. Combination of edge element and optical flow estimates for 3D-model-based vehicle tracking in traffic image sequences. International Journal of Computer Vision, 1999, 35(3):295-319 doi: 10.1023/A:1008112528134

[32]

Koller D, Daniilidis K, Nagel H H. Model-based object tracking in monocular image sequences of road traffic scenes. International Journal of Computer Vision, 1993, 10(3):257-281 doi: 10.1007/BF01539538

[33]

Lim J J, Pirsiavash H, Torralba A. Parsing Ikea objects: fine pose estimation. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia: IEEE, 2013. 2992-2999

[34]

Satkin S, Rashid M, Lin J, Hebert M. 3DNN:3D nearest neighbor. International Journal of Computer Vision, 2015, 111(1):69-97 doi: 10.1007/s11263-014-0734-4

[35]

Pepik B, Stark M, Gehler P, Ritschel T, Schiele B. 3D object class detection in the wild. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Boston, MA, USA: IEEE, 2015. 1-10

[36]

Huang Q X, Wang H, Koltun V. Single-view reconstruction via joint analysis of image and shape collections. ACM Transactions on Graphics (TOG), 2015, 34(4): Article No. 87

[37]

Liu F, Zeng D, Li J, Zhao Q J. Cascaded regressor based 3D face reconstruction from a single arbitrary view image.[Online], available: https://arxiv.org/abs/1509.06161v1, March 25, 2019

[38]

Blanz V, Vetter T. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(9):1063-1074 doi: 10.1109/TPAMI.2003.1227983

[39]

Twarog N R, Tappen M F, Adelson E H. Playing with puffball: simple scale-invariant inflation for use in vision and graphics. In: Proceedings of the 2012 ACM Symposium on Applied Perception. Los Angeles, California, USA: ACM, 2012. 47-54

[40]

Aloimonos J. Shape from texture. Biological Cybernetics, 1988, 58(5):345-360 doi: 10.1007/BF00363944

[41]

Marinos C, Blake A. Shape from texture: the homogeneity hypothesis. In: Proceedings of the 3rd International Conference on Computer Vision. Osaka, Japan: IEEE, 1990. 350-353

[42]

Loh A M, Hartley R I. Shape from non-homogeneous, non-stationary, anisotropic, perspective texture. In: Proceedings of the 2005 British Machine Vision Conference. Oxford, UK: BMVC, 2005. 69-78

[43]

Horn B K P. Obtaining Shape from Shading Information. Cambridge:MIT Press, 1989. 123-171

[44]

Robles-Kelly A, Hancock E R. An eigenvector method for shape-from-shading. In: Proceedings of the 12th International Conference on Image Analysis and Processing. Mantova, Italy: IEEE, 2003. 474-479

[45]

Cheung W P, Lee C K, Li K C. Direct shape from shading with improved rate of convergence. Pattern Recognition, 1997, 30(3):353-365 doi: 10.1016/S0031-3203(96)00097-0

[46]

Yang L, Han J Q. 3D shape reconstruction of medical images using a perspective shape-from-shading method. Measurement Science and Technology, 2008, 19(6): Article No. 065502

[47]

Tankus A, Kiryati N. Photometric stereo under perspective projection. In: Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005. 611-616

[48]

Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images. In: Proceedings of the 18th International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada: MIT Press, 2005. 1161-1168

[49]

Saxena A, Sun M, Ng A Y. Make3D:learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5):824-840 doi: 10.1109/TPAMI.2008.132

[50]

Delage E, Lee H, Ng A Y. A dynamic Bayesian network model for autonomous 3D reconstruction from a single indoor image. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). New York, USA: IEEE, 2006. 2418-2428

[51]

Tulsiani S, Kar A, Carreira J, Malik J. Learning category-specific deformable 3D models for object reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):719-731 doi: 10.1109/TPAMI.2016.2574713

[52]

王伟, 高伟, 朱海, 胡占义.快速鲁棒的城市场景分段平面重建.自动化学报, 2017, 43(4):674-684 http://www.aas.net.cn/CN/abstract/abstract19045.shtml

Wang Wei, Gao Wei, Zhu Hai, Hu Zhan-Yi. Rapid and robust piecewise planar reconstruction of urban scenes. Acta Automatica Sinica, 2017, 43(4):674-684 http://www.aas.net.cn/CN/abstract/abstract19045.shtml

[53]

缪君, 储珺, 张桂梅, 王璐.基于稀疏点云的多平面场景稠密重建.自动化学报, 2015, 41(4):813-822 http://www.aas.net.cn/CN/abstract/abstract18655.shtml

Miao Jun, Chu Jun, Zhang Gui-Mei, Wang Lu. Dense multi-planar scene reconstruction from sparse point cloud. Acta Automatica Sinica, 2015, 41(4):813-822 http://www.aas.net.cn/CN/abstract/abstract18655.shtml

[54]

张峰, 史利民, 孙凤梅, 胡占义.一种基于图像的室内大场景自动三维重建系统.自动化学报, 2010, 36(5):625-633 http://www.aas.net.cn/CN/abstract/abstract13353.shtml

Zhang Feng, Shi Li-Min, Sun Feng-Mei, Hu Zhan-Yi. An image based 3D reconstruction system for large indoor scenes. Acta Automatica Sinica, 2010, 36(5):625-633 http://www.aas.net.cn/CN/abstract/abstract13353.shtml

[55]

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444 doi: 10.1038/nature14539

[56]

Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088):533-536 doi: 10.1038/323533a0

[57]

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647

[58]

焦李成, 杨淑媛, 刘芳, 王士刚, 冯志玺.神经网络七十年:回顾与展望.计算机学报, 2016, 39(8):1697-1716 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201608015

Jiao Li-Cheng, Yang Shu-Yuan, Liu Fang, Wang Shi-Gang, Feng Zhi-Xi. Seventy years beyond neural networks:retrospect and prospect. Chinese Journal of Computers, 2016, 39(8):1697-1716 http://d.old.wanfangdata.com.cn/Periodical/jsjxb201608015

[59]

Feng X, Zhang Y D, Glass J. Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Florence, Italy: IEEE, 2014. 1759-1763

[60]

Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada: IEEE, 2013. 6645-6649

[61]

Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland: ACM, 2008. 160-167

[62]

Huang E H, Socher R, Manning C D, Ng A Y. Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju Island, Korea: Association for Computational Linguistics, 2012. 873-882

[63]

Mikolov T, Chen K, Corrado G S, Dean J. Efficient estimation of word representations in vector space.[Online], available: http://www.oalib.com/paper/4057741, March 25, 2019

[64]

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 1097-1105

[65]

Le Q V. Building high-level features using large scale unsupervised learning. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada: IEEE, 2013. 8595-8598

[66]

Socher R, Huval B, Bath B, Manning C D, Ng A Y. Convolutional-recursive deep learning for 3D object classification. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates Inc., 2012. 656-664

[67]

Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, et al. 3D shapeNets: a deep representation for volumetric shapes. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015. 1912-1920

[68]

Gupta S, Girshick R, ArbelÁez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In: Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer-Verlag, 2014. 345-360

[69]

Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527

[70]

Schölkopf B, Platt J, Hofmann T. Greedy layer-wise training of deep networks. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. Canada: MIT Press, 2006. 153-160

[71]

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791

[72]

Williams R J, Zipser D. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1989, 1(2):270-280 doi: 10.1162/neco.1989.1.2.270

[73]

Girdhar R, Fouhey D F, Rodriguez M, Gupta A. Learning a predictable and generative vector representation for objects. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer-Verlag, 2016. 484-499

[74]

Kar A, Hane C, Malik J. Learning a multi-view stereo machine. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). New York, USA: Curran Associates, Inc., 2017. 364-375

[75]

Wu J J, Wang Y F, Xue T F, Sun X Y, Freeman W T, Tenenbaum J B. MarrNet: 3D shape reconstruction via 2.5D sketches. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). New York, USA: Curran Associates, Inc., 2017. 8-15

[76]

Kanazawa A, Jacobs D W, Chandraker M. WarpNet: weakly supervised matching for single-view reconstruction. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 3253-3261

[77]

Tulsiani S, Zhou T H, Efros A A, Malik J. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 209-217

[78]

Tulsiani S. Learning Single-view 3D Reconstruction of Objects and Scenes[Ph. D. dissertation], UC Berkeley, USA, 2018

[79]

Yan X C, Yang J M, Yumer E, Guo Y J, Lee H. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016). New York, USA: Curran Associates, Inc., 2016. 1696-1704

[80]

Gwak J Y, Choy C B, Garg A, Chandraker M, Savarese S. Weakly supervised generative adversarial networks for 3D reconstruction. arXiv preprint arXiv: 1705.10904, 2017. 263-272

[81]

Rosca M, Lakshminarayanan B, Warde-Farley D, Mohamed S. Variational approaches for auto-encoding generative adversarial networks. arXiv preprint arXiv: 1706. 04987, 2017.

[82]

Zhu R, Galoogahi H K, Wang C Y, Lucey S. Rethinking reprojection: closing the loop for pose-aware shape reconstruction from a single image. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 57-65

[83]

Liu J, Yu F, Funkhouser T. Interactive 3D modeling with a generative adversarial network. In: Proceedings of the 2017 International Conference on 3D Vision (3DV). Qingdao, China: IEEE, 2018. 126-134

[84]

Wu J J, Zhang C K, Xue T F, Freeman W T, Tenenbaum J B. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016). New York, USA: Curran Associates, Inc., 2016. 82-90

[85]

Gadelha M, Maji S, Wang R. 3D shape induction from 2D views of multiple objects. In: Proceedings of the 2017 International Conference on 3D Vision (3DV). Qingdao, China: IEEE, 2017. 402-411

[86]

Wang P S, Liu Y, Guo Y X, Sun C Y, Tong X. O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics (TOG), 2017, 36(4): Article No. 72

[87]

Sun Y B, Liu Z W, Wang Y, Sarma S E. Im2avatar: Colorful 3D reconstruction from a single image.[Online], available: https://arxiv.org/abs/1804.06375, March 25, 2019

[88]

Tatarchenko M, Dosovitskiy A, Brox T. Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 2107-2115

[89]

Riegler G, Ulusoys A O, Geiger A. Octnet: learning deep 3D representations at high resolutions. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 6620-6629

[90]

Häne C, Tulsiani S, Malik J. Hierarchical surface prediction for 3D object reconstruction. In: Proceedings of the 2017 International Conference on 3D Vision (3DV). Qingdao, China: IEEE, 2017. 76-84

[91]

Charles R Q, Su H, Mo K, Guibas L J. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii, USA: IEEE, 2017. 77-85

[92]

Qi C R, Yi L, Su H, Guibas L J. Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). New York, USA: Curran Associates, Inc., 2017. 5099-5108

[93]

Klokov R, Lempitsky V. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 863-872

[94]

Newell A, Yang K Y, Deng J. Stacked hourglass networks for human pose estimation. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer, 2016. 483-499

[95]

Lin C H, Kong C, Lucey S. Learning efficient point cloud generation for dense 3D object reconstruction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI, 2017. 3-11

[96]

Pontes J K, Kong C, Sridharan S, Lucey S, Eriksson A, Fookes C. Image2mesh: A learning framework for single image 3D reconstruction.[Online], available: https://arxiv.org/abs/1711.10669v1, March 25, 2019

[97]

Wang N Y, Zhang Y D, Li ZW, Fu Y W, Liu W, Jiang Y G. Pixel2mesh: Generating 3D mesh models from single rgb images.[Online], available: https://arxiv.org/abs/1804.01654v1, March 25, 2019

[98]

Xiang Y, Mottaghi R, Savarese S. Beyond PASCAL: a benchmark for 3D object detection in the wild. In: Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision. Steamboat Springs, CO, USA: IEEE, 2014. 75-82

[99]

Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2):303-338 doi: 10.1007/s11263-009-0275-4

[100]

Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 2009. 248-255

[101]

Chang A X, Funkhouser T, Guibas L, Hanrahan P, Huang Q X, Li Z M, et al. Shapenet: An information-rich 3d model repository.[Online], available: https://arxiv.org/abs/1512.03012v1, March 25, 2019

[102]

Miller G A. WordNet:a lexical database for English. Communications of the ACM, 1995, 38(11):39-41 doi: 10.1145/219717.219748

[103]

Song H O, Xiang Y, Jegelka S, Savarese S. Deep metric learning via lifted structured feature embedding. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 4004-4012

[104]

Shilane P, Min P, Kazhdan M, Funkhouser T. The princeton shape benchmark. In: Proceedings of the 2004 Shape Modeling Applications. Genova, Italy: IEEE, 2004. 167-178