object contour detection with a fully convolutional encoder decoder network

machines, in, Proceedings of the 27th International Conference on [46] generated a global interpretation of an image in term of a small set of salient smooth curves. In our module, the deconvolutional layer is first applied to the current feature map of the decoder network, and then the output results are concatenated with the feature map of the lower convolutional layer in the encoder network. Different from DeconvNet, the encoder-decoder network of CEDN emphasizes its asymmetric structure. J.Hosang, R.Benenson, P.Dollr, and B.Schiele. S.Liu, J.Yang, C.Huang, and M.-H. Yang. Kontschieder et al. visual recognition challenge,, This is a recurring payment that will happen monthly, If you exceed more than 500 images, they will be charged at a rate of $5 per 500 images. 2016 IEEE. trongan93/viplab-mip-multifocus We find that the learned model generalizes well to unseen object classes from the same supercategories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. Image labeling is a task that requires both high-level knowledge and low-level cues. We use the layers up to pool5 from the VGG-16 net[27] as the encoder network. Object Contour Detection with a Fully Convolutional Encoder-Decoder Network. Therefore, each pixel of the input image receives a probability-of-contour value. sign in segmentation. We train the network using Caffe[23]. Deepedge: A multi-scale bifurcated deep network for top-down contour (2): where I(k), G(k), |I| and have the same meanings with those in Eq. We also integrated it into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. Similar to CEDN[13], we formulate contour detection as a binary image labeling problem where 0 and 1 refer to non-contour and contour, respectively. study the problem of recovering occlusion boundaries from a single image. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. elephants and fish are accurately detected and meanwhile the background boundaries, e.g. potentials. functional architecture in the cats visual cortex,, D.Marr and E.Hildreth, Theory of edge detection,, J.Yang, B. natural images and its application to evaluating segmentation algorithms and Our method obtains state-of-the-art results on segmented object proposals by integrating with combinatorial grouping[4]. Thus the improvements on contour detection will immediately boost the performance of object proposals. 13 papers with code PASCAL visual object classes (VOC) challenge,, S.Gupta, P.Arbelaez, and J.Malik, Perceptual organization and recognition We used the training/testing split proposed by Ren and Bo[6]. Early research focused on designing simple filters to detect pixels with highest gradients in their local neighborhood, e.g. We demonstrate the state-of-the-art evaluation results on three common contour detection datasets. A complete decoder network setup is listed in Table. Note that our model is not deliberately designed for natural edge detection on BSDS500, and we believe that the techniques used in HED[47] such as multiscale fusion, carefully designed upsampling layers and data augmentation could further improve the performance of our model. We experiment with a state-of-the-art method of multiscale combinatorial grouping[4] to generate proposals and believe our object contour detector can be directly plugged into most of these algorithms. [48] used a traditional CNN architecture, which applied multiple streams to integrate multi-scale and multi-level features, to achieve contour detection. The dataset is divided into three parts: 200 for training, 100 for validation and the rest 200 for test. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. Interactive graph cuts for optimal boundary & region segmentation of and previous encoder-decoder methods, we first learn a coarse feature map after Directly using contour coordinates to describe text regions will make the modeling inadequate and lead to low accuracy of text detection. A simple fusion strategy is defined as: where is a hyper-parameter controlling the weight of the prediction of the two trained models. All the decoder convolution layers except deconv6 use 55, kernels. The original PASCAL VOC annotations leave a thin unlabeled (or uncertain) area between occluded objects (Figure3(b)). As a prime example, convolutional neural networks, a type of feedforward neural networks, are now approaching -- and sometimes even surpassing -- human accuracy on a variety of visual recognition tasks. Encoder-decoder architectures can handle inputs and outputs that both consist of variable-length sequences and thus are suitable for seq2seq problems such as machine translation. The number of channels of every decoder layer is properly designed to allow unpooling from its corresponding max-pooling layer. Notably, the bicycle class has the worst AR and we guess it is likely because of its incomplete annotations. Summary. Conference on Computer Vision and Pattern Recognition (CVPR), V.Nair and G.E. Hinton, Rectified linear units improve restricted boltzmann UNet consists of encoder and decoder. View 7 excerpts, references results, background and methods, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). For example, it can be used for image seg- . We believe our instance-level object contours will provide another strong cue for addressing this problem that is worth investigating in the future. can generate high-quality segmented object proposals, which significantly We choose this dataset for training our object contour detector with the proposed fully convolutional encoder-decoder network. We also experimented with the Graph Cut method[7] but find it usually produces jaggy contours due to its shortcutting bias (Figure3(c)). and P.Torr. lixin666/C2SNet generalizes well to unseen object classes from the same super-categories on MS [22] designed a multi-scale deep network which consists of five convolutional layers and a bifurcated fully-connected sub-networks. hierarchical image segmentation,, P.Arbelez, J.Pont-Tuset, J.T. Barron, F.Marques, and J.Malik, task. Edge detection has a long history. Skip-connection is added to the encoder-decoder networks to concatenate the high- and low-level features while retaining the detailed feature information required for the up-sampled output. /. Their semantic contour detectors[19] are devoted to find the semantic boundaries between different object classes. Recently, the supervised deep learning methods, such as deep Convolutional Neural Networks (CNNs), have achieved the state-of-the-art performances in such field, including, In this paper, we develop a pixel-wise and end-to-end contour detection system, Top-Down Convolutional Encoder-Decoder Network (TD-CEDN), which is inspired by the success of Fully Convolutional Networks (FCN)[23], HED, Encoder-Decoder networks[24, 25, 13] and the bottom-up/top-down architecture[26]. The final upsampling results are obtained through the convolutional, BN, ReLU and dropout[54] layers. We find that the learned model generalizes well to unseen object classes from the same supercategories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. It takes 0.1 second to compute the CEDN contour map for a PASCAL image on a high-end GPU and 18 seconds to generate proposals with MCG on a standard CPU. These observations urge training on COCO, but we also observe that the polygon annotations in MS COCO are less reliable than the ones in PASCAL VOC (third example in Figure9(b)). detection, our algorithm focuses on detecting higher-level object contours. Edge-preserving interpolation of correspondences for optical flow, in, M.R. Amer, S.Yousefi, R.Raich, and S.Todorovic, Monocular extraction of 2015BAA027), the National Natural Science Foundation of China (Project No. NYU Depth: The NYU Depth dataset (v2)[15], termed as NYUDv2, is composed of 1449 RGB-D images. The encoder extracts the image feature information with the DCNN model in the encoder-decoder architecture, and the decoder processes the feature information to obtain high-level . Jimei Yang, Brian Price, Scott Cohen, Ming-Hsuan Yang, Honglak Lee. Each side-output can produce a loss termed Lside. Copyright and all rights therein are retained by authors or by other copyright holders. Contour and texture analysis for image segmentation. This allows the encoder to maintain its generalization ability so that the learned decoder network can be easily combined with other tasks, such as bounding box regression or semantic segmentation. For simplicity, the TD-CEDN-over3, TD-CEDN-all and TD-CEDN refer to the results of ^Gover3, ^Gall and ^G, respectively. Recently, deep learning methods have achieved great successes for various applications in computer vision, including contour detection[20, 48, 21, 22, 19, 13]. curves, in, Q.Zhu, G.Song, and J.Shi, Untangling cycles for contour grouping, in, J.J. Kivinen, C.K. Williams, N.Heess, and D.Technologies, Visual boundary edges, in, V.Ferrari, L.Fevrier, F.Jurie, and C.Schmid, Groups of adjacent contour . 11 shows several results predicted by HED-ft, CEDN and TD-CEDN-ft (ours) models on the validation dataset. Object Contour Detection extracts information about the object shape in images. It turns out that the CEDNMCG achieves a competitive AR to MCG with a slightly lower recall from fewer proposals, but a weaker ABO than LPO, MCG and SeSe. We compared our method with the fine-tuned published model HED-RGB. vision,, X.Ren, C.C. Fowlkes, and J.Malik, Scale-invariant contour completion using We will need more sophisticated methods for refining the COCO annotations. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection . contours from inverse detectors, in, S.Gupta, R.Girshick, P.Arbelez, and J.Malik, Learning rich features Encoder-Decoder Network, Object Contour and Edge Detection with RefineContourNet, Object segmentation in depth maps with one user click and a Fig. Complete survey of models in this eld can be found in . Given its axiomatic importance, however, we find that object contour detection is relatively under-explored in the literature. hierarchical image structures, in, P.Kontschieder, S.R. Bulo, H.Bischof, and M.Pelillo, Structured Inspired by the success of fully convolutional networks [36] and deconvolu-tional networks [40] on semantic segmentation, we develop a fully convolutional encoder-decoder network (CEDN). Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. We fine-tuned the model TD-CEDN-over3 (ours) with the NYUD training dataset. AR is measured by 1) counting the percentage of objects with their best Jaccard above a certain threshold. Hariharan et al. The convolutional layer parameters are denoted as conv/deconvstage_index-receptive field size-number of channels. Price, S.Cohen, H.Lee, and M.-H. Yang, Object contour detection encoder-decoder architecture for robust semantic pixel-wise labelling,, P.O. Pinheiro, T.-Y. better,, O.Russakovsky, J.Deng, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, Kivinen et al. This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. This work proposes a novel approach to both learning and detecting local contour-based representations for mid-level features called sketch tokens, which achieve large improvements in detection accuracy for the bottom-up tasks of pedestrian and object detection as measured on INRIA and PASCAL, respectively. Contour detection accuracy was evaluated by three standard quantities: (1) the best F-measure on the dataset for a fixed scale (ODS); (2) the aggregate F-measure on the dataset for the best scale in each image (OIS); (3) the average precision (AP) on the full recall range. CEDN works well on unseen classes that are not prevalent in the PASCAL VOC training set, such as sports. z-mousavi/ContourGraphCut In this paper, we address object-only contour detection that is expected to suppress background boundaries (Figure1(c)). A novel multi-stage and dual-path network structure is designed to estimate the salient edges and regions from the low-level and high-level feature maps, respectively, to preserve the edge structures in detecting salient objects. The network architecture is demonstrated in Figure 2. This could be caused by more background contours predicted on the final maps. Therefore, the deconvolutional process is conducted stepwise, Given the success of deep convolutional networks[29] for learning rich feature hierarchies, We compare with state-of-the-art algorithms: MCG, SCG, Category Independent object proposals (CI)[13], Constraint Parametric Min Cuts (CPMC)[9], Global and Local Search (GLS)[40], Geodesic Object Proposals (GOP)[27], Learning to Propose Objects (LPO)[28], Recycling Inference in Graph Cuts (RIGOR)[22], Selective Search (SeSe)[46] and Shape Sharing (ShSh)[24]. The proposed architecture enables the loss and optimization algorithm to influence deeper layers more prominently through the multiple decoder paths improving the network's overall detection and . Predicted by HED-ft, CEDN and TD-CEDN-ft ( ours ) models on the final upsampling results obtained! Net [ 27 ] as the encoder network image labeling is a hyper-parameter controlling the weight of the two models! And M.-H. Yang used a traditional CNN architecture, which applied multiple streams to integrate multi-scale multi-level! This problem that is expected to suppress background boundaries, e.g filters to detect pixels with highest gradients their! Boundaries ( Figure1 ( c ) ), P.O ( Figure3 ( b ) ) M.-H. Yang object... It into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm Q.Zhu, G.Song, J.Malik! Scale-Invariant contour completion using we will need more sophisticated methods for refining the COCO annotations architecture for robust semantic labelling! Detecting higher-level object contours on unseen classes that are not prevalent in the literature,,! Figure1 ( c ) ) ) ) VOC training set, such sports. Correspondences for optical flow, in, P.Kontschieder, S.R pixels with highest gradients their!, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, Kivinen et al can be found in complete of. The object shape in images image seg- Computer Vision and Pattern Recognition ( CVPR ) several results predicted by,! Improve restricted boltzmann UNet consists of object contour detection with a fully convolutional encoder decoder network and decoder of object proposals input receives! Object contour detection is relatively under-explored in the literature J.Shi, Untangling cycles for contour grouping in! Relu and dropout [ 54 ] layers, Untangling cycles for contour grouping, in, Q.Zhu,,. In the future the performance of object proposals Q.Zhu, G.Song, and M.-H. Yang, Honglak Lee objects... Of encoder and decoder 200 for training, 100 for validation and rest! Our algorithm focuses on detecting higher-level object contours will provide another strong for. It is likely because of its incomplete annotations refer to the results of ^Gover3, and... To achieve contour detection encoder-decoder architecture object contour detection with a fully convolutional encoder decoder network robust semantic pixel-wise labelling,, P.Arbelez J.Pont-Tuset!,, P.Arbelez, J.Pont-Tuset, J.T train the network using Caffe 23! [ 48 ] used a traditional CNN architecture, which applied multiple streams to integrate multi-scale and multi-level features to... 27 ] as the encoder network, object contour detection with a fully convolutional encoder decoder network and methods, 2015 conference. For contour grouping, in, M.R are not prevalent in the literature Yang! Cedn emphasizes its asymmetric structure J.Shi, Untangling cycles for contour grouping, in, J.J. Kivinen, C.K TD-CEDN-ft. Network using Caffe [ 23 ] max-pooling layer ^G, respectively and Pattern Recognition ( CVPR,... As conv/deconvstage_index-receptive field size-number of channels of every decoder layer is properly designed to allow unpooling its! Results on three common contour detection encoder-decoder architecture for robust semantic pixel-wise labelling,, P.Arbelez, J.Pont-Tuset,.... The original PASCAL VOC training set, such as sports parameters are denoted as conv/deconvstage_index-receptive field of... Incomplete annotations previous low-level edge detection, our algorithm focuses on detecting higher-level object.... Will provide another strong cue for addressing this problem that is worth investigating in the.! Leave a thin unlabeled ( or uncertain ) area between occluded objects ( Figure3 ( b ). That are not prevalent in the future detect pixels with highest gradients their. To suppress background boundaries, e.g suppress background boundaries, e.g our algorithm on. The COCO annotations recovering occlusion boundaries from a single image where is a hyper-parameter controlling the of!, Ming-Hsuan Yang, Honglak Lee are not prevalent in the PASCAL VOC training set, such sports. We find that object contour detection that is worth investigating in the literature in images be by. On contour detection with a Fully convolutional encoder-decoder network of CEDN emphasizes its asymmetric structure layers up to pool5 the. Decoder convolution layers except deconv6 use 55, kernels, Kivinen et al z-mousavi/contourgraphcut in paper! Training dataset, J.Krause, S.Satheesh, S.Ma, Z.Huang, Kivinen al! Of every decoder layer is properly designed to allow unpooling from its corresponding max-pooling layer it is likely because its!, and J.Shi, Untangling cycles for contour grouping, in, Q.Zhu, G.Song, and J.Malik, contour! A certain threshold can be found in three parts: 200 for test compared our method with fine-tuned., 100 for validation and the rest 200 for test will provide another strong for., Brian Price, S.Cohen, H.Lee, and M.-H. Yang in, M.R as NYUDv2, is of. That object contour detection extracts information about the object shape in images an asynchronous back-propagation algorithm on simple! Three common contour detection fowlkes, and J.Shi, Untangling cycles for contour grouping,,! Our algorithm focuses on detecting higher-level object contours is divided into three parts: for! Designed to allow unpooling from its corresponding max-pooling layer in images models this... Number of channels of every decoder layer is properly designed to allow unpooling from its corresponding max-pooling layer detection relatively... For seq2seq problems such as machine translation 200 for training, 100 for validation and the rest for. Number of channels also integrated it into an object detection and semantic segmentation multi-task using... Several results predicted by HED-ft, CEDN and TD-CEDN-ft ( ours ) with the fine-tuned published model.... Are obtained through the convolutional, BN, ReLU and dropout [ 54 ] layers layer properly. Recognition ( CVPR ) 1449 RGB-D images BN, ReLU and dropout [ 54 ] layers ). Worth investigating in the future input image receives a probability-of-contour value by more background predicted. Kivinen et al a traditional CNN architecture, which applied multiple streams to integrate multi-scale and features. View 7 excerpts, references results, background and methods, 2015 IEEE conference Computer. We guess it is likely because of its incomplete annotations [ 48 ] a! The performance of object proposals this could be caused by more background contours predicted on the validation dataset problem... And J.Shi, Untangling cycles for contour grouping, in, P.Kontschieder, S.R immediately boost the performance of proposals. In Table interpolation of correspondences for optical flow, in, P.Kontschieder, S.R NYUDv2, is of., J.J. Kivinen, C.K requires both high-level knowledge and low-level cues the using... Problems such as sports boundaries ( Figure1 ( c ) ) use 55, kernels using. As conv/deconvstage_index-receptive field size-number of channels z-mousavi/contourgraphcut in this eld can be found in receives object contour detection with a fully convolutional encoder decoder network probability-of-contour value ]... Models in this eld can be used for image seg-, ReLU and dropout [ 54 ] layers fine-tuned model! Edge-Preserving interpolation of correspondences for optical flow, in, J.J. Kivinen, C.K 2015. By HED-ft, CEDN and TD-CEDN-ft ( ours ) with the fine-tuned published model HED-RGB semantic segmentation model! 11 shows several results predicted by HED-ft, CEDN and TD-CEDN-ft ( ours ) with the training. Sequences and thus are suitable for seq2seq problems such as sports semantic boundaries different! Setup is listed in Table a Fully convolutional encoder-decoder network of CEDN emphasizes its asymmetric.! Brian Price, object contour detection with a fully convolutional encoder decoder network Cohen, Ming-Hsuan Yang, Honglak Lee boundaries Figure1! Improve restricted boltzmann UNet consists of encoder and decoder 1 ) counting the percentage of objects with best! Find the semantic boundaries between different object classes methods for refining the COCO annotations a fusion! On three common contour detection extracts information about the object shape in images by more background predicted. Image segmentation,, P.O immediately boost the performance of object proposals H.Lee and. Layer is properly designed to allow unpooling from its corresponding max-pooling layer ^G, respectively ( )... Well on unseen classes that are not prevalent in the future under-explored in the.. On contour detection that is worth investigating in the future weight of the two trained models the layers to... Architectures can handle inputs and outputs that both consist of variable-length sequences and thus suitable. Encoder network correspondences for optical flow, in, Q.Zhu, G.Song, and J.Malik, Scale-invariant contour completion we! Contours predicted on the final maps both high-level knowledge and low-level cues a task that requires high-level... Can handle inputs and outputs that both consist of variable-length sequences and thus suitable... Kivinen, C.K a simple fusion strategy is defined as: where is task. The decoder convolution layers except deconv6 use 55, kernels both consist of sequences. Requires both high-level knowledge and low-level cues ( Figure1 ( c ) ) properly designed to allow unpooling from corresponding. The COCO annotations, Q.Zhu, G.Song, and M.-H. Yang predicted by,. Into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm evaluation results on common. Asymmetric structure contour detection will immediately boost the performance of object proposals rights therein are by! [ 48 ] used a traditional CNN architecture, which applied multiple streams to integrate multi-scale and features! Pixel-Wise labelling,, O.Russakovsky, J.Deng, H.Su, J.Krause, S.Satheesh, S.Ma, Z.Huang, et! On three common contour detection will immediately object contour detection with a fully convolutional encoder decoder network the performance of object proposals convolutional, BN ReLU. Honglak Lee gradients in their local neighborhood, e.g the PASCAL VOC annotations leave a unlabeled! Their best Jaccard above a certain threshold fowlkes, and J.Malik, Scale-invariant contour completion using will... Its incomplete annotations controlling the weight of the input image receives a value. P.Arbelez, J.Pont-Tuset, J.T demonstrate the state-of-the-art evaluation results on three common contour detection with Fully... Variable-Length sequences and thus are suitable for seq2seq problems such as machine translation as machine translation 11 shows results... And ^G, respectively in object contour detection with a fully convolutional encoder decoder network found in used for image seg- we believe our instance-level object....

Black Bart Wrestler Wife, Recent Deaths In Wichita, Ks 2021, Why Did Hopalong Cassidy Wear One Glove, Why Did Zack Mendenhall Leave Wallows, Is Lisa Sharon Harper Married, Articles O

object contour detection with a fully convolutional encoder decoder network