Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

While endoscopy is routinely used for surveillance, high operator dependence demands robust automated image analysis methods. Automated segmentation of region-of-interest (ROI) that includes lesions, inflammations, and instruments can serve to cope with the operator dependence problem in this field. Most supervised methods are developed by fitting models on the available ground truth mask samples only. This work proposes a joint training approach using the UNet coupled with a variational auto-encoder (VAE) to improve endoscopic image segmentation by exploiting original samples, predicted masks and ground truth masks. In the proposed UNet-eVAE, VAE utilises the masks to constrain ROI-specific feature representations for reconstruction as an auxiliary task. The fine-grained spatial information from VAE is fused with the UNet decoder to enrich the feature representations and improve segmentation performance. Our experimental results on both colonoscopy and ureteroscopy datasets demonstrate that the proposed architecture can learn robust representations and generalise segmentation performance on unseen samples while improving the baseline.

Original publication




Conference paper

Publication Date



13583 LNCS


161 - 170