Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Acquiring accurate real-world monocular depth data in surgery is often infeasible. Widely used synthetic datasets can provide accurate ground truth labels, however, these do not reflect variabilities in real-world surgery. To address this limitation, we aim to leverage high-fidelity synthetic depth data and transfer this understanding to diverse surgical dataset. To achieve this, we introduce a novel efficient teacher-student architecture, namely 'PatchSurg'. Our PatchSurg exploits the structural details in synthetic datasets and transfers it to realworld cases by mitigating the domain gap between synthetic and real-world data, using a detail and scale disentangling technique. Furthermore, we utilise a pose prediction network that processes temporally adjacent frames to enhance temporal consistency in depth estimation. In terms of root mean squared error, our novel PatchSurg method achieves a 20.5% improvement over recent approaches on the synthetic SimuScope dataset and exhibits substantial gain over any existing state-of-the-art methods on real surgical datasets, including 9% on EndoNeRF, 21.4% on SCARED, and 26.4% on SERV-CT, compared to the most accurate method.

More information Original publication

DOI

10.1109/ISBI61048.2026.11516012

Type

Conference paper

Publication Date

2026-01-01T00:00:00+00:00

Volume

2026-April