Patchsurg: Leveraging Synthetic Datasets for High-Fidelity Depth Estimation in Surgery
Xu Z., Zhang C., Rittscher J., Ali S.
Acquiring accurate real-world monocular depth data in surgery is often infeasible. Widely used synthetic datasets can provide accurate ground truth labels, however, these do not reflect variabilities in real-world surgery. To address this limitation, we aim to leverage high-fidelity synthetic depth data and transfer this understanding to diverse surgical dataset. To achieve this, we introduce a novel efficient teacher-student architecture, namely 'PatchSurg'. Our PatchSurg exploits the structural details in synthetic datasets and transfers it to realworld cases by mitigating the domain gap between synthetic and real-world data, using a detail and scale disentangling technique. Furthermore, we utilise a pose prediction network that processes temporally adjacent frames to enhance temporal consistency in depth estimation. In terms of root mean squared error, our novel PatchSurg method achieves a 20.5% improvement over recent approaches on the synthetic SimuScope dataset and exhibits substantial gain over any existing state-of-the-art methods on real surgical datasets, including 9% on EndoNeRF, 21.4% on SCARED, and 26.4% on SERV-CT, compared to the most accurate method.