For clinical combined optoacoustic (OA) and ultrasound (US) imaging, maximum flexibility is provided when the irradiation optics is integrated with the ultrasound detector into a handheld probe. The drawback of this epi-illumination geometry is that the high laser intensity near the detection aperture generates strong background signals that clutter the OA image, thus limiting imaging depth (see Figure, phantom study). Our niche is the development of techniques for reducing this clutter with the goal to enable deep OA imaging. One of these techniques is localized vibration tagging (LOVIT): a long pulsed ultrasound beam generates acoustic radiation force (ARF) that induces localized tissue displacement at its focus. Subtraction of OA images acquired before and after the ARF push preserves true OA signal in the displacement focus while eliminating the clutter background (see Figure).