Zihou Ng

One of the common visual conventions in computer vision is the use of geometric annotations overlaid on images—such as bounding boxes for object detection or keypoints for human pose estimation. These annotations reflect how machine learning models operate: extracting, compressing, and reducing information from unstructured raw data. In ML-related artworks, this convention is often embraced by overlaying geometric annotations on images or videos, subtly implying the presence of an underlying machine learning model.

In this work, we explore the inverse perspective—considering annotations not as an added layer but as the underlying essence. Specifically, we focus on human pose estimation data, using joint position data from the AMASS dataset [1] and applying inverse rendering [2] to partially reconstruct the human body. Through these images, we invite viewers to reconsider the reductive nature of machine learning models and reflect on the origins of the data they process.

AnnaCortesi_RandB_C3D_posed.jpg

Vasso_Bachata_01_posed.jpg

Vasso_Salsa_Shines_01_posed.jpg

StefanosKoullapis_Bachata_C3D_posed.jpg

CLIO_Dimitroula_1_posed.jpg

Clio_Flamenco_C3D_posed.jpg

CLIO_Kolo_posed.jpg

Fanie_Zumba_C3D_posed.jpg

Vasso_Reggaeton_01_posed.jpg

StefanosKoullapis_Reggaeton_C3D_posed.jpg

Biblography

[1] Mahmood, N., Ghorbani, N., Troje, N., Pons-Moll, G., & Black, M. (2019). AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision (pp. 5442–5451).

[2] Delio Vicini, Sébastien Speierer, & Wenzel Jakob (2022). Differentiable Signed Distance Function Rendering*. Transactions on Graphics (Proceedings of SIGGRAPH), 41(4), 125:1–125:18.*