A 3D scan of a person’s face using newly developed algorithms that can turn 2D images from a mobile camera into a 3D model of the face. (Image: Glasses.com)

A 3D scan of a person’s face using newly developed algorithms that can turn 2D images from a mobile camera into a 3D model of the face. (Image: Glasses.com)

Smart Vision

Semantic 3D face sensing using webcams.

  • 7 October 2013

Our research capabilities

People find visual perception easy; machines, on the other hand in general perceive just a spatial array of digitally sampled light intensity measurements. Our Smart Vision research efficiently, accurately and densely locates points of interest on any subject’s face, using only 2D images/video. This capability leans heavily on the group’s seminal work in semantic 3D face sensing using webcams.

Based on commercial and academic investigation there is no solution on the market that can offer the reliability and fidelity that our face-tracking algorithm can deliver.

We use a principled optimisation strategy that allows for efficient yet accurate facial feature point location. Our approach is able to track up 66 points on a subject’s face in 3D, faster than real-time on a modern CPU- that is, faster than 30 frames per second. Based on commercial and academic investigation there is no solution on the market that can offer the reliability and fidelity that our face-tracking algorithm can deliver.

Our technology

One of our scientific strengths is semantic 3D face sensing using cheap, ubiquitous digital cameras, such as those found in laptops and mobile phones. This involves face tracking and avatar rendering.

(i) Face tracking: The Smart Vision research group has recently had great success in developing pioneering algorithms that can take 2D pixels from an image of a human face taken by a normal webcam or tablet camera and turn them into a real-time semantic 3D face-sensing device. The technology is semantic in the sense it can interpret an array of 2D pixels as meaningful locations on the face, such as the left eye corner or tip of the nose.

Watch the video on YouTube: glasses.com App with 3Dfit Technology [external link]

(ii) Avatar Rendering: A key and novel aspect of our technology is centred on how we employ the tracked 3D face points. Based on our face tracking capability the research team have been able to develop a real-time system that transfers the expression of a user in front of a webcam to an avatar, the avatar having been created from a single still image of the desired target.

Watch the video on YouTube: CI2CV group at CSIRO's Avatar Capability (real-time) [external link]

Our facial expression transfer system also runs on a standard Intel based desktop or laptop as well as the Apple iPad. The system is considered world class, noting that all other systems on the market animate avatars that are cartoons, not real people.

Key applications

(i) Commerce: Our technology is being used by glasses.com to virtually try on glasses, by producing photorealistic images of the user wearing glasses. By simply rotating a subject’s head from left to right in front of a digital device with a cheap camera, our prototype can capture a subject’s face on a tablet and generate a dense (10,000 point vertex model) semantic 3D model. Semantic is again important in this application as the algorithm needs to know where the nose, brow and ears are to within millimetre accuracy.

(ii) Communication: The eyes and more specifically the gaze is an important signal for social communication. Even at the early stage of the interaction, the initiation of contact, it plays a crucial role. Traditional paradigms for video-conferencing are poor at maintaining this important social signal due to a physical misalignment between the position of the camera and the rendered speaker on the screen. An obvious avenue for overcoming this problem is through a “virtual” alignment of the rendered speaker and the camera through specialised hardware & software so the listener has the illusion of “eye-to-eye” contact. Our same 3D face sensing capability is used to solve this problem by transferring viewpoint instead of expression.

(iii) Entertainment: The online gaming industry is growing at a rapid rate. The expression transfer technology that the Computer Vision group is developing for the CyWee Group has the potential to tap into this vast, lucrative market by allowing users to communicate with one another as a “character” rather than themselves.

Find out more about Autonomous systems.

The player will show in this paragraph

3Dfit technology video
Video showing how 3Dfit technology works.

Transcript

[Image of a man’s face appears, text appears: User moves head left to right]

[The man’s face turns from left to right, a blue computer generated mask appears over the man’s face and moves left to right, text appears: CSIRO Algorithm creates 3D model]

[Image changes to the man’s face with a grid that matches the contours of his face with the text: Glasses.com App Output

[Image changes to the man with a pair of glasses on his face moving his head left to right, text appears: User can “Virtually Try On” Glasses]

[Glasses then tilt down the man’s nose, image then changes to multiple faces with ‘select to share’ option at the top right of iPad, text appears: As well as try on multiple pairs.]

[Glasses change to different tint styles].