Intel RealSense: Reverse-Engineering Human Perception

DE editor Kenneth Wong’s game avatar, created with XXXX based on a 3D scan.

Dr. Achin Bhowmik, VP and GM of Intel's Perceptual Computing Group. Dr. Achin Bhowmik, VP and GM of Intel’s Perceptual Computing Group.

Can the way people see and hear be replicated in a computer system? The current system in humans—accomplished with a set of eyes, ears and certain parts of the brain—is the outcome of countless evolutionary iterations over a series of archeological periods. Those attempting a version of it in software and hardware would have to invent, refine, and perfect theirs in a much shorter time. That’s the challenge undertaken by Intel Perceptual Computing Group, headed by Dr. Achin Bhowmik

Bhowmik, VP and GM of the Perceptual Computing Group, holds office in the Robert Noyce Building in Santa Clara, California, home of the Intel Museum. Filled with depth-sensing cameras, augmented reality/virtual reality (AR/VR) goggles, smart mirrors, and self-navigating drones and robots, his work space might seem like a playground to some. Bhowmik moved from one corner to the next in energetic steps, in a speed that matches his multi-threaded thinking.

“You can think of human perception as a series of modules,” he said. “The eyes are basically light sensors. These are your visual sensors, just as your ears are audio sensors. The processing happens in the visual cortex and auditory cortex of the brain. Behind their ears, humans also have the vestibular sensing system [three fluid-filled circular ducts connected to a nerve], which helps with orientation and movements. We want to build a sensing system into computing devices to do the same thing.”

Bhowmik believes technologies that mimic this biological system—biomimicry, as it’s often called—will lead to advancements in self-navigating bots and drones, autonomous vehicles, and interactive VR systems.

In the computer-powered perceptual system, depth-sensing cameras capable of capturing information in point clouds, like the RealSense SR300 and F200, are the substitute for eyes. Accelerometers and gyroscopes can perform the function of the vestibular rings. And a robust set of algorithms or software must do what the brain does.

Intel’s offering includes the Intel RealSense SDK, a development platform consisting of the 3D-sensing devices and a number of software libraries and application programming interfaces (API). The 2016 R2 version of the SDK contains improvements “made for the SR300 in Background Segmentation (BGS), Hand Tracking Cursor Mode,  and 3D Scan,” according to Intel.

“We want to build something so small that it can fit into thin, lightweight devices,” said Bhowmik. “The system projects high-density infrared patterns, then uses the deformation it sees to calculate the structure of the objects in its sight. Capturing and calculating the 3D points’ locations happens right on the Intel chip in the camera. [The more computation heavy] higher-level processing happens in the computer.”

Humans learn through lots of sample data to recognize distinct shapes as predators, harmless animals, moving vehicles, household objects, and faces of their loved ones. In the discipline known as machine learning, programmers today use a similar method to teach machine to recognize shapes.

“We are at the stage where we can now teach devices to learn from what it sees,” Bhowmik said. “For example, we trained our system to do finger tracking with machine learning. How does it know where your finger joints are from what it captures in the camera? Because we fed it millions of samples of hand positions to train it. Similarly, we are training robots to learn the 3D environment and autonomously navigate, and drones to avoid collisions as they fly.”

Computer vision, which deals with how computers “see” (in a manner of speaking), is a key component of autonomous vehicles. The machine’s ability to make critical real-time maneuvers stems from a program that can recognize and correctly interpret road signs, surrounding vehicles, nearby obstacles, and pedestrians.

The Intel RealSense SDK comes with face recognition, gesture recognition, automatic background removal, and 3D scanning, among others. They offer the foundation for developing AR/VR experiences, virtual green screen projection, and movement-driven human-machine interactions.

DE editor Kenneth Wong's game avatar, created with XXXX based on a 3D scan. DE editor Kenneth Wong’s game avatar, created with Uranium ( based on a 3D scan captured from a depth-sensing camera.

In VR applications, the user immersed in a virtual environment cannot easily see the boundaries and barriers in his or her physical environment (the studio or room where the VR setup is installed). It’s a dilemma the programmers need to solve using different alert mechanisms.

“With the addition of RealSense technology, when you’re inside the VR environment, if you get too close to a real object, like a wall, then that physical object will appear inside your VR environment,” explained Bhowmik. “You also see your real hands in the virtual environment. Using real hands to manipulate the virtual objects is much more natural and intuitive than just using the controllers. So it’s more accurate to call is mixed reality, not just virtual. “

Online sites like Uraniom let users create game avatars based on 3D scans from depth-sense cameras and 3D scanners to add a personal touch to the experience. Engineers and product designers could use a similar approach to import digital avatars of real consumers or 3D scans of real-world objects into virtual environments, further blurring the line between digital and physical objects.

In many VR setups, the environment must also be equipped with trackers to identify the position and posture of the VR user. This is usually done with cameras installed in the corners of the room or studio.

“You don’t need to do that with our system,” said Bhowmik. “The tracking of your position is done on the device itself.”

AR/VR applications are now a growing segment of the entertainment and game industry, which may be the harbinger of things to come for engineering and manufacturing. In mimicking human perception, AR/VR applications bring a degree of realism not possible with flat screens. That opens new doors to digital prototyping and simulation, where the digital model serves as the replica of the vehicle, aircraft, or building that will be built.

Some technologies from Intel’s Perceptual Computing Group have already found their ways into online games. Others are available for you to try for free. Bhowmik believes many of them point to new ways in which users would interact with computer-equipped systems and microprocessor-driven products. As a result, the interface of many products, like vehicles, may be revised to take advantage of what’s now possible.

“Perceptual computing—the computer’s ability to capture and process its own environment in 3D—is something new. It’s just now getting deployed,” he pointed out. “That means your drone knows how to avoid trees and obstacles. A car or robot with 3D sensing has additional intelligence that makes it autonomous. Maybe the car can even detect when you, the driver, is drowsy. Once you have worked with a system that recognizes the world in 3D, you won’t want to go back.

Share This Article

Subscribe to our FREE magazine, FREE email newsletters or both!

Join over 90,000 engineering professionals who get fresh engineering news as soon as it is published.

About the Author

Kenneth Wong's avatar
Kenneth Wong

Kenneth Wong is Digital Engineering’s resident blogger and senior editor. Email him at [email protected] or share your thoughts on this article at

      Follow DE