登录
注册
您所在的位置:首页>>快讯>>快讯详情

AI taught to ‘see’ by mimicking how humans visualise objects

人工智能系统模仿人类如何可视化和识别物体

【标签】: {{b}}
展开
【来源】: 英国工程科技学会
【时间】: 2018-12-21
【阅读】: {{d.SYS_FLD_BROWSERATE}}次

{{d.摘要翻译}}

原文链接: {{d.URL}}
全文阅读:
The system is an advance in a type of technology called “computer vision,” which enables computers to read and identify visual images.

It is an important step toward general AI systems that are intuitive, can make decisions based on reasoning and interact with humans in a more human-like way.

Although current AI computer vision systems are increasingly powerful and capable, they are task-specific, meaning their ability to identify what they see is limited by how much they have been trained and programmed by humans.

Even today’s best computer vision systems cannot create a full picture of an object after seeing only certain parts of it, and the systems can be fooled by viewing the object in an unfamiliar setting.

Current computer vision systems are not designed to learn on their own and must be trained manually, usually by reviewing thousands of images in which the objects they are trying to identify are labelled for them.

Now researchers from UCLA Samueli School of Engineering and Stanford have developed a new approach that begins by breaking up an image into small chunks, which the researchers call “viewlets.”

The computer learns how these viewlets fit together to form the object in question and also asseses what other objects are in the surrounding area, and whether or not information about those objects is relevant to describing and identifying the primary object.

To help the new system “learn” more like humans, the engineers decided to immerse it in an internet replica of the environment humans live in.

“Fortunately, the internet provides two things that help a brain-inspired computer vision system learn the same way humans do,” said UCLA professor Vwani Roychowdhury, the study’s principal investigator.

“One is a wealth of images and videos that depict the same types of objects. The second is that these objects are shown from many perspectives – obscured, bird’s eye, up-close – and they are placed in different kinds of environments.”

To develop the framework, the researchers drew insights from cognitive psychology and neuroscience.

“Starting as infants, we learn what something is because we see many examples of it, in many contexts,” Roychowdhury said. “That contextual learning is a key feature of our brains, and it helps us build robust models of objects that are part of an integrated worldview where everything is functionally connected.”

The researchers tested the system with about 9,000 images, each showing people and other objects. The platform was able to build a detailed model of the human body without external guidance and without the images being labelled.

The engineers ran similar tests using images of motorcycles, cars and aircraft. In all cases, their system performed better or at least as well as traditional computer vision systems that have been developed with many years of training.