Bilinear Attention Networks for VizWiz Grand Challenge 2018

Abstract

VizWiz challenge addresses two tasks: visual question answering and visual question answerability. Unlike conventional VQA datasets, VizWiz dataset was collected by the visually impaired using mobile phones and has the following characteristics: (1) the images include a significant amount of blur and unstable lighting while the questions include conversational style. (2) the questions can be unrelated to the images due to the limited visual information. In this paper, we propose bilinear attention networks (BAN), which exploits bilinear interactions between multimodal input channels, followed by two-layer MLPs for each task with a joint loss. Experimental results on VizWiz dataset show that the proposed method significantly outperforms previous methods.

Date

Sep 14, 2018

10:50 AM — 11:20 AM

Event

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Location

Theresianum 601 in TU Munchen, Munich, Germany

Links

PDF Code