Advances in Brain-Computer Interfaces and Neural Integration

Image to Caption Generator Using Machine Learning and Deep Learning Models

Abstract

Abhiraj Singh Sengar, PragyaTewari and Kritika Pandey

Image captioning is creating descriptive text from images. This has become a research focal point. The reason is advancements in deep learning. The paper delves into a comprehensive Image Captioning Method. It merges Convolutional Neural Networks (CNNs) with Recurrent Neural Networks (RNNs). Specifically, it uses Long Short-Term Memory (LSTM) networks to produce natural language descriptions. This approach builds on earlier work. Such as Vinyals Et al’s “Show and Tell” model. This model was one of the first to use CNNs and LSTMs for this purpose in 2015. We integrate attention mechanisms as suggested by Xu Et al. (2015) and Anderson et al. (2018). This improves the model’s Focus on image areas. We employ both bottom-up and top-down attention techniques. This strengthens the accuracy and relevance of the captions generated. We train and assess our model on datasets. Some of these include MSCOCO and Flickr8k. We use standard evaluation metrics to assess like BLEU, METEOR and CIDEr.The results Show that our method surpasses Existing models. It outperforms them in both the quality of captions produced and computational efficiency. The research contributes to the ongoing development of image captioning. It has promising applications. These include Assistive technologies, content-based image retrieval and human-computer interaction.

PDF

Journal key Highlights