Automatic Image and Video caption Generation with Deep in learning

Automatic Image and Video caption Generation with Deep in learning

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 15 Issue 1, 2024.

  • Abstract and Keywords
  • How to Cite this Article
  • <> BibTeX Source

Abstract: In the area of video summarization applications, automatic image caption synthesis using deep learning is a promising approach. This methodology utilizes the capabilities of neural networks to autonomously produce detailed textual descriptions for significant frames or instances in a video. Through the examination of visual elements, deep learning models possess the capability to discern and classify objects, scenarios, and actions, hence enabling the generation of coherent and useful captions. This paper presents a novel methodology for generating image captions in the context of video summarizing applications. DenseNet201 architecture is used to extract image features, enabling the effective extraction of comprehensive visual information from keyframes in the videos. In text processing, GloVe embedding, which is pre-trained word vectors that capture semantic associations between words, is employed to efficiently represent textual information. The utilization of these embeddings establishes a fundamental basis for comprehending the contextual variations and semantic significance of words contained within the captions. LSTM models are subsequently utilized to process the GloVe embeddings, facilitating the development of captions that keep coherence, context, and readability. The integration of GloVe embeddings with LSTM models in this study facilitates the effective fusion of visual and textual data, leading to the generation of captions that are both informative and contextually relevant for video summarization. The proposed model significantly enhances the performance by combining the strengths of convolutional neural networks for image analysis and recurrent neural networks for natural language generation. The experimental results demonstrate the effectiveness of the proposed approach in generating informative captions for video summarization, offering a valuable tool for content understanding, retrieval, and recommendation.

Keywords: Video summarization; deep learning; image caption synthesis; densenet201; GloVe embeddings; LSTM

Mohammed Inayathulla and Karthikeyan C, “Image Caption Generation using Deep Learning For Video Summarization Applications” International Journal of Advanced Computer Science and Applications(IJACSA), 15(1), 2024. http://dx.doi.org/10.14569/IJACSA.2024.0150155

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

https://thesai.org/Publications/ViewPaper?Volume=15&Issue=1&Code=IJACSA&SerialNo=55

Author

  • Michael Turner

    Michael Turner is an experienced automotive journalist with over 12 years of expertise in covering global car markets, electric vehicle innovations, and transport infrastructure. His work combines deep technical knowledge with a passion for storytelling, making complex industry trends accessible to a broad audience. At Red88 News, Michael delivers sharp insights into how the automotive world is reshaping our future.

red88news

Michael Turner is an experienced automotive journalist with over 12 years of expertise in covering global car markets, electric vehicle innovations, and transport infrastructure. His work combines deep technical knowledge with a passion for storytelling, making complex industry trends accessible to a broad audience. At Red88 News, Michael delivers sharp insights into how the automotive world is reshaping our future.

More From Author

Things to do and know before investing in a company

Dead Battery? How To Charge a Car Battery Yourself

Tag Cloud

Your browser doesn't support the HTML5 CANVAS tag.

Subscribe