From Image to Summary: Using Tesseract and BART for Text Extraction and Summary

Bambo Traore
3 min readJul 7, 2024

--

In an increasingly data-driven world, the ability to extract and summarize information quickly and efficiently is crucial. This article explores the use of two powerful technologies: Tesseract for text extraction from images, and Facebook’s BART (Bidirectional and Auto-Regressive Transformers) for text synthesis and summarization. By integrating these technologies into a Gradio interface, we will demonstrate how to simplify and automate the process of text extraction and summarization.

Installation Libraries

To set up the project follow these steps

  • 1 Create a virtual Environement

Creating a virtual environment helps in managing dependancies and maintaining a clean workspace

python -m venv env
source env/bin/activate # On Windows, use `env\Scripts\activate`
  • 2 Create a requirements.txt file

List all the necessary libraries in a requirements.txt

tesseract==4.41.2
transformers==4.9.2
gradio==2.4.7
torch==2.3.1
gradio==4.37.2
numpy==1.23.5
pillow==10.3.0
pytesseract==0.3.10
protobuf
  • 3 Install dependencies

Install the libraries listed in the requirements.txt

pip install -r requirements.txt

Now let’s set up the Python application file app.py wiht the neccesary import and functions

import pytesseract
from transformers import pipeline, BartForConditionalGeneration, BartTokenizer
import gradio as gr
from PIL import Image,UnidentifieldImageError
import re
import logging

Initializing the Model :

We initialized the BART using the pipeline from huggingFace

summarize =pipeline('summarization',model="facebook/bart-large-cnn")

Function to extract text from image:

Now create a function to extract text from image

def image_extract(image):
try:
if image is None:
return "Not found image"
#extract text from image
raw_text = pytesseract.image_to_string(image)
#preprocessing raw text
preprocessing = re.sub(r'\s+',' ',raw_text).strip()
# summarize the preprocessing
text_summary = summarize(preprocessing,do_sample=False,min_lenght=50,max_lenght=512
summary_text_from_image = text_summary[0].get('summary_text')

return summary_text_from_image
except UnidentifiedImageError:
return "Error to load image"
except Exception as e:
return str(e)

Interface Gradio

  • The Gradio interface takes an image input and a checkbox to choose between using the pipeline or direct model/tokenizer interaction.
  • It displays the extracted text and the summarized version based on the chosen method.

Create a gradio interface to interact

interface = gr.Interface(
fn=image_extract,
inputs=gr.Image(label="Upload a file to extract text",type='pil'),
outputs=gr.Textbox(label="Summary"),
title="From image to summary:Using Bart et Tesseract"
)

if __name__=="__main__":
interface.lunch()

Run the Application:

python.app

And open our browser and type : 127.0.0.1:7860

Conclusion

This project showcases the potential of combining OCR technology with advanced NLP models to handle large volumes of textual data efficiently. By providing accessible tools like Gradio, this application demonstrates how such technologies can be harnessed to streamline and enhance data-driven workflows in various domains.

Demonstration Link to Hugging Face

link to my github :

--

--

Bambo Traore
Bambo Traore

Written by Bambo Traore

ML ,AI Developer I never miss an opportunity to learn new things.I am a passionate about computer technologies,