Build a smart product data generator from image with GPT-4o and Langchain
When listing new products to an online store, owners or marketers often find it too time-consuming to fill in the essential information such as title, description, and tags for each product from scratch. Most of the information can be retrieved from the product image itself. With the right combination of LLM and AI tools, such as Langchain and OpenAI, we can automate the process of writing product's information using an input of image, which is our focus in today's post.
Table of contents
- Table of contents
- Brief introduction about Langchain and OpenAI
- The flow of generating product data
- Step 1: Load an product image into base64 format
- Step 2: Ask GPT to generate a product's metadata
- Step 3: Extract the result from GPT in a structured Product format
- Chaining all the steps together using Langchain
- Resources
- Summary
Brief introduction about Langchain and OpenAI
Langchain is a powerful tool that allows you to architect and run AI-powered functions with ease. It provides a simple interface to integrate with different LLMs (Large-Language-Models) APIs and services such as OpenAI, Hugging Face, etc. It also offers an extensible architecture that allows you to create and manage custom chains (pipelines), agents, and workflows tailored to your specific needs.
OpenAI is a leading AI research lab that has developed several powerful LLMs, including GPT-3, GPT-4 and Dall-E. These models can generate human-like text and media based on the input prompt, making them ideal for a wide range of applications, from chatbots to content/image generation.
Setting up Langchain and OpenAI
In this post, we will use GPT-4o model from OpenAI for better image anayzing and text completion, along with the following Langchain Python packages:
langchain-openai
- A package that provides a simple interface to interact with OpenAI API.langchain_core
- The core package of Langchain that provides the necessary tools to build your AI functions.
To install these packages, you use the following command:
python -m pip install langchain-openai langchain-core
Next, let's define the flow of how we generate product information based on a given image.
The flow of generating product data
Our tool will perform the following steps upon receiving an image URL from the user:
- Load the given product image into base64 data URI text format.
- Ask GPT to analyze and generate the required product's metadata based on such data.
- Extract the result from GPT in a structured Product format.
The below diagram demonstrates how the our work flow looks like:
With this flow in mind, let's walk through each step's implementation in detail.
Step 1: Load an product image into base64 format
Before we can ask GPT to generate a product's metadata from a given image URL, we need to convert it into a format that GPT can understand, which is base64
data URI. To do so, we will create an image.py
with the following code:
import base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
The above encode_function
function takes an image_path
, opens and reads the image into bytes format, and then returns the encoded based64
text version.
We then write a load_image
function, which performs the following:
- Receives
inputs
as a dictionary, which contains animage_path
key with the path to the image file, - Reads
inputs[image_path]
into base64 format usingbase64.b64encode()
method. - Assigns the result to
image
property of the returned object for the function.
The code is as follows:
def load_image(inputs: dict) -> dict:
"""Load image from file and encode it as base64."""
image_file = inputs["image_path"]
image_base64 = encode_image(image_file)
return {
"image": image_base64
}
Now we have the image processing step implemented. Next, we will create a function to communicate with GPT for the information desired based on this image data.
Step 2: Ask GPT to generate a product's metadata
In this step, since we are going to send request to GPT API, we need to set up its API's key for related Langchain OpenAI package to pick up and initialize the service.
Setting up OpenAI API key
The most straighforward way is to create an .env
file with an OPENAI_API_KEY
variable, whose value can be found under Settings panel, as shown below:
OPENAI_API_KEY=your-open-ai-api-key
Then, we install python-dotenv
package using the below command:
python -m pip install python-dotenv
And in our generate.py
file, we add the following code to load the key from the .env
file into our project for usage:
import os
from dotenv import load_dotenv
load_dotenv()
And with that, we can implement the function that will invoke the GPT model for answers.
Creating a model to process the image and prompt
In generate.py
, we create a function image_model
that takes inputs
as a dictionary containing the fields: image
and prompt
, where image
is the base64 data URI from step 1.
def image_model(inputs: dict):
"""Invoke model with image and prompt."""
image = inputs["image"]
prompt = inputs["prompt"]
From the given inputs, we compute a user's message to pass to the model. To do so, we use HumanMessage
class from langchain_core.messages
package:
message = HumanMessage(
content=[
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{ image }"
}
},
]
)
In the above code, we pass to HumanMessage
an array of content
containing:
- A
text
object with theprompt
text - An
image_url
object with the base64-encodedimage
data as the URL
Once we have the message
ready, we then initialize a model instance of ChatOpenAI
using gpt-4o
, an 0.5
temperature and a maximum number of 1024
tokens:
from langchain_openai import ChatOpenAI
def image_model(inputs: dict):
"""Invoke model with image and prompt."""
#... previous code
model = ChatOpenAI(temperature=0.5, model="gpt-4o", max_tokens=1024)
And invoke the model with the message
and return the content
of the response, as follows:
def image_model(inputs: dict):
#... previous code
result = model.invoke(message)
return result.content
At this stage, we have the content of the response from GPT. In the next step, we will extract that content in a structured Product format.
Step 3: Extract the result from GPT in a structured Product format
The response from GPT is always in a text format, which requires us to parse and extract the relevant information in a structured Product format. This is not a straightforward step. Fortunately, Langchain provides us several tools to help us with this task, starting with defining the output structure format.
Define the Product structure
We will define a Product
class as a Pydantic model using BaseModel
and Field
from the langchain.pydantic_v1
package, as shown below:
# Product.py
from langchain_core.pydantic_v1 import BaseModel, Field
class Product(BaseModel):
'''Product description'''
title: str = Field(..., title="Product Title", description="Title of the product")
description: str = Field(..., title="Product Description", description="Description of the product")
tags: list = Field([], title="Product Tags", description="Tags for SEO")
The above class defines a Product
model with the following fields:
title
- The title of the productdescription
- The description of the producttags
- The tags for SEO
Next, we declare a parser function that will extract the GPT response into the Product
structure.
Create a function to extract the product information
We can use JsonOutputParser
class to create a custom parser by passing our Product
structure as its pydantic_object
, as follows:
from langchain_core.output_parsers import JsonOutputParser
#... previous code
parser = JsonOutputParser(pydantic_object=Product)
Great. All left is to modify our content
array in Step 2 to include the parser's format instructions, by adding the following element to the array:
content = [
#... previous code
{"type": "text", "text": parser.get_format_instructions()},
{
"type": "image_url",
# ... code
},
]
And with that, all the components for the flow is ready. It's time to chain them together.
Chaining all the steps together using Langchain
Chaining is similar to a train of action carriage, where each carriage can be a step of LLM call, data transformation, or any tool connected together, supporting streaming, async and batch processing out of the box. In our case, we will use TransformChain
for transforming our image_path
input into a proper base64 data input as a pre-processing step of the main flow.
from langchain.chains import TransformChain
load_image_chain = TransformChain(
input_variables=['image_path'],
output_variables=["image"],
transform=load_image
)
From there, we create another generate_product_chain
that chains all the flow components together using |
operator, starting with loading and transforming the image path into a base64 data URI text, then passing its output as the input to our image model for generating the desired data, and finally parsing the result into our target Product format:
generate_product_chain = load_image_chain | image_model | parser
Finally, we define get_product_info
function to invoke the chain with the initial input image_path
and prompt
as follows:
def get_product_info(image_path: str) -> dict:
generate_product_chain = load_image_chain | image_model | parser
prompt = f"""
Given the image of a product, provide the following information:
- Product Title
- Product Description
- At least 13 Product Tags for SEO purposes
"""
return generate_product_chain.invoke({
'image_path': image_path,
'prompt': prompt
})
And that's it! We have successfully built a smart product information generator. You can now use the get_product_info
function to generate product information by giving it a valid image path:
product_info = get_product_info("path/to/image.jpg")
print(product_info)
Resources
Summary
In this post, we have explored how to generate essential product data such as title, description and tags based on a given image using Langchain, Open AI GPT-4o. We have walked through the flow, including loading an image into base64 text format, asking GPT to generate a product's metadata, and extracting the result from GPT in a structured Product format. We have also seen how to chain all the steps together using Langchain to create a working product information generator.
In the next post, we will explore how to deploy this tool as a web service API using Flask. Until then, happy coding!
👉 Learn about Vue 3 and TypeScript with my new book Learning Vue!
👉 If you'd like to catch up with me sometimes, follow me on X | LinkedIn.
Like this post or find it helpful? Share it 👇🏼 😉
Learning Vue
Learn the core concepts of Vue.js, the modern JavaScript framework for building frontend applications and interfaces from scratch