Introduction
The speedy enlargement of the Generative AI mannequin capabilities permits us to develop many companies round GenAI. Right now’s mannequin not solely generates textual content information but additionally, with the highly effective multi-modal mannequin like GPT-4, Gemini can leverage picture information to generate info. This functionality has large potential within the enterprise world similar to you should use any picture to get details about it immediately from the AI with any overhead. On this article, We’ll undergo the processes of utilizing the Gemini Imaginative and prescient Professional multimodal mannequin to get product info from a picture after which making a FastAPI-based REST API to devour the extracted info. So, let’s begin studying by constructing a product discovery API.

Studying Goal
- What’s REST structure?
- Utilizing REST APIs to entry net information
- Tips on how to use FastAPI and Pydantic for creating REST API
- What steps to take to construct APIs utilizing Google Gemini Imaginative and prescient Professional
- Tips on how to use the Llamaindex library to entry Google Gemini Fashions
This text was revealed as part of the Knowledge Science Blogathon.
What’s a REST API?
A REST API or RESTful API is an utility programming interface (API) that makes use of the design ideas of the Representational State Switch structure. It helps builders to combine utility parts in a microservices structure.
An API is a method to allow an utility or service to entry assets inside one other service or utility.
Let’s take a Restaurant analogy to know the ideas.
You’re a restaurant proprietor, so you might have two companies working when the restaurant is working.
- One is the kitchen, the place the scrumptious meals will likely be made.
- Two, the sitting or desk space the place folks will sit and eat meals.

Right here, the kitchen is the SERVER the place scrumptious meals or information will likely be produced for the folks or shoppers. Now, folks (shoppers) will examine the menu(API)and may place for order(request) to the kitchen (server) utilizing particular codes (HTTP strategies) like “GET”, “POST”, “PUT”, or “DELETE”.
Perceive the HTTP technique utilizing the restaurant analogy
- GET: It means the consumer browses the menu earlier than ordering meals.
- POST: Now, shoppers are inserting an order, which suggests the kitchen will begin making the meals or creating information on the server for the shoppers.
- PUT: Now, to know the “PUT” technique, think about that after inserting your order, you forgot so as to add ice cream, so that you simply replace the prevailing order, which suggests updating the info.
- DELETE: If you wish to cancel the order, delete the info utilizing the “DELETE” technique.
These are probably the most used strategies for constructing API utilizing the REST framework.
What’s the FastAPI framework?
FastAPI is a high-performance trendy net framework for API growth. It’s constructed on high of Starlette for net components and Pydantic for information validation and serialization. Essentially the most noticeable options are beneath:
- Excessive Efficiency: It’s based mostly on ASGI(Asynchronous Server Gateway Interface), which suggests FastAPI leverages asynchronous programming, which permits for dealing with high-concurrency eventualities with out a lot overhead.
- Knowledge Validation: FastAPI makes use of probably the most extensively used Pydantic information validation. We’ll find out about it later within the article
- Automated API documentation utilizing Swagger UI with full OpenAPI normal JSON information.
- Straightforward Extensibility: FastAPI permits integration with different Python libraries and frameworks simply
What’s Lammaindex?
LLamaindex is a software that acts as a bridge between your information and LLMs. LLMs could be native utilizing Ollama (run LLMs on a Native machine) or an API service similar to OpenAI, Gemini, and so forth.LLamaindex can construct a Q&A system, Chat course of, clever agent, and different LLM fashions. It lays the muse of Retrieval Augmented Technology (see beneath picture) with ease in three well-defined steps

- Step One: Data Base (Enter)
- Step Two: Set off/Question(Enter)
- Step Three: Process/Motion(Output)
In response to the context of this text, we’ll construct Step Two and Step Three. We’ll give a picture as enter and retrieve the product info from the product within the picture.
Setup challenge setting
Right here is the not-so-good flowchart of the applying:

I’ll use conda to arrange the challenge setting. Comply with the beneath steps
Schematic challenge construction

Step 1: Create a conda setting
conda create --name api-dev python=3.11
conda activate api-dev
Step 2: Set up the mandatory libraries
# llamaindex libraries
pip set up llama-index llama-index-llms-gemini llama-index-multi-modal-llms-gemini
# google generative ai
pip set up google-generativeai google-ai-generativelanguage
# for API growth
pip set up fastapi
Step 3: Getting the Gemini API KEY
Go to Google AI and Click on on Get an API Key. It should go to the Google AI Studio, the place you’ll be able to Create API Key

Maintain it secure and save it; we would require this later.
Implementing REST API
Create a separate folder for the challenge; let’s identify it gemini_productAPI
# create empty challenge listing
mkdir gemini_productAPI
# activate the conda setting
conda activate api-dev
To make sure FastAPI is put in accurately create a Python file identify major.py and duplicate the beneath code to it.
# Touchdown web page for the applying
@app.get("https://www.analyticsvidhya.com/")
def index(request: Request):
return "Howdy World"
As a result of Fastapi is an ASGI framework, we’ll use an asynchronous net server to run the Fastapi utility. There are two kinds of Server Gateway interfaces: WSGI and ASGI. They each sit between an online server and a Python net utility or framework and deal with incoming consumer requests, however they do it otherwise.
- WSGI or Internet Server Gateway interface: It’s synchronous, which suggests it will probably deal with one request at a time and block execution of the opposite till the earlier request is accomplished. Common Python net framework Flask is a WSGI framework.
- ASGI or Asynchronous Server Gateway interface: It’s asynchronous, which suggests it will probably deal with a number of requests concurrently with out blocking others. It’s extra trendy and sturdy for a number of shoppers, long-live connections, and bidirectional communication similar to real-time messaging, video calls, and so forth.
Uvicorn is an Asynchronous Server Gateway Interface (ASGI) net server implementation for Python. It should present a normal interface between an async-capable Python net server, framework, and utility. Fastapi is an ASGI framework that makes use of Uvicorn by default.
Now begin the Uvicorn server and go to http://127.0.0.1:8000 in your browser. You will note Howdy World written on it.
-- open your vscode terminal and sort
uvicorn major:app --reload

Now, we’re set to start out coding the primary challenge.
Importing Libraries
import os
from typing import Listing
# fastapi libs
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse
from fastapi.responses import HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
# pydantic libs
from pydantic import BaseModel, ConfigDict, Subject, HttpUrl
# llamaindex libs
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
After importing libraries, create a file .env and put the Google Gemini API Key you bought earlier.
# put it within the .env file
GOOGLE_API_KEY="AB2CD4EF6GHIJKLM-NO6P3QR6ST"
Now instantiate the FastAPI class and cargo the GOOGLE API KEY from env
app = FastAPI()
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
Create a easy touchdown Web page
Create a GET technique for our easy touchdown web page for the challenge.
# Touchdown web page for the applying
@app.get("https://www.analyticsvidhya.com/", response_class=HTMLResponse)
def landingpage(request: Request):
return templates.TemplateResponse(
"landingpage.html",
{"request": request}
)
To render HTML in FastAPI we use the Jinja template. To do that create a template listing on the root of your challenge and for static information similar to CSS and Javascript information create a listing named static. Copy the beneath code in your major.py after the app.
# Linking template listing utilizing Jinja Template
templates = Jinja2Templates(listing="templates")
# Mounting Static information from a static listing
app.mount("/static", StaticFiles(listing="static"), identify="static")
The code above hyperlinks your templates and static listing to the FastAPI utility.
Now, create a file referred to as landingpage.html within the template listing. Go to GithubLink and duplicate the code /template/landingpage.html to your challenge.
Truncated code snippets
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta identify="viewport" content material="width=device-width, initial-scale=1.0">
<title>Product Discovery API</title>
<hyperlink href="https://www.analyticsvidhya.com/weblog/2024/05/building-a-product-discovery-api/{{url_for("static', path="/landingpage.css")}}"></hyperlink>
</head>
<physique>
<header class="header">
<div class="container header-container">
<h1 class="content-head is-center">
Uncover Merchandise with Google Gemini Professional Imaginative and prescient
....
</header>
<major>
....
</major>
<footer class="footer">
<div class="container footer-container">
<p class="footer-text">© 2024 Product Discovery API.
All rights reserved.</p>
</div>
</footer>
</physique>
</html>
After that, create a listing named static and two information, landingpage.css and landingpage.js, in your static listing. Now, go to GithubLink and duplicate the code from landingpage.js to your challenge.
Truncated code snippets
doc.getElementById('upload-btn').addEventListener('click on', operate() {
const fileInput = doc.getElementById('add');
const file = fileInput.information[0];
......
doc.getElementById('contact-form').addEventListener(
'submit', operate(occasion) {
occasion.preventDefault();
alert('Message despatched!');
this.reset();
}
);
for CSS, go to the Github hyperlink and duplicate the code landingpage.css to your challenge.
Truncated code snippets
physique {
font-family: 'Arial', sans-serif;
margin: 5px;
padding: 5px;
box-sizing: border-box;
background-color: #f4f4f9;
}
.container {
max-width: 1200px;
margin: 0 auto;
padding: 0 20px;
}
The ultimate web page will appear to be beneath

This can be a very fundamental touchdown web page created for the article. You may make it extra engaging utilizing CSS styling or React UI.
We’ll use the Google Gemini Professional Imaginative and prescient mannequin to extract product info from a picture.
def gemini_extractor(model_name, output_class, image_documents, prompt_template_str):
gemini_llm = GeminiMultiModal(api_key=GOOGLE_API_KEY, model_name=model_name)
llm_program = MultiModalLLMCompletionProgram.from_defaults(
output_parser=PydanticOutputParser(output_class),
image_documents=image_documents,
prompt_template_str=prompt_template_str,
multi_modal_llm=gemini_llm,
verbose=True,
)
response = llm_program()
return response
We’ll use Llamaindex’s GeminiMultiModal API to work with Google Gemini API on this operate. The LLmaindex MultiModalLLMCompletion API will take the output parser, picture information, immediate, and GenAI mannequin to get our desired response from the Gemini Professional Imaginative and prescient mannequin.
For extracting info from the picture, we’ve got to engineer a immediate for this objective
prompt_template_str = """
You might be an professional system designed to extract merchandise from photos for
an e-commerce utility. Please present the product identify, product coloration,
product class and a descriptive question to seek for the product.
Precisely determine each product in a picture and supply a descriptive
question to seek for the product. You simply return a accurately formatted
JSON object with the product identify, product coloration, product class and
description for every product within the picture
"""
With this immediate, we instruct the mannequin that it’s an professional system that may extract info from a picture. It should extract the data beneath from the given picture enter.
- Identify of product
- Coloration
- Class
- Description
This immediate will likely be used as an argument within the above gemini_extractor operate later.
Everyone knows {that a} Generative AI mannequin can usually produce undesired responses. This can be a drawback when working with a generative AI mannequin as a result of it won’t at all times comply with the immediate (more often than not). To mitigate any such situation, Pydantic comes within the scene. FastAPI was constructed utilizing Pydantic to validate its API mannequin.
Making a Product mannequin utilizing Pydantic
class Product(BaseModel):
id: int
product_name: str
coloration: str
class: str
description: str
class ExtractedProductsResponse(BaseModel):
merchandise: Listing[Product]
class ImageRequest(BaseModel):
url: str
model_config = ConfigDict(
json_schema_extra={
"examples": [
{
"url": "https://images.pexels.com/photos/356056/pexels-photo-356056.jpeg?
auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
}
]
}
)
The above Product class defines an information mannequin for a product, and the ExtractedProductResponse class represents a response construction that accommodates an inventory of those merchandise, in addition to the ImageRequest class for offering enter photos for shoppers. We used Pydantic to make sure the structural integrity of the info validation and serialization.
all_products = []
@app.publish("/extracted_products")
def extracted_products(image_request: ImageRequest):
responses = gemini_extractor(
model_name="fashions/gemini-pro-vision",
output_class=ExtractedProductsResponse,
image_documents=load_image_urls([image_request.url]),
prompt_template_str=prompt_template_str,
)
all_products.append(responses)
return responses
Within the above code snippets, we create an endpoint within the FastAPI utility utilizing the POST technique with decorator @app.publish(“/extracted_products”), which can course of the requested picture to extract product info. The extracted_products technique will deal with the request to those endpoints. It should take the image_request parameter of sort ImageRequest.
We referred to as the gemini_extractor operate we created beforehand for info extraction, and the response will likely be saved within the all_products listing. We’ll use a built-in Python listing to retailer the responses for simplicity. You possibly can add database logic to retailer the response within the database. MongoDB can be a sensible choice for storing any such JSON information.
Requesting a picture from OPENAPI doc
Go to http://127.0.0.1:8000/docs in your browser; you’re going to get an OpenAPI docs

Develop the /extracted_product and click on Strive It Out on the fitting

Then click on Execute and it’ll extract the product info from the picture utilizing the Gemini Imaginative and prescient Professional mannequin.

Response from the POST technique will appear to be this.

Within the above picture, you’ll be able to see the requested URL and response physique, which is the generated response from the Gemini mannequin
Making a product endpoint with a GET technique for fetching the info
@app.get("/api/merchandise/", response_model=listing[ExtractedProductsResponse])
async def get_all_products_api():
return all_products
Go to http://127.0.0.1:8000/api/merchandise to see all of the merchandise

Within the above code, we created an endpoint to fetch the extracted information saved within the database. Others can use this JSON information for his or her merchandise, similar to making and e-commerce websites.
All of the code used on this article within the GitHub Gemini-Product-Utility
Conclusion
This can be a easy but systematic method to entry and make the most of the Gemini Multimodal Mannequin to make minimal viable product discovery API. You should use this method to construct a extra sturdy product discovery system immediately from a picture. The sort of utility has very helpful enterprise potential, e.g., an Android utility that makes use of digicam API to take pictures and Gemini API for extracting product info from that picture, which will likely be used for purchasing merchandise immediately from Amazon, Flipkart, and so forth.
Key Takeaways
- The structure of Representational State Switch for constructing high-performance API for enterprise.
- Llamaindex has an API to attach totally different GenAI fashions. Right now, we’ll learn to use Llamaindex with the Gemini API to extract picture information utilizing the Gemini Imaginative and prescient Professional mannequin.
- Many Python frameworks, similar to Flask, Django, and FastAPI, are used to construct REST APIs. We learn to use FastAPI to construct a strong REST API.
- Immediate engineering to get the anticipated response from Gemini Mannequin
The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.
Incessantly Requested Questions
A: Llamaindex has default OpenAI entry, however if you wish to use different fashions similar to Cohere, Gemini, Llama2, Ollama, or MistralAI, you’ll be able to set up a model-specific library utilizing PIP. See use instances right here.
A: You should use any UI frameworks you need with FastAPI. It’s important to create a frontend listing and Backend dir for the API utility within the FastAPI root and hyperlink your frontend UI with the Backend. See the Full Stack utility right here.
A: The responses from the mannequin are JSON, so any doc database similar to MongoDB can be good for storing the response and retrieving the info for the applying. See MongoDB with FastAPI right here.


