Constructing a Product Discovery API with Gemini Imaginative and prescient Professional

May 28, 2024

1

Introduction

The speedy enlargement of the Generative AI mannequin capabilities permits us to develop many companies round GenAI. Right now’s mannequin not solely generates textual content information but additionally, with the highly effective multi-modal mannequin like GPT-4, Gemini can leverage picture information to generate info. This functionality has large potential within the enterprise world similar to you should use any picture to get details about it immediately from the AI with any overhead. On this article, We’ll undergo the processes of utilizing the Gemini Imaginative and prescient Professional multimodal mannequin to get product info from a picture after which making a FastAPI-based REST API to devour the extracted info. So, let’s begin studying by constructing a product discovery API.

Studying Goal

What’s REST structure?
Utilizing REST APIs to entry net information
Tips on how to use FastAPI and Pydantic for creating REST API
What steps to take to construct APIs utilizing Google Gemini Imaginative and prescient Professional
Tips on how to use the Llamaindex library to entry Google Gemini Fashions

This text was revealed as part of the Knowledge Science Blogathon.

What’s a REST API?

A REST API or RESTful API is an utility programming interface (API) that makes use of the design ideas of the Representational State Switch structure. It helps builders to combine utility parts in a microservices structure.

An API is a method to allow an utility or service to entry assets inside one other service or utility.

Let’s take a Restaurant analogy to know the ideas.

You’re a restaurant proprietor, so you might have two companies working when the restaurant is working.

One is the kitchen, the place the scrumptious meals will likely be made.
Two, the sitting or desk space the place folks will sit and eat meals.

Right here, the kitchen is the SERVER the place scrumptious meals or information will likely be produced for the folks or shoppers. Now, folks (shoppers) will examine the menu(API)and may place for order(request) to the kitchen (server) utilizing particular codes (HTTP strategies) like “GET”, “POST”, “PUT”, or “DELETE”.

Perceive the HTTP technique utilizing the restaurant analogy

GET: It means the consumer browses the menu earlier than ordering meals.
POST: Now, shoppers are inserting an order, which suggests the kitchen will begin making the meals or creating information on the server for the shoppers.
PUT: Now, to know the “PUT” technique, think about that after inserting your order, you forgot so as to add ice cream, so that you simply replace the prevailing order, which suggests updating the info.
DELETE: If you wish to cancel the order, delete the info utilizing the “DELETE” technique.

These are probably the most used strategies for constructing API utilizing the REST framework.

What’s the FastAPI framework?

FastAPI is a high-performance trendy net framework for API growth. It’s constructed on high of Starlette for net components and Pydantic for information validation and serialization. Essentially the most noticeable options are beneath:

Excessive Efficiency: It’s based mostly on ASGI(Asynchronous Server Gateway Interface), which suggests FastAPI leverages asynchronous programming, which permits for dealing with high-concurrency eventualities with out a lot overhead.
Knowledge Validation: FastAPI makes use of probably the most extensively used Pydantic information validation. We’ll find out about it later within the article
Automated API documentation utilizing Swagger UI with full OpenAPI normal JSON information.
Straightforward Extensibility: FastAPI permits integration with different Python libraries and frameworks simply

What’s Lammaindex?

LLamaindex is a software that acts as a bridge between your information and LLMs. LLMs could be native utilizing Ollama (run LLMs on a Native machine) or an API service similar to OpenAI, Gemini, and so forth.LLamaindex can construct a Q&A system, Chat course of, clever agent, and different LLM fashions. It lays the muse of Retrieval Augmented Technology (see beneath picture) with ease in three well-defined steps

Step One: Data Base (Enter)
Step Two: Set off/Question(Enter)
Step Three: Process/Motion(Output)

In response to the context of this text, we’ll construct Step Two and Step Three. We’ll give a picture as enter and retrieve the product info from the product within the picture.

Setup challenge setting

Right here is the not-so-good flowchart of the applying:

I’ll use conda to arrange the challenge setting. Comply with the beneath steps

Schematic challenge construction

Step 1: Create a conda setting

conda create --name api-dev python=3.11

conda activate api-dev

Step 2: Set up the mandatory libraries

# llamaindex libraries
pip set up llama-index llama-index-llms-gemini llama-index-multi-modal-llms-gemini

# google generative ai

pip set up google-generativeai google-ai-generativelanguage

# for API growth

pip set up fastapi

Step 3: Getting the Gemini API KEY

Go to Google AI and Click on on Get an API Key. It should go to the Google AI Studio, the place you’ll be able to Create API Key

Maintain it secure and save it; we would require this later.

Implementing REST API

Create a separate folder for the challenge; let’s identify it gemini_productAPI

# create empty challenge listing
mkdir gemini_productAPI

# activate the conda setting
conda activate api-dev

To make sure FastAPI is put in accurately create a Python file identify major.py and duplicate the beneath code to it.

# Touchdown web page for the applying

@app.get("https://www.analyticsvidhya.com/")
def index(request: Request):
    return "Howdy World"

As a result of Fastapi is an ASGI framework, we’ll use an asynchronous net server to run the Fastapi utility. There are two kinds of Server Gateway interfaces: WSGI and ASGI. They each sit between an online server and a Python net utility or framework and deal with incoming consumer requests, however they do it otherwise.

WSGI or Internet Server Gateway interface: It’s synchronous, which suggests it will probably deal with one request at a time and block execution of the opposite till the earlier request is accomplished. Common Python net framework Flask is a WSGI framework.
ASGI or Asynchronous Server Gateway interface: It’s asynchronous, which suggests it will probably deal with a number of requests concurrently with out blocking others. It’s extra trendy and sturdy for a number of shoppers, long-live connections, and bidirectional communication similar to real-time messaging, video calls, and so forth.

Uvicorn is an Asynchronous Server Gateway Interface (ASGI) net server implementation for Python. It should present a normal interface between an async-capable Python net server, framework, and utility. Fastapi is an ASGI framework that makes use of Uvicorn by default.

Now begin the Uvicorn server and go to http://127.0.0.1:8000 in your browser. You will note Howdy World written on it.

-- open your vscode terminal and sort

uvicorn major:app --reload

Now, we’re set to start out coding the primary challenge.

Importing Libraries

import os
from typing import Listing

# fastapi libs
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse
from fastapi.responses import HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates

# pydantic libs
from pydantic import BaseModel, ConfigDict, Subject, HttpUrl

# llamaindex libs
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

After importing libraries, create a file .env and put the Google Gemini API Key you bought earlier.

# put it within the .env file

GOOGLE_API_KEY="AB2CD4EF6GHIJKLM-NO6P3QR6ST"

Now instantiate the FastAPI class and cargo the GOOGLE API KEY from env

app = FastAPI()
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

Create a easy touchdown Web page

Create a GET technique for our easy touchdown web page for the challenge.

# Touchdown web page for the applying

@app.get("https://www.analyticsvidhya.com/", response_class=HTMLResponse)
def landingpage(request: Request):
    return templates.TemplateResponse(
    "landingpage.html", 
    {"request": request}
    )

To render HTML in FastAPI we use the Jinja template. To do that create a template listing on the root of your challenge and for static information similar to CSS and Javascript information create a listing named static. Copy the beneath code in your major.py after the app.

# Linking template listing utilizing Jinja Template
templates = Jinja2Templates(listing="templates")

# Mounting Static information from a static listing
app.mount("/static", StaticFiles(listing="static"), identify="static")

The code above hyperlinks your templates and static listing to the FastAPI utility.

Now, create a file referred to as landingpage.html within the template listing. Go to GithubLink and duplicate the code /template/landingpage.html to your challenge.

Truncated code snippets

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta identify="viewport" content material="width=device-width, initial-scale=1.0">
    <title>Product Discovery API</title>
    <hyperlink href="https://www.analyticsvidhya.com/weblog/2024/05/building-a-product-discovery-api/{{url_for("static', path="/landingpage.css")}}"></hyperlink>
</head>
<physique>
    <header class="header">
        <div class="container header-container">
            <h1 class="content-head is-center">
            Uncover Merchandise with Google Gemini Professional Imaginative and prescient
            ....
    </header>

    <major>
        ....
    </major>

    <footer class="footer">
        <div class="container footer-container">
            <p class="footer-text">© 2024 Product Discovery API. 
            All rights reserved.</p>
        </div>
    </footer>

</physique>
</html>

After that, create a listing named static and two information, landingpage.css and landingpage.js, in your static listing. Now, go to GithubLink and duplicate the code from landingpage.js to your challenge.

Truncated code snippets


doc.getElementById('upload-btn').addEventListener('click on', operate() {
    const fileInput = doc.getElementById('add');
    const file = fileInput.information[0];

   ......

doc.getElementById('contact-form').addEventListener(
'submit', operate(occasion) {
    occasion.preventDefault();
    alert('Message despatched!');
    this.reset();
}
);

for CSS, go to the Github hyperlink and duplicate the code landingpage.css to your challenge.

Truncated code snippets

physique {
    font-family: 'Arial', sans-serif;
    margin: 5px;
    padding: 5px;
    box-sizing: border-box;
    background-color: #f4f4f9;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 0 20px;
}

The ultimate web page will appear to be beneath

This can be a very fundamental touchdown web page created for the article. You may make it extra engaging utilizing CSS styling or React UI.

We’ll use the Google Gemini Professional Imaginative and prescient mannequin to extract product info from a picture.

def gemini_extractor(model_name, output_class, image_documents, prompt_template_str):
    gemini_llm = GeminiMultiModal(api_key=GOOGLE_API_KEY, model_name=model_name)

    llm_program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=PydanticOutputParser(output_class),
        image_documents=image_documents,
        prompt_template_str=prompt_template_str,
        multi_modal_llm=gemini_llm,
        verbose=True,
    )

    response = llm_program()
    return response

We’ll use Llamaindex’s GeminiMultiModal API to work with Google Gemini API on this operate. The LLmaindex MultiModalLLMCompletion API will take the output parser, picture information, immediate, and GenAI mannequin to get our desired response from the Gemini Professional Imaginative and prescient mannequin.

For extracting info from the picture, we’ve got to engineer a immediate for this objective

prompt_template_str = """
                You might be an professional system designed to extract merchandise from photos for
                an e-commerce utility. Please present the product identify, product coloration,
                product class and a descriptive question to seek for the product.
                Precisely determine each product in a picture and supply a descriptive
                question to seek for the product. You simply return a accurately formatted
                JSON object with the product identify, product coloration, product class and
                description for every product within the picture
"""

With this immediate, we instruct the mannequin that it’s an professional system that may extract info from a picture. It should extract the data beneath from the given picture enter.

Identify of product
Coloration
Class
Description

This immediate will likely be used as an argument within the above gemini_extractor operate later.

Everyone knows {that a} Generative AI mannequin can usually produce undesired responses. This can be a drawback when working with a generative AI mannequin as a result of it won’t at all times comply with the immediate (more often than not). To mitigate any such situation, Pydantic comes within the scene. FastAPI was constructed utilizing Pydantic to validate its API mannequin.

Making a Product mannequin utilizing Pydantic

class Product(BaseModel):
    id: int
    product_name: str
    coloration: str
    class: str
    description: str


class ExtractedProductsResponse(BaseModel):
    merchandise: Listing[Product]

class ImageRequest(BaseModel):
    url: str

    model_config = ConfigDict(
        json_schema_extra={
            "examples": [
                {
                    
                    "url": "https://images.pexels.com/photos/356056/pexels-photo-356056.jpeg?
                    auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
                }
            ]
        }
    )

The above Product class defines an information mannequin for a product, and the ExtractedProductResponse class represents a response construction that accommodates an inventory of those merchandise, in addition to the ImageRequest class for offering enter photos for shoppers. We used Pydantic to make sure the structural integrity of the info validation and serialization.

all_products = []

@app.publish("/extracted_products")
def extracted_products(image_request: ImageRequest):
    responses = gemini_extractor(
        model_name="fashions/gemini-pro-vision",
        output_class=ExtractedProductsResponse,
        image_documents=load_image_urls([image_request.url]),
        prompt_template_str=prompt_template_str,
    )
    all_products.append(responses)
    return responses

Within the above code snippets, we create an endpoint within the FastAPI utility utilizing the POST technique with decorator @app.publish(“/extracted_products”), which can course of the requested picture to extract product info. The extracted_products technique will deal with the request to those endpoints. It should take the image_request parameter of sort ImageRequest.

We referred to as the gemini_extractor operate we created beforehand for info extraction, and the response will likely be saved within the all_products listing. We’ll use a built-in Python listing to retailer the responses for simplicity. You possibly can add database logic to retailer the response within the database. MongoDB can be a sensible choice for storing any such JSON information.

Requesting a picture from OPENAPI doc

Go to http://127.0.0.1:8000/docs in your browser; you’re going to get an OpenAPI docs

Develop the /extracted_product and click on Strive It Out on the fitting

Then click on Execute and it’ll extract the product info from the picture utilizing the Gemini Imaginative and prescient Professional mannequin.

Response from the POST technique will appear to be this.

Within the above picture, you’ll be able to see the requested URL and response physique, which is the generated response from the Gemini mannequin

Making a product endpoint with a GET technique for fetching the info

@app.get("/api/merchandise/", response_model=listing[ExtractedProductsResponse])
async def get_all_products_api():
    return all_products

Go to http://127.0.0.1:8000/api/merchandise to see all of the merchandise

Within the above code, we created an endpoint to fetch the extracted information saved within the database. Others can use this JSON information for his or her merchandise, similar to making and e-commerce websites.

All of the code used on this article within the GitHub Gemini-Product-Utility

Conclusion

This can be a easy but systematic method to entry and make the most of the Gemini Multimodal Mannequin to make minimal viable product discovery API. You should use this method to construct a extra sturdy product discovery system immediately from a picture. The sort of utility has very helpful enterprise potential, e.g., an Android utility that makes use of digicam API to take pictures and Gemini API for extracting product info from that picture, which will likely be used for purchasing merchandise immediately from Amazon, Flipkart, and so forth.

Key Takeaways

The structure of Representational State Switch for constructing high-performance API for enterprise.
Llamaindex has an API to attach totally different GenAI fashions. Right now, we’ll learn to use Llamaindex with the Gemini API to extract picture information utilizing the Gemini Imaginative and prescient Professional mannequin.
Many Python frameworks, similar to Flask, Django, and FastAPI, are used to construct REST APIs. We learn to use FastAPI to construct a strong REST API.
Immediate engineering to get the anticipated response from Gemini Mannequin

The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.

Incessantly Requested Questions

Q1. Tips on how to use Llamaindex for various fashions?

A: Llamaindex has default OpenAI entry, however if you wish to use different fashions similar to Cohere, Gemini, Llama2, Ollama, or MistralAI, you’ll be able to set up a model-specific library utilizing PIP. See use instances right here.

Q2. Tips on how to use frontend frameworks similar to NextJS, Vite, and React with FastAPI?

A: You should use any UI frameworks you need with FastAPI. It’s important to create a frontend listing and Backend dir for the API utility within the FastAPI root and hyperlink your frontend UI with the Backend. See the Full Stack utility right here.

Q3. Which Database will likely be good for storing responses?

A: The responses from the mannequin are JSON, so any doc database similar to MongoDB can be good for storing the response and retrieving the info for the applying. See MongoDB with FastAPI right here.

Supply hyperlink

Constructing a Product Discovery API with Gemini Imaginative and prescient Professional

Introduction

Studying Goal

What’s a REST API?

What’s the FastAPI framework?

What’s Lammaindex?

Setup challenge setting

Schematic challenge construction

Step 1: Create a conda setting

Step 2: Set up the mandatory libraries

Step 3: Getting the Gemini API KEY

Implementing REST API

Importing Libraries

Create a easy touchdown Web page

Truncated code snippets

Truncated code snippets

Truncated code snippets

Making a Product mannequin utilizing Pydantic

Requesting a picture from OPENAPI doc

Making a product endpoint with a GET technique for fetching the info

Conclusion

Key Takeaways

Incessantly Requested Questions

Related Articles

Elon Musk Is Dangerous at Human Assets Administration: HR Consultants

OpenAI Unveils ChatGPT-4o • TechNotes Weblog

Some Graduates Should Earn $50K a Yr for Decade to Make Diploma Pay Off

LEAVE A REPLY Cancel reply

Latest Articles

Elon Musk Is Dangerous at Human Assets Administration: HR Consultants

OpenAI Unveils ChatGPT-4o • TechNotes Weblog

Some Graduates Should Earn $50K a Yr for Decade to Make Diploma Pay Off

New safety and privateness options in Android 15

The decline of the consumer interface