Gen AI Powered Knowledge Perception Era utilizing LIDA

June 18, 2024

1

Introduction

This text will introduce the readers to LIDA, an open-source python library for producing detailed information visualizations and interesting infographics. We’ll first perceive that how LIDA works and what are its core capabilities after which lastly see it in motion by constructing a Streamlit utility that can allow the person to discover the offered csv dataset and unearth beneficial insights robotically by creating superb information visualizations.

Studying Targets

Perceive the challenges related to guide information exploration and evaluation.
Discover LIDA structure and core constructing blocks.
Be taught to construct a completely useful Streamlit utility for automated information exploration and perception technology.

This text was printed as part of the Knowledge Science Blogathon.

Challenges with Guide Knowledge Exploration

Guide information exploration is a labor intensive course of that calls for important effort and time to wash, analyze, and visualize information. Analysts usually face the problem of sifting by way of giant datasets, which
will increase the probability of human error and ignored patterns or insights. Moreover, the guide method may be inconsistent, because it depends closely on the person abilities and experience of the analyst, making it troublesome to breed outcomes or scale the method for bigger datasets.

Automating information exploration accelerates the evaluation course of, making certain extra correct and complete insights. Automation instruments, like LIDA, streamline information visualization, and perception technology, permitting customers to give attention to decision-making and strategic planning.

What’s LIDA and How Does it Work?

LIDA is a open-source python library for producing information visualizations and infographics. LIDA is grammar agnostic and might work with any programming language. It additionally helps a number of visualization libraries like matplotlib, seaborn and so on.

LIDA consists of the next 4 key modules that work collectively in a sequence to generate computerized information visualizations and infographics:

Summarizer

Operate: Converts datasets right into a wealthy however compact pure language illustration (context)
Course of: Makes use of guidelines and huge language fashions (LLMs) to research the dataset.

Aim Explorer

Operate: Generates a set of potential “targets” primarily based on the dataset context.
Course of: Makes use of LLMs to interpret the context and recommend related visualization targets.

Viz Generator

Operate: Generates, evaluates, repairs, filters, and executes visualization code to satisfy specified targets.
Course of: Leverages LLMs to create visualization code within the applicable programming language or grammar.

Infographer

Operate: Generates stylized infographics primarily based on the visualization and elegance prompts.
Course of: Applies picture technology fashions (IGMs) to remodel visualizations into styled infographics.

Now that we’re acquainted with the constructing blocks of LIDA and their respective features, let’s perceive that how all these blocks combine and work collectively in a single workflow:

Dataset Enter: The person offers a CSV dataset (e.g., Automobiles.csv).
Summarization: The Summarizer processes the dataset and generates a pure language context.
Aim Exploration: The Aim Explorer makes use of the context to recommend attainable visualization targets.
Visualization Era: The Viz Generator creates and executes code to provide visualizations primarily based on the chosen targets.
Infographic Creation: The Infographer transforms these visualizations into styled infographics in response to user-defined prompts.
Output Supply: The system outputs a pure language abstract, urged targets, visualization code, and the ultimate stylized infographics.

This built-in method streamlines the method of information exploration, visualization, and infographic creation, making it environment friendly and user-friendly.

Constructing Utility for Computerized Perception Era

Now that we’ve a reasonably truthful thought of LIDA and it’s functioning, let’s roll up our sleeves and get into some motion by constructing a Streamlit utility that can settle for a CSV dataset as enter after which leverage LIDA to generate computerized information visualizations

Step1: Set up Python Libraries

First issues first, let’s set up the required python libraries for our utility. We’ll create a necessities.txt file with the next set of libraries:

Python Library	Description/Use case
uvicorn	A lightning-fast ASGI server for operating Python net functions
streamlit	An open-source app framework for creating and sharing lovely, customized net apps
pandas	A strong information manipulation and evaluation library offering information constructions like DataFrames
lida	A toolkit for producing information visualizations and data-faithful infographics
python-dotenv	A toolkit for producing information visualizations and data-faithful infographics, suitable with varied programming languages and visualization libraries

Then set up all of the libraries by operating the command “pip set up -r necessities.txt”

Step2: Integrating LIDA with LLM

Subsequent, we have to combine LIDA with a LLM that can be used to summarize the dataset, create targets after which lastly generate and execute visualization code. LIDA is very versatile and integrates easily with a number of giant language mannequin suppliers, together with OpenAI, Azure OpenAI, PaLM, Cohere, and Huggingface. Nevertheless, for our utility, we can be utilizing the GPT-3.5 Turbo mannequin by OpenAI and for that we would wish an Open AI API key.

To generate an API key, first, create an OpenAI account or sign up. Subsequent, navigate to the API key web page and “Create new secret key”, optionally naming the important thing. Be certain that to avoid wasting this someplace protected and don’t share it with anybody.

As soon as we’ve API key, create a .env file and save your API key over there

Step3: Streamlit Utility Logic

Lastly, we are going to create the app.py file containing the Streamlit utility logic and LIDA API name.

import streamlit as st
import pandas as pd
from lida import Supervisor, TextGenerationConfig , llm  
from PIL import Picture
from io import BytesIO
import base64
from dotenv import load_dotenv
import os
import openai

# Configuring the OpenAI API Key
load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')

# To transform charts into photographs, in order that they are often displayed on Stremlit front-end
def base64_to_image(base64_string):
    # Decode the base64 string
    byte_data = base64.b64decode(base64_string)
    # Use BytesIO to transform the byte information to picture
    return Picture.open(BytesIO(byte_data))

# Streamlit App Code
st.set_page_config(
    page_title="Computerized Insights and Visualization App",
    page_icon="🤖",
    format="centered",
    initial_sidebar_state="expanded",
)

st.header("Computerized Insights and Visualization 🤖")


menu = st.sidebar.selectbox("Select an Possibility", ["Automatic Insights"])

if menu == "Computerized Insights":
    st.subheader("Generate Computerized Insights")
    # Add CSV dataset as enter
    uploaded_file = st.file_uploader("Select a csv file")
    if uploaded_file isn't None:
        dataframe = pd.read_csv(uploaded_file)
        st.write(dataframe)
        btn = st.button("Generate Solutions", kind = "major")

        if btn: 
            # Generate targets utilizing LIDA
            lida = Supervisor(text_gen = llm("openai"))
            textgen_config = TextGenerationConfig(n=1, 
                                                  temperature=0.5, 
                                                  mannequin="gpt-3.5-turbo-0301", 
                                                  use_cache=True)
            abstract = lida.summarize(dataframe, 
                      summary_method="default", 
                      textgen_config=textgen_config)  
            targets = lida.targets(abstract, n=5, textgen_config=textgen_config)

            i = 0
            library = "seaborn"
            imgs = []
            textgen_config = TextGenerationConfig(n=1, temperature=0.2, use_cache=True)
            # Create the corresponding information visualization for every aim
            for i in vary(len(targets)):
                charts = lida.visualize(abstract=abstract, 
                                        aim=targets[i], 
                                        textgen_config=textgen_config, 
                                        library=library)
                img_base64_string = charts[0].raster
                img = base64_to_image(img_base64_string)
                imgs.append(img)

            tab1, tab2, tab3, tab4, tab5 = st.tabs(
            ["Goal 1", "Goal 2", "Goal 3", "Goal 4", "Goal 5"]
            )

            with tab1:
                st.header("Aim 1")
                targets[0].query
                st.picture(imgs[0])

            with tab2:
                st.header("Aim 2")
                targets[1].query
                st.picture(imgs[1])

            with tab3:
                st.header("Aim 3")
                targets[2].query
                st.picture(imgs[2])
            
            with tab4:
                st.header("Aim 4")
                targets[3].query
                st.picture(imgs[3])
            
            with tab5:
                st.header("Aim 5")
                targets[4].query
                st.picture(imgs[4])

As soon as all of the information are prepared, you possibly can run the streamlit utility utilizing the command “streamlit run app.py”

Conclusion

We explored the challenges related to guide information exploration and the way instruments like LIDA assist us streamline the method by offering a versatile and absolutely computerized answer for information exploration and perception technology We additionally acquired an understanding of the LIDA system structure and its core capabilities. Lastly, we noticed LIDA in motion by constructing an computerized perception technology utility utilizing Streamlit.

Right here is the hyperlink for the video depicting the ultimate utility and it’s working.

Key Takeaways

Whether or not you’re working with Matplotlib or Seaborn, Python or every other programming language, LIDA suits proper into your workflow.
Leverage the newest language fashions to generate clever insights and suggestions in your information.
No step studying curves right here. LIDA is designed to be intuitive and straightforward to make use of, so you possibly can give attention to the issues that matter to you and the enterprise – making data-driven choices.
Automating information exploration accelerates the evaluation course of, making certain extra correct and complete insights.

Often Requested Questions

Q1. What are the completely different LLM fashions supported by LIDA?

A. LIDA helps a number of giant language mannequin suppliers like OpenAI, Azure OpenAI, PaLM, Cohere and Huggingface .

Q2. Is an API key required to work with LIDA?

A. LIDA is an open-source library and doesn’t require an API key as you might want to set up it in your system and run it domestically, however you would possibly want an API key for the LLM mannequin that you can be utilizing with LIDA. For instance, you will have an OpenAI API key in case you are utilizing a mannequin like GPT3.5-Turbo.

Q3. Does LIDA help query-based visualization technology?

A. As a substitute of counting on LIDA for aim technology, a person can explicitly present the question/aim and generate the specified chart. LIDA additionally offers the help for multi-lingual enter.

The media proven on this article isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.

Supply hyperlink