BaseLVM API Reference

class lmitf.base_lvm.BaseLVM(api_key: str | None = None, base_url: str | None = None)[source]

Bases: object

OpenAI LVM (Language Vision Model) 客户端封装类

提供对 OpenAI Vision API 的简化访问接口,支持图像处理和文本生成。 自动处理环境变量配置,维护调用历史记录。

client

OpenAI 图像处理客户端实例

Type:

openai.Image

call_history

API 调用响应的历史记录

Type:

list[str | dict[str, Any]]

__init__(api_key: str | None = None, base_url: str | None = None)[source]

初始化 VLM 客户端

Parameters:
  • api_key (str, optional) – OpenAI API 密钥。如果未提供,将从环境变量 OPENAI_API_KEY 读取

  • base_url (str, optional) – API 基础URL。如果未提供,将从环境变量 OPENAI_BASE_URL 读取

create(prompt: str, model: str = 'gpt-image-1', size: str = '1024x1024') Image[source]
edit(image: Image, prompt: str, mask: Image | None = None, model: str = 'gpt-image-1', size: str = '1024x1024') Image[source]

Edit an existing image with a prompt and optional mask.

The image and mask (if provided) are sent as file-like objects. Returns the first edited image as a PIL Image.

class lmitf.base_lvm.AgentLVM(api_key: str | None = None, base_url: str | None = None)[source]

Bases: object

__init__(api_key: str | None = None, base_url: str | None = None)[source]

初始化 Agent LVM 客户端

Parameters:
  • api_key (str, optional) – OpenAI API 密钥。如果未提供,将从环境变量 OPENAI_API_KEY 读取

  • base_url (str, optional) – API 基础URL。如果未提供,将从环境变量 OPENAI_BASE_URL 读取

create(msg: list[dict], model: str = 'gpt-4o') Image[source]
edit(prompt: str, image: Image | list[Image], model: str = 'gpt-4o') Image[source]

Edit an existing image with a prompt.

The image is sent as a file-like object. Returns the first edited image as a PIL Image.

Overview

The BaseLVM class provides an interface for working with Large Vision Models (LVMs) that can process both text and images. It’s designed for multimodal AI tasks that require understanding visual content.

Class Reference

BaseLVM

class lmitf.base_lvm.BaseLVM(api_key: str | None = None, base_url: str | None = None)[source]

Bases: object

OpenAI LVM (Language Vision Model) 客户端封装类

提供对 OpenAI Vision API 的简化访问接口,支持图像处理和文本生成。 自动处理环境变量配置,维护调用历史记录。

client

OpenAI 图像处理客户端实例

Type:

openai.Image

call_history

API 调用响应的历史记录

Type:

list[str | dict[str, Any]]

__init__(api_key: str | None = None, base_url: str | None = None)[source]

初始化 VLM 客户端

Parameters:
  • api_key (str, optional) – OpenAI API 密钥。如果未提供,将从环境变量 OPENAI_API_KEY 读取

  • base_url (str, optional) – API 基础URL。如果未提供,将从环境变量 OPENAI_BASE_URL 读取

create(prompt: str, model: str = 'gpt-image-1', size: str = '1024x1024') Image[source]
edit(image: Image, prompt: str, mask: Image | None = None, model: str = 'gpt-image-1', size: str = '1024x1024') Image[source]

Edit an existing image with a prompt and optional mask.

The image and mask (if provided) are sent as file-like objects. Returns the first edited image as a PIL Image.

Key Features

  • Multimodal Processing: Handle both text and image inputs

  • Multiple Image Support: Process multiple images in a single request

  • Flexible Image Input: Support for file paths, URLs, and PIL Image objects

  • Automatic Encoding: Handle base64 encoding transparently

  • OpenAI Integration: Built on OpenAI’s vision models

Usage Examples

Basic Image Analysis

from lmitf import BaseLVM

lvm = BaseLVM()

# Analyze a single image
response = lvm.call(
    messages="What do you see in this image?",
    image_path="path/to/image.jpg"
)
print(response)

Multiple Image Analysis

# Compare multiple images
images = ["image1.jpg", "image2.jpg", "image3.jpg"]
response = lvm.call(
    messages="Compare these images and describe the differences",
    image_path=images
)
print(response)

Image with Complex Prompt

# Detailed analysis with specific instructions
prompt = """
Analyze this image and provide:
1. A description of the main objects
2. The color palette used
3. Any text visible in the image
4. The overall mood or atmosphere
"""

response = lvm.call(
    messages=prompt,
    image_path="screenshot.png",
    model="gpt-4-vision-preview"
)
print(response)

Image Generation (if supported)

# Generate images using create method
image_response = lvm.create(
    prompt="A futuristic city skyline at sunset",
    size="1024x1024",
    quality="standard",
    n=1
)
print(image_response)

Supported Image Formats

  • JPEG (.jpg, .jpeg)

  • PNG (.png)

  • GIF (.gif)

  • WebP (.webp)

  • BMP (.bmp)

Input Methods

File Paths

# Single image file
response = lvm.call("Describe this image", image_path="photo.jpg")

# Multiple image files  
response = lvm.call("Compare these", image_path=["img1.jpg", "img2.png"])

PIL Image Objects

from PIL import Image

# Load image with PIL
img = Image.open("photo.jpg")
response = lvm.call("What's in this image?", image_path=img)

URLs (if supported by model)

# Remote image URL
response = lvm.call(
    "Analyze this image",
    image_path="https://example.com/image.jpg"
)

Method Reference

call()

Main method for vision-language tasks.

Parameters:

  • messages (str | list): Text prompt or conversation

  • image_path (str | list | PIL.Image): Image(s) to analyze

  • model (str): Vision model to use

  • max_tokens (int): Maximum response length

  • **kwargs: Additional API parameters

Returns:

  • str: Generated response describing the image(s)

create()

Generate images from text prompts (if supported).

Parameters:

  • prompt (str): Text description of desired image

  • size (str): Image dimensions (e.g., “1024x1024”)

  • quality (str): Image quality (“standard” or “hd”)

  • n (int): Number of images to generate

Returns:

  • dict: Response with generated image URLs/data

Configuration

Environment Setup

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"

Manual Configuration

lvm = BaseLVM(
    api_key="your-api-key",
    base_url="https://your-endpoint.com/v1"
)

Error Handling

try:
    response = lvm.call("Describe this", image_path="missing.jpg")
except FileNotFoundError:
    print("Image file not found")
except Exception as e:
    print(f"API Error: {e}")

Best Practices

  1. Image Quality: Use high-quality images for better analysis

  2. File Size: Keep images under 20MB for optimal performance

  3. Batch Processing: Process multiple related images together

  4. Clear Prompts: Be specific about what you want to analyze

  5. Error Handling: Always handle file and network errors

Common Use Cases

Document Analysis

# OCR and document understanding
response = lvm.call(
    "Extract all text from this document and summarize the key points",
    image_path="document.pdf"
)

Visual QA

# Answer questions about images
response = lvm.call(
    "How many people are in this photo and what are they doing?",
    image_path="group_photo.jpg"
)

Chart/Graph Analysis

# Analyze data visualizations
response = lvm.call(
    "What trends do you see in this chart? Provide key insights.",
    image_path="sales_chart.png"
)