BaseLVM API Reference¶

class lmitf.base_lvm.BaseLVM(api_key: str | None = None, base_url: str | None = None)[source]¶

Bases: object

OpenAI LVM (Language Vision Model) 客户端封装类

提供对 OpenAI Vision API 的简化访问接口，支持图像处理和文本生成。自动处理环境变量配置，维护调用历史记录。

client¶

OpenAI 图像处理客户端实例

Type:: openai.Image

call_history¶

API 调用响应的历史记录

Type:: list[str | dict[str, Any]]

__init__(api_key: str | None = None, base_url: str | None = None)[source]¶

初始化 VLM 客户端

Parameters:

api_key (str, optional) – OpenAI API 密钥。如果未提供，将从环境变量 OPENAI_API_KEY 读取
base_url (str, optional) – API 基础URL。如果未提供，将从环境变量 OPENAI_BASE_URL 读取

create(prompt: str, model: str = 'gpt-image-1', size: str = '1024x1024') → Image[source]¶

edit(image: Image, prompt: str, mask: Image | None = None, model: str = 'gpt-image-1', size: str = '1024x1024') → Image[source]¶

Edit an existing image with a prompt and optional mask.

The image and mask (if provided) are sent as file-like objects. Returns the first edited image as a PIL Image.

class lmitf.base_lvm.AgentLVM(api_key: str | None = None, base_url: str | None = None)[source]¶

Bases: object

__init__(api_key: str | None = None, base_url: str | None = None)[source]¶

初始化 Agent LVM 客户端

Parameters:

api_key (str, optional) – OpenAI API 密钥。如果未提供，将从环境变量 OPENAI_API_KEY 读取
base_url (str, optional) – API 基础URL。如果未提供，将从环境变量 OPENAI_BASE_URL 读取

create(msg: list[dict], model: str = 'gpt-4o') → Image[source]¶

edit(prompt: str, image: Image | list[Image], model: str = 'gpt-4o') → Image[source]¶

Edit an existing image with a prompt.

The image is sent as a file-like object. Returns the first edited image as a PIL Image.

Overview¶

The BaseLVM class provides an interface for working with Large Vision Models (LVMs) that can process both text and images. It’s designed for multimodal AI tasks that require understanding visual content.

Class Reference¶

BaseLVM¶

class lmitf.base_lvm.BaseLVM(api_key: str | None = None, base_url: str | None = None)[source]¶

Bases: object

OpenAI LVM (Language Vision Model) 客户端封装类

提供对 OpenAI Vision API 的简化访问接口，支持图像处理和文本生成。自动处理环境变量配置，维护调用历史记录。

client¶

OpenAI 图像处理客户端实例

Type:: openai.Image

call_history¶

API 调用响应的历史记录

Type:: list[str | dict[str, Any]]

__init__(api_key: str | None = None, base_url: str | None = None)[source]¶

初始化 VLM 客户端

Parameters:

api_key (str, optional) – OpenAI API 密钥。如果未提供，将从环境变量 OPENAI_API_KEY 读取
base_url (str, optional) – API 基础URL。如果未提供，将从环境变量 OPENAI_BASE_URL 读取

create(prompt: str, model: str = 'gpt-image-1', size: str = '1024x1024') → Image[source]¶

edit(image: Image, prompt: str, mask: Image | None = None, model: str = 'gpt-image-1', size: str = '1024x1024') → Image[source]¶

Edit an existing image with a prompt and optional mask.

The image and mask (if provided) are sent as file-like objects. Returns the first edited image as a PIL Image.

Key Features¶

Multimodal Processing: Handle both text and image inputs
Multiple Image Support: Process multiple images in a single request
Flexible Image Input: Support for file paths, URLs, and PIL Image objects
Automatic Encoding: Handle base64 encoding transparently
OpenAI Integration: Built on OpenAI’s vision models

Usage Examples¶

Basic Image Analysis¶

from lmitf import BaseLVM

lvm = BaseLVM()

# Analyze a single image
response = lvm.call(
    messages="What do you see in this image?",
    image_path="path/to/image.jpg"
)
print(response)

Multiple Image Analysis¶

# Compare multiple images
images = ["image1.jpg", "image2.jpg", "image3.jpg"]
response = lvm.call(
    messages="Compare these images and describe the differences",
    image_path=images
)
print(response)

Image with Complex Prompt¶

# Detailed analysis with specific instructions
prompt = """
Analyze this image and provide:
1. A description of the main objects
2. The color palette used
3. Any text visible in the image
4. The overall mood or atmosphere
"""

response = lvm.call(
    messages=prompt,
    image_path="screenshot.png",
    model="gpt-4-vision-preview"
)
print(response)

Image Generation (if supported)¶

# Generate images using create method
image_response = lvm.create(
    prompt="A futuristic city skyline at sunset",
    size="1024x1024",
    quality="standard",
    n=1
)
print(image_response)

Supported Image Formats¶

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
BMP (.bmp)

Input Methods¶

File Paths¶

# Single image file
response = lvm.call("Describe this image", image_path="photo.jpg")

# Multiple image files  
response = lvm.call("Compare these", image_path=["img1.jpg", "img2.png"])

PIL Image Objects¶

from PIL import Image

# Load image with PIL
img = Image.open("photo.jpg")
response = lvm.call("What's in this image?", image_path=img)

URLs (if supported by model)¶

# Remote image URL
response = lvm.call(
    "Analyze this image",
    image_path="https://example.com/image.jpg"
)

Method Reference¶

call()¶

Main method for vision-language tasks.

Parameters:

messages (str | list): Text prompt or conversation
image_path (str | list | PIL.Image): Image(s) to analyze
model (str): Vision model to use
max_tokens (int): Maximum response length
**kwargs: Additional API parameters

Returns:

str: Generated response describing the image(s)

create()¶

Generate images from text prompts (if supported).

Parameters:

prompt (str): Text description of desired image
size (str): Image dimensions (e.g., “1024x1024”)
quality (str): Image quality (“standard” or “hd”)
n (int): Number of images to generate

Returns:

dict: Response with generated image URLs/data

Configuration¶

Environment Setup¶

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"

Manual Configuration¶

lvm = BaseLVM(
    api_key="your-api-key",
    base_url="https://your-endpoint.com/v1"
)

Error Handling¶

try:
    response = lvm.call("Describe this", image_path="missing.jpg")
except FileNotFoundError:
    print("Image file not found")
except Exception as e:
    print(f"API Error: {e}")

Best Practices¶

Image Quality: Use high-quality images for better analysis
File Size: Keep images under 20MB for optimal performance
Batch Processing: Process multiple related images together
Clear Prompts: Be specific about what you want to analyze
Error Handling: Always handle file and network errors

Common Use Cases¶

Document Analysis¶

# OCR and document understanding
response = lvm.call(
    "Extract all text from this document and summarize the key points",
    image_path="document.pdf"
)

Visual QA¶

# Answer questions about images
response = lvm.call(
    "How many people are in this photo and what are they doing?",
    image_path="group_photo.jpg"
)

Chart/Graph Analysis¶

# Analyze data visualizations
response = lvm.call(
    "What trends do you see in this chart? Provide key insights.",
    image_path="sales_chart.png"
)

BaseLVM API Reference¶

Overview¶

Class Reference¶

BaseLVM¶

Key Features¶

Usage Examples¶

Basic Image Analysis¶

Multiple Image Analysis¶

Image with Complex Prompt¶

Image Generation (if supported)¶

Supported Image Formats¶

Input Methods¶

File Paths¶

PIL Image Objects¶

URLs (if supported by model)¶

Method Reference¶

call()¶

create()¶

Configuration¶

Environment Setup¶

Manual Configuration¶

Error Handling¶

Best Practices¶

Common Use Cases¶

Document Analysis¶

Visual QA¶

Chart/Graph Analysis¶

Related Classes¶