amusing-chameleon-q4rqcGuideReference

Getting Started

What is IRIS

Iris is an advanced browser automation and Robotic Process Automation (RPA) platform that combines visual recognition, AI-driven analysis, and browser control capabilities. It allows you to:

  • Automate repetitive browser tasks with intelligent processing

  • Record and replay browser sessions with precise timing

  • Process videos of browser interactions for RPA analysis

  • Parameterize workflows for data-driven automation

  • Access browser sessions remotely through VNC interfaces

The platform is designed for both developers building automation tools and end-users who need to automate repetitive web tasks without coding.

Key FeaturesCopied!

  • Browser Automation: Control web browsers programmatically through an API

  • Session Recording: Capture and replay user interactions with browsers

  • Video Processing: Analyze recorded sessions with AI vision models

  • RPA Execution: Convert recorded sessions into replayable automation workflows

  • Parameterization: Create dynamic workflows with variable inputs

  • Remote Access: View and control automated browsers through VNC

  • RESTful API: Comprehensive API with OpenAPI documentation

PrerequisitesCopied!

  • Node.js 20.x or later

  • PNPM 8.x or later (recommended package manager)

  • Docker and Docker Compose (for containerized setup)

InstallationCopied!

Local Development Setup

  1. Clone the repository:

git clone <repository-url>
cd iris
  1. Install dependencies:

pnpm install
  1. Set up environment variables:

cp .env.example .env
  1. Edit the .env file with your configuration:

# Server Configuration
PORT=3000
HOST=0.0.0.0

# VLM Configuration
VLM_BASE_URL=...           # Visual Language Model API URL
VLM_API_KEY=...            # API key for VLM service
VLM_MODEL_NAME=tgi
VLM_PROVIDER=ui_tars_1_5

# Application Settings
LANGUAGE=en
MAX_LOOP_COUNT=10
LOOP_INTERVAL_MS=1000
DEFAULT_OPERATOR=browser

Docker Setup

For a containerized setup:

docker-compose up --build

Running the ApplicationCopied!

Development Mode

pnpm run start:dev

Production Mode

pnpm run build
pnpm run start:prod

Accessing the ApplicationCopied!

After starting the application, you can access:

Core FunctionalityCopied!

Browser Automation

Iris provides two main operator types for automation:

  • Browser Operator: For web browser automation

  • Computer Operator: For desktop automation (using @ui-tars/operator-nut-js)

You can create sessions, execute actions, and record interactions through the API.

RPA Workflow

  1. Record a Session: Capture browser interactions as a recording

  2. Process the Recording: Extract actions and metadata

  3. Parameterize if Needed: Add variable inputs to the workflow

  4. Execute RPA: Replay the recorded actions automatically

  5. Monitor Execution: Track progress and handle errors

API Endpoints

  • /api/sessions - Session management

  • /api/config - Configuration management

  • /api/operators - Operator management

  • /api/rpa - RPA execution and management

  • /api/docs - Swagger API documentation

  • /api/reference - Scalar API Reference documentation

Example WorkflowCopied!

Here's a typical workflow for using Iris:

  1. Create a new browser automation session:

    POST /api/sessions
    {
      "operatorType": "browser"
    }
  2. Execute browser actions:

    POST /api/sessions/{sessionId}/execute
    {
      "action": "navigate",
      "url": "https://example.com"
    }
  3. Record the session for later replay:

    POST /api/sessions/{sessionId}/record
    {
      "recordingName": "example-workflow"
    }
  4. Execute the recording as an RPA workflow:

    POST /api/rpa/execute
    {
      "recordingId": "example-workflow",
      "actionDelay": 1000
    }

TestingCopied!

# Run unit tests
pnpm run test

# Run e2e tests
pnpm run test:e2e

# Run test coverage
pnpm run test:cov

Security ConsiderationsCopied!

When implementing and using the RPA video processing feature:

  • Validate all uploaded videos for potential security risks

  • Implement size and format restrictions for uploads

  • Ensure sensitive content in videos is handled appropriately

  • Implement access controls for generated RPA steps and recordings

  • Store API keys securely using environment variables

  • Sanitize user-supplied content before processing

TroubleshootingCopied!

  • For browser automation issues, inspect the VNC connection to see what's happening in real-time

  • Check the server logs for detailed error messages

  • Ensure your browser automation actions are compatible with the target website's structure