You are viewing a single comment's thread from:

RE: LeoThread 2025-11-05 15-48

in LeoFinance21 days ago

Part 8/15:

GPT-4's modalities remain speculative, but some clues point toward multi-modal capabilities:

  • OpenAI has developed projects like DALL·E (images) and Whisper (audio transcription).

  • These suggest a growing interest in multimodal models that integrate text, images, and speech.

However:

  • Currently, GPT models are primarily text-based.

  • Full multimodal integration, where models seamlessly process and generate across different media, might require architecture redesigns.

  • Some research hints at models that treat all data as raw bits and bytes, bypassing tokenization altogether—potentially allowing for unified handling of multiple modalities.

The Window Size: Context Length and Memory Limits