Ingest Content Precisely and at Enterprise Scale

Pryon Ingestion Engine reads content like a human and converts unstructured content into structured data, ready for retrieval.

Prebuilt connectors to commonly used enterprise data repositories

Continuous ingestion of millions of pages

Accurate reading and analysis of multimodal content — text, images, audio, and video

Smart data chunking

How it Works


Access content

Pryon Ingestion Engine starts by accessing enterprise data from your trusted content repositories. A host of prebuilt connectors securely access content from sources such as Amazon S3, Google Drive, Zendesk, SharePoint, Box, Confluence, and Documentum. The Ingestion Engine’s universal connector framework enables the rapid development of bespoke connectors for additional unstructured and semi-structured data sources.

Advanced technologies custom-developed by Pryon, such as optical character recognition (OCR) and vision segmentation, enable Pryon Ingestion Engine to understand content stored in various formats, including text, images, audio, and video.

Continuous ingestion keeps ingested content sources updated in near-real time automatically, eliminating the need for admins to manually refresh.


Clean and transform data

Once Pryon Ingestion Engine has ingested your organization’s trusted content in its rawest form, it cleans and transforms that data to ensure speed and accurate retrieval.

Key metadata elements (e.g., date modified, webpage URL, and file paths) are captured, and key components of documents (e.g., tables of contents, headers, footers, and page numbers) are identified and labeled. The Ingestion Engine also removes irrelevant text (e.g., non-ASCII characters) and prioritizes more meaningful text (e.g., body paragraphs).

Pryon Ingestion Engine is user-configurable, so you can define rules for how it works.


Chunk data

After your data has been cleaned and transformed, Pryon Ingestion Engine breaks it down into smaller chunks. The Ingestion Engine uses a variety of smart chunking approaches to optimize efficiency and accuracy.

Smart chunking enables you to configure how the Ingestion Engine chunks your data based on document structure.


Generate content embeddings

Pryon Ingestion Engine generates content embeddings and stores them in a vector database. The Ingestion Engine enables you to select whichever embedding model you prefer, including those created by Hugging Face, Cohere, OpenAI, and more, or create a custom embedding model.

Content embeddings and structured chunked data can be stored in a variety of vector databases, including Weaviate, Milvus, and Pinecone.


Which content repositories can Pryon Ingestion Engine ingest content from?

Through prebuilt connectors, Pryon Ingestion Engine can ingest content from dozens of enterprise content repositories, including SharePoint, Box, Amazon S3, Confluence, Google Drive, Salesforce knowledge articles, and Documentum. Need more? The universal connector framework enables you to rapidly develop bespoke connectors for both unstructured and structured data sources based on your needs.

What kinds of content can Pryon Ingestion Engine ingest?

Pryon Ingestion Engine can ingest a vast array of multimodal content, including text, images, audio, and video. The Ingestion Engine can ingest millions of pages today and will scale to a billion pages by the end of 2024, all while preserving accuracy.

What makes Pryon Ingestion Engine best in class?

Pryon Ingestion Engine is both deep and broad in its capabilities. It leverages custom-developed technologies such as optical character recognition (OCR) and vision segmentation to ingest a vast array of multimodal content, including text, images, audio, and video, and understand it like a human would. Additionally, Pryon Ingestion Engine offers connection pipelines to dozens of content sources.

Ready to get started?

Request a demo.