01
Access content
Pryon Ingestion Engine starts by accessing enterprise data from your trusted content repositories. A host of prebuilt connectors securely access content from sources such as Amazon S3, Google Drive, Zendesk, SharePoint, Box, Confluence, and Documentum. The Ingestion Engine’s universal connector framework enables the rapid development of bespoke connectors for additional unstructured and semi-structured data sources.
Advanced technologies custom-developed by Pryon, such as optical character recognition (OCR) and vision segmentation, enable Pryon Ingestion Engine to understand content stored in various formats, including text, images, audio, and video.
Continuous ingestion keeps ingested content sources updated in near-real time automatically, eliminating the need for admins to manually refresh.
02
Clean and transform data
Once Pryon Ingestion Engine has ingested your organization’s trusted content in its rawest form, it cleans and transforms that data to ensure speed and accurate retrieval.
Key metadata elements (e.g., date modified, webpage URL, and file paths) are captured, and key components of documents (e.g., tables of contents, headers, footers, and page numbers) are identified and labeled. The Ingestion Engine also removes irrelevant text (e.g., non-ASCII characters) and prioritizes more meaningful text (e.g., body paragraphs).
Pryon Ingestion Engine is user-configurable, so you can define rules for how it works.
03
Chunk data
After your data has been cleaned and transformed, Pryon Ingestion Engine breaks it down into smaller chunks. The Ingestion Engine uses a variety of smart chunking approaches to optimize efficiency and accuracy.
Smart chunking enables you to configure how the Ingestion Engine chunks your data based on document structure.
04
Generate content embeddings
Pryon Ingestion Engine generates content embeddings and stores them in a vector database. The Ingestion Engine enables you to select whichever embedding model you prefer, including those created by Hugging Face, Cohere, OpenAI, and more, or create a custom embedding model.
Content embeddings and structured chunked data can be stored in a variety of vector databases, including Weaviate, Milvus, and Pinecone.