General AI/RAG/LLM Terms
Bounding box and coordinates: To provide attribution to a specific sentence or paragraph within a document, a RAG system needs to be able to assign some kind of value to each content chunk. This is done through bounding box/coordinate values. The RAG system’s computer vision model analyzes the layout of a document and assigns each chunk a coordinate value. These coordinates are stored as metadata within a vector database. When the LLM delivers a response, it retrieves the coordinates of the source chunk, so a link to the source chunk (attribution) can be added to the response.
This concept is also referenced in the realm of object detection, a technique used in computer vision. Object detection models receive an image as input and output coordinates of the bounding boxes and associated labels of the detected objects. An image can contain multiple objects, each with its own bounding box and a label (e.g. it can have a car and a building), and each object can be present in different parts of an image (e.g. the image can have several cars). This task is commonly used in autonomous driving for detecting things like pedestrians, road signs, and traffic lights. Other applications include counting objects in images, image search, and more.
GPUs: GPUs, or graphics processing units, power LLMs and SLMs (small language models). Historically, computers were powered solely by CPUs, but in the last couple of decades, computer engineers realized GPUs were more efficient at certain computing tasks — not just producing graphics, as the name implies, but also tasks related to AI. As their usage has skyrocketed, GPUs — made by companies like NVIDIA — have consumed more and more energy and require more cooling methods, which can have major negative effects on the environment. Pryon’s proprietary set of LLMs and SLMs use less GPU resources than many competing language models, making Pryon RAG Suite an energy-efficient way to deploy generative AI at enterprise scale.
Neural networks: Neural networks are used in machine learning as a way to mimic how neurons in the human brain communicate. Just as neurons "fire" together to enable us to do things like walk, talk, and understand a book, neural networks are a series of nodes that work together to accomplish a certain task, such as summarizing text, doing natural language processing (NLP), and recognizing images.
The reason Pryon Ingestion Engine can understand enterprise content like a human would is its advanced deep learning techniques, such as visual segmentation and OCR (optical character recognition). These deep learning techniques are powered by neural networks.
Orchestration: The process of contributing and managing the interactions between different AI components to work together effectively. This process involves a central platform that manages the deployment, integration, and interaction of AI components such as databases, algorithms, AI models, and neural networks.
REST APIs: APIs make applications extensible, providing the ability for one application to integrate with another. They establish the content required from the consumer (the call) and the content required by the producer/provider (the response). For example, Weather.com’s API might call for a ZIP code and respond with the high and low temperature for that ZIP code.
REST APIs (aka RESTful APIs) are a kind of API that conforms to a specific architectural standard. Without going into a lot of technical detail, REST APIs are faster, more lightweight, and offer increased scalability than other kinds of APIs.
Small language models (SLMs): Like large language models (LLMs), SLMs are what provide the responses that generative AI application users receive. The difference is that unlike LLMs, which are built for general purpose use, SLMs are built for specific purposes. For instance, an LLM might be capable of everything from writing a poem to penning an academic dissertation, but an SLM might be trained specifically to answer questions posed by users (instead of generating original content). Because SLMs aren’t built for as many use cases as LLMs, they’re smaller in size, which is not only beneficial from a performance standpoint but also makes them far more energy-efficient.
Tokens: In natural language processing (NLP), a token is a fundamental building block of text. Tokens are the smallest unit into which a piece of text, such as a sentence or document, can be divided. This process of converting text to tokens is called tokenization.
Tokens are typically words, where each word in a sentence is considered a separate token. For example, in the sentence "I love NLP," there are three words and therefore three tokens: "I," "love," and "NLP." However, tokenization can also involve subword, character, or even sentence-level units, depending on the specific task and the tokenization approach used.
Tokens are the basic units of data processed by LLMs, so the more tokens an LLM can process, the more powerful the LLM. (It’s like cellular organisms; the more cells an organism has, the more complex that organism is.)
Transformers: Transformers are a kind of deep learning architecture designed to process natural language and other data. The transformer-based architecture was first described in a Google whitepaper in 2017 and gave rise to the modern LLM-based AI applications many of us are familiar with, such as ChatGPT. (Unrelated to the toys or Michael Bay films.)