AI Glossary

Table of contents

Most Popular Definitions
‍

RAG: RAG stands for Retrieval-Augmented Generation. RAG is a framework for implementing generative AI using a curated set of trusted content as the basis of the generative output. It's a way for organizations to implement generative AI with confidence, because the large language models (LLMs) that create the output are grounded in trusted content — rather than whatever content a consumer-grade tool, like ChatGPT, might find somewhere on the Internet.

‍Large language models (LLMs): LLMs are what enable an AI application to provide outputs (responses) to users. LLMs are typically trained on a large set of data so they can learn what to "say," similar to how humans are educated through books and other media so we know how to communicate. Many companies have introduced their own LLMs, such as OpenAI (GPT), Google (Gemini), Meta (LLaMa), and Mistral (Mixtral). At Pryon, we have our own set of LLMs that are expertly designed to ingest enterprise content and provide trustworthy outputs for trusted answers.

Machine learning (ML): Think of ML as a subset of AI. One of the reasons artificial intelligence can do certain tasks so well (and so much faster than a human could), from predicting equipment failure or identifying the sales opportunity most likely to close, is because it's capable of progressively learning at a rapid rate. ML is what makes it possible for AI to understand data, learn from patterns in the data, make predictions or classify items into groups, and incrementally improve over time. As machine learning technology improves, so does artificial intelligence.

General AI/RAG/LLM Terms
‍

Bounding box and coordinates: To provide attribution to a specific sentence or paragraph within a document, a RAG system needs to be able to assign some kind of value to each content chunk. This is done through bounding box/coordinate values. The RAG system’s computer vision model analyzes the layout of a document and assigns each chunk a coordinate value. These coordinates are stored as metadata within a vector database. When the LLM delivers a response, it retrieves the coordinates of the source chunk, so a link to the source chunk (attribution) can be added to the response.

This concept is also referenced in the realm of object detection, a technique used in computer vision. Object detection models receive an image as input and output coordinates of the bounding boxes and associated labels of the detected objects. An image can contain multiple objects, each with its own bounding box and a label (e.g. it can have a car and a building), and each object can be present in different parts of an image (e.g. the image can have several cars). This task is commonly used in autonomous driving for detecting things like pedestrians, road signs, and traffic lights. Other applications include counting objects in images, image search, and more.

GPUs: GPUs, or graphics processing units, power LLMs and SLMs (small language models). Historically, computers were powered solely by CPUs, but in the last couple of decades, computer engineers realized GPUs were more efficient at certain computing tasks — not just producing graphics, as the name implies, but also tasks related to AI. As their usage has skyrocketed, GPUs — made by companies like NVIDIA — have consumed more and more energy and require more cooling methods, which can have major negative effects on the environment. Pryon’s proprietary set of LLMs and SLMs use less GPU resources than many competing language models, making Pryon RAG Suite an energy-efficient way to deploy generative AI at enterprise scale.

Neural networks: Neural networks are used in machine learning as a way to mimic how neurons in the human brain communicate. Just as neurons "fire" together to enable us to do things like walk, talk, and understand a book, neural networks are a series of nodes that work together to accomplish a certain task, such as summarizing text, doing natural language processing (NLP), and recognizing images.

The reason Pryon Ingestion Engine can understand enterprise content like a human would is its advanced deep learning techniques, such as visual segmentation and OCR (optical character recognition). These deep learning techniques are powered by neural networks.

Orchestration: The process of contributing and managing the interactions between different AI components to work together effectively. This process involves a central platform that manages the deployment, integration, and interaction of AI components such as databases, algorithms, AI models, and neural networks.

REST APIs: APIs make applications extensible, providing the ability for one application to integrate with another. They establish the content required from the consumer (the call) and the content required by the producer/provider (the response). For example, Weather.com’s API might call for a ZIP code and respond with the high and low temperature for that ZIP code.

REST APIs (aka RESTful APIs) are a kind of API that conforms to a specific architectural standard. Without going into a lot of technical detail, REST APIs are faster, more lightweight, and offer increased scalability than other kinds of APIs.

‍Small language models (SLMs): Like large language models (LLMs), SLMs are what provide the responses that generative AI application users receive. The difference is that unlike LLMs, which are built for general purpose use, SLMs are built for specific purposes. For instance, an LLM might be capable of everything from writing a poem to penning an academic dissertation, but an SLM might be trained specifically to answer questions posed by users (instead of generating original content). Because SLMs aren’t built for as many use cases as LLMs, they’re smaller in size, which is not only beneficial from a performance standpoint but also makes them far more energy-efficient.

Tokens: In natural language processing (NLP), a token is a fundamental building block of text. Tokens are the smallest unit into which a piece of text, such as a sentence or document, can be divided. This process of converting text to tokens is called tokenization.

Tokens are typically words, where each word in a sentence is considered a separate token. For example, in the sentence "I love NLP," there are three words and therefore three tokens: "I," "love," and "NLP." However, tokenization can also involve subword, character, or even sentence-level units, depending on the specific task and the tokenization approach used.

Tokens are the basic units of data processed by LLMs, so the more tokens an LLM can process, the more powerful the LLM. (It’s like cellular organisms; the more cells an organism has, the more complex that organism is.)

Transformers: Transformers are a kind of deep learning architecture designed to process natural language and other data. The transformer-based architecture was first described in a Google whitepaper in 2017 and gave rise to the modern LLM-based AI applications many of us are familiar with, such as ChatGPT. (Unrelated to the toys or Michael Bay films.)

Ingestion-Related Terms
‍

Connectors: An enterprise RAG product like Pryon RAG Suite uses connectors to ingest content directly from different repositories, such as SharePoint, Box, Amazon S3, Confluence, and Google Drive. This enables seamless, no-code content updates.

Embeddings/vectors: LLMs use vectors to represent text or data in a numerical form. Vectors are easy for an LLM to parse and process.

Embeddings are the high-dimensional vectors that LLMs use to capture the semantic meaning of words, sentences, or entire documents.

An ingestion engine divides the content into chunks of text. These chunks are encoded into an embedding vector and stored in a vector database. These embeddings are retrieved by the retrieval engine and used to form generative outputs.

‍

Query-Related Terms
‍

Embedding generation: LLMs use embeddings to capture the semantic meaning of words, such as those in a query. When a question is asked, the LLM generates an embedding and feeds it to the retrieval engine. The retrieval engine matches this embedding with the closest embedding(s) from the vector database to provide the user with the most relevant response.

LLM inference: The process of entering a prompt and generating a response from an LLM. It involves a language model drawing conclusions or making predictions (so, inferring) from a prompt, so it can generate an appropriate output based on the patterns and relationships to which it was exposed during training. The better an LLM is at inference (understanding a user’s prompt), the better.

Named entity recognition (NER) and intent recognition: Extracts key entities and intent from the question. For instance, an employee at a mobile phone retailer might ask a chatbot, “How much does a galaxy cost?” The word “galaxy” can either refer to a type of Samsung phone or a celestial system. A sophisticated query engine should be able to infer the user’s intent (in this case, the employee is asking about phones, not stars), generate the appropriate embedding, and retrieve the best answer related to the user’s query.

Query backoff: Automatically retries failed queries by substituting specific entities with their broader topic families based on configurations, increasing the likelihood of retrieving accurate responses. To retrieve better answers for the question “What do I do if my Dell XPS is overheating?” a query engine might replace “Dell XPS” with “laptop” and retrieve a support article that mentions how to deal with overheating laptops.

Query disambiguation: When a RAG application prompts a user for additional context or clarification when necessary to ensure a more accurate response. For example, if a user's input is simply “Revenue,” the application might respond back, “Would you like to know the company’s revenue for the last quarter, last fiscal year, or another timeframe?”.

Query expansion: Expands acronyms and other organization-specific terms to return more accurate results. For instance, “SME” could stand for “subject matter expert” or “small or medium enterprise” depending on the organization and user; a RAG application should know which one the user intended to ask about and expand it to provide an appropriate response.

Query transformation: Breaks down complex queries into simpler, more manageable questions. Say, for example, you’re using a RAG application designed to answer questions about the Wizarding World franchise. If you ask, “How long was the first book in the series?” the RAG application might break this query down to first identify “the series” (Harry Potter), then the “first book” (Harry Potter and the Sorcerer’s Stone) before asking the retrieval engine for the needed information (the length of Harry Potter and the Sorcerer’s Stone) and providing an answer.

Query type handling: Determines how to handle each question by type. For example, a conversational (“chit-chat”) query, like “What’s for lunch?” or one that’s out of domain (e.g., “What are the nuclear launch codes?” might receive a canned response, while a legitimate query would get routed to the retrieval engine so an answer can be provided using content from the knowledge library.

Retrieval-Related Terms‍

Access control checks: Ensures the user has access to the documents the information is being retrieved from. If a user doesn’t pass an access control check for a certain document, that document and its contents won’t be available to the user. For instance, a low-level employee in the marketing department likely wouldn’t be able to gain access to payroll data.
‍

Generative-Related Terms

Ways to enhance the performance of an LLM
‍

Fine-tuning: Most large language models (LLMs) are versatile, but if you have a specific task in mind, such as answering questions, you might benefit from fine-tuning a model for that purpose.

Fine-tuning is done by changing the parameters of the LLM to train the LLM so it’s better at a particular task.

Fine-tuning can significantly enhance an LLM’s performance for specific tasks, though it may come at the expense of no longer being able to perform other tasks. For instance, an LLM fine-tuned so it performs well at information retrieval might not do as good a job at things like writing advertising copy or brainstorming new product ideas.

At Pryon, our engineering team found dramatic improvements in accuracy and latency from fine-tuning our generative LLMs to focus on question-answer tasks.

Hyperparameter tuning: Hyperparameters are adjustable parameters, such as the quantity of units in a neural network and the learning rate, that determine the rate at which a machine learning algorithm learns. Hyperparameter tuning is the process of optimizing these parameters to maximize the algorithm’s performance. Hyperparameters are set before an LLM is trained, so they must be re-tuned every time the LLM trains on new data.

Parameter-efficient fine-tuning (PEFT): A fine-tuning method that, as the name states, is parameter-efficient — meaning this method minimizes the amount of required memory. PEFT improves the performance of an LLM with minimal computational cost and data.

Prompt engineering: When a data scientist strategically provides an LLM with input queries to guide the LLM in generating accurate and relevant responses.

Essentially, with prompt engineering, the data scientist molds the input for a generative LLM to produce a better output for whatever application they’re trying to build. For example, the LLM of a RAG application focused on helping lawyers may be prompted never to inject humor into a response, lest a joke be misinterpreted.

‍

Additional generative-related terms
‍

Answer snippets: With RAG, responses are delivered in the form of answer snippets, so people don’t have to read through an entire document to find the answer they’re looking for (this is the main selling point of RAG!).

Answer snippets should include source attribution, so users can refer back to the source and gather additional context if desired.

Agentic-Related Terms

AI agent / Agentic system: An AI agent is a system that uses language models to autonomously execute tasks by interacting with tools and data. Unlike simple chatbots or single-function tools, AI agents are complex systems designed to operate across multiple workflows, adapting to the needs of the specific task. They combine several components into a cohesive system, driven by central reasoning capabilities powered by an LLM.

Most Popular Definitions
‍

General AI/RAG/LLM Terms
‍

Our expert team of Solutions Engineers will work closely with you to scope, build, and scale enterprise RAG across your organization.
‍

Request a demo.

More Resources

The Hidden Costs of DIY RAG: How Tech Debt Eats Your ROI

Reasoning Models Hallucinate More — Marking Trouble for AI Agent Adoption

AI Agents are Coming — But Your Data Isn’t Ready

Step-by-Step Guide to Implementing an AI-Powered Federal Service Desk

Most Popular Definitions‍

General AI/RAG/LLM Terms‍

Ingestion-Related Terms‍

See also

Query-Related Terms‍

Retrieval-Related Terms‍

See also

See also

See also

Generative-Related Terms

Ways to enhance the performance of an LLM‍

Additional generative-related terms‍

See also

Security-Related Terms

See also

Agentic-Related Terms

See also

Our expert team of Solutions Engineers will work closely with you to scope, build, and scale enterprise RAG across your organization.‍

Request a demo.

More Resources

The Hidden Costs of DIY RAG: How Tech Debt Eats Your ROI

Reasoning Models Hallucinate More — Marking Trouble for AI Agent Adoption

AI Agents are Coming — But Your Data Isn’t Ready

Step-by-Step Guide to Implementing an AI-Powered Federal Service Desk

Most Popular Definitions
‍

General AI/RAG/LLM Terms
‍

Ingestion-Related Terms
‍

Query-Related Terms
‍

Ways to enhance the performance of an LLM
‍

Additional generative-related terms
‍

See also 

Our expert team of Solutions Engineers will work closely with you to scope, build, and scale enterprise RAG across your organization.
‍