AI Inference Introduction
AI Inference Introduction
AI Inferencing is the process of using a pre-trained AI model to make predictions or decisions based on new input data.
AI inference is the process of generating outputs from a model by providing it inputs. There are numerous types of data inputs and outputs—such as images, text, or video—that are used to produce applications such as a weather forecast or a conversation with a large language model (LLM).
How AI Inference Works?
AI inference comes after the training phase, which is also referred to as induction. During this stage, models are built by applying machine learning algorithms, such as neural networks, to datasets that include labeled examples.
- Training: The model learns patterns from large datasets.
- Inferencing: The trained model makes predictions on new, unseen data.
The model learns to identify and generalize patterns from the data to make accurate predictions. Once trained, the model is evaluated on new, unseen data to verify its effectiveness and performance. If it performs well, the model can then be deployed for inference. Inference involves giving the trained model new, unlabeled data so it can generate predictions or classifications. This process underpins a variety of use cases, including large language models, forecasting systems, and other predictive analytics tools. Fundamentally, inference in neural networks is about transforming input numbers into output numbers. The distinction between different inference types lies in how data is processed before being fed into the model and after results are generated. For instance, in the case of a large language model, textual prompts must be converted into numerical form before input, and the numerical outputs must be translated back into human-readable text.
AI Inference Workflow
+------------------+ +------------------+ | | | | | Training Data | | New Input | | (e.g. Images, | | (e.g. New image | | Text, Labels) | | or sensor data)| +--------+---------+ +--------+---------+ | | v v +------------------+ +------------------+ | | | | | Train AI Model | | Use Trained AI | | (Learning Phase) | | Model | | | | (Inferencing) | +--------+---------+ +--------+---------+ | | v v +------------------+ +------------------+ | | | | | Trained Model | ----------> | Prediction or | | | | Output Result | +------------------+ +------------------+
Real-World Applications of AI Inferencing
Use Case | Inference Example |
---|---|
Healthcare | Diagnosing diseases from X-rays or scans |
Finance | Detecting fraudulent transactions |
Retail | Recommending products to customers |
Automotive | Self-driving car making navigation decisions |
Security | Face recognition in surveillance systems |
AI Inference Tools
Some of the AI inference tools are as follows:
- NVIDIA NIM
- NVIDIA Dynamo
- NVIDIA TensorRT
- NVIDIA DGX Cloud Serverless Inference
In summary, AI Inferencing is a critical part of deploying AI models in real-world applications. It brings intelligence to devices, apps, and systems by enabling them to make smart decisions based on learned data.