Amazon Machine Learning Key Concepts
Data sources
Term |
Definition |
Attribute |
A unique, named property within an observation. In tabular-formatted data such as spreadsheets or CSV files |
Datasource Name |
A unique name for a dataset |
Input Data |
Collective name for all the observations that are referred to by a datasource. |
Location |
Amazon ML can use data that is stored within Amazon S3 buckets, Amazon Redshift databases, or MySQL databases in Amazon Relational Database Service (RDS) |
Observation |
A single data point that is part of a datasource |
Schema |
The information needed to interpret the input data, including attribute names and their assigned data types, and names of special attributes. |
Statistics |
Summary statistics for each attribute in the input data |
Status |
Indicates the current state of the datasource, such as In Progress, Completed, or Failed. |
Target Attribute |
the target attribute is the attribute whose value will be predicted by a trained ML model |
ML Models
Term |
Definition |
Regression |
ML model to predict a numeric value |
Multiclass |
ML model to predict values that belong to a limited, pre-defined set of permissible values. |
Binary |
ML model to predict values that can only have one of two state |
Model Size |
ML models capture and store patterns. The more patterns a ML model stores, the bigger it will be. ML model size is described in Mbytes. |
Number of Passes |
he number of times that you let Amazon ML use the same data records is called the number of passes. |
Regularization |
Regularization is a machine learning technique that you can use to obtain higher-quality models |
Evaluations
Term |
Definition |
Model Insights |
Amazon ML provides you with a metric to evaluate the predictive performance of your model. |
Precision |
the number of positive class predictions that actually belong to the positive class. |
Recall |
the number of positive class predictions made out of all positive examples in the dataset. |
AUC |
Area Under the ROC Curve (AUC) measures the ability of a binary ML model to predict a higher score for positive examples as compared to negative examples |
Accuracy |
Accuracy measures the percentage of correct predictions. |
F1-score |
The macro-averaged F1-score is used to evaluate the predictive performance of multiclass ML models. |
RMSE |
The Root Mean Square Error (RMSE) is a metric used to evaluate the predictive performance of regression ML models. |
Cut-off |
The cut-off is the threshold that you use to determine whether a predicted value is correct or not. |
Batch Predictions
Term |
Definition |
Output Location |
The results of a batch prediction are stored in an S3 bucket output location. |
Manifest File |
This file relates each input data file with its associated batch prediction results. It is stored in the S3 bucket output location. |
Real-time Predictions
Real-time predictions are for applications with a low latency requirement, such as interactive web, mobile, or desktop applications.
Term |
Definition |
Real-time Prediction API |
The Real-time Prediction API accepts a single input observation in the request payload and returns the prediction in the response. |
Real-time Prediction Endpoint |
To use an ML model with the real-time prediction API, you need to create a real-time prediction endpoint. Once created, the endpoint contains the URL that you can use to request real-time predictions. |
AWS WhitePaper Summary
Top comments (0)