Introduction to ML Development:
On a very high-level overview machine learning-based software development look into 3 core components
Data Pipeline- Data acquisition & data preparation
ML Pipeline - ML model training & serving
Code Pipeline -Integrating ML model into the final product.
Machine Learning Engineering
Data Pipeline:
The first step is to acquire and prepare the data to be analyzed. Data preparation is a critical activity in the data science workflow because it is important to avoid the propagation of data errors to the next phase, data analysis, as this would result in the derivation of wrong insights from the data. The design should cover
a. Data Ingestion -Collecting data by using various frameworks and formats. In case of unavailability of data it would need to be synthesized . Consider the data masking based on the data classification
b. Exploration and Validation - Includes data profiling to obtain information about the content and structure of the data. The output of this step is a set of metadata, such as max, min, avg of values. Data validation operations are user-defined error detection functions, which scan the dataset in order to spot some errors.
c. Data Wrangling (Cleaning) - The process of re-formatting particular attributes and correcting errors in data, such as missing values imputation.
d. Data Labeling - The operation of the Data Engineering pipeline, where each data point is assigned to a specific category.
e. Data Splitting - Splitting the data into training, validation, and test datasets to be used during the core machine learning stages to produce the ML model.
ML Pipeline:
Leveraging the data and experimenting with various machine learning algorithm to obtain a ML model is the objective of this workflow. The Model Engineering pipeline includes a number of operations that lead to a final model
a. Model Training - The process of applying the machine learning algorithm on training data to train an ML model. It also includes feature engineering and the hyperparameter tuning for the model training activity.
b. Model Evaluation - Validating the trained model to ensure it meets original codified objectives before serving the ML model in production to the end-user.
c. Model Testing - Performing the final “Model Acceptance Test” by using the hold backtest dataset.
d. Model Packaging - The process of exporting the final ML model into a specific format (e.g. PMML, PFA, or ONNX), which describes the model, in order to be consumed by the business application.
Model Serve / Deploy / Operationalise:
The final stage of the ML workflow is the integration of the previously engineered ML model into existing software. This stage includes the following operations:
Model Serving - The process of addressing the ML model artifact in a production environment.
Model Performance Monitoring - The process of observing the ML model performance based on live and previously unseen data, such as prediction or recommendation. In particular, we are interested in ML-specific signals, such as prediction deviation from previous model performance. These signals might be used as triggers for model re-training.
Model Performance Logging - Every inference request results in the log-record.
Top comments (0)