Generative AI Maturity Model

Generative AI has captured the world's attention. Every week there's a new product that makes last week seem like ancient history.

This fast pace makes it hard to know how businesses are adapting and making use of this new technology. How do organisations tailor their AI usage and development to their evolving needs?

From curious consumers to the realm of automated pipelines, these are the levels most businesses will go through as they mature their use of generative AI.

  • Level 0: Solution Consumers
  • Level 1: API Endpoint Consumers
  • Level 2: Retrieval Augmented Generation
  • Level 3: Private LLM
  • Level 4: Fine tuning
  • Level 5: Manual Training
  • Level 6: Continuous Training
  • Level 7: Automated Pipeline

Immature AI

A Guide to Enterprise Generative AI Maturity

Level 0: Solution Consumers

Consumers have no AI expertise. They simply want to take advantage of cool AI tools that are sold and supported by other businesses.

Level 1: API Endpoint Consumers

Endpoint consumers see the awesome potential of Large Language Models (LLMs) like ChatGPT. Software developers at this level want to call an API to make use of these services to enhance their existing products or create new ones.

Example: A simple website that helps people plan a holiday itinerary.

Level 2: Retrieval Augmented Generation (RAG)

Many organisations will quickly become frustrated by the lack of domain specific knowledge provided by pre-trained foundation models. For their specific niche, these models are completely unusable. An infinite amount of prompt engineering won't help. They need the model to be aware of their specific business context.

This can be done with Retrieval Augmented Generation (RAG) which is just a fancy way of saying include some extra data in your prompt to the LLM so that it can answer more specific questions.

There's a bunch of complexity here that's out of scope for this blog post. Send me a message if you'd like me to write a follow up on RAG Apps.

Level 3: Private LLM

At either level 1 or level 2 organisations start to become concerned with data sensitivity, privacy and access. Unfortunately, using publicly available LLM services with your private data is too risky. Organisations at this level need an easy button for standing up private GenAI models with a consistent API. This API often mimics the OpenAI API so that existing applications need only change their base_url to point to a new model.

These organisations have 3 options:

Host their own open source LLM
This is quite an involved task. It involves looking at the Hugging Face leaderboard and deciding which model to run. Deploying it either on-premise or in your private VPC.

Hosted Commercial Models
Making use of other independent providers to host a model in a private datacenter or VPC. Some examples include Cohere and Azure hosted OpenAI.

Private API Endpoint
To reduce the complexity imposed by the above challenges some systems integrators and businesses will offer solutions to run a hosted open source LLMs in your data center or cloud.

Level 4: Fine tuning

At level 4, your organisation has reached the limits of what RAG can offer and you are looking to optimise for both cost and performance. For some applications you don't need the biggest model available. A fine-tuned smaller model can outperform larger models like GPT4 for specific tasks.

Source: Anyscale

Using a smaller fine-tuned model is both cheaper to operate and can provide better performance. You also might want to fine-tune a model to support further data sensitivity and compliance.

E.g. Making sure the model adheres to content guidelines.

Fine-tuning a model is a far more complex than building a RAG application. A data scientist will be required to collect a relevant dataset, choose a foundation model to fine-tune, do the fine-tuning and then evaluate the model's performance.

Level 5: Manual Training

At this level, organisations are investing in their own AI teams and have data scientists on the payroll. Many of these teams can produce fantastic ML models but their process for experimentation, building and deploying these models is manual. They have no Machine Learning Operations (MLOps) engineers to automate this process.

The process looks like this:

  • Data extraction: Select and integrate the relevant data from various data sources.
  • Data preparation: This involves data cleaning, where you split the data into training, validation, and test sets.
  • Model exploration: The data scientist implements different algorithms with prepared data to determine which approach should be taken to the training phase and whether further infrastructure is required to meet training needs.
  • Model training: The data scientist will scale their training jobs with accelerators if needed.
  • Model evaluation: The model is evaluated on a test set to evaluate its accuracy.
  • Model serving: The model is deployed for inferencing.
  • Model monitoring: Model accuracy is monitored to determine if it needs to be retrained.

Level 6: Continuous Training (CT)

Once an organisation starts producing more and more models the operations to support each step of the process can become cumbersome.

To resolve these issues, organisations will attempt to automate each step I mentioned in Level 5.

An automated trigger can kick off the retraining process based on:

  • A set cadence (e.g. monthly)
  • Availability of new data
  • Performance degradation of an existing model

Continuous training can help data scientists rapidly experiment and iterate during training phases by automating each step. Many proprietary solutions exist to help build a CT pipeline as well as open source projects like KubeFlow.

Level 7: Automated Pipeline

Extending beyond continuous training, adding Continuous Integration and Continuous Delivery (CI/CD) allows organisations to rapidly bring models into production. It adds the following components:

  • Source control
  • Test/build services
  • Deployment services
  • Model registry
  • Feature store
  • Metadata store
  • Pipeline orchestrator

The full workflow looks something like this:

  1. Development and experimentation: Iteratively try out new ML algorithms and new modelling where the experiment steps are orchestrated.
  2. Pipeline continuous integration: Build source code and run various tests to produce pipeline components (packages, executables, and artefacts).
  3. Pipeline continuous delivery: Deploy the artefacts produced by the CI stage to your environment to produce a pipeline.
  4. Automated triggering: The pipeline is automatically executed in production based on a schedule or in response to a trigger. The output of this stage is a trained model that is pushed to the model registry.
  5. Model continuous delivery: Deploy the trained model for inferencing.
  6. Monitoring: Collect statistics on the model performance based on live data. This data may be used as a trigger to execute the pipeline again.

What did you think?

I'll update this reference as I get feedback and learn more. Send me a message with feedback so I can 'fine-tune' my model!