GCDLChapter 27 of 101Objective 3.3

Generative AI for Business Leaders

Generative AI for business leaders — how it works, its key components, and its application in enterprise settings — is the focus of this chapter. For the Google Cloud Digital Leader (GCDL) exam, this topic falls under Domain 3: Data Analytics and AI, Objective 3.3. Approximately 10-15% of exam questions touch on generative AI concepts, including foundation models, prompt engineering, grounding, and responsible AI. This chapter provides the depth needed to answer those questions correctly.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Generative AI as a Master Chef

What if a master chef had studied thousands of recipes and could create new dishes on demand? The chef's training involves reading every recipe in existence, understanding how ingredients combine, and learning the underlying principles of flavor and texture. When you ask for a 'new pasta dish with a creamy lemon sauce,' the chef doesn't retrieve an existing recipe—they generate one from scratch, predicting which ingredients and steps are most likely to produce a delicious result. The chef's knowledge is not a copy of any single recipe but a probabilistic model of what makes a good dish. Similarly, generative AI models are trained on vast datasets to learn patterns and relationships. They then generate new content—text, images, code—by predicting the next most probable token based on the input prompt and the learned distribution. Just as the chef might occasionally produce a bland dish if the request is vague, generative AI can produce low-quality or hallucinated outputs if prompts are poorly designed. The key is the underlying statistical model, not a simple lookup or retrieval.

How It Actually Works

What is Generative AI?

Generative AI refers to a class of artificial intelligence models that can generate new content—text, images, audio, video, code, and more—based on patterns learned from training data. Unlike traditional AI models that classify or predict a label (e.g., 'is this email spam?'), generative models create novel outputs that mimic the statistical properties of the training data. The most prominent type of generative AI today is the large language model (LLM), such as Google's PaLM 2, Gemini, and GPT-4. These models are built on the Transformer architecture, introduced in the 2017 paper 'Attention Is All You Need' (Vaswani et al.).

How Generative AI Works Internally

At a high level, generative AI models are trained on massive datasets (e.g., trillions of tokens of text, billions of images). During training, the model learns to predict the next token (a word or subword) given the previous tokens. This is called autoregressive language modeling. The model's parameters (weights) are adjusted to minimize the difference between its predictions and the actual next token in the training data. Once trained, the model can generate text by repeatedly sampling from its predicted probability distribution over the vocabulary. Each new token becomes part of the input for the next prediction, creating a coherent sequence.

Key steps in generation: - Tokenization: Input text is split into tokens (e.g., 'Generative' might be split into 'Gen' and 'erative'). Each token is mapped to a unique ID. - Embedding: Each token ID is converted into a high-dimensional vector (e.g., 768 or 4096 dimensions) that captures semantic meaning. - Attention Mechanism: The model computes attention scores between every pair of tokens in the input sequence. This allows the model to weigh the importance of different tokens when predicting the next one. For example, in the sentence 'The cat sat on the mat,' the word 'cat' might have high attention to 'sat' and 'mat'. - Feed-Forward Layers: After attention, the token representations pass through feed-forward neural networks that apply non-linear transformations. - Output Layer: The final hidden state for the last token is projected to a vector of size equal to the vocabulary (e.g., 50,000 tokens). A softmax function converts these scores into probabilities. The model then samples from this distribution to pick the next token.

Key Components and Parameters

- Context Window: The maximum number of tokens the model can consider at once. For example, Gemini 1.5 Pro has a context window of up to 1 million tokens. Larger context windows allow the model to process longer documents or conversations. - Temperature: Controls randomness in generation. Low temperature (e.g., 0.1) makes the model more deterministic, picking the highest-probability token. High temperature (e.g., 1.0) increases randomness, leading to more creative but potentially less coherent outputs. - Top-K and Top-P (Nucleus Sampling): Alternative sampling strategies. Top-K limits the next token to the K most probable tokens. Top-P selects the smallest set of tokens whose cumulative probability exceeds P (e.g., 0.9). These techniques reduce the chance of generating nonsensical tokens. - Max Output Tokens: The maximum number of tokens the model can generate in a single response. This prevents excessively long outputs and controls cost. - Stop Sequences: Specific strings (e.g., ' ') that tell the model to stop generating when encountered.

Foundation Models and Tuning

Google Cloud offers pre-trained foundation models (e.g., PaLM 2, Gemini, Codey, Imagen) that can be used out of the box or customized. Customization options include: - Prompt Engineering: Crafting input prompts to elicit desired behavior without changing model weights. For example, adding 'Explain like I'm 5' changes the output style. - Fine-Tuning: Further training the model on a smaller, task-specific dataset. This adjusts model weights to improve performance on specific tasks. Google Cloud's Vertex AI supports supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). - Grounding: Connecting the model to real-world data sources (e.g., Google Search, enterprise databases) to reduce hallucinations and provide up-to-date information. Vertex AI Agent Builder enables grounding with Google Search or private data. - Retrieval-Augmented Generation (RAG): Combines a retrieval system (e.g., vector database) with a generative model. The model retrieves relevant documents from a knowledge base and uses them as context for generation. This improves accuracy and reduces hallucination.

How Generative AI Interacts with Other Technologies

Vertex AI: Google Cloud's unified ML platform for building, deploying, and managing generative AI models. It provides access to foundation models, tuning tools, and deployment endpoints.

Model Garden: A curated set of foundation models from Google and third parties (e.g., Llama, Claude) accessible via Vertex AI.

Vertex AI Agent Builder: A tool for building conversational AI agents that can ground responses in enterprise data and execute actions.

BigQuery: Can be used to store and query training data or to analyze generated outputs. Integration allows models to access structured data.

Cloud Storage: Used to store large datasets for training or grounding.

Configuration and Verification Commands

Using the gcloud CLI or Python SDK, you can interact with Vertex AI generative models. Example using gcloud:

gcloud ai models list --filter=display_name:PALM

To generate text using the PaLM API via Python:

import vertexai
from vertexai.language_models import TextGenerationModel

vertexai.init(project='my-project', location='us-central1')
model = TextGenerationModel.from_pretrained('text-bison@002')
response = model.predict(
    'What is the capital of France?',
    temperature=0.2,
    max_output_tokens=256,
    top_p=0.8
)
print(response.text)

Responsible AI Considerations

Generative AI models can produce harmful, biased, or hallucinated outputs. Google Cloud provides tools to mitigate these risks: - Safety Attributes: Scores for categories like toxicity, harassment, and hate speech. These can be used to filter outputs. - Bias Detection: Tools to evaluate model outputs for demographic biases. - Human-in-the-Loop: Deploying models with human reviewers to catch errors. - Data Governance: Ensuring training data is properly sourced and anonymized.

Exam-Relevant Numbers and Defaults

Context window sizes: Gemini 1.5 Pro: 1M tokens; PaLM 2: 8k tokens (standard), 32k tokens (extended).

Default temperature: 0.0 for deterministic tasks, 0.8 for creative tasks.

Max output tokens: 8192 for text-bison@002.

Cost: Charged per input and output token. Example: text-bison@002 costs $0.0005 per 1,000 input characters and $0.0005 per 1,000 output characters (pricing may vary).

Latency: Typically 1-5 seconds for short outputs.

Common Pitfalls

Hallucination: The model generates plausible-sounding but incorrect information. Mitigate with grounding and RAG.

Prompt Injection: Malicious prompts that trick the model into ignoring instructions. Use input validation and safety filters.

Context Window Overflow: Input exceeds the context window, causing truncation. Monitor token counts.

Cost Overruns: Long outputs or high-frequency calls can increase costs. Set max tokens and use caching.

Walk-Through

Define Business Use Case

Identify a specific problem generative AI can solve. For example, automating customer support responses, generating marketing copy, or summarizing meeting notes. This step ensures alignment with business goals and avoids vague applications. Consider data availability, privacy requirements, and expected ROI. Document the use case with clear success metrics (e.g., reduce response time by 50%).

Select Foundation Model

Choose a pre-trained model from Vertex AI Model Garden. Options include PaLM 2 for text, Gemini for multimodal, Codey for code, Imagen for images. Consider factors like context window size (e.g., 32k vs 1M tokens), latency, cost, and supported modalities. For example, use Gemini 1.5 Pro if you need to process long documents or videos. Access models via the Vertex AI console or API.

Design and Test Prompts

Craft input prompts that elicit the desired behavior. Use techniques like few-shot prompting (providing examples), chain-of-thought prompting (step-by-step reasoning), and role-playing (e.g., 'You are a helpful assistant'). Test prompts with sample inputs and iterate based on output quality. Use Vertex AI Studio for interactive prompt design. Measure against success metrics.

Ground in Enterprise Data

Connect the model to relevant data sources to improve accuracy and reduce hallucinations. Use Vertex AI Agent Builder to ground with Google Search or private data stores (e.g., BigQuery, Cloud Storage). Implement RAG by storing documents in a vector database (e.g., Cloud SQL with pgvector) and retrieving relevant chunks during generation. This step is critical for enterprise applications where accuracy is paramount.

Deploy and Monitor

Deploy the model as an endpoint in Vertex AI for real-time predictions. Set up monitoring for latency, error rates, and output quality. Use Vertex AI Model Monitoring to detect drift (e.g., changes in input distribution). Implement human-in-the-loop for high-stakes decisions (e.g., medical advice). Log all interactions for audit and improvement. Scale endpoints based on demand using autoscaling.

What This Looks Like on the Job

Enterprise Scenario 1: Customer Support Automation

A large e-commerce company wants to automate responses to common customer inquiries (e.g., order status, return policy). They use Vertex AI with PaLM 2 for text generation. First, they gather historical chat logs and FAQ documents. They fine-tune the model on a dataset of 10,000 question-answer pairs. They also ground the model by connecting it to their order database via a RAG pipeline: when a customer asks 'Where is my order?', the model retrieves the latest tracking info from BigQuery and generates a personalized response. The system handles 80% of inquiries without human intervention, reducing support costs by 60%. Common issues include the model occasionally hallucinating shipping dates when the database query fails—mitigated by adding a fallback message like 'I'm unable to retrieve your order details right now.' Latency is kept under 2 seconds by using a dedicated endpoint with autoscaling.

Enterprise Scenario 2: Marketing Content Generation

A media company uses Imagen (Google's image generation model) to create product images for their online catalog. They need to generate variations of product photos (e.g., different backgrounds, angles) without costly photoshoots. They use prompt engineering to specify style (e.g., 'photorealistic, bright lighting, white background'). They also use ControlNet techniques to ensure the generated image adheres to a product's shape. The system generates 100 images per hour, but quality varies—some images have distorted logos or unrealistic shadows. They implement a human review step where designers approve or reject images before publishing. Cost is $0.05 per image (Imagen pricing), which is cheaper than hiring a photographer.

Scenario 3: Code Generation for Developers

A software company uses Codey (Code Generation API) to accelerate development. Developers describe a function in natural language (e.g., 'Write a Python function to calculate the Fibonacci sequence') and Codey generates the code. They integrate it into their IDE via a plugin. The model reduces boilerplate coding time by 30%. However, the generated code may contain security vulnerabilities (e.g., SQL injection). They use a static analysis tool to scan generated code before merging. The team also fine-tunes Codey on their internal codebase to improve accuracy for company-specific libraries.

How GCDL Actually Tests This

What GCDL Tests on This Topic

Domain 3: Data Analytics and AI, Objective 3.3: 'Explain how generative AI can be used to create new content and insights.' The exam focuses on high-level concepts rather than deep technical details. You should know:

The difference between generative and discriminative AI.

Key Google Cloud services: Vertex AI, Model Garden, Agent Builder.

The purpose of foundation models, prompt engineering, fine-tuning, grounding, and RAG.

Responsible AI considerations (bias, safety, hallucination).

Business use cases: content creation, code generation, customer support, summarization.

Common Wrong Answers and Why Candidates Choose Them

'Generative AI models retrieve information from a database.' This is wrong because generative models generate content from learned patterns, not retrieve from a fixed database. Candidates confuse generative AI with traditional search or retrieval systems. The correct answer emphasizes that models predict the next token based on training data.

'Fine-tuning is the same as prompt engineering.' This is wrong because fine-tuning modifies model weights, while prompt engineering only changes the input. Candidates may think both are ways to 'customize' the model. The exam expects you to know that fine-tuning requires training data and changes the model, whereas prompt engineering does not.

'Grounding is used to increase creativity.' This is wrong because grounding reduces hallucinations by connecting to real data, not increasing creativity. Candidates may associate 'grounding' with 'grounded in reality' but misinterpret its purpose. The correct answer is that grounding improves factual accuracy.

'All generative AI models have the same context window.' This is wrong because context windows vary (e.g., 8k, 32k, 1M). Candidates might assume a standard value. The exam may ask about specific models like Gemini 1.5 Pro with 1M tokens.

Specific Numbers and Terms Verbatim on the Exam

Gemini 1.5 Pro context window: 1 million tokens.

PaLM 2 context window: 8,192 tokens (standard), 32,768 tokens (extended).

Temperature range: 0.0 to 1.0 (or higher). Low temp = deterministic, high temp = creative.

Max output tokens: 8192 for text-bison@002.

Vertex AI Agent Builder: used for grounding and building conversational agents.

RAG: Retrieval-Augmented Generation.

Edge Cases and Exceptions

Zero-shot vs few-shot: The exam may ask about scenarios where no examples are provided (zero-shot) vs few examples. Few-shot often improves performance.

Multimodal models: Gemini can process text, images, audio, and video. The exam may test that Gemini is multimodal, while PaLM 2 is text-only.

Cost considerations: Input tokens are cheaper than output tokens. The exam may ask about cost optimization.

Latency: Larger models have higher latency. The exam may ask about trade-offs between accuracy and speed.

How to Eliminate Wrong Answers

If an answer mentions 'retrieving from a fixed database,' it is likely wrong unless it specifically describes RAG (which combines retrieval with generation).

If an answer says 'fine-tuning does not require training data,' it is wrong.

If an answer says 'generative AI always produces accurate results,' it is wrong due to hallucination.

Look for keywords like 'predicts next token,' 'probabilistic,' 'pattern learning' to identify correct answers about how generative AI works.

Key Takeaways

Generative AI creates new content by predicting the next token based on learned probability distributions.

Foundation models are pre-trained on massive datasets and can be customized via prompt engineering, fine-tuning, grounding, and RAG.

Vertex AI provides unified access to foundation models (PaLM 2, Gemini, Codey, Imagen) and tools for deployment and monitoring.

Grounding connects models to real-world data to reduce hallucinations; RAG combines retrieval with generation.

Responsible AI practices include using safety attributes, bias detection, and human-in-the-loop for high-stakes applications.

Key parameters: temperature (0.0-1.0), top-K, top-P, max output tokens, context window (e.g., Gemini 1.5 Pro: 1M tokens).

Common exam traps: confusing generative with discriminative AI, misstating fine-tuning vs prompt engineering, and underestimating hallucination risks.

Business use cases include customer support automation, content generation, code generation, and summarization.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Prompt Engineering

No model weight changes; only input prompt is modified.

Requires no training data; just craft effective prompts.

Faster and cheaper to implement; results are immediate.

Best for tasks where the base model already performs well.

Does not change model behavior for unseen patterns.

Fine-Tuning

Model weights are updated using a task-specific dataset.

Requires a labeled dataset of hundreds to thousands of examples.

Slower and more expensive due to training costs.

Best for tasks requiring specialized knowledge or style.

Permanently changes model behavior for the fine-tuned task.

Watch Out for These

Mistake

Generative AI models understand and think like humans.

Correct

Generative AI models are statistical pattern matchers; they do not have understanding, consciousness, or intent. They predict the next token based on probabilities learned from training data. The illusion of understanding comes from their ability to generate coherent text.

Mistake

Generative AI always produces factual and accurate outputs.

Correct

Generative AI can hallucinate—generate plausible-sounding but incorrect information. This is because the model's goal is to produce likely text, not verified facts. Grounding and RAG are used to reduce hallucinations, but they do not eliminate them entirely.

Mistake

Fine-tuning is always better than prompt engineering.

Correct

Fine-tuning is not always necessary or beneficial. Prompt engineering is cheaper, faster, and often sufficient for many tasks. Fine-tuning requires a high-quality labeled dataset and can lead to overfitting. The choice depends on the use case.

Mistake

Generative AI models can only generate text.

Correct

Generative AI can generate text, images, audio, video, and code. For example, Imagen generates images, Chirp generates speech, and Codey generates code. Multimodal models like Gemini can handle multiple modalities simultaneously.

Mistake

Large context windows always improve performance.

Correct

Larger context windows allow processing more information but also increase computational cost and latency. Moreover, models may not effectively use all tokens in a long context (the 'lost in the middle' problem). Performance depends on the task.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between generative AI and discriminative AI?

Generative AI models learn the joint probability distribution of the data and can generate new samples (e.g., text, images). Discriminative AI models learn the boundary between classes and predict labels (e.g., spam vs not spam). In simple terms, generative AI creates, discriminative AI classifies. For the exam, remember that generative models produce new content, while discriminative models make predictions.

How does grounding reduce hallucinations in generative AI?

Grounding provides the model with access to external, authoritative data sources (e.g., Google Search, enterprise databases) during generation. The model can retrieve real-time or verified information and incorporate it into its output. This reduces the likelihood of the model fabricating facts. On Google Cloud, Vertex AI Agent Builder enables grounding with your own data or public data.

What is the context window and why does it matter?

The context window is the maximum number of tokens the model can consider when generating a response. A larger context window allows the model to process longer documents, maintain coherence over longer conversations, and incorporate more information. For example, Gemini 1.5 Pro has a 1 million token context window, enabling it to analyze entire books. However, larger windows increase computational cost and latency.

What is the difference between zero-shot, one-shot, and few-shot prompting?

Zero-shot prompting gives the model a task description with no examples. One-shot provides a single example. Few-shot provides several examples. Few-shot generally improves performance by showing the model the desired input-output pattern. For the exam, know that few-shot prompting is a technique within prompt engineering, not a separate customization method.

How does Vertex AI Model Garden help with generative AI?

Model Garden is a curated repository of foundation models from Google and third parties (e.g., Llama, Claude). It allows you to discover, compare, and deploy models from a single interface. You can also fine-tune models and deploy them as endpoints. It simplifies the process of choosing and using the right model for your use case.

What are the responsible AI considerations for generative AI?

Key considerations include: (1) Bias and fairness: models can amplify biases in training data; (2) Safety: outputs may contain harmful content; (3) Transparency: users should know they are interacting with AI; (4) Privacy: training data and prompts may contain sensitive information; (5) Accountability: human oversight is needed for critical decisions. Google Cloud provides tools like safety attributes and bias detection.

Can generative AI models be used for code generation?

Yes. Google Cloud's Codey API provides models specialized for code generation, completion, and chat. It supports languages like Python, Java, Go, and SQL. Codey can generate functions, provide code explanations, and convert code between languages. It is built on PaLM 2 and is available via Vertex AI.

Terms Worth Knowing

BigQuery Cloud computing Cloud IAM Cloud storage Machine learning Region

Ready to put this to the test?

You've just covered Generative AI for Business Leaders — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Try GCDL practice questions Back to all chapters

Done with this chapter?

Cloud Innovation Mindset and Culture Change

Cloud Maturity Model and Readiness Assessment

See the full GCDL study guide