GCDLChapter 63 of 101Objective 3.3

AI Use Cases: NLP, Vision, Prediction, Search

This chapter covers the four primary AI use cases on Google Cloud: Natural Language Processing (NLP), Computer Vision, Predictive Analytics, and Enterprise Search. These topics are critical for the Google Cloud Digital Leader (GCDL) exam as they represent the most common AI/ML applications in enterprise environments. Approximately 20-25% of exam questions touch on these use cases, often testing your ability to match business problems to the correct AI service. You will learn the core mechanisms, key services, and common pitfalls for each use case, enabling you to identify the right solution for any scenario.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

The Factory with Four Specialized Workshops

Imagine a large factory that produces custom products. It has four specialized workshops: NLP, Vision, Prediction, and Search. Each workshop has its own machinery and processes, but they all share raw materials and final outputs. The NLP workshop is like a language translation booth: it receives text or speech, breaks it into words, understands grammar and context, and outputs structured meaning. The Vision workshop is like a quality inspection station: cameras capture images, then algorithms detect edges, shapes, and patterns to identify defects or objects. The Prediction workshop is like a forecasting department: it takes historical production data, applies statistical models, and predicts future demand or machine failures. The Search workshop is like a massive library catalog: it indexes all product specifications and customer queries, then retrieves the most relevant documents. The factory manager (the ML platform) coordinates these workshops, ensuring they share data and models. For example, a customer email (NLP) might trigger a product recommendation (Prediction) that includes images (Vision) and related documentation (Search). Each workshop has specialized tools: NLP uses tokenizers and transformers, Vision uses convolutional neural networks, Prediction uses regression or time-series models, and Search uses inverted indices and ranking algorithms. The factory's success depends on how well these workshops integrate and scale. In Google Cloud, these workshops correspond to AI services like Natural Language API, Vision API, AI Platform Prediction, and Enterprise Search. The analogy highlights that while each workshop has a distinct function, they often work together to solve complex business problems, just as in real-world AI deployments.

How It Actually Works

Natural Language Processing (NLP)

Natural Language Processing enables machines to understand, interpret, and generate human language. On Google Cloud, the primary services are the Natural Language API, Translation API, and Dialogflow. The Natural Language API provides pre-trained models for entity extraction, sentiment analysis, syntax analysis, and content classification. Internally, it uses deep learning models like BERT (Bidirectional Encoder Representations from Transformers) that process text in both directions to capture context. The API returns results with confidence scores (0-1) and salience scores for entities. For sentiment analysis, it provides a magnitude and score: score ranges from -1.0 (negative) to 1.0 (positive), and magnitude indicates the overall emotional strength.

Key components include: - Entities: Proper nouns, common nouns, and other named items. Each entity has a type (e.g., PERSON, LOCATION, ORGANIZATION, EVENT). - Sentiment: Analyzed at the document or sentence level. The API uses a neural network trained on large corpora. - Syntax: Tokenization, part-of-speech tagging, and dependency parsing. - Content Classification: Uses a taxonomy of over 700 categories (e.g., "/Arts & Entertainment/Music").

The default language is English, but the API supports many languages. For custom models, you can use AutoML Natural Language to train on your own data. The Translation API supports 100+ languages and uses neural machine translation (NMT) for higher quality. Dialogflow builds conversational agents (chatbots) using intent matching and entity extraction; it supports both text and voice.

Computer Vision

Computer Vision extracts information from images and videos. Google Cloud offers the Vision API, Video Intelligence API, and AutoML Vision. The Vision API provides pre-trained models for label detection, optical character recognition (OCR), face detection, landmark detection, logo detection, and safe search (explicit content detection). Internally, it uses convolutional neural networks (CNNs) that learn hierarchical features from pixels. For example, label detection identifies objects (e.g., "car", "tree") with confidence scores. OCR extracts text from images in over 50 languages. Face detection returns bounding boxes, landmarks (eyes, nose), and attributes like joy, sorrow, anger, and surprise — each with a likelihood (VERY_UNLIKELY to VERY_LIKELY).

Key values and defaults:

Maximum image size for Vision API: 20 MB (after base64 encoding).

Supported image formats: JPEG, PNG, GIF, BMP, WEBP.

Number of results per feature: default 10, max 100 for label detection.

Video Intelligence API processes videos up to 200 GB or 3 hours; it breaks video into shots and analyzes frames at 1 frame per second (configurable).

AutoML Vision allows training custom models with your own labeled images. It supports classification (single-label or multi-label) and object detection. The minimum dataset size is 10 images per label, but 100+ is recommended. The service automatically splits data into training, validation, and test sets.

Predictive Analytics

Predictive Analytics uses historical data to forecast future events. Google Cloud provides AI Platform (now Vertex AI) for custom model training and prediction, and pre-built services like Recommendations AI and Retail API. Vertex AI supports tabular data, time series, and text. For tabular data, it offers AutoML (automated machine learning) and custom training with frameworks like TensorFlow, PyTorch, and scikit-learn. AutoML automatically selects the best model architecture, tunes hyperparameters, and handles feature engineering. It supports regression (predicting numeric values) and classification (binary or multi-class).

Key components: - Training: Data must be in BigQuery, Cloud Storage, or CSV files. AutoML uses a neural network-based architecture search. - Prediction: Deploy models as endpoints with autoscaling. Latency targets: under 100 ms for online predictions. - Evaluation: AutoML provides metrics like RMSE, MAE, AUC, precision, recall, and confusion matrix.

Recommendations AI uses deep learning to recommend products based on user behavior (click, purchase, add-to-cart). It requires an event catalog with historical data (minimum 10,000 events). The model learns user-item interactions and provides ranked recommendations.

Enterprise Search

Enterprise Search on Google Cloud is powered by the Discovery Engine (formerly Enterprise Search on Generative AI App Builder). It allows organizations to index internal documents (e.g., PDFs, websites, databases) and provide natural language search. Under the hood, it uses a combination of traditional information retrieval (inverted index, TF-IDF) and modern neural search (dense embeddings) to understand query intent. The service supports structured and unstructured data, with built-in connectors for Google Drive, Cloud Storage, BigQuery, and third-party sources via API.

Key features: - Search relevance: Uses ranking algorithms that consider term frequency, document length, and semantic similarity. - Natural language queries: Understands synonyms and context (e.g., "budget for Q3" returns relevant financial documents). - Faceted search: Allows filtering by metadata (date, author, department). - Generative AI integration: Can summarize search results or answer questions directly using large language models.

Indexing requires a schema definition. The service supports incremental indexing for near-real-time updates. Quota: up to 1,000 queries per second per project (default).

Interaction Between Use Cases

In practice, these use cases often combine. For example, a customer support chatbot (NLP) might use Vision to analyze uploaded images of a product, then use Predictive Analytics to estimate repair time, and Search to find relevant documentation. Google Cloud's AI services are designed to work together via APIs and integrate with other GCP services like BigQuery, Cloud Storage, and Pub/Sub. Understanding when to use a pre-trained API vs. AutoML vs. custom training is a key exam skill. Pre-trained APIs are best for common tasks (e.g., label detection, translation) with no custom data. AutoML is for domain-specific tasks with moderate data (100s to 1000s of examples). Custom training is for large-scale, highly specialized models.

Walk-Through

Identify Business Problem and Data

Start by defining the business problem and determining which AI use case applies. For example, if the problem is to automatically categorize customer emails, NLP is appropriate. If it's to detect defective products from images, Vision is the choice. Next, assess available data: is it labeled? How much? For pre-trained APIs, no training data is needed. For AutoML, you need at least 10 labeled examples per category. For custom training, you need thousands. Also consider data format: text, images, tabular, or video. This step ensures you select the right service and avoid misapplication.

Choose the Appropriate GCP Service

Match the problem to a specific service. For NLP: use Natural Language API for general analysis, Translation API for language conversion, or Dialogflow for chatbots. For Vision: use Vision API for image analysis, Video Intelligence API for video, or AutoML Vision for custom models. For Prediction: use Vertex AI AutoML for tabular data, Recommendations AI for product recommendations, or BigQuery ML for in-database predictions. For Search: use Discovery Engine (Enterprise Search) for document search. Consider factors like latency, throughput, and whether you need real-time or batch processing. For example, Vision API supports both online (single image) and batch (up to 2000 images per request) processing.

Prepare and Upload Data

Data must be in a supported format and location. For Vision API, images can be uploaded as base64 strings or via Cloud Storage URIs. For Natural Language API, text can be sent directly or via Cloud Storage. For AutoML, data must be in CSV format with labeled examples, stored in Cloud Storage. For Vertex AI, tabular data can be in BigQuery or CSV. Ensure data is clean and representative. For best results, balance classes (e.g., equal number of positive and negative examples). For AutoML, the service automatically splits data into training (80%), validation (10%), and test (10%) sets unless you specify a custom split.

Configure and Run the Service

For pre-trained APIs, send a request with the data and specify the feature(s) you want (e.g., LABEL_DETECTION, TEXT_DETECTION). The API returns a JSON response with entities, scores, and bounding boxes. For AutoML, start a training job by specifying the dataset and objective (e.g., classification, regression). Training time varies: AutoML can take hours to days depending on data size. For Vertex AI custom training, you write a training script and submit a job using the gcloud CLI or SDK. For Discovery Engine, create a search app, define a schema, and index documents. Monitor the job status via Cloud Console or API.

Evaluate and Deploy

After training, evaluate model performance using provided metrics. For AutoML, check precision, recall, AUC, and confusion matrix. If performance is insufficient, improve data quality or quantity, adjust hyperparameters (if custom), or try a different service. For pre-trained APIs, test with sample data to verify accuracy. Once satisfied, deploy the model. For Vertex AI, create an endpoint and deploy the model; you can enable autoscaling (min/max nodes) and set traffic splitting for A/B testing. For Discovery Engine, the search app is automatically available after indexing. Monitor usage and costs: pre-trained APIs charge per request or per entity; AutoML charges per training hour and per prediction node hour.

What This Looks Like on the Job

Scenario 1: Customer Service Automation with NLP and Search

A large e-commerce company receives 50,000 customer emails daily. They want to automatically categorize emails (complaint, return, inquiry) and provide instant answers from their knowledge base. They use the Natural Language API for sentiment analysis and entity extraction to prioritize urgent complaints. They also use Discovery Engine to index their knowledge base (10,000 articles). When a customer asks a question, the system searches for relevant articles and presents a summary. The integration uses Cloud Functions to orchestrate the APIs. Common issue: misclassification due to ambiguous language; they had to train a custom model with AutoML Natural Language to improve accuracy. Performance: each email processed in ~2 seconds; search latency under 500 ms. Cost: ~$0.001 per API call for NLP, plus $0.05 per 1000 queries for Discovery Engine. Misconfiguration: forgetting to set up a Cloud Storage bucket for batch processing led to timeouts; proper setup with Pub/Sub for async processing resolved it.

Scenario 2: Quality Inspection with Computer Vision

A manufacturing plant uses cameras to inspect products on a conveyor belt. They need to detect defects in real time (within 100 ms). They use the Vision API with custom labels trained via AutoML Vision. They collected 5,000 labeled images (defective vs. non-defective). Training took 3 hours. The deployed model runs on a Vertex AI endpoint with autoscaling (min 1, max 10 nodes). Throughput: 200 predictions per second. Challenge: false positives due to lighting variations; they added data augmentation (rotation, brightness) and retrained. Cost: ~$0.50 per hour for training, $0.30 per node hour for prediction. Misconfiguration: using the wrong image format (BMP instead of JPEG) caused errors; the API supports BMP but requires proper encoding.

Scenario 3: Predictive Maintenance with Tabular Data

A logistics company wants to predict vehicle breakdowns. They have sensor data (temperature, pressure, mileage) from 10,000 trucks over 2 years. They use Vertex AI AutoML for tabular data to train a classification model (breakdown vs. no breakdown). Data stored in BigQuery. Training took 6 hours with 50 features. The model achieved 95% precision. Deployed as an endpoint with online predictions; fleet managers get real-time alerts. Cost: ~$20 per training hour, $0.50 per 1000 predictions. Misconfiguration: not handling missing values; AutoML imputes missing values by default but can be configured. Another issue: data leakage from including future information (e.g., breakdown flag from next day) — they had to properly split time-series data chronologically.

How GCDL Actually Tests This

What GCDL Tests

Objective 3.3 focuses on identifying appropriate AI use cases and matching them to Google Cloud services. The exam does not test deep ML theory but expects you to know:

Which service to use for a given problem (e.g., Vision API for image classification, Natural Language API for sentiment analysis, Vertex AI for custom prediction).

The difference between pre-trained APIs (no training needed) and AutoML (requires labeled data).

When to use Recommendations AI vs. custom model.

Basic capabilities of each service (e.g., Vision API can detect faces, labels, text; Natural Language API can extract entities and sentiment).

Common Wrong Answers and Why

Using AutoML when a pre-trained API suffices: Candidates often choose AutoML for common tasks like label detection because they think custom is always better. But the exam expects you to know that pre-trained APIs are faster and cheaper for standard use cases. Wrong answer: "Use AutoML Vision to detect common objects." Correct: "Use Vision API label detection."

Confusing NLP and Vision services: For example, using Vision API to analyze text in an image (that's correct) but using Natural Language API to analyze an image (wrong). The trap: a question about extracting text from a scanned document — candidates might choose Natural Language API because it deals with text, but the correct service is Vision API (OCR).

Misunderstanding data requirements: Candidates think AutoML can work with very few examples (e.g., 5 images). The exam tests that AutoML requires at least 10 per label, and more is better. Also, pre-trained APIs require no training data.

Overlooking Enterprise Search: Many candidates forget about Discovery Engine for document search and instead suggest building a custom search with BigQuery or Cloud Search. The exam expects you to know Discovery Engine is the managed solution for enterprise search.

Specific Numbers and Terms

Natural Language API sentiment score range: -1.0 to 1.0.

Vision API maximum image size: 20 MB.

AutoML minimum dataset size: 10 images per label.

Vertex AI AutoML supports tabular, text, image, and video data.

Recommendations AI requires minimum 10,000 events.

Discovery Engine supports up to 1,000 QPS.

Edge Cases

Video vs. Image: If the problem involves video, use Video Intelligence API, not Vision API. Video Intelligence can analyze shots, detect objects, and transcribe speech.

Batch vs. Online: For real-time predictions (e.g., chatbot), use online prediction. For large offline processing (e.g., analyzing archived emails), use batch prediction.

Language Support: Translation API supports 100+ languages; Natural Language API supports fewer for advanced features like sentiment (e.g., sentiment analysis is available for ~10 languages).

Eliminating Wrong Answers

If the problem requires understanding the meaning of text (e.g., sentiment, entities), it's NLP.

If the problem involves extracting information from images or video, it's Vision.

If the problem is about forecasting or recommendations, it's Predictive Analytics.

If the problem is about searching internal documents, it's Enterprise Search.

Always check if the data is labeled and how much: if unlabeled or little data, use pre-trained API; if labeled and domain-specific, use AutoML; if large custom dataset, use custom training.

Key Takeaways

NLP services: Natural Language API (sentiment, entities, syntax), Translation API, Dialogflow (chatbots).

Vision services: Vision API (labels, OCR, faces), Video Intelligence API (shots, objects, speech), AutoML Vision.

Predictive services: Vertex AI AutoML (tabular, text, image), Recommendations AI (retail), BigQuery ML (in-database).

Enterprise Search: Discovery Engine (index internal documents, natural language queries).

Pre-trained APIs need no training data; AutoML requires at least 10 labeled examples per category.

Vision API max image size: 20 MB; supports JPEG, PNG, GIF, BMP, WEBP.

Natural Language API sentiment score range: -1.0 to 1.0; magnitude indicates emotional intensity.

Recommendations AI requires minimum 10,000 events for training.

Vertex AI AutoML supports tabular, text, image, and video data; automatically splits train/val/test.

Discovery Engine supports up to 1,000 queries per second per project.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Pre-trained APIs (e.g., Vision API, Natural Language API)

No training data required; uses Google's pre-trained models.

Ideal for common, general-purpose tasks (e.g., label detection, sentiment analysis).

Faster to implement; results in seconds via API call.

Lower cost for low-volume use; pay per request or per entity.

Limited customization; cannot adapt to domain-specific nuances.

AutoML (e.g., AutoML Vision, AutoML Natural Language)

Requires labeled training data (minimum 10 examples per label).

Best for domain-specific tasks (e.g., identifying proprietary parts in images).

Training time from hours to days; prediction latency similar to pre-trained.

Higher cost due to training hours and prediction node hours.

Customizable; can achieve higher accuracy on specialized data.

Watch Out for These

Mistake

Natural Language API can analyze images.

Correct

Natural Language API only processes text. For images, use Vision API or Video Intelligence API. The confusion arises because both can extract text, but NLP works on text input, while Vision extracts text from images (OCR).

Mistake

AutoML always produces better results than pre-trained APIs.

Correct

AutoML improves accuracy for domain-specific tasks, but for common tasks (e.g., general object detection, sentiment analysis on standard English), pre-trained APIs are often comparable and faster/cheaper. AutoML requires significant labeled data and training time.

Mistake

Enterprise Search is just a search engine for websites.

Correct

Discovery Engine (Enterprise Search) indexes internal documents (PDFs, Google Drive, databases) and supports natural language queries. It is not for public web search; that's Cloud Search or Google Search.

Mistake

Recommendations AI requires user profiles.

Correct

Recommendations AI works with event data (click, purchase) and does not require explicit user profiles. It uses collaborative filtering and deep learning on anonymized event sequences.

Mistake

Vertex AI AutoML can handle any data type without preprocessing.

Correct

AutoML handles missing values and basic feature engineering, but data must be in a supported format (CSV, BigQuery). For images, they must be in Cloud Storage. For text, it requires UTF-8 encoding. AutoML does not handle streaming data directly.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Natural Language API and Dialogflow?

Natural Language API analyzes text to extract entities, sentiment, and syntax. It is a one-shot analysis tool. Dialogflow builds conversational agents (chatbots) that maintain context across multiple turns. It uses intents and entities to understand user requests and trigger responses. For a simple text analysis, use Natural Language API. For a chatbot, use Dialogflow.

Can Vision API detect text in images?

Yes, Vision API has a feature called TEXT_DETECTION (or DOCUMENT_TEXT_DETECTION for dense text). It extracts text from images using OCR. It supports over 50 languages. The response includes the text string and bounding boxes. For handwriting, DOCUMENT_TEXT_DETECTION is more accurate.

How much data do I need for AutoML Vision?

AutoML Vision requires a minimum of 10 images per label, but 100+ is recommended for good accuracy. The total dataset should be at least a few hundred images. More data generally improves performance. The service automatically splits data into training (80%), validation (10%), and test (10%).

What is the difference between Vertex AI AutoML and BigQuery ML?

Vertex AI AutoML is a full ML platform that supports various data types (tabular, text, image, video) and automates model training and deployment. BigQuery ML allows you to create and execute ML models using SQL queries directly on data in BigQuery. BigQuery ML is simpler for analysts familiar with SQL, but limited to tabular data and fewer model types (e.g., linear regression, deep neural networks).

When should I use Recommendations AI vs. a custom model on Vertex AI?

Use Recommendations AI if you have a retail or media use case with user interaction events (clicks, purchases) and want a managed service that handles model training, tuning, and serving. It is optimized for recommendation tasks. Use Vertex AI custom model if you have non-standard recommendation logic (e.g., content-based filtering, hybrid) or need full control over the model architecture.

Can I use Enterprise Search to search the public web?

No, Enterprise Search (Discovery Engine) is designed to index your own content (documents, websites, databases) within your organization. For public web search, use Google Cloud Search (for G Suite) or the Google Search API (for custom search engines).

What is the cost of using Natural Language API?

Natural Language API pricing is based on the number of text records sent and the features used. For entity extraction and sentiment, the first 5,000 records per month are free (for each feature). After that, it costs $1.00 per 1,000 records for entity extraction and $1.00 per 1,000 records for sentiment. Syntax analysis and content classification have separate pricing. Check the official pricing page for current rates.

Terms Worth Knowing

Artificial intelligence Machine learning Responsible AI

Ready to put this to the test?

You've just covered AI Use Cases: NLP, Vision, Prediction, Search — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Try GCDL practice questions Back to all chapters

Done with this chapter?

Duet AI for Developers and Workspace

App Modernisation Approaches: Refactor, Rehost, Rebuild

See the full GCDL study guide