GCDLChapter 90 of 101Objective 4.1

Enterprise Search with Google Cloud

Vertex AI Search is Google Cloud's platform for building powerful, AI-powered search experiences across enterprise data. For the GCDL exam, this topic appears in roughly 5-7% of questions under Domain 4: Apps. Understanding how to index structured and unstructured data, leverage natural language processing, and integrate with existing systems is crucial for architects designing intelligent search solutions. We'll explore the architecture, key components, configuration steps, and common pitfalls tested on the exam.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Enterprise Search as Library Card Catalog

Enterprise search is like a massive library with millions of books (documents, emails, databases) but no central catalog. Without a catalog, finding a specific book requires wandering through every aisle—impossible at scale. The library installs a card catalog system: each book is indexed by title, author, subject, and keywords. When a patron searches for 'cloud security,' the catalog instantly returns all matching books and their shelf locations. Behind the scenes, the catalog is updated nightly as new books arrive, and it uses a thesaurus to handle synonyms ('security' vs 'safety'). In Google Cloud, Vertex AI Search acts as that catalog: it ingests data from various sources (Cloud Storage, BigQuery, websites), builds an index, and provides a unified search endpoint. The index uses inverted indexes (like catalog cards pointing to books) and ranking algorithms (like relevance scores) to return the most relevant results. Just as a library catalog doesn't store the books themselves—only metadata and location—enterprise search indexes metadata and content snippets, pointing to the original data source. This decoupling allows searching across siloed repositories without moving data.

How It Actually Works

What is Enterprise Search and Why Does It Matter?

Enterprise search is the practice of making an organization's internal data—documents, emails, databases, intranets, wikis—searchable from a single interface. Unlike web search (which crawls public pages), enterprise search must handle authentication, diverse data formats, and varying access controls. Google Cloud's solution is Vertex AI Search (formerly Enterprise Search on Google Cloud), a managed service that uses Google's search technology (the same that powers Google.com) to index enterprise data and provide fast, relevant results.

How Vertex AI Search Works Internally

Vertex AI Search consists of three core components: data stores, search apps, and indexing pipelines.

Data Store: A container for indexed data. You create a data store and connect it to data sources: Cloud Storage, BigQuery, websites (via web crawling), or structured JSON data. Each data store has a schema that defines fields (e.g., title, body, author, date).

Search App: The front-end interface that users interact with. It connects to one or more data stores and provides a search API or a widget for embedding in a website. Search apps support features like autocomplete, spell correction, synonyms, and boosting.

Indexing Pipeline: When you connect a data source, Vertex AI Search automatically crawls and indexes the content. For Cloud Storage, it processes documents (PDF, DOCX, HTML, TXT). For BigQuery, it reads rows. Indexing extracts text, metadata, and creates an inverted index. The index is stored in a highly available, distributed system.

Key Components, Values, and Defaults

Document Size Limit: Each document (e.g., a PDF) can be up to 10 MB. Larger documents are truncated.

Indexing Frequency: For Cloud Storage, new or updated files are detected within minutes. For BigQuery, you can schedule periodic syncs (every 30 minutes minimum).

Search Relevance Ranking: Uses a combination of term frequency-inverse document frequency (TF-IDF) and neural matching (using BERT-based models) to rank results. You can adjust ranking with boosting rules (e.g., boost results from a specific department).

Synonyms: You can define custom synonyms to improve recall. For example, map "automobile" to "car". Up to 10,000 synonym pairs per data store.

Autocomplete: Suggests queries as the user types. Based on popular queries and indexed content. Can be customized with a deny list.

Access Control: Can integrate with Cloud IAM or Cloud Identity for document-level permissions, ensuring users only see results they are authorized to view.

Configuration and Verification

To create a search app:

In the Google Cloud Console, navigate to Vertex AI > Search & Conversation.

Create a data store and specify the data source (e.g., Cloud Storage bucket).

Wait for indexing to complete (can take minutes to hours depending on data volume).

Create a search app and attach the data store.

Test the search using the built-in preview widget or the API.

Example API call to search:

curl -X POST -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/dataStores/DATA_STORE_ID/servingConfigs/default_search:search \
-d '{
  "query": "cloud security",
  "pageSize": 10
}'

You can also use the Google Cloud console to view indexing status: under the data store, the Documents tab shows the count of indexed documents and any errors.

Interaction with Related Technologies

Vertex AI Search vs Cloud Search: Cloud Search is an older product for G Suite content. Vertex AI Search is the recommended service for custom enterprise data.

BigQuery: Vertex AI Search can index BigQuery tables, making database content searchable. This is useful for product catalogs or knowledge bases stored in relational format.

Cloud Storage: Commonly used to store PDFs, Word docs, and images. Vertex AI Search extracts text from these documents using OCR for images.

Dialogflow: You can integrate Vertex AI Search with Dialogflow to power conversational search bots.

Performance and Scaling

Vertex AI Search automatically scales to handle millions of documents. The service is regional (choose a region like us-central1) but can serve global queries. Latency is typically under 200 ms for simple queries. For high-throughput applications, you can provision additional serving capacity (purchased through reserved capacity units).

Walk-Through

Create a Data Store

Navigate to Vertex AI > Search & Conversation in the Google Cloud Console. Click 'Create data store'. Choose the data source type: Cloud Storage, BigQuery, website, or structured JSON. Provide a name (e.g., 'company-knowledge-base') and select a region. For Cloud Storage, specify the bucket path (e.g., gs://my-docs/). The data store schema is auto-detected for common file types. After creation, the system begins an initial indexing job. You can monitor progress in the 'Documents' tab.

Configure Indexing Schedule

For dynamic data, set up periodic indexing. For Cloud Storage, you can enable 'Cloud Storage notifications' to trigger re-indexing on file changes. For BigQuery, set a sync schedule (minimum 30 minutes). Go to the data store 'Sources' tab and edit the schedule. You can also manually trigger a re-index. Note: Each re-index consumes resources; frequent updates may incur higher costs.

Create a Search App

In the same console, click 'Create app'. Choose 'Search' as the app type. Provide a name and description. Select the data store(s) you created earlier. You can attach multiple data stores to one app. Configure search features: enable autocomplete, spell correction, and synonyms. Set a default serving configuration (e.g., 'default_search'). The app generates a unique ID used in API calls.

Define Synonyms and Boosting Rules

To improve search relevance, go to the Search App's 'Synonyms' tab. Add pairs of equivalent terms (e.g., 'laptop' and 'notebook'). Each pair is bidirectional. For boosting, go to 'Boosting' tab. Create rules to increase or decrease ranking based on field values. For example, boost results where 'department' equals 'Engineering' by 2x. Rules can be applied to specific queries or globally.

Test and Deploy Search Widget

Use the 'Preview' tab in the console to test queries. Review results for relevance and access control. Once satisfied, deploy the search widget: copy the provided HTML snippet or use the REST API. The widget is a JavaScript embed that renders a search box and results. For custom UI, use the API directly. Monitor usage via Cloud Logging and set up alerts for error rates.

What This Looks Like on the Job

Scenario 1: Internal Knowledge Base for a Large Enterprise

A multinational corporation with 50,000 employees maintains thousands of internal documents across multiple departments. Previously, employees had to search SharePoint, Confluence, and file shares separately. The company deployed Vertex AI Search with a data store connected to Cloud Storage containing PDFs and Word docs. They also indexed their BigQuery table of IT support tickets. The search app was embedded in the company intranet. They configured synonyms for technical jargon (e.g., 'VM' and 'virtual machine') and boosted results from the IT department for queries containing 'password' or 'login'. The solution reduced average search time from 3 minutes to 15 seconds. However, they initially forgot to set up access control, causing users to see documents they shouldn't—a critical misconfiguration. They later integrated with Cloud IAM to enforce document-level permissions.

Scenario 2: E-commerce Product Search

An online retailer with 2 million products wanted to improve site search. They used Vertex AI Search to index their product catalog stored in BigQuery (fields: product_name, description, price, category). They configured autocomplete with a deny list to exclude inappropriate terms. They also used boosting to prioritize in-stock items and items on sale. The search API was called from their web front-end. During a flash sale, the system handled 10,000 queries per second with 99.9% uptime. The main challenge was handling product variants (e.g., different sizes) — they had to ensure each variant was a separate document with distinct SKUs. A common mistake was not normalizing text (e.g., 'iPhone 12' vs 'iphone12'), which they solved by adding synonyms and using the built-in spell correction.

Scenario 3: Legal Document Discovery

A law firm needed to search through millions of legal documents for e-discovery. They stored documents in Cloud Storage and used Vertex AI Search with custom metadata fields (case number, date, author). They used boosting to prioritize documents with higher relevance scores from previous searches. However, they hit the 10 MB document size limit for some scanned PDFs. They pre-processed these files by splitting them into smaller parts. They also had to handle OCR for scanned documents—Vertex AI Search automatically extracts text from images, but accuracy varied. They supplemented with manual tagging. The biggest issue was latency: complex queries with many filters took over 2 seconds. They optimized by reducing the number of fields in the schema and using pre-filtering.

How GCDL Actually Tests This

What the GCDL Exam Tests on Enterprise Search

The GCDL exam under Domain 4.1 focuses on understanding the capabilities and use cases of Google Cloud's enterprise search offerings, specifically Vertex AI Search (formerly Enterprise Search). You are expected to know:

The difference between Vertex AI Search and Cloud Search.

How to connect data sources (Cloud Storage, BigQuery, websites).

Basic concepts: data store, search app, indexing, synonyms, boosting.

Access control integration with Cloud IAM.

Pricing model (pay per query, plus indexing costs).

Common Wrong Answers and Why Candidates Choose Them

Confusing Vertex AI Search with Cloud Search: Many candidates think Cloud Search is the correct answer for custom enterprise data. Cloud Search is only for Google Workspace content (Gmail, Drive). Vertex AI Search is for all other data. The exam might ask: 'Which service should you use to index documents in Cloud Storage?' The wrong answer is Cloud Search.

Believing Vertex AI Search stores the original documents: It does not—it indexes metadata and snippets. The original data stays in the source. A common distractor is 'Vertex AI Search replicates your data to a managed storage.'

Thinking you need to train custom ML models: Vertex AI Search uses built-in models (BERT) for ranking. You do not need to train models. The exam might offer an option like 'Train a custom model using Vertex AI Training.' That is unnecessary.

Assuming all data sources support real-time indexing: Only Cloud Storage with notifications supports near-real-time. BigQuery has a minimum 30-minute sync interval. Web crawling is periodic (default every 7 days).

Specific Numbers and Terms to Memorize

10 MB maximum document size.

10,000 maximum synonym pairs per data store.

30 minutes minimum BigQuery sync interval.

7 days default web crawl frequency.

TF-IDF and neural matching used for ranking.

Data store and search app are the two main resources.

Edge Cases and Exceptions

Access control: If not configured, all indexed documents are visible to all users who can access the search app. The exam expects you to know that Cloud IAM integration is required for document-level permissions.

Multiple data stores: A single search app can query multiple data stores, but results are merged. There is no cross-data-store deduplication.

Unstructured vs structured data: Vertex AI Search handles both, but structured data (BigQuery) allows filtering on fields.

How to Eliminate Wrong Answers

When you see a question about enterprise search:

If the question mentions G Suite/Workspace, answer Cloud Search.

If the question mentions custom data (Cloud Storage, BigQuery, websites), answer Vertex AI Search.

If the answer suggests moving data to a new storage system, it's likely wrong—Vertex AI Search indexes in place.

If the answer mentions training models, it's wrong.

Look for keywords like 'index,' 'synonym,' 'boost'—these are Vertex AI Search features.

Key Takeaways

Vertex AI Search is the recommended service for custom enterprise search on Google Cloud.

Data sources: Cloud Storage, BigQuery, websites, and structured JSON.

Maximum document size is 10 MB; larger documents are truncated.

Synonyms can be defined up to 10,000 pairs per data store.

BigQuery sync interval minimum is 30 minutes; Cloud Storage with notifications is near-real-time.

Access control requires Cloud IAM integration; not automatic.

Ranking uses TF-IDF and neural matching (BERT). Boosting rules can adjust relevance.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Vertex AI Search

Indexes data from Cloud Storage, BigQuery, websites, and structured JSON.

Supports custom synonyms, boosting rules, and autocomplete.

Integrates with Cloud IAM for document-level access control.

Uses BERT-based neural matching for ranking.

Priced per query and per GB of indexed data.

Cloud Search

Indexes only Google Workspace content (Gmail, Drive, Calendar, etc.).

Limited to Google's built-in ranking and no custom boosting.

Access control is inherited from Google Workspace sharing settings.

Uses Google's web search technology but not customizable.

Included with Google Workspace subscriptions; no separate per-query cost.

Watch Out for These

Mistake

Vertex AI Search requires you to move your data into a new Google Cloud storage service.

Correct

Vertex AI Search indexes data in place from Cloud Storage, BigQuery, or websites. It does not copy or move the original documents; it only stores metadata and an inverted index.

Mistake

Cloud Search and Vertex AI Search are interchangeable.

Correct

Cloud Search is limited to Google Workspace content (Gmail, Drive, Calendar). Vertex AI Search is for any enterprise data stored in Cloud Storage, BigQuery, or public websites.

Mistake

You need to train a custom machine learning model to improve search relevance.

Correct

Vertex AI Search uses Google's pre-trained BERT models for neural matching. You can adjust relevance using boosting rules and synonyms without any ML training.

Mistake

Vertex AI Search supports real-time indexing for all data sources.

Correct

Only Cloud Storage with bucket notifications provides near-real-time indexing (within minutes). BigQuery syncs at a minimum interval of 30 minutes, and web crawling is periodic (default every 7 days).

Mistake

Vertex AI Search automatically enforces document-level access control.

Correct

Access control must be explicitly configured using Cloud IAM. Without it, all indexed documents are visible to any user who can query the search app.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can Vertex AI Search index data from an on-premises file server?

Not directly. Vertex AI Search only supports data sources in Google Cloud: Cloud Storage, BigQuery, and public websites. To index on-premises data, you must first migrate it to Cloud Storage or BigQuery using Transfer Service or Storage Transfer Service. Alternatively, you can use a third-party connector.

Does Vertex AI Search support filtering by metadata fields?

Yes, for structured data sources like BigQuery. When you define the schema, you can mark fields as filterable. In the search API, you can include a 'filter' parameter (e.g., filter: 'category = "electronics"'). For unstructured data (Cloud Storage), filtering is limited to document-level metadata like file name and date.

How does Vertex AI Search handle multiple languages?

Vertex AI Search uses Google's multilingual models to automatically detect language and provide relevant results. It supports over 100 languages. You can also configure language-specific synonyms. There is no need to create separate data stores per language.

What is the difference between a data store and a search app?

A data store holds the indexed content from a specific source. A search app is the interface that queries one or more data stores. You can attach multiple data stores to a single search app, allowing you to search across different datasets (e.g., documents and database records) from one endpoint.

Can I use Vertex AI Search to search images or videos?

Vertex AI Search can extract text from images using OCR, but it does not perform visual search. For video, it can index metadata (title, description) but not the video content itself. For advanced image search, consider using Vertex AI Vision or Cloud Vision API.

How does pricing work for Vertex AI Search?

Pricing is based on two components: indexing and serving. Indexing costs per GB of data processed (approx $2/GB). Serving costs per query (approx $0.50 per 1000 queries). There are also charges for autocomplete and spell correction. See the official pricing page for current rates.

What happens if my document exceeds the 10 MB limit?

The document is truncated to the first 10 MB. Only the text within that limit is indexed. To index the full document, you must split it into smaller parts and upload them as separate documents. The exam may test this limit.

Terms Worth Knowing

BigQuery Cloud computing Cloud IAM Cloud storage Machine learning Region

Ready to put this to the test?

You've just covered Enterprise Search with Google Cloud — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Try GCDL practice questions Back to all chapters

Done with this chapter?

RAG and Grounding Generative AI Responses

Vertex AI Platform Deep Dive

See the full GCDL study guide