$vectorize in collections

$vectorize is a reserved field in documents. It stores a string from which a vector embedding is automatically generated through an embedding provider integration. The resulting embedding is stored in the $vector field. The embedding can later be used for vector search and for the vector search component of hybrid search.

A collection must have a embedding provider integration to support the $vectorize field. For more information, see Create a collection that can automatically generate vector embeddings and Auto-generate embeddings with vectorize.

Insert or update a document’s $vectorize field

When you insert or update a document, you can use the $vectorize field to store a string. The Data API passes this string to the collection’s integrated embedding provider to generate a vector embedding from the string. The resulting embedding is automatically stored in the reserved $vector field, and the original string is stored in the $vectorize field. For an example, see Insert documents and generate vector embeddings.

The $vectorize string must be compliant with the embedding provider’s requirements, such as the token count.

Alternatively, you can use the $vector field to insert or update vector embeddings directly. However, you can’t include both the $vector and $vectorize fields in the same insert or update operation.

When you find, update, replace, or delete documents, you can use the $vectorize field to perform a vector search. For an example, see Use vector search and vectorize to find documents.

Similarly, when you find and rerank documents, you can use the $vectorize field to perform a hybrid search. For an example, see Find documents with a hybrid search.

Return the $vectorize field

By default, the Data API excludes the $vectorize field from returned documents. If you want the Data API to return the $vectorize field, you must use a projection to explicitly include the $vectorize field in the response.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com