Seems with some constraints:
Yes — the same patterns work with imported fields, with a few important constraints
What “imported fields” really are
In Vespa parent/child, an imported field is effectively an attribute value read through a reference (“real-time join”), and it is usable as if it were stored with the child. (docs.vespa.ai)
That means:
- You can use
attribute(imported_field) in ranking (same syntax as local attributes). (docs.vespa.ai)
- You can use imported fields in query operators (filtering/boosting), but remember: only attribute fields can be imported, and attribute fields do not support full text matching. (GitHub)
1) Your tensor-mirror approach works for imported fields
If you store mcat_tree_tensor in the parent document and import it into the child, then:
sum(query(my_param) * attribute(imported_mcat_tree_tensor))
works the same way as with local fields. The Vespa tutorial explicitly shows importing a sparse tensor and referencing it in ranking via attribute(imported_name) and tensor expressions. (docs.vespa.ai)
Caveat: parent docs must be global="true", and that limits how many parent docs you can have, since every node indexes all parent docs. (docs.vespa.ai)
So this is ideal when many children share a relatively small set of parents (the intended parent/child use case), not when every child effectively has its own unique parent.
2) The “no tensor mirror” approach also works: rank() + matches() on imported array attributes
Why you can’t “just do contains in ranking”
Ranking expressions can directly index numeric array attributes (attribute(name, n)), but not string array attributes. (docs.vespa.ai)
So for array<string>, you need match-time/query-time operators to create match signals.
Pattern
Use the YQL rank() operator so the membership check does not affect recall, but does compute match features. Only the first argument determines matching; all arguments contribute rank features. (docs.vespa.ai)
Then in ranking, use:
matches(field) → 1 if that field was matched by the query (docs.vespa.ai)
- (or)
matchCount(field) if you want counts (docs.vespa.ai)
Example (single ID)
where rank(
userQuery(),
imported_mcat_tree contains "191984"
)
Rank profile snippet:
first-phase {
expression: nativeRank + 100 * matches(imported_mcat_tree)
}
matches(imported_mcat_tree) returns 1 if the imported field matched. (docs.vespa.ai)
And because it’s in rank(...) as the 2nd argument, it won’t change which docs are retrieved. (docs.vespa.ai)
Example (set of IDs)
If you can use in (available since Vespa 8.293.15) as your shorthand for OR, you can do:
where rank(
userQuery(),
imported_mcat_tree in ("12345","191984")
)
(docs.vespa.ai)
Then the same matches(imported_mcat_tree) boost applies.
Practical note: in is documented for “an integer or string field” (docs.vespa.ai). If your array<string> has match settings or tokenization expectations that make in awkward for your exact IDs, fallback is contains OR-clauses (or your tensor approach).
3) Performance notes specific to imported fields (this is the main difference)
Searching imported attribute fields (with fast-search) has extra indirection. Vespa’s tuning guide recommends reducing this cost by:
- setting the imported field to
rank: filter
- configuring
visibility-delay > 0 to enable TTL caching for the parent/child indirection (docs.vespa.ai)
Where visibility-delay is configured
services.xml → <content> … <search> <visibility-delay>…
The reference explicitly states it controls TTL caching for parent-child imported fields. (docs.vespa.ai)
Bound the cost with phased ranking when needed
If you’re doing multiple membership checks (4 imported arrays + 1 local), consider pushing the expensive logic into second-phase and using rerank-count to bound work. Vespa’s phased ranking doc explains first-phase runs for all hits retrieved, while second-phase is bounded by rerank-count. (docs.vespa.ai)
4) Constraints/pitfalls to keep in mind
Imported-field limitations
Only attribute fields can be imported; no tokenization/stemming; you can use word match and fast-search, but not text match (index-field style). (GitHub)
For your ID membership checks, that’s usually fine (it’s “database-style” exact-ish matching).
Parent document scaling
Parent docs are global and replicated; this limits how many parents you can practically have. (docs.vespa.ai)
If your “other schema” has very high cardinality (close to your 20M docs), parent/child may become the dominant memory problem regardless of whether you use tensors.
Recommended reading / similar material
- Parent/child + imported tensors used in ranking (official tutorial) (docs.vespa.ai)
- YQL
rank() + in operator semantics (docs.vespa.ai)
- Rank features you’ll use (
matches, matchCount, attributeMatch) (docs.vespa.ai)
- Feature tuning notes for imported-field search (
rank:filter, visibility-delay) (docs.vespa.ai)
visibility-delay reference (TTL cache for imported fields) (docs.vespa.ai)
- GitHub issue documenting imported-field match limitations & fast-search notes (GitHub)