Make solr field/schema resolvers respect the tokenized field of attributes#7003
Make solr field/schema resolvers respect the tokenized field of attributes#7003clockard wants to merge 2 commits into
Conversation
|
chris.lockard seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
jaymcnallie
left a comment
There was a problem hiding this comment.
Trying to figure out why the unit tests lit up red on the PR build. I think you introduced a query side regression by selectively writing the tokenized copy of the attribute data.
Queries also need to respect the tokenized/not-tokenized field behavior when they run. As it stands, fuzzy searches will look in tokenized fields for tokenized=false attributes and never get results because those fields are now empty.
d2c89f0 to
2981850
Compare
a6cdfb8 to
41123f6
Compare
| return; | ||
| } | ||
|
|
||
| if (!descriptor.isTokenized()) { |
There was a problem hiding this comment.
I am not 100% confident that I know what all of these fields are for, but it looks like there are at least two cases where fields were added to the cache that will no longer be hit:
_phoneticsfor untokenized string fields when phonetics is enabled.
a. I don't know enough about this feature to know what this would impact._tptfor untokenized XML fields.
a. This one I am really not sure about because I don't see any references to_tptin our schema.
| if (!isExact) { | ||
| field += tokenized; | ||
| field = getMappedPropertyName(propertyName, AttributeFormat.STRING, false); | ||
| } |
There was a problem hiding this comment.
✏️ We can remove this case and pass isExact as the last argument above.
| @Test | ||
| public void testGetFieldNumericalFallsBackToRequestedSuffix() { | ||
| // When no numerical field exists in cache, should fall back to the requested suffix | ||
| // fieldsCache is empty for a new resolver |
There was a problem hiding this comment.
✏️ It will have a minimum of three items, but yes it will not have anything for these fields.
| SchemaField schema = new SchemaField("testField_int", "tint"); | ||
| schema.setSuffix("_int"); | ||
| when(mockResolver.getSchemaField("testField", true)).thenReturn(schema); | ||
| schema.setSpecialSuffix("_int"); |
There was a problem hiding this comment.
✏️ The special suffix is only used for the like visitor, so it can be removed in these other tests that set it.
What does this PR do?
Updates the solr resolvers to respect the tokenized field of attribute definitions instead of assuming all text fields are tokenized.
Also makes metacard-tags tokenized.
One question I still have is why wildcard id queries don't work in the testing framework. By default the solr schema does have id defined as tokenized but that's not how the attribute is defined. It should work either way but it wasn't in the tests which is why I had to switch them to anyText filters.
Who is reviewing it?
@jaymcnallie
@jrnorth
Any background context you want to provide?
The default ddf solr schema uses primarily dynamic field definitions. These definitions determine how solr stores/handles the data and completely disregards the MetacardType definitions of attributes when it comes to fields like indexed and tokenized. If a downstream project wants to define their fields explicitly for better data handling the resolvers assumption of all string fields being tokenized breaks down and causes issues.
Notes on Review Process
Please see Notes on Review Process for further guidance on requirements for merging and abbreviated reviews.
Review Comment Legend: