User defined description of the model.
Defines the combination of features used. The options are: Simple: Calculates a single cosine-distance similarity score for each of the fields defined in keysFromTo. This is the fastest option. Bigram: Adds similarity score based on the sequence of the terms. Frequency-Weighted-Bigram: Calculates a similarity score based on the sequence of the terms, giving higher weights to less commonly occurring tokens. Bigram-Extra-Tokenizers: Similar to bigram, but able to learn that leading zeros and spaces should be ignored in matching. Bigram-Combo: Calculates all of the above options, relying on the model to determine the appropriate features to use. This is the slowest option.
If True, replaces missing fields in sources
or targets
entities, for fields set in set in matchFields
, with empty strings. Else, returns an error if there are missing data.
List of pairs of fields from the target and source items used to calculate features. All source and target items should have all the source and target fields specified here.
User defined name of the model.
List of custom source object to match from, for example, time series. String key -> value. Only string values are considered in the matching. Optional id and/or externalId fields.
List of custom target object to match to, for example, assets. String key -> value. Only string values are considered in the matching. Optional id and/or externalId fields.
List of objects of pairs of fromId or fromExternalId and toId or toExternalId, that corresponds to entities in matchFrom and matchTo respectively, that indicates a confirmed match used to train the model. If omitted, an unsupervised model is used.
The classifier used in the model. Only relevant if there are trueMatches/labeled data and a supervised model is fitted.