Using categorical fields in Graft
Overview
It is often advantageous to combine the embedding results of categorical and other text fields to improve model accuracy and introduce data which is helpful when reviewing the results.
COMMON MODALITY ONLY
Categorical data can only be combined with data of the same modality. i.e. Text
Within the same entity categorical fields and text fields can be embedded with independant trunk models and the resulting output concatentated automatically to generate a single embedding space with the combined dimensionality. For example if the categorical field has 12 dimensions and the text model has 768, the total dimensions is 780. Where this 'enhanced' model is used to build an enrichment we typically see improved performance.
MULTI DATA SOURCE CAPABLE
Data can be combined from across any data source within the project and is not limited to a single data source.
Process overview​
Combining data is as simple as selecting the fields to be embedded within the same entity and running the appropriate processing jobs please see Creating an Entity for extended details.
The following sections provide an abbreviated simple example of embedding a categorical field (Category) with a sample text field (Data) from a single data source.
Entity Creation​
Now that the data source has been created and the categorical field defined we can create an entity in which we will ingest and embed the data.
- Click CREATE ENTITY from the Entities tab
- Enter a suitable name for the entity
- OPTIONALLY Add a description so your colleagues will know the purpose of the Entity
- Click NEXT
- Select the data source from the Data Source drop down
- Select all the fields you require for the entity
- Click NEXT
- Select the field representing your primary key
- Click NEXT
We do not need to add existing entities to this entity, skip ahead by…
- Clicking NEXT
Select Fields to be embedded​
- Select the fields to be embedded, in this example we are using the data field which contains product descriptions and category which contains the product category.
Note the Category field only has the option of the Category Encoder the data field has the option of a number of text based trunk models the default being Base
- Clicking FINISH to complete the Entity creation workflow
A summary of the created Entity is displayed
The processing jobs may now be started to ingest and embed the categorical data.