Each dataset entity is structured to support historical data through fields for creation (created_at) and update timestamps (updated_at). This format enables flexible data analysis, as users can filter by updated_at to isolate only the latest version of each entity if needed, or view the entire history of changes over time. Because the same id is retained across both creation and update entries, this structure ensures a comprehensive audit trail, preserving the full lifecycle of each entity within the dataset.
When an attribute supports multiple languages, it will be represented as an array of values, each linked to a specific locale. This array allows you to store and retrieve attribute values in various languages, enabling seamless multilingual integration within the datasets. The structure of such a multilingual attribute array is as follows:
|-- names: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Id: string (nullable = true)
| | |-- Locale: string (nullable = true)
| | |-- Value: string (nullable = true)
| | |-- UpdatedAt: string (nullable = true)Each element within the names array represents a localized version of the attribute, encapsulated in a structured format:
Id: A unique identifier for the localized entry.
Locale: The locale code (e.g., "en", "fr") representing the language and region of the value.
Value: The translated or localized text for the attribute in the specified locale.
UpdatedAt: A timestamp indicating when this localized value was last updated.
This format ensures that attributes are easily accessible in various languages, enhancing the datasetโs usability across different linguistic contexts.