Sometimes you may need not just to connect data to your site, but also process it and/or generate pages from it. This is optional but possible.
The configuration file data.config.yaml (or data.config.json) defines how data will be processed before being connected to the site. It also configures page generation from data.
The file is written in yaml or json format. It consists of a list of data operations (array of objects). All actions described in the file are performed sequentially, except for page generation - pages are generated strictly after all other actions are completed.
Each action has mandatory parameters dataset and task. Below is the general scheme, all examples here and below are in yaml format.
- task: <action name>
dataset: <dataset name>
... other parameters may vary depending on the action
Actions can only be performed on tabular data, i.e. data that can be represented as a table. The number of actions is limited only by common sense.
Non-tabular data can be used on pages, inserted into templates, but cannot be processed.
Creates a new column based on the selected one, where each value from the selected column corresponds to a unique string consisting of Latin alphabet letters, numbers and underscores. This function is needed for creating pages - its result can always be used as part of a URL.
For example, if you have a table with actor names and you need to create a page for each of them, you can use slugify to add a column that will later be used as a filename when generating pages:
- task: slugify
dataset: tables.movies
input_col: actor_name
output_col: actor_page_name
Parameters:
input_col - column based on which the new column will be createdoutput_col - name of the new columnCreates a new column based on the selected one, containing the hash of the selected column’s content.
Useful when you need to use values that are too long in a URL. By default, a hash is generated
using the sha-256 algorithm with a length of 43 characters. When the short option is enabled,
a shortened 22-character hash is generated.
- task: hash
dataset: tables.movies
short: true
input_col: very_long_name
output_col: name_hash
input_col — the column from which the new column will be createdoutput_col — the name of the new columnshort — use a shorter hashCreates a new column based on several selected ones. The contents of the selected columns are concatenated and hashed.
This is necessary if you need to generate a more complex page structure. For example, for a list of movies, you might want to create a subsection for each director, and within it, pages for all the actors who worked with them. In this case, you need to combine the values of two columns and generate pages based on the contents of the new column — it will contain all unique director/actor combinations.
By default, a 43-character hash is generated. When the short option is enabled,
a shortened hash is generated.
- task: combine
dataset: tables.movies
short: false
input_cols:
- actor_name
- director_name
output_col: director_actor
input_cols — the list of columns from which the new column will be createdoutput_col — the name of the new columnshort — use a shorter hash
### sort
Sorts the dataset by the specified column.
Parameters:
- `col` - column to sort by
- `as_number` - boolean value - whether to interpret column content as a number. The result may be unexpected (for now) as locale is not taken into account.
- `desc` - boolean value, sort in descending order.
### del_cols
Deletes columns from the table. Useful when you want to save the table for JavaScript, to avoid saving and downloading unnecessary data.
- `cols` - list of column names to delete
```yaml
- task: del_cols
dataset: examples.movies
cols: ["minutes_long", "budget", "rating_metacritic"]
Warning: it's better to prepare data in advance using more suitable tools! The aggregate action allows simple aggregations - sum, count, median. For example, this can be used to add to a table listing actors and movies they starred in, the number of movies each actor appeared in:
- task: aggregate
dataset: table.actors
type: count_u
col: movie_name
group_by: actor_name # Optional parameter.
output_col: filmed_in
type - aggregation type: count, count_u, sum, average, mediancol - column where aggregation will be performedgroup_by - optional, column by which values will be groupedoutput_col - name of the new column where results will be storedThe render action generates pages from tabular data. Render actions are executed after all others.
Two main generation types are defined by the type parameter (see below). Other parameters determine the location and content of generated pages.
path - Full address of the generated pagemeta - Metadata, any fields can be filledmarkdown or html - Page content can be generated either in markdown or directly in html (in the latter case the text won't be processed by markdown parser). Template tags will be processed in any case.Column values from the processed dataset can be inserted into property values using simple substitution syntax:
task: render
dataset: examples.movies
type: row
# Filename will be generated from movie_slug column
path: /examples/movies/[movie_slug].html
meta:
# Page title = name field
title: "[name]"
The simplest: pages are generated for each table row. The generated page's local_data field will contain a single-row table.
Pages are generated for each unique value in the specified column. In the generated page, the local_data field will contain a table subset during generation - all rows where the selected column's value is the same.
This data can be iterated through using template tags, and/or saved for use in client scripts via JavaScript API.