Done
Nesting and Accessing Data in d3v4
When creating data visualizations using d3, I often needed to restructure my data by grouping it based on certain variables. In d3v4
, this process is completed using nest
and rollup
functions. This tutorial details how to use both.
Introduction
While learning how to make interactive data visualizations using d3.js , I ran into an issue with something new to me: nests. The general idea is that data sometimes needs to be grouped based on certain variables and the groups need to be analyzed or graphed separately. Seems like a simple enough concept, but in practice, well…I got a little lost in the weeds.
I’m writing this post as a resource for how to nest and access nested d3 data both for myself and for anyone else who could benefit from my exploration of this topic.
This entire tutorial is based on functionality from version 4 of d3.js. This functionality changed completely in v6. This interactive tutorial details the updates to this system, now referred to as groups
instead of nest
.
Ok, let’s start at the beginning!
Before Nesting
For all the versions of nesting in this post, I’m going to be working with the same dataset. The first version looks like this:
month | sales |
---|---|
Jan | 80 |
Feb | 0 |
Mar | 71 |
Apr | 51 |
May | 3 |
Jun | 11 |
These data are just random numbers, but for our purposes, we’ll say that they are monthly sales of strawberries. And although there are a few ways we could represent this totally fake data, for demonstration purposes, we’ll use a line chart.
The basics for creating a line chart in d3 are outside the scope of this post, but if you need more background, this is a good place to start.
Here’s what the final version looks like!
Now, if we use console.log()
to look at the structure of the data, it looks like this:
console
[
{
"Month": "2015-01-01T08:00:00.000Z",
"Sales": 87
},
{
"Month": "2015-02-01T08:00:00.000Z",
"Sales": 3
}
...
]
So there’s an array with 12 objects inside: one for each month. And each object contains both the month variable and the Sales count. When generating the path for the line graph, we can access the data like this:
JavaScript
// Define the line
var valueLine = d3.line()
.x(function(d) { return x(d.Month); })
.y(function(d) { return y(+d.Sales); })
// Add the path element
svg.selectAll(".line")
.data([data])
.enter()
.append("path")
.attr("class", "line")
.attr("d", valueLine)
So far so good. Now let’s expand the data and add a nest.
Nest Level 1
Ok, so now that we’ve plotted our monthly fake strawberry sales for one year, let’s add in our (again, randomly generated) grape and blueberry sales data.
Month | Sales | Fruit |
---|---|---|
Jan | 80 | strawberry |
Feb | 0 | strawberry |
Mar | 71 | strawberry |
Apr | 51 | strawberry |
May | 3 | strawberry |
Jun | 11 | strawberry |
If we try plotting the line chart the exact same way as before, we end up with something that looks like this:
Whoops, it looks like d3 tried to plot all of our data as one continuous line, which makes sense because we didn’t tell it that there are 3 separate categories here.
For native R users, the solution to this issue would be relatively straightforward. In ggplot2, you’d just use the group
and/or color
options like this:
r
library(ggplot2)
ggplot(ex2, aes(x = Month, y = Sales, group = Fruit, color = Fruit))
+ geom_path()
In d3v4, this is where nests come in.
Just like with ggplot, we need to figure out which variable we want to group our data by. In this case, we want a separate line for each fruit’s sales. In R that’s group = Fruit
and in d3, you need to set the key to the Fruit variable.
It looks like this:
javascript
var nest = d3.nest()
.key(function(d){
return d.Fruit;
})
.entries(data)
At this stage, since we are simply grouping the data, this nest has only two parts:
The
key
(in this case, thed.Fruit
variable)The
entries
(the variable that holds the data that you are nesting)
Perhaps unsurprisingly, doing this changes the structure of our data. Before nesting it looks like this:
console
[
{
"Month": "2015-01-01T08:00:00.000Z",
"Sales": 87,
"Fruit": "strawberry"
},
{
"Month": "2015-02-01T08:00:00.000Z",
"Sales": 3,
"Fruit": "strawberry"
}
...
]
And after nesting:
console
[
{
"key": "strawberry",
"values": [
{
"Month": "2015-01-01T08:00:00.000Z",
"Sales": 87,
"Fruit": "strawberry"
},
{
"Month": "2015-02-01T08:00:00.000Z",
"Sales": 3,
"Fruit": "strawberry"
},
...
]
},
{
"key": "grape",
"values": Array(12)
}
...
]
To access the nested data and generate multiple lines, we can do this:
javascript
// Define the line
var valueLine = d3.line()
.x(function(d) { return x(d.Month); })
.y(function(d) { return y(+d.Sales); })
// Draw the line
svg.selectAll(".line")
.data(nest)
.enter()
.append("path")
.attr("class", "line")
.attr("d", function(d){
return valueLine(d.values);
});
Notice that there are only two things that have changed here, but they’re important things!
.data([data])
became.data(nest)
Make sure to change the data source to your new nested data.attr("d", valueLine)
became.attr("d", function(d){ return valueLine(d.values); })
Instead of being able to generate a line directly from the data as is, you need to now specify that you’d like to make a path from the values of the data (in this case, our Sales variable)
Just by making these small changes, you’ll see that we now have 3 separate lines. Hooray!
The entire js, html, css and csv scripts are included here .
Rollup Level 1
Now that we’ve been able to draw 3 separate lines (one for each fruit), we can see the theoretical monthly sales for each fruit. But what if we wanted to compare the annual sales for each fruit instead?
For R-users, my preferred option comes from the dplyr
package and the group_by
and summarise
functions. That may look something like this:
r
library(dplyr)
annualSales <- ex2 %>%
group_by(Fruit) %>%
summarise(Annual = sum(Sales))
annualSales
r
## # A tibble: 3 x 2
## Fruit Annual
## <chr> <int>
## 1 blueberry 729
## 2 grape 673
## 3 strawberry 617
So we end up with data that has only one data point for each fruit. To replicate this in d3, we can use the function d3.rollup
.
Using d3.rollup
in d3v4 would look like this:
javascript
var nest = d3.nest()
.key(function(d){
return d.Fruit;
})
.rollup(function(leaves){
return d3.sum(leaves, function(d) {return (d.Sales)});
})
.entries(data)
The rollup function generates a sum of the sales data for each Fruit value, similarly to the dplyr
group_by
and summarise
functions.
The data structure then looks like this:
console
[
{
"key": "strawberry",
"value": 617,
},
{
"key": "grape",
"value": 673,
}
...
]
Since we’ve reduced the data to just 3 values (one for each fruit), we can no longer represent the data using a line chart. Instead, here’s a bar chart generated with the nested and rolled-up data.
Although the code is different due to the difference in chart type, here is the code to generate the bars.
r
// Draw the bars
svg.selectAll(".rect")
.data(nest)
.enter()
.append("rect")
.attr("class", "bar")
.attr("x", function(d) { return x(d.key); })
.attr("y", function(d) { return y(d.value); })
.attr("width", x.bandwidth())
.attr("height", function(d) { return height - y(d.value); });
Notice that the d.key
(remember, the key is our Fruit variable) is used for the x component of creating the shapes. Similarly the d.value
(this is our rolled up Sum of Sales data) is used for the y component of the bars.
Sorting Keys
If necessary, you can also sort the keys using the .sortKeys
function. For instance, to put our fruit data in alphabetical order (by fruit), our new nesting function may look like this:
javascript
var nest = d3.nest()
.key(function(d){
return d.Fruit;
})
.sortKeys(d3.ascending)
.rollup(function(leaves){
return d3.sum(leaves, function(d) {return (d.Sales)});
})
.entries(data)
Which results in an updated chart like this:
The full code for this example is available here .
Nest Level 2
We’re now familiar with how d3.nest()
and d3.rollup()
work, but we don’t have to stop at one level. For instance, imagine that we now have multiple years of fruit sale data.
For this example, the data includes values for 2015 and 2016.
Month | Sales | Fruit | Year |
---|---|---|---|
Jan | 87 | strawberry | 2016 |
Feb | 3 | strawberry | 2016 |
Mar | 89 | strawberry | 2016 |
Apr | 56 | strawberry | 2016 |
May | 1 | strawberry | 2016 |
Jun | 17 | strawberry | 2016 |
Now, we may want to nest by fruit and then by year. In this case, we don’t need the rollup, just the keys. First by fruit and then by year.
javascript
var nest = d3.nest()
.key(function(d){
return d.Fruit;
})
.key(function(d){
return d.Year;
})
.entries(data)
Our resulting data structure looks like this:
console
[
{
"key": "strawberry",
"values": [
{
"key": 2015,
"values": [
{
"Month": "2015-01-01T08:00:00.000Z",
"Sales": 87,
"Fruit": "strawberry"
}
]
}
...
]
},
{
"key": "grape",
"values": Array(2)
}
...
]
The original sales data is still present, but notice that it’s now two levels down. That’ll make it slightly more challenging to access for making graphics with it.
Here’s how we get to it:
First, we need to bind the upper levels of data to “groups”, or in d3, g-elements.
r
var fruitGroups = svg.selectAll(".fruitGroups")
.data(nest)
.enter()
.append("g")
This creates 3 groups: strawberry, grape, and blueberry. These were our first keys, so they are the first things to be grouped.
Now, we need to access the data inside each group by appending path elements like this:
javascript
var paths = fruitGroups.selectAll(".line")
.data(function(d){
return d.values
})
.enter()
.append("path");
This leaves us with 3 arrays: strawberry, grape, and blueberry. Within each array we’ll find 2 paths: one bound with 2015 data and one bound with 2016 data. Now all that’s left is to actually draw the path element.
javascript
paths
.attr("d", function(d){
return d.values
})
.attr("class", "line")
After we’ve added that bit of code, this is the resulting graph:
Awesome! We now have 6 lines on our chart. It’s a little hard to tell the difference between our lines though, so we can add some styling.
Styling Nested Elements
First, let’s make the color of the line reflect which fruit the data represents. We can do this by manually defining the colors for each. Here, we’ll do that manually making strawberry pink, grapes green, and blueberry blue-ish purple.
javascript
// Set the color scheme
var colors = d3.scaleOrdinal()
.domain(["strawberry", "grape", "blueberry"])
.range(["#EF5285", "#88F284" , "#5965A3"]);
Now, adding a single line of code to the end of our grouping variable like this will adjust the color for each element:
javascript
var fruitGroups = svg.selectAll(".fruitGroups")
.data(nest)
.enter()
.append("g")
.attr("stroke", function(d){ return colors(d.key)}); // Adding color!
Getting closer! But we have a 2015 line for each fruit and a 2016 line for each fruit. Let’s separate those out by line type, adding a dash for 2015 lines.
We can do this by adding this line of code to the end of our path attributes:
javascript
paths
.attr("d", function(d){
return valueLine(d.values)
})
.attr("class", "line")
.style("stroke-dasharray", function(d){
return (d.key == 2015) ? ("3, 3") : ("0, 0")}); // Adding dashes to 2015!
And we end up with this:
Yay! We now have 6 lines, 2 for each of our 3 fruits.
All of the code for this chart is available here .
I’ve added a few more features like expansive y-axes and dropdowns in the following example: You’ll find the fully functional version with all of the necessary code here .
I hope this has been a helpful resource on using the d3.nest()
functions in your work. Good luck!