Mindful tools for data exploration (or, how to make visual lasagna)
Note: This post can also be found at Graphicacy , a data visualization firm helping you to visualize a better world. World-changing ideas travelled fluently along the twenty-one inches between Willi am Playfair 's mind and fingertips. Playfair, the eighteenth-century Scottish engineer who invented the bar, pie, and line chart, relied upon an oft-overlooked digital tool: his hand.
When people think of data visualization, they often imagine visual spaghetti.
Note: This post can also be found at Graphicacy, a data visualization firm helping you to visualize a better world.
World-changing ideas traveled fluently along the twenty-one inches between William Playfair’s mind and fingertips. Playfair, the eighteenth-century Scottish engineer who invented the bar, pie, and line chart, relied upon an oft-overlooked digital tool: his hand. He plotted each chart with pen and paper. Borne from tedious hours drawing and measuring, his handcrafted charts describe his world with a visual, economical elegance. Today, we churn out effective charts, also known as data visualizations, with a computer. Like magic, a computer can leap through the variables and the observations within a large dataset, mapping these individual records to a coherent visual framework. Here, the computer conducts our meticulous work instantly. In this essay, I’ll describe some of the computer tools that have democratized data visualization, evaluating their strengths but also offering thoughts to consider when using these tools to tell visual stories.
As a design educator who teaches data visualization, I thought my students would celebrate the computer drawing a simple pie chart for them. After all, those arcs would have taken Playfair actual time to calculate and craft. Yet many of my design students have studio art backgrounds. Even as they’re amazed by the speed and convenience the computer affords, they’re dismayed to discover that we don’t plot our graphs by hand. Here, we’ll take a look at the free applications used by my students, which require little to no coding knowledge. To transform the data, these tools ask for only a cursory knowledge of statistics that’s taught in many middle schools. What do we gain and lose with digital tools that have made charts so swift to create at the speed of thought, with an easy to moderate learning curve?
Before discussing the tools, it’s important to acknowledge the garbage in/garbage out adage. In an ideal world, a creator who we know well has put together our dataset. We know the creator’s intent for recording a set observations about our world and breaking them down into variables, quantitative and qualitative. We also understand their data collection methods to be valid and authentic. In this green world, our records haven’t been summarized for us already. With a mindful approach, this type of data lends itself to digital tools that help us discover and present a story. Today’s digital tools liberate us to mix and remix data, rapidly sketching, exploring a dataset from multiple vantage points. Much like a chef, the data visualization author today needs to understand how to collect nourishing ingredients, prepare them, and choose an effective visualization recipe. Visualization researcher Noah Iliinsky calls effective data visualization a form of lasagna, unfolding its ideas over many layers. Ineffective visualization is like visual spaghetti — just google “data visualization,” and you’ll see abundant examples of elaborate graphics that are impossible to create by hand: often, this visual spaghetti is all sound and fury, signifying nothing, to paraphrase William Faulkner, the prescient Nobel Prize winning novelist for our noisy times.
Pen and paper
With all of the bright pixels around us today, it’s easy to forget that pen and paper are inventions too. These inventions, or tools for our real digits, evolved over many years to provide the most utility and ease of use. By sketching your data visualizations by hand, you’re able to see the big picture in a way a computer cannot envision. I promise that you’ll never draw the visual spaghetti you see in Google Images. You also might find some serendipity while drawing. Research shows that drawing is not a talent but a form of intelligence. Unexpected connections appear while thinking visually, along that Silk Road between the fingertips and the mind. In my class, at the beginning of each project, I ask my students to ask questions about our world, which we’ll attempt to picture through data visualization. Students write down their topic, a question for themselves, in one sentence, their goal and audience, and the variables that they have within that topic.
Every project begins with a question. For instance, a student might ask themselves, “How many monarch butterflies migrate in North America each year?” They write that sentence down on paper. Then they draw an actual portrait of their intended audience — a face with distinguishing demographic attributes. Overall, a generalist audience will want simple graphics; scientists will want more elaboration. The variables might include: monarch butterflies, other migrating creatures, monarch lifespans, distance travelled, time travelled, seasons of the year, temperature, geography, types of habitats, flowering plants, predators and prey, human populations, and so on. With this written guide in hand, students can then make predictions about the types of visual frameworks that might match their storytelling. Do they want to show relationships, comparisons, or changes over time? To help explore possibilities, I give them an online resource: the data visualization catalog. To chunk out the story, I often ask the students to write a headline for the audience, taking a cue from journalists. The main idea of the headline can be the main graphic; the secondary ideas in the headline can be secondary graphics.
When sketching the data visualizations with a pen, it’s important to set limits so that you can’t get too detailed. I recommend using index cards to assure a big picture approach, and the biggest, fattest sharpie pen you can find. Create variations on your theme on index cards, surround yourself with them, and see if you and trusted peers can agree on a big picture that makes sense for the story. In every data visualization project, there are two ways of working, as outlined by David McCandless of Information is Beautiful. You can use pen-and-paper and sketch from the big picture to the details, which the computer specializes in, or sometimes, you might want to work on the computer, examine the details, and then produce a big-picture, pen-and-paper sketch, so you can discover what you want to prioritize and edit out in your complex story.
Data visualization can be helpful in helping us answer Who/What, How much/many, Where, and When questions. I call these puzzle piece questions. They are also like layers in your lasagna. Once you put them together, your audience can see a clear picture, and it’s rich and fulfilling for them to consume all of that information. But data visualization is less helpful in answering How and Why questions, the mystery questions that really ignite our imaginations. Often, How and Why questions can’t be visualized because they emanate from invisible belief systems that compel us to interact with each other and our world in certain ways. For nature stories, the How and the Why might be found in the complex, invisible interrelationships of a system, where the whole is greater than the sum of the parts. We can see this in the recent collapse of honeybee ecosystems. Researchers understand that some threads of the complex web that sustains honeybees have been disrupted, but they can’t actually see these threads. An ecosystem can’t be taken apart like a clock. Visualizing all of nature’s parts won’t yield the deeper, more mysterious answers you seek. It’s important to recognize the limits of data visualization during the big picture sketching phases of your project. You can help answer How and Why questions through writing and annotations in your final presentation: text pairs with charts like signposts, guiding the reader through deeper issues below what’s visible. Or you might draw a conceptual illustration, often a visual metaphor, that helps people see the idea, like the iceberg model used in systems thinking.
Iceberg model: when you can’t chart the how and why questions, use a visual metaphor. Source: Northwest Earth Institute.
The world’s oldest spreadsheet: The Kish tablet, ca 3500 BC, Ancient Sumer (present-day Iraq.) This tablet likely records beer allocations. Source: Wikipedia.
Consider the spreadsheet to be a form of writing and storytelling. In fact, writing began neither in poetry nor novels, but in spreadsheets. In ancient Sumer, scribes used clay tablets to record foodstuffs and other legal matters. Perhaps picking up on our horizon line that divides earth from sky, these scribes divided their information with horizontal lines that resemble rows, and vertical lines that resemble columns. Despite providing the seed for writing and boasting an ancient lineage, spreadsheets languished as design interfaces for many years. Most people associate them with arguably negative or unpleasant contexts: accounting; cells; formulas. That’s why I offer a resounding thank you to the design team at Google Sheets for making its interface and tools so friendly.
While Google Sheets doesn’t offer the powerful functionality of Excel, its rival spreadsheet application, it boasts a winning user interface and experience. My students are able to perform the same tasks in Google Sheets that had befuddled and even angered them in Excel. With the green Explore button in the lower right corner of the spreadsheet, Google Sheets also allows you to automatically explore data through descriptive analysis and charts. You simply click in a cell on the spreadsheet, then click “Explore,” and new vantage points on the data appear in a panel. This panel even tries to predict the type of question you might have about the data. Like Excel, this tool also allows you to summarize your data through pivot tables, so that you can attain an appropriate level of detail for the story you want to tell. By taking original records and chunking them together into sums and averages within a single variable, you have a better chance of creating a coherent chart that shows relationships and comparisons between categories, rather than the visual spaghetti often associated with data visualization.
Hans Rosling plucks apart China to compare Shanghai and Guizhou. Source: Hans Rosling’s Visualization of Statistics.
Visual spaghetti ultimately leaves you entangled and unfulfilled. Yet despite its necessity for visualization, summarized data also presents dangers: it molds together individual records until you can no longer see them with the clarity you had before. When summarizing your data using pivot tables, remember that your spreadsheet is a form of writing that has the potential for coherent meaning, and just beware of the danger of a single story, as noted by the acclaimed novelist Chimamanda Ngozi Adichie. Here, Adichie talks about stereotypical views of Nigerians, her home country, and how a novelist needs to see every person in their vivid individuality. Even with our computer tools, we need to also find the humanity in our data. A brilliant example of that can be seen in Hans Rosling’s famous presentation comparing the health and wealth of two hundred countries around the world over the past two hundred years in four minutes. At one moment near the end of the presentation, he shows China’s summarized data plotted on a bubble chart. But then he plucks China’s bubble apart. Some parts of China are similar in their health and wealth to Western Europe, while others are similar to Sub-saharan Africa. If we had only the summarized data — so easy to do in a spreadsheet application — we would have lost the diversity of voices that comprise China in this story. While spreadsheets can take messy data and help you clean them up, pay keen attention to what might be lost in this process, and always preserve your original dataset on another worksheet tab.
A simple alluvial flow diagram visualizing the animal kingdom in Raw (below): kingdom, phylum, and class. Beware of visual spaghetti when the computer charts for you. Figure out the level of detail, or summary, that you need to tell your story, and what design techniques you can apply to help people chunk the information and see the signal in the noise.
Here, I added order and family to the visualization to make visual spaghetti. What level of detail do you need to tell your story?
If you want more knowledge of D3, I recommend Scott Murray’s book Interactive Data Visualization for the Web, which does a surprisingly humorous job of leading the reader through its intricacies. Even though it is thought of as library, D3 is meant to be a custom creation, and this requires writing a lot of code. A simplified version of D3 can also be studied and implemented in your projects: d3plus, which is often used by MIT for its data visualization interactives. While d3plus does require coding, it uses less lines of code to build interactive data visualizations, which can be useful when the bespoke approach of D3 seems too unwieldy.
iNZight Lite is a free, eponymous academic tool developed at the University of Auckland in New Zealand. iNZight was initially developed as a visualization package for R, the world’s largest open-source software for working with data. R is all text-based in its interface, and it was developed for people working in statistics. It functions like a swiss army knife: by installing packages, you can customize R for your own needs. iNZight has matured into an easy-to-use and beautiful tool for exploring and presenting your data in visual way, without any coding at all. This is true even for the R package, but the lite, or online, version has most of the same capabilities. iNZight has many positive presentation qualities. Its signature paired box plot and dot chart graphs, and its histograms built out of counted dots, are all easy to read at a summary and detail level at once. Each circle represents an individual record, and they are complemented by summaries drawn by boxes and lines. At the same time, iNZight also quickly allows you to explore a large dataset visually. It is especially powerful for faceting your data, creating small multiple charts to compare changes in a single space. Alberto Cairo has several excellent tutorials that will introduce you to the tool, from single variable to multivariable plots. Again, all of these charts can be downloaded into more artistic image editing tools, such as Adobe Illustrator.
As mentioned before, R is not just a pirate’s barbaric yawp — it’s also the open-source swiss army knife for working with data. While it has a reputation for a steep learning curve, you can actually make effective visualizations with just a little training. To learn R and my favorite visualization package, ggplot, simply google Hadley Wickham, the inventor of ggplot and a great educator on how to use R and ggplot together, along with other packages that he invented for organizing and exploring your data. R is best to use when you have epic data sets that spreadsheet applications, such as Google Sheets, can’t handle, or you want to plot your graphic in an unusual way, such as on a polar grid. R allows you to be creative and think outside-the-box in how you layer together and plot your visualization. You can even build an interactive using R that works online with web standards — no HTML or CSS knowledge required. The package for that is called Shiny R.
Imagine the Google Sheets interface — vast territory devoted to columns and rows, with the opportunity to insert charts around the spreadsheet — suddenly inverted. That’s Tableau Public, with its emphasis on charts over spreadsheets. Once you connect to your data in Google Sheets, Excel, CSV or TSV form, Tableau automagically interprets your data into numbers, strings for text, and latitudes and longitudes for mapping. On your first worksheet, this data is broken up into two stacks of blue and green pills. The blue pills are discrete data, or categorical data that Tableau calls Dimensions. This creates buckets for you. The green data is continuous, which draws axis lines for mapping the buckets of data into space. Tableau calls this data Measures. To sort through all of these variables, Tableau offers a Show Me panel, where you can select some of these pills and see appropriate visual frameworks for them. With your variables selected and just one click on a suggested chart, you can create a visualization, which can then be customized visually on the Marks landing pad. For instance, you can create a map of tornadoes in the United States, with circles placed according to automagically geolocated data, scaled according to number, and colored according to miles-per-hour in speed, from dark to light blue. Typically, this secondary level of mapping information works for a broader, more general question that you want to see visualized, while your most important question can be seen through one of our most powerful visual attributes: position in space showing relationships, comparisons, and connections.
Tableau is effective for rapidly sketching variations from your data set, and then linking them together into interactive dashboards. If you were a chef, Tableau helps you avoid making spaghetti out of your data. Instead, you can create layered, rich stories, just like visual lasagna. The primary drawback with Tableau is that it’s a closed system, meaning the language encoding your graphics is proprietary to Tableau. While you can connect Tableau to R and it allows you to work with shapefiles used by mapmakers, it is not as connected to web standards that are built using HTML and CSS. Newcomers to data visualization, such as Quadrigram and Plotly, are using web-based methods and work completely online. They show the effects of modern-day interactives, with smooth transitions a most notable improvement over Tableau for beauty, ease of reading, and comparing variables. Hopefully, this competition will compel Tableau to create a web-based version of its software, which integrates more seamlessly with web developers.
Communication theorist Marshall McLuhan offered many timeless nuggets of wisdom. First, he said that the medium is the message. That’s why it’s so important to understand the strengths and weaknesses of the digital tools used for data visualization. The medium itself shapes how your audience receives the story, and it certainly helps hone the level of detail we can see while exploring the data, from the microscopic lens of the detailed view, to the telescopic lens provided by having a big picture in mind: broad, hazy, yet capturing the essence of your story. He also said that once you see the boundaries in your environment — in this case, your digital tools — they are no longer boundaries. You can find a way to integrate the hand-drawn approach of William Playfair with the computer tools that allow anyone to visualize complex datasets.
Marcel Proust once wrote that to explore and discover, we don’t need to just seek new landscapes; we need to have new eyes. Data visualization begins and ends in exploring, aided by many complementary tools for insightful navigating. With eyes, hands, mind, computer, and yes, heart, we can create and share authentic stories that speak to us at a human level.
Originally published at graphicacy.com.