The Framework of Improving the Analytical Interaction in the Visualization of Economic Data for General Public

INTRODUCTION

Nowadays, the significant role data plays in economic growth is widely recognized. Many countries and enterprises are striving to collect data and analyze data. In this data ecosystem, the general public seems to be just the provider of economic data and has little to do with economic data analysis. However, due to the emergence of open data and information visualization, they are about to have a share of the benefit of data. For example, coffee shop owners use visualization tools to analyze regional population distribution by age to predict new customer demand for coffee and cake; job seekers use visualization tools to discover industry distribution in the world and know where there are better opportunities for them. In this way, economic knowledge can be popularized and local economy will prosper.

Great Britain’s Ordnance Survey, New York Business Atlas and the Atlas of Economic Complexity are all visualization tools of economic data, which were launched around 2000. They demonstrate economic data that would be understandable for general users to make business decisions. However, the design of many visualization tools still remain in the early phase–just to show the data clearly. The interaction for analysis in those tools is lacking. In addition, one of the most obvious difference between analyst and the general public is that analyst have their own analytical tool, while the general public does not have that. Therefore, providing a means for the novices to analyze data is important in the visualization tools. The aim of this thesis is to design a framework that enhances the analytical interaction in the visualization of economic data, so that users not just can read the data in the visualization, but they can also analyze the data, in order to form their insights from it.

There are two important references in this thesis. First one is Stephen Few’s book , in which he concluded core skills for visual analysis of economic related data [1]. In his book, analytical interaction refers to interaction techniques which are especially useful for analyzing information. The second reference is Ben Schneiderman et al.’s book , in which he provided a clear logic of considering interaction design in information visualization, which involves cognitive science aspects, information visualization, and interaction techniques [2]. The framework in this thesis is built according to this architecture. As for the purpose of this framework, I only focused on the knowledge which is associated with economic data analysis, rather than in general.

In the first part, I would introduce the related works. In the second part, I would present a case study: The Atlas of Economic Complexity, and also the result of the usability test. In the third part, I would present the construction of the framework in 3 perspectives: cognitive science, economic data visualization, and economic data interaction. In the forth part, I would present the framework in a visualization, with a prototype of the analytical interaction (due to the limited time, this part will be presented at the end of September). In the fifth part, I would talk about the impact of the visualization of economic data for the general public. Based on the above, I am going to propose some future directions and conclusion at the end of this thesis.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999.

1.Related Work

1.1 Cognitive science aspects Information visualization research is often associated with cognitive science, mostly with visual perception. Visual perception is the foundation for the science of data [3]. This article is about improving the analytical interaction of information visualization. According to Stephen Few, the interactive manipulation of computational resources is part of the reasoning process, in which, besides recognizing the visual patterns, users also need to memorize and analyze them [1].

Donald A. Norman addressed the overrated power of unaided mind. He claimed that “Without external aids, memory, though, and reasoning are all constrained [4].” Ben Schneiderman et al. regarded this kind of situation as a hard time for mental work. It happens when we try to analyze data in the visualization tool [2]. For example, a stockbroker who stares at the financial data on a monitor for making timely decisions. He needs to memorize a lot of information and quickly processes them in his brain. He may easily feel overwhelmed by the new information. Nelson Cowan claimed that, this when we are challenging the capacity of our working memory during this kind of “hard time” [5]. A well-designed information visualization is exactly the external aid for overcoming our limits. In order to prevent users from going through hard time at mental work, it is important for designers to understand the capacity of working memory.

Miller GA & Nelson are both experts in the area of working memory, just that their research had 50 years time difference. The magic number of 7 was propounded by Miller to illustrate the capacity of working memory for numbers, images or chunks of images [6]. However, Nelson proposed the number 4 for the capacity of working memory, without the assistance of repetition and grouping being passed through to the memory process [7]. Another part of short-term memory storage is visual information [8]. Steven J. Luck & Edward K. Vogel presented the capacity of visual working memory for features and conjunctions [9].

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999. [3] Ware, C. “Information visualization: perception for design”. Morgan Kaufmann. 2012. [4] Donald A. Norman. “Things that make us smart: Defending human attributes in the age of machine”. Addison-Wesley Piblishing Company. 1993. p43 [5] Nelson Cowan. “Working memory capacity: classic edition”. Library of Congress Cataloging-in-Publication Data. 2016. [6] Miller GA. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review. 63 (2): 81-97, March 1956. [7] Nelson Cowan. “The magical number 4 in short-term memory: A reconsideration of mental storage capacity”. Cambridge University Press. February 2001. [8] Baddeley, A. D. “Working Memory”. Clarendon, Oxford. 1986. [9] Steven J. Luck & Edward K. Vogel. “The capacity of visual working memory for features and conjunctions”. Nature. 390, 279-281, 20 November 1997.

1.2 Economic data representation Ben Shneiderman regarded graphic aids as an important external cognition that makes us smart [2]. Graphic can help us to better communicate an idea, “A picture is worth ten thousand words.” More importantly, graphical means can help us in the reasoning process, as Bertin would say: using vision to think. He also concluded visual variables for presenting values in 2-PD [10]. The development of information technology has empowered the formats and interactivity of graphic aids. Graphic representation of variables could be generated by computer, and users can interact with the graphic representation directly. Such creative graphic external aid is information visualization.

Economic data could be represented as different types of graphs, and each graph has unique properties. William Playfair is probably the first economist who could draw so well. He created a lot of precious diagrams of economic data. Tufte was the one who first organized the representation of quantitative information [11]. Stephen Few listed few commonly used graphs of economic data, for example, scatterplots, heat maps, and tree maps, and he also pointed out meaningful tips for their application [1].

In 1721, Anscombe’s quartet showed the great power of scatterplot in 2-PD rectangular coordinates [12]. In 2008, Elmqvist, N. presented a 3-PD scatterplot, a more powerful navigation technique to move between scatterplots via cube rotations [13]. However, the visualization seems very complicated in a 3-PD scatterplot, and the interface is hard to use at the first glance. The system is a bit too complicated for the novices. Hans Rosling ‘s “moving bubble” presented an amazing animated scatterplot, which is a powerful tool even for the novices to comprehend the correlation changes [14]. Brian Johnson and Ben Shneiderman claimed that treemaps is a powerful tool to visualize hierarchically structured information in a space-saving way [15]. Isabel Meirelles listed few design tips for heat maps, as well as choropleth maps [16].

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999. [10] Jacques Bertin. “Graphics and Graphic Information Processing”. De Gruyter. 1981. [11] Edward R. Tufte. “The visual display of quantitative information”. Graphics Pr, 2nd edition. 2001. [12] F. J. Anscombe. “Graphs in Statistical Analysis”. The American Statistician, 27 (1): 17-21, February 1973. [13] Elmqvist, N., Fekete, J.-D., and Dragicevic, P. „Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation”. TVCG: Transactions on Visualization and Computer Graphics. 14(6): 1141–1148, Nov/Dec 2008. [14] Hans Rosling. “Trendalyzer”. 2007. https://www.gapminder.org/tools/#$chart-type=bubbles [15] Brian Johnson, Ben Shneiderman. “Tree-maps: a space-filling approach to visualization of hierarchical information structures”. Proceeding Visualization’91. IEEE. 1991 [16] Isabel Meirelles. “Design for Information”. Rockport Publishers. 2003

1.3 Economic data interaction Interaction in visualization as the catalyst for the user’s dialogue with the data, and it also closely connected to the user’s actual understanding and insight into these data [1][17][18]. However, there are few studies focused on the data interaction within the domain of visualization research. One reason could be that interaction is an intangible concept that is difficult to quantify, and evaluate [17]. And there are even less fewer studies focused on analytical interaction for economic data.

Ben Shneiderman introduced visualization controls and dynamic query as the foundation of interaction in information visualization [2]. Stephen Few’s book, , is a guidebook closely related to economic data analysis [1]. He summed up 13 interaction techniques commonly used by data analysts. He called them analytical interaction–comparing, sorting, adding variables, filtering, highlighting, aggregating, re-expressing, re-visualizing, zooming and panning, re-scaling, accessing details on demand, annotating, and bookmarking. These are important references for setting up the framework later. In addition, Willan A. Pike et al. claimed seven constructive research directions for an interactive tool, including embodied interaction, and capturing user intentionality etc. [18]. Niklas Elmqvist et al. proposed the new term “Fluid interaction” for information visualization, that he categorized properties, such as promoting flow, supporting direct manipulation and minimizing the gulfs of action, which is very inspiring [17]. In section 3, I would introduce some of the mentioned concepts and methods which are closely related to economic data analysis and summarize them into the framework.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999. [17] Niklas Elmqvist, Andrew Vande Moere, Hans-Christian Jetter, Daniel Cernea, Harald Reiterer and Tj Jankun-Kelly. “Fluid interaction for information visualization”. Information Visualization, 10(4), October 2011. [18] Willan A. Pike, John Stasko, Remco Chang, and Theresa A. O’Connell. “The Science of Interaction”. Information Visualization. 8(4): 263–274, 2009.

1.4 The impact of the visualization of economic data The book from Andrew Young and Stefaan Verhulst serves as the main references here. They focused on the impact of Open data on people’s lives. They concluded one dimension of impact is to create an opportunity for the citizen, that includes stimulating economic growth and fostering innovation[9]. They classified different cases by topics like Great Britain’s Ordnance Survey, New York City Business Atlas, NOAA: Opening up global weather data in collaboration with businesses, and Opening GPS data for civilian use. However, different projects focused on data of different sectors. Among these cases, Great Britain’s Ordnance Survey and New York City Business Atlas are both the visualization of economic data for the general public, that becomes a very valuable reference for this thesis, where I especially looked into the impact of mentioned in these two cases.

[19] Stefaan Verhulst, Andrew Young. “The global impact of open data”. O'Reilly Media, Inc.. 2016

1.5 Interim conclusion The related work involves 4 aspects to improve analytical interaction: cognitive science, economic data visualization, economic data interaction and the vision of economic InfoVis. Because of the following reasons:

Firstly, the aim of this book is to set up an analytical interaction framework for visualization of economic data, which is a relatively new research area. Ben Schneiderman et al.’s book , in which he provided a clear logic of considering interaction design in information visualization, which involves cognitive science aspects, information visualization, and interaction techniques [2]. The framework in this thesis is built according to this architecture. As for the purpose of this framework, I only focused on the knowledge which is associated with economic data analysis, rather than in general.

Secondly, the ultimate purpose of information visualization is to help users to generate their own insights for making good decisions, which could not be fulfilled with the framework alone, which is data interaction including analytical techniques, analytical patterns, and practices etc.. Other aspects are also highly relevant to this goal.

Thirdly, it is necessary to classify types of visualization for the best interpretation of economic data. For example, heat map has been widely used in biology research, but heat map for economic data can lead to different interpretation and understanding by users. Designers should examine the choice for visualization carefully.

[2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999.

2. Methodology

I conducted a usability test with the object of an economic InfoVis product: the Atlas of Economic Complexity. The first part includes product discovery with an introduction, target audience, and the vision. The second part includes is the content of the usability test. I have asked 10 potential users to complete a set of tasks, and determine the usability of the product by observing the users’ behavior. The results are used as a reference for further improvement.

2.1 Product discovery The Atlas of Economic Complexity is a data visualization tool designed by Center for International Development at Harvard University, released in 2011. “The atlas starts with the idea that the wealth and potential of nations are derived from productive knowledge. To maximize collective knowledge, a nation needs to connect its individual citizens, each of whom can benefit the whole. The more complex and interconnected a nation, the greater its economic productivity and potential. For example, a country that manufactures lithium batteries can soon expand into making computers, cell phones, or electric cars [20].” Following this logic, they created a Trade Data Visualization first, which collected data from 128 countries’ exports and imports between 1962 and 2016. They visualized the data as 3 interactive visualizations: tree-map, heat map, and charts (see Figure 1). These are the main visualization which I used as the object in the usability test. Otherwise, the most significant feature in the atlas is the Complexity Visualization (see Figure 2). These three charts are designed for experts. Since the target audience of this article is the novices, I would not explain these three charts here.

The tool itself is a dynamic resource, which automatically updates with new data. The publisher of the product claims that the tool is designed for answering questions such as [20]:

-What does a country import and export? -How has its trade evolved over time? -What are the drivers of export growth? -Which new industries are likely to emerge or disappear in within a given geography? Which are likely to disappear? -What are the GDP growth prospects of a given country in the next 5-10 years, based on its productive capabilities?

The vision of the tool is clearly written in the paper: “By providing maps, we do not pretend to tell potential explorers where to go, but to pinpoint what is out there and what routes may be shorter or more secure. We hope this will empower these explorers with valuable information that will encourage them to take on the challenge and thus speed up the process of economic development [21].” The target user of this product is very wide. The “explorers” here could be any single individual in the society, such as students, entrepreneurs, and policymakers.

2.2. Usability test The average correct rates (sheet 1, 2 and 4) are equal regardless of the participants’ economic background. Participants without economic background (P2, P3, P4, P8, P9) achieved slightly higher correct rates than participants with an economic background (P1, P5, P6, P7, P10) for the simple descriptive questions. However, participants without economic background reported slightly lower correct rates than participants with an economic background for the difficult descriptive questions.

We can see from the sheet 1,2 and 3 that under the same correct rates achieved by both groups of participants those with economic background finished the tasks significantly faster than those without economic background. And both groups took less time on the second part of the questions than the first part.

I classified the participants by their economic background according to their answers to the self-checking question “Is what you are doing/studying relevant to economics ? ”. Those who checked “Yes” have a certain economic background, and those who checked “No” were participants without economic background.

This is a brief analysis of part of the experiment results. Due to the limited number of participants, this interim conclusion is rather superficial. A more qualitative analysis on the participants with the tool will be presented in the next part of the thesis, such as their thinking process and interaction process. The combination of both parts of the analysis would form a more sufficient ground for the report.

Although the visualization of economic data has a long history, it is fairly new to the general public. The participants of this usability test come from different age groups, with different personal, academic and professional background. Some of them major in economics, some of them are totally not interested in economics. It was the first time for some of them to try to understand economic data. However, this usability test shows that the visualization is highly readable and easy to learn for the users. It can help users to answer some descriptive questions, and the correct rate is high. When it came to more difficult predictive questions, few users were also able to understand the cause and effect of some economic phenomenon, and able to predict the future by what they have learned from the visualization tool.

3. Analysis

3.1 Cognitive science aspects 3.1.1 Definition of working memory Working memory is a cognitive system with limited capacity that is responsible for temporarily holding information available for processing [22], and it is important for the reasoning, the guidance of decision-making and behavior [23][24]. Nelson’s explanation can help us to understand working memory in our daily life. He has compared human brain to the computer. As we know that the RAM of the computer can be filled up. When the capacity of the RAM is close to its limit, new information cannot be stored. But the normal human brain has the ability to accommodate new information without limit. However, a person can be overwhelmed by too much new information which he/she feels difficult to comprehend [6].

I can recall the memory contest I participated in the fifth grade. We had to to memorize 13 sets of two digits in 10 seconds, and complete simple mental arithmetic afterwards (see Figure 2). It went well when I spelled out the 1st, 2nd and 3rd sets of numbers. When I challenged the 8th set of numbers, I have already made a lot of mistakes. My head felt extremely tired and painful. It felt like, if I put more sets of numbers inside, my head would explode.

Nelson claims that the feeling of being overwhelmed by a lot of new information can occur because of the special type of memory that is typically termed working memory [6}. Firstly, the capacity of working memory is limited, which is why I was only able to finish 8 sets. If the capacity of memory exceeds its limit, the probability of error increases. Second, it is responsible for the temporary preservation, like short-term memory. Third, the information it stores is prepared for the next process, which is different from short-term memory. The term “working” is meant to indicate that mental work requires the use of such information.

When the user uses information visualization to analyze data, sometimes it feels like being in a memory contest, but a more complex one. The user needs to compare among multiple dimensions of data, memorize a lot of patterns and colors, and memorize both similarities and differences. In addition to the growing volume of information, the workload is also large. A good analyze tool for the visualization of economic data should be able to minimize the user’s memory load so that the user would not easily feel overwhelmed and become more efficient.

[5] Nelson Cowan. “Working memory capacity: classic edition”. Library of Congress Cataloging-in-Publication Data. 2016. [22] Miyake, A. ; Shah, P., eds.. “Models of working memory. Mechanisms of active maintenance and executive control”. Cambridge University Press. 1999. [23] Diamond A. “Executive functions”. Annu Rev Psychol. 64: 135-168. 2013 [24] Malenka RC, Nestler EJ, Hyman SE. “Chapter 13: Hligher Cognitive Function and Bahavioral Control”. In Sydor A, Brown RY. 2009

3.1.2 Capacity limits of working memory People do have different memory abilities. In the usability test, some participants could remember the numbers quickly in mind, and do easy calculation; while some needed flashback several times, and some preferred to write them down. How many numbers can human memorize in general?

Miller proposed the “magical number seven” in 1956. According to Miller, “chunks” of 7 elements, which could be digits, letters, words or other units, is the information-capacity of a young adult [6]. As scientists set the boundaries of experiments more significantly, different magical number emerged. At present, scientists believe that without using methods such as repetition, grouping, and merging, human processing capacity will greatly decrease. In 2001, Nelson proposed the “magical number 4”.

Moreover, both Miller and Nelson believe that regular chunking can alleviate the burden on memory [6][7]. For example, a series of letters, “fbicbsibmirs” can be seen as 12 elements. It’s hard to memorize because It has exceeded the capacity of working memory. However, if someone finds the relationship in them and associate them together into “FBI. CBS, IBM, and IRS”, then the pure capacity limit for the trail was four chunks. It would be much easier to memorize. McLean and Gregg provided three ways to form chunks for better memorization: ”(a) Some stimuli may already form a unit with which S is familiar. (b) External punctuation of the stimuli may serve to create groupings of the individual elements. (c) The S may monitor his own performance and impose structure by selective attention, rehearsal, or other means [25].”

Designers should pay attention to the presentation of digits and letters in the visualization. It is recommended that the number of arrays be no more than four. Moreover, we can reduce the burden of human memory by grouping elements. The reason why a long phone number is often divided by 4 digits is to make it easier to remember. Another possible solution would be, instead of showing the digits, showing the percentage. Sometimes it can aid comprehension.

[6] Miller GA. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review. 63 (2): 81-97, March 1956. [7] Nelson Cowan. “The magical number 4 in short-term memory: A reconsideration of mental storage capacity”. Cambridge University Press. February 2001. [25] McLean, R. S. & Gregg, L. W. “Effects of induced chunking on temporal aspects of serial recitation”. Journal of Experimental Psychology. 74:455-59. 1967.

3.1.3 Capacity limits of visual working memory Another part of short-term memory storage is visual information [8]. Luck & Vogel focused on the features and conjunctions regarding different types of information [9]. They claimed that our visual working memory can retain only 4 colors or orientations at a time (see Figure 8a, 8b, 8c). When the number of the information rises to eight and twelve, the correct rate decreases. Second, as mentioned that human can retain both the color and the orientation of four objects, indicates that visual working memory stores integrated objects rather than individual features (see Figure 8d, 8e). This is to say when integrating individual features into one object, it leads to a large capacity for retaining individual features. Moreover, as chunking is an important way to reduce the burden of human memory for verbal information, integrating relevant features together is an efficient method to enlarge the capacity of visual working memory.

Color plays a huge role in information visualization. From the perspective of data representation, when the amount of data in a database rises, the amount of color used to represent the data in the screen increases simultaneously. Designers should keep the number 4 in mind and control the number of colors. When the total number of color exceeds a certain amount, it is very necessary to provide a way for users to organize the colors (for example, grouping & filtering), so that users can reduce the burden of their working memory. In this way, users could achieve higher accuracy and better experience when it comes to understanding and analyzing data. In the usability test, this problem also emerged.

[8] Baddeley, A. D. “Working Memory”. Clarendon, Oxford. 1986. [9] Steven J. Luck & Edward K. Vogel. “The capacity of visual working memory for features and conjunctions”. Nature. 390, 279-281, 20 November 1997.

3.2 Economic Data Representation 3.2.1 Rectangular coordinates Rectangular coordinate was invented by Descartes in the 17th century. Although today we have more choices to represent economic data, like trees, maps, and networks, rectangular coordinates have often been widely used for the representation of economic data. Moreover, after being designed by a lot of great designers, rectangular coordinates has shown its powerful potential as an analytical tool.

First, rectangular coordinates can bring out the relationship between numbers clearly. Anscombe’s quartet can illustrate this point well [12]. When we look at the arrays on the left side (Figure 9), we cannot find any rules in these numbers. Even the data scientist cannot differ them at all without the help of graph. On the right side, all four of these data sets are described by exactly the same linear model. And yet these datasets appear very different when graphed, the relationship between numbers could be recognized at the first glance ( see Figure 10 in next page).

Moreover, the patterns presented by rectangular coordinates has certain rules, for example, positive correlation, negative correlation, no correlation (see Figure 11) [1]. The system can detect the patterns in the dataset and prompt it to the user, which is an efficient way to improve the analytical skill of the novice users.

Second, animated scatterplots is a great tool for analyzing correlation changes. Hans Rosling has demonstrated the power of scatterplot with animation in the visualization “moving bubble”, which tells the story of changes over time. However, animated scatterplots have shortcoming for analysis. Animated visualization only allows us to see a fraction from the changing pattern at a time [1]. For further analysis, the visualization allows the user to select particular bubbles and show a trace line for each of the selected bubbles in animation as it progresses (as shown in Figure 12, three traces of Andorra, Argentina, and Angola from 1800 to 2018). This is so-called traces visualization and is very helpful for verifying apparent anomalies. Moreover, the filtering, zooming and panning controls on the right side of the scatterplot, provide the analytical interaction which supports the users to analyze correlation changes.

In addition to scatterplot, bar graph and line graph can also be drawn on coordinates It is important to point out that line graph and bar graph are interchangeable in many cases and that would not affect the reading. But line graph has a specific feature that visually connects separated values together [1]. Therefore, it will cause misunderstanding, if the values on the horizontal axis are independent from one another. In the usability test, users also made mistake with some fairly easy questions because of this feature. The question was, in which year China had experienced a significant decline? Through the chart, some of the users quickly wrote 2008. Some of them were hesitated in 2008 and 2009 (see Figure 13). The correct answer should be 2009. If we re-visualize the data into a bar graph, the answer will be more clear (see Figure 14). The line connected 2008 and 2009 has misled the users.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [12] F. J. Anscombe. “Graphs in Statistical Analysis”. The American Statistician, 27 (1): 17-21, February 1973.

3.2.2 Tree-maps Tree-maps is are known for an interactive visualization method for presenting hierarchical information. There are two types of hierarchical information: structural information associated with hierarchy, and content information associated with each node. There are four forms of tree-maps which are able to represent both the structure and content of the hierarchy (Figure 15). Figure 15-b & 15-c are the traditional one. Figure 15-d is the commonly used one. A major feature of tree-maps is its high space utilization. Figure 15-d and e display data in a fixed long square, and utilize 100% of the area in the square. Studies have shown that on a 130inch display, this tree-maps can arrange more than 1000 nodes very well, without affecting readability [15]. The difference between them is, Figure 15-e simply eliminates the nesting offset used to separate object at each level so that it could provide even larger space for representing data.

The pattern in a dataset should be demonstrated with visual attributes, such as color (hue, saturation, brightness), texture and shape, etc. Color is the most important visual attribute among the others, and it is especially helpful for making a quick and accurate decision [26]. Designers should pay attention to the choice of visual attributes. If visual attributes are not well designed, it could mess up the diagram once the amount of data rises. One of the test charts in the usability test was a tree-map (see Figure 16). Some of the participants thought that there were too many colors in the graph and they tended not to use it until they needed to.

There is no research to tell a specific number of color in a tree-map which would be overwhelming to users. We could refer to the conclusion in section 1.3 that the capacity limit of visual working memory is four. However, the stimuli of the experiment which this conclusion came from was not designed in the form of a tree-map. Therefore, we can take the number four just as a reference here.

[15] Brian Johnson, Ben Shneiderman. “Tree-maps: a space-filling approach to visualization of hierarchical information structures”. Proceeding Visualization’91. IEEE. 1991 [26] John F. Rice. Ten rules for color coding. Information Display, 7(3): 12-14. 1991.

3.2.3 Heat map Heat map is another common graph of representing economic data. When a heat map is combined with a geo map, it has similarities with Choropleth maps. Otherwise, we call it cluster heat map. Cluster heat map is often used in biology, while another one is used in economic research. As economic geography research becomes more and more important, I think this type of graph will be used more frequently.

A heat map integrated with geo map provides a perspective of God. It allows the user to quickly discover the differences between local and global. Figure 17-a is an interactive heat map which shows Brazil’s export in 2010. Brazil is represented in black, and the colors of other countries are based on the export volume with Brazil. From large to small, the volumes are identified from blue to yellow. We can see clearly that all countries are filled with colors, which indicate that Brazil’s products are exported to all countries in the world. The United States, China, Germany, the Netherland and Argentina are Brazil’s most important exporting partners, while New Zealand, Africa and Central Asia receive fewer exports. Let’s compare it with Brazil’s export heat map in 1990 (see Figure 17-b). We can clearly see the changes and the trend in these 2 decades. From 1990 to 2010, Brazil has obtained more partners, ie. Russia and southern Africa. The volume of trade between Brazil and China has also increased significantly. The trade relationship between Brazil and the United States, Europe, and Argentina has always been solid.

One of the shortcoming of heat map and choropleth map is that when the color is assigned to a defined region (it is by country in this case) on the heat map, it could lead to misinterpretation, that a represented phenomenon is also assigned to the entire country uniformly, which is often not the truth [16]. It causes misunderstanding more likely when the region is relatively large. As shown in the following Figure 15, Germany’s export to China in 1985. It seems that Germany’s products have successfully spread all over China, which doesn’t reflect the reality (see Figure 18-a). In 1985, the Chinese government had just decided to open 14 cities along the coast for imported goods, and the goods could be freely traded in these 14 cities only [27]. These 14 cities are quite small on the map of China (see Figure 18-b).

Another strength of heat map is that it is especially useful for depicting economic concentration and business clustering. The 2009 World Development Report–reshaping Economic Geography released by World Bank demonstrated a series of classic and creative heat maps (figure 19-a & b). They were established on geo maps, and the GDP per square kilometer is stacked up in the corresponding area. A city with a small area and high GDP value will look like a hill. The World Bank claims that “Markets favor some places over others” [28]. Development does not bring economic prosperity everywhere at once. Production concentrates in big cities. These heat maps illustrate this phenomenon clearly. The high hills and plains on the map exactly conveyed the reality of uneven economic prosperity. Now we are able to see both density and distance at the same time.

This graph is also from the same report of Work Bank. This is a distorted world map (Figure 20). I see it as a creative version of heat map. The location of all the countries maintain the same like they are in a normal world map, and the countries in the same region are identified in the same color. What is special about this graph is that it adjusts countries’ size to show their global GDP proportion. For example, the United States and Canada both belong to North America,(marked in blue). Because the United States’ GDP is 13 times that of Canada [29], this is reflected in their size on this map (the United States is 13 times bigger than Canada). Instead of identifying the GDP value of the country with color, the country adjusts its size, which makes the outcome more remarkable visually. Compare to the normal world map which users may know it by heart already, they can be more aware of the differences regarding GDP in this graph.

[16] Isabel Meirelles. “Design for Information”. Rockport Publishers. 2003. [27] Wu Xiaobo. “Agitation for Three Decades: Chinese Enterprises 1978-2008”. 2008. [28] World Bank. “World Development Report 2009: Reshaping Economic Geography. English PDF, Geography in motion: The Report at a Glance–Density, Distance, and Division xix.”. 2009. https://openknowledge.worldbank.org/handle/10986/5991 [29] World Bank. “GDP (Dollar)”. 2005. https://data.worldbank.org.cn/indicator/NY.GDP.MKTP.CD?end=2017&locations=US&start=2005&view=chart&year_low_desc=true

3.3 Economic Data Interaction 3.3.1 Visualization controls Ben Schneiderman claims that users come with certain tasks upon information visualization, where they try to solve tasks through interaction, (ie. change parameters in the chain of these transformations) [2]. Visualization control is the control panel for users to post such commands. It can be integrated to the image itself, or separated from the image, as a new layer or bar beside the image [2].

Mostly, designers tend to use the image itself as visualization control. In this way, they would remove the menu bar and the display area for the graph will be larger and clearer. Thus, when the user manipulates the graph directly, he would not be distracted by other things. Sometimes users would even have an immersive experience in the visualization. Visualization controls designed in this way work especially well for gaming. However, due to the complexity of economic data and analytical nature, a fix menu bar is quite necessary.

When thinking about what is the best solution of visualization controls in the visualization of economic data, it could be like the way we carry tools in the video games. For example, in “Stone Age”, users can both store the tools at home or in a backpack. The tools in the backpack are commonly used, like the functions integrated into the visualization. The tools at home are the less used ones, and they could be well categorized, like in the fixed menu bar. When the user goes out to fight in the Stone Age, he can only carry a backpack, and the items in it could be directly controlled by the digit buttons on the keyboard. He can quickly pick the right tool from the bag and hunt, without searching or filtering, which is very important for the success of hunting. When the user returns home, he needs to decide which tool to carry, throw away or buy a new one, he needs a large space to organize them. That’s why a well-designed menu bar is crucial.

[2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999.

3.3.2 Dynamic Queries Ben Schneiderman claims that:”A dynamic query involves the interactive control by a user of visual query parameters that generate a rapid (100 ms update), animated, visual display of database search results [30] .” The user can search the database by adjusting sliders, buttons or other visual controls, and the result will be presented immediately. Here are the principles of direct manipulation to the database environment [31]：

-Visual presentation of the query’s component -Visual presentation of results -Rapid, incremental, and reversible control of the query -Selection by pointing, not trying -Immediate and continuous feedback

Furthermore, I would like to point out certain advantages of dynamic queries for novices. First, through a dynamic query, users can articulate complex questions in simple language [32]. There is a learning process for novices to formulate queries in command language correctly [31]. This is also the phenomenon I found out in the usability test. Especially, when they were facing a large multi-dimensional database, the attributes are fragmented to them, they were not able to combine the pieces of thoughts into one in a short time. Dynamic queries provide a way for the novices to organize a long question under units of short questions, and then users can conduct a deeper search in the database step by step. This advantage has been applied to many rental websites for housing. Second, though dynamic queries, graphical results can be presented in context, can aid comprehension [31]. Third, the question formulated by a dynamic query can be taken as a command, which can be copied and pasted for the next search. In the Atlas of Economic Complexity, the users’ search through buttons and sliders will automatically be formulated as an interrogative sentence shown above the visualization (see Figure 22-a). This sentence can be used not just as a title, but also a functional command. Users can directly edit the command, ie. change the time or other parameters to search in the database (see Figure 22-b). In this way, users could focus more on the visualization and their own question without distracting themselves in the control area. Although this feature is currently not mature, I believe that it has great potential to improve direct manipulation in the visualization.

[30] Ben Shneiderman, Dynamic Queries for Visual Information Seeking, IEEE Software (Volume 11, Issue: 6, Nov. 1994. [31] B. Shneiderman, “Direct Manipulation: A Step Beyond Programming Languages,” Computer, Aug. 1983, pp. 57-69 [32] M. Egenhofer, “Manipulation the Graphical Representation of Query Results in Geographic Information Systems,” Proc. IEEE Workshop on Visual Language, IEEE CS Press, Los Alamitos, Calif., 1990,pp. 119-124

3.3.3 Analytical interaction The term “analytical interaction” comes from Stephen Few’s book [1]. As mentioned before, this book is about how to design a software which can help the analyst to analyze data. Especially data which support business intelligence. He classified several interaction techniques which are especially useful into the following 13: -Comparing -Sorting -Adding variables -Filtering -Highlighting -Aggregating -Re-expressing -Re-visualizing -Zooming and panning -Aggregating -Re-scaling -Annotating -Bookmarking Some of these interaction techniques are also commonly used in other types of visualization, such as comparing and filtering. Here, I will focus on their functionality when being used in economic information visualization for the general public.

Comparing Comparing is the most commonly used one in analytical interaction. When we compare something in a dataset, we look for both similarities and differences between two or more things. In economic analysis, there are certain rules for comparing. Stephen Few has summarized them as ranking, part-to-whole, deviation, time-series and nominal [1] (see Figure 23-a & b). These rules can be regarded as shortcuts for the novices when they start to analyze in a database. They can be designed as five different comparing modes and users can easily switch from one to the other. In this way, users could find something more efficiently.

In addition, it is important to design a function that users could put whatever they want to compare on the screen. There would be a time when we want to view a single data set from different perspectives or we want to divide a data set into multiple parts. It is called multiple concurrent views [1]. In the visualization of economic data, trellises and crosstabs are commonly used. Multiple concurrent views could also be connected with all the analytical interaction, especially “brushing” (to be mentioned under “Highlighting” below).

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Sorting According to the different attributes in the database, we need different ways to arrange data. We can arrange them in alphabetical order, as shown in Figure 24-a, or by salary, as shown in Figure 24-b. Mostly, both arrangements can convey correct information without misunderstanding. However, it is necessary to provide the user a simple way to re-sort data based on various values, especially the values that are featured in the graph [1].

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Adding variables Adding variables is like adding new ideas in the process of analysis. For example, when we are thinking about the revenue per product, we may also want to know the profit per product. There are some creative ways to do it, like connecting employees performance with the weather condition. Stephen Few described the experience of adding variables as “looking for interesting attributes and examining them in various ways, which always leads to questions that we didn’t think to ask when we first began” [1]. When we design such technique, it is important to provide a convenient path for users to quickly find the appropriate variable. Secondly, users can freely delete the existing variable and add a new one. The operation should not be as convenient as possible.

Filtering The purpose of filtering is to remove unnecessary information which is distracting for the current task [1]. Filtering functions can be executed with controls, buttons, sliders or direct pointing by users. Filtering is an important technique especially for information seeking and multiple concurrent views. The options in a filter should cover all the data in the database, including the data hasn’t yet been shown on the graph [1]. Secondly, once a filter has been applied, it is necessary to provide a visible reminder of the applied filters and an easy means to check the filtered data.

Highlight Highlighting allows us to see the data of our interest and the other data at the same time. This technique creates focus on a subset of data in the context of the whole. The brushing (or brushing and linking) technique is applied to highlight data simultaneously in several associated graphs (Figure 25). Stephen Few emphasized that no matter it is in a single chart or multiple charts, the non-highlighted data elements should be visible together with the highlighted data elements. The highlighted cannot represent the others in any way. It is important to preserve the integrity of the database [1].

Aggregating When we aggregate or disaggregate information, we did not change the amount of information, but change the perspective of our view. When we aggregate data, we view it at a higher level of summarization, overview is one of the example; when we disaggregate the information, we can view it at a lower level of detail [1]. Drilling is another type of aggregating. Drilling involves moving down levels of summarization (and also back up) along a defined hierarchical path [1]. For example, the top of the hierarchy are Americas, Europe, and the Asia Pacific, followed by smaller countries, followed by pacific regions in the country. The logic of drilling is much simpler, which maybe will be a better choice for the novices.

Re-expressing Re-expressing means changing the unit of measure [1]. For example, replace the actual number with a percentage, which is useful to aid comprehension for the novices.

Re-visualizing The activity pertains only to visual forms of analysis. It involves changing the visual representation in some fundamental way, such as switching from bar chart to line graph [1]. A graph should be constructed and reconstructed according to the users’ demand. It is very helpful for patterning-finding a data set with functionality of rapid and easy switch between different graphs. In addition, for the novices, software should offer guidance in choosing an appropriate type of graph.

Zooming and panning Zooming and panning both help users to have a closer look at a specific section of a graph [1]. There are some points need to consider when designing it for the novices. First, both zooming-in and -back should be operated through direct and easy selection. It should provide an overview for the entire graph, so that novice users always have a connection with the entire picture.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Re-scaling There are two types of scale. one is linear scale. In the linear scale, the quantitative scale places equal space between equal intervals of value. For example, 100, 200, 300. Another type of quantitative scale is a logarithmic scale. Along a log scale, each value is equal to the value of the previous interval multiplied by a base value. For example, 100, 1000, 10000 There is a classic example to illustrate the difference between linear scale and logarithmic scale. Figure 26-a & 26-b represent the same set of data. Figure 26-a uses a linear scale while Figure 26-b uses a logarithmic scale [1]. Although the growth rate of hardware and software products is both 1%, from Figure 26-a, the growth rate of hardware seems much better than software. This is due to the high price of the hardware product itself. Figure 26-a causes misunderstanding at the first glance. This function is more commonly used by experts. For the novices, it better starts from the linear scale because it is easier to understand.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Accessing details on demand A pop-up box (also called a tool tip) is a great solution, for showing supplemental, detailed information of a specific item on a visualization. Users can control its visibility by a click or mouse over [1].

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Annotating Annotating allows users to add notes on the data elements in the visualization, that would point to a specific item or in a specific location, ie. a dot on a scatterplot or a specific pattern on a line graph (Figure 27-a & b). When the position of the item changes, the notes would also change (Figure 27-c) [1].

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

Boomarking The steps of data analysis process normally do not follow a straight line. An analyst may turn right or turn left, go backward or forward. Therefore, when an analyst makes an interesting exploration, it is beneficial to store this particular perspective [1]. The so-called storage of a perspective, in addition to visualization, includes filtering, sorting and other analytical interaction. Bookmarking plays a special role in analyzing data. Like a bookmark, it allows users to find points recorded at any time during the analysis, without affecting the current status. Users can easily leave the bookmarked point and go back to the current view.

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009.

4.Analytical interaction framework design

The framework would be demonstrated as an information visualization in the second book, accompanied with a prototype of analytical interaction.

5. The Impact of Visualization of Economic Data for the General Public

4.1 Positive side The visualization of economic data is gradually coming into the lives of the general public. As mentioned at the beginning of this thesis, coffee shop owners analyze regional population distribution by age; job seekers analyze industry distribution in the world. Such practices will not only affect the decision people make in their lives but also have an impact on the economy.

The images I described here are stories happening step by step now. Great Britain’s Ordnance Survey is a good example. Its flagship product “OS MasterMap” launched in 2001 is “an intelligent geospatial database offering definitive, consistent, and maintained referencing to more than 460 million manmade and natural landscape features in Britain” [32]. It allows users to integrate external information, i.e. economic data. Any organization, company or individual is allowed to use the map in different aspects: urban planning, real estate development, environmental science, retail, and more. In addition to the data, OS also offers specific visualization tool for users to analyze the data on the map. OS OpenData products are estimated to deliver between a net £13 million to £28.5 million increase in GDP over its first five years. Although OS OpenData is not free for the general public (the charge is pretty high), it is, after all, a good kick-off for the popularization of economic data.

Another example comes from New York City. New York City Business Atlas is developed by the Mayor’s Office of Data Analytics (MODA). The aim of this visualization is to close the market research information gap between small and large businesses in New York City. Through this visualization, small business has access to data of high-quality in the region, for example, the value of taxable sales revenue, population distribution by age, and estimated business breakdown. These data are first analyzed by professional analysts, so that it is especially useful for neighborhood retail, such as restaurants, grocery stores, and dry cleaners. The visualization can be an important basis for them to make business decision. to carry, throw away or buy a new one, he needs a large space to organize them. That’s why a well-designed menu bar is crucial.

According to the above two examples, we can see that visualization of economic data can make economic data useful to the general public, not just for the large companies. Due to the closure of economic data, the complexity of collecting it and analyzing it, the general public used to be only the producer of data. The advantages of data are only possessed by a small group of people, which are the big companies and expertise. Thus, through the control of the data, they grew rapidly and developed

more means to collect data from the general public. The general public is still vulnerable in the cycle of the data economy. If this cycle would not be broken, the gap between the two sides will become larger and larger. The data turns out to be a sword for some people and a black box for others. The popularization of the visualization of economic data to the general public can close the gap. First, visualization of economic data provides a way for the general public to understand the economic data. It helps to alleviate the information gap between the two sides so that the market will likely become more transparent and fair. Second, visualization is also a good way to popularize economics knowledge. For example, Hans Rosling’s appealing moving bubble for data about income and life expectancy etc.. Therefore, the positive impact of visualization of economic data is to let more people share the benefits of data, narrow the gap between the giant and the dwarf in the data world, make the market more fair and transparent, and promote economic growth and prosperity.

[33] “Our history.” Ordnance Survey. https://www.ordnancesurvey.co.uk/about/overview/history.html

4.2 Negative side There are always good and bad sides in everything. Among all the possible impacts of the visualization of economic data, I have not yet found out nor imagined any negative impacts. Therefore, I try to assume some of the negative impacts without theoretical foundation. In my opinion, if more people would become masters of data, they might become more suspicious. They would be obsessed to examine every data source, and take each step of data analysis cautiously. I could picture the Finance Minister giving a speech on the stage and he announced: ”Based on XXX data, we came to this decision...” And then the audience (in the park or on the bus), expertly checking the data, and using the visualization tool to check if he is right or not, and whether there is any exaggeration or concealment. Everyone was commenting below the visualization and sharing their results of the analysis. There was also an area for voting the analysis by the public. Is it a picture to tell that the technology is making our life better? Isn’t it an interesting way to think?

6. Furture Direction and Conclusion

There are two factors which are important for the development of the visualization of economic data. First, the system enhances the readability of economic data, even for the general public who have not studied economics. In addition, the visualization encourages certain reasoning process through the interaction in the visualization, so that users can gain insights from it. The second factor is the open data. They are inseparable. If the visualization is not connected to a real-time database, not only its analytical power will be reduced, but also its credibility. Therefore, the readability of data, the analyticity of data, and the authenticity of data are significant for the development of the visualization of economic data.

The goal of this thesis is to create a framework that enhances the analytical interaction in the visualization of economic data, especially for the general public, so that they not just can read the data in the visualization, but they can also analyze the data. As a result, they can use their insights from the visualization to solve problems or make better decisions. In this way, the general public (the novices), and the small businesses can share the benefits of economic data. Through the results of the usability test. The outlook of the development of visualization of economic data is highly positive. I hope this thesis would aid designers who attempt to develop the visualization of economic data for the general public.

REFERENCE

[1] Stephen Few. “Now you see it”. Jonathan G. Koomey. 2009. [2] Ben Shneiderman, Stuart K. Card. “Readings in Information Visualization: Using Vision to Think (Interactive Technologies)”. Morgan Kaufmann. 1999. [3] Ware, C. “Information visualization: perception for design”. Morgan Kaufmann. 2012. [4] Donald A. Norman. “Things that make us smart: Defending human attributes in the age of machine”. Addison-Wesley Piblishing Company. 1993. p43 [5] Nelson Cowan. “Working memory capacity: classic edition”. Library of Congress Cataloging-in-Publication Data. 2016. [6] Miller GA. “The magical number seven plus or minus two: some limits on our capacity for processing information”. Psychological Review. 63 (2): 81-97, March 1956. [7] Nelson Cowan. “The magical number 4 in short-term memory: A reconsideration of mental storage capacity”. Cambridge University Press. February 2001. [8] Baddeley, A. D. “Working Memory”. Clarendon, Oxford. 1986. [9] Steven J. Luck & Edward K. Vogel. “The capacity of visual working memory for features and conjunctions”. Nature. 390, 279-281, 20 November 1997. [10] Jacques Bertin. “Graphics and Graphic Information Processing”. De Gruyter. 1981. [11] Edward R. Tufte. “The visual display of quantitative information”. Graphics Pr, 2nd edition. 2001. [12] F. J. Anscombe. “Graphs in Statistical Analysis”. The American Statistician, 27 (1): 17-21, February 1973. [13] Elmqvist, N., Fekete, J.-D., and Dragicevic, P. „Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation”. TVCG: Transactions on Visualization and Computer Graphics. 14(6): 1141–1148, Nov/Dec 2008. [14] Hans Rosling. “Trendalyzer”. 2007. https://www.gapminder.org/tools/#$chart-type=bubbles [15] Brian Johnson, Ben Shneiderman. “Tree-maps: a space-filling approach to visualization of hierarchical information structures”. Proceeding Visualization’91. IEEE. 1991 [16] Isabel Meirelles. “Design for Information”. Rockport Publishers. 2003 [17] Niklas Elmqvist, Andrew Vande Moere, Hans-Christian Jetter, Daniel Cernea, Harald Reiterer and Tj Jankun-Kelly. “Fluid interaction for information visualization”. Information Visualization, 10(4), October 2011. [18] Willan A. Pike, John Stasko, Remco Chang and Theresa A. O’Connell. “The Science of Interaction”. Information Visualization. 8(4): 263–274, 2009. [19] Stefaan Verhulst, Andrew Young. “The global impact of open data”. O'Reilly Media, Inc.. 2016 [20] The Atlas of Economic Complexity. About. http://atlas.cid.harvard.edu/about [21] Ricardo Hausmann, César A Hidalgo. “Atlas of Economic Complexity: Mapping Paths to Prosperity”. The MIT Press. 2014. [22] Miyake, A. ; Shah, P., eds.. “Models of working memory. Mechanisms of active maintenance and executive control”. Cambridge University Press. 1999. [23] Diamond A. “Executive functions”. Annu Rev Psychol. 64: 135-168. 2013 [24] Malenka RC, Nestler EJ, Hyman SE. “Chapter 13: Hligher Cognitive Function and Bahavioral Control”. In Sydor A, Brown RY. 2009 [25] McLean, R. S. & Gregg, L. W. “Effects of induced chunking on temporal aspects of serial recitation”. Journal of Experimental Psychology. 74:455-59. 1967. [26] John F. Rice. Ten rules for color coding. Information Display, 7(3): 12-14. 1991. [27] Wu Xiaobo. “Agitation for Three Decades: Chinese Enterprises 1978-2008”. 2008. [28] World Bank. “World Development Report 2009: Reshaping Economic Geography. English PDF, Geography in motion: The Report at a Glance–Density, Distance, and Division xix.”. 2009. https://openknowledge.worldbank.org/handle/10986/5991 [29] World Bank. “GDP (Dollar)”. 2005.

[31] B. Shneiderman, “Direct Manipulation: A Step Beyond Programming Languages,” Computer, Aug. 1983, pp. 57-69 [32] M. Egenhofer, “Manipulation the Graphical Representation of Query Results in Geographic Information Systems,” Proc. IEEE Workshop on Visual Language, IEEE CS Press, Los Alamitos, Calif., 1990,pp. 119-124 [33] “Our history.” Ordnance Survey. https://www.ordnancesurvey.co.uk/about/overview/history.html

Incom ist die Kommunikations-Plattform der Fachhochschule Potsdam

Incom ist die Kommunikations-Plattform der Fachhochschule Potsdam mehr erfahren

The Framework of Improving the Analytical Interaction in the Visualization of Economic Data for General Public

INTRODUCTION

1.Related Work

2. Methodology

3. Analysis

4.Analytical interaction framework design

5. The Impact of Visualization of Economic Data for the General Public

6. Furture Direction and Conclusion

REFERENCE

Ein Projekt von

Fachgruppe

Art des Projekts

Betreuer_in

Entstehungszeitraum