Thursday, May 9, 2013

A Data Mining Project (day 8)

How do I go about visualising these vast streams of varied (raw, derivative, summary, calculated) and complex real-time data?

It is a little counterintuitive but in theory - time-series data does not need to have time information included (because it is implied), and event data may sometimes just consist of time-stamps (because the event is implied). More generally however devices that produce event and/or time-series data are set up to provide as much information as possible and so end up looking a lot like each other. The distinction is important however, time-series is like the bread and butter of data analysis, and event data is the jam. A time-series shows how processes appear over time (temperature, pressure). Event data warns you when things of interest appear to happen (think of alarms, limits, state transitions, a choreographed sequence of technology performance).

Are there some exceptionally good ways to visualise time series and event data?
Take a weather station as an example of a system that can generate a broad range of time-series and event data.
If the system captures an outside temperature reading every 10' then the sequence of temperature readings captured [10.9, 11.1, 11.0, 11.0] can be associated with local weather station time as follows [10:45, 10:50, 11:05, 11:15] to constitute time-series data.
The occurrence of rainfall could be treated like an event as could some peak wind gust speed or barometer value.
Other elements of the 'whole system' may produce event data. The weather service on the computer controlling the weather station receiver records the time that a range of application behaviours occur; for example the time the WeatherStation service started, 10:37:04, the time an Application Error is generated 10:38:17, shutdown time etc.

Back to the question; What are some exceptionally good ways to visualise time series and event data?

In attempting to respond I feel it is important to identify what data visualisation tries to do. I think of it in terms of three domains: simulation-modelling-representation (usually focused on communicating state/status), data summaries (classic chart types that aggregate data), and what might be termed raw data (plots of data points, tabulated data, lists, node data). Most visualisation tools appear to include all three areas. I think it is useful to consider the temporal flow of a visualisation as a fourth domain. The temporal aspect of using a visualisation ties it all together in a way.

Another way of looking at approaches to data visualisation is to consider a kind of representational spectrum that starts from bare data, possibly even raw signal traces, that follows the various necessary 'moves' that transforms signal to codes, then numbers or values, summaries, aggregates, timeplots, graphs, right through to skeuomorphic realism or analogues displaying what the data is intended to represent.

This wind visualisation tool is a contender for the most elegant and beautiful presentation of wind direction & speed data. The tool was created by Fernanda Viegas and Martin Wattenberg of Google’s “Big Picture” research group. As they claim, this mode of presenting a wind map is as 'a personal art project' (see the article in Wired magazine; http://www.wired.com/geekdad/2012/04/google-wind/).

Screenshot from http://hint.fm/wind/index.html
This visualisation employs a minimalist visual vocabulary to depict the speed and direction of wind on a map of the US mainland. The visual vocabulary consists of a monochrome grey map combined with a particle animation of wind strength that also conveys idea that wind has small variations of strength and direction. The map works well at all levels of scale with relevant landmark centers appearing at each zoom level (although the animation density motif starts to fail at high levels of zoom).
The wind map is divided into two zones, the map and the keys. This representation employs two models (the geographical map and the wind animation) and one aggregation (the legend explaining the wind animation). The 2D cartographic representation with North top also informs the wind animation to suggest wind direction. This visualisation is visually striking but but is also relatively weak at conveying meaningful actionable information. What is the actual average wind direction or wind speed at some point? Hovering the pointer over a point prompts a small window displaying wind speed and lat/long for that point. What sources, sensors and devices produced the  data? How reliable is it? The same questions (and challenges) can be asked of our traditional weather maps.

The Siemens Stratos traffic management infrastructure system offers hybrid interfaces that combine, by juxtaposition, time-series and event data with spatial visualisation (www.siemens.co.uk).
Image source www.siemens.co.uk
The traffic system employs a geographical map and route markers. In this case the route marker is the key to the information panel below. The map in this view utilises colour and visual cues like icons to denote the location of primary and secondary routes, landmarks, land use, and instrumentation. The information panel below employs to different data aggregate panels, one charting a calculated average journey sampled minute by minute over the whole day, the other displaying a combination of summary values (e.g. current average speed) and empirical data (15 vehicles on the road segment) for the current time.

Energy monitoring software for organisations running via cloud services can track power usage via software agents installed on IoT and other computing devices. Typical consumer dashboards display time/value graphs and data summary widgets.

Designs should employ consistent graduated limited colour schemes in a consistent manner to identify and relate different elements in order to enhance comprehension. Use chart colour variation to highlight extremes. Use colour cues to indicate action prompts. Action prompts span those that require immediate response to more subtle communicative devices in order to subtly influence the longer term behaviour of consumers. Muted colour schemes invite attention and enquiry rather than demand an immediate response.

Device form factor limiting user interaction (i.e. tablet, smartphone etc.)
Consider a multi-mode control panel. Display multiple dashboards (central consoles) by selecting side button/tabs. Side button/tabs present on all screens act as navigation controls. The simplest version of this is the main-menu/drill-down-drill-up style. Consider using combinations of different side/tab selections offer overlay possibilities for exploring data using overlays/filters.
For example: time tabs along the bottom edge (daily, weekly, monthly, quarterly, annual), site/location tabs along the left side, report type tab along the top edge, and a drill down or edit panel on the right side. The central display renders the relevant summary views but also includes a relevant time-series graph. This mode makes good use of a limited spatial palette while also being amenable to touch interaction.

Machine utility activity footprints. 
Power Analytics bridges between simple summary style data that you'd expect to see on a tablet dashboard and detailed reports custom designed for an operations centre display-wall. The illustrations below from the Paladin tool highlight how they have linked ladder control models with the information design problem.
Power Analytics Paladin environment
The tool simulates a ladder logic design environment, coupled with typical window configuration screens, to generate simulation models. Data can be rendered using classic analogue controls and/or cell based report tables. The display elements visualise data sources connected to the model and can be combined with other display elements like webcams.

DPR have produced an impressive building dashboard for its new headquarters, a refurbished building in Phoenix Arizona. The dashboard has an arrangement optimised for web device interaction similar to LinkCycle's above. The central graph (histogram, trendlines, datapoint chart, roll-up summary etc) is configured from the LHS/RHS and bottom edge tabs; illustrations below (link).
DPR's Building Dashboard
Kongsberg's SiteCom suite suggest some interesting interfaces for displaying and analysing multi-various data sources.

Consider also the various images of the displays for time-data recording and interfaces for managing machine performance from Brüel & Kjær.



Thursday, May 2, 2013

Visits versus visit duration

Let's plot two metrics against each other in the Audience Overview:
Visits vs. Avg. Visit Duration (diagrams below).
Why look at the number of visits versus visit duration? Well this comparison might highlight specific styles of site engagement.  Have a look at the following report from a low traffic site...

Three distinct kinds of site involvement are evident:


  1. Involved readers, high engagement; a user who spends a long time reading through a small number of pages (circled in red).
  2. Skimming or surface engagement; a user who visits a few pages quickly but stays only a short time (circled in orange).
  3. Sticky or correlated readers; the longer the visit the more pages they view (circled in green).

Figuring out who these readers are is the tricky bit.
Note that you may not be able to make the same conclusions for a high traffic site however as different styles of reader behaviour will disappear into the averages.

Sharing 360° video?

So, you've got a 360 degree video file from your GoPro. What to do with it? Well, share it on YouTube. YouTube supports uploading and pl...