Our first meeting for 2015 was our best-attended event so far, possibly boosted by being part of Big Data Week. The theme was “Big Data Comes to Town”, and 86 people came along to hear three engaging presentations.
KPMG provided an excellent venue and excellent hospitality, for which we are most grateful.
Data Quality Challenges to Big Data: a Practical Insight
Steve Latham and Hugo van Hoogstraten, both of KPMG, highlighted that data quality underpins the value of big data. Without good quality control of base data, interpretations on the data are less likely to be meaningful. As Steve and Hugo showed, in the worst case it can be disastrous: their example was the fitting of an incorrect fuel gauge in an aircraft that led to a catastrophic crash.
While 92% of people believe that master data is extremely important to core business operations, 39% don’t trust that it is correct. Without that trust, it is difficult to make business decisions with confidence. Yet few organisations seem to have good data quality governance regimes.
One of the challenges is to convince senior management of the importance of effective data governance.
Analyse what? The future of Big Data Analytics
Mark Stevens, from new startup Skrydata, noted that data by itself doesn’t “inform”. Traditional analysis tools are designed to help the brain get to the Aha! moment: they tend to focus on visualisation, statistics, and forming then testing hypotheses. However, with large and diverse data there are many more possible permutations and so rather than aha we get Aargh!
The transition in approach is to let the data speak for itself. New tools and techniques can look for significant factors amongst the data without having to form hypotheses beforehand. Forming hypotheses tends to introduce our own biases, and rule-based interpretations tend to miss patterns we haven’t thought of.
Mark’s case studies showed improvements to business processes that resulted from applying these new techniques to big data sets. One case study involving credit card fraud was particularly interesting. Traditionally, the approach is to look for predefined patterns of activity. In this case they let the data itself disclose unusual patterns, and from this were able to detect fraud that had otherwise been undetected. Another example showed using tools to find discrepancies in the data without building preformed rules.
Mark’s challenge to us was not to focus on what we want the data to tell us, but rather to focus on what the question is we’re try to solve; let the data tell us what it can.
Big Data, the SKA architecture, and The Internet of Everything
Gary Hale, from CISCO, pointed out that Australia is a leader in uptake of cloud technologies; is a leader in adoption of new technology and innovation; and there are huge potential economic gains from process improvements enabled by using data more effectively. Yet our investment in R&D is one of the lowest in the OECD countries at less than 4%.
Gary noted that the Internet of Things is predicted to grow from 6 billion devices today to over 50 billion devices in the next five years. Most of these devices include sensors of some sort, so we are facing potential exponential growth in data available to be used.
A new way of thinking about big data architecture is based on the work CISCO has done with the Square Kilometer Array project. Partly this will see more processing power move closer to the sensors, which will give them capacity for localised decision-making, processing power to reduce the amount of information transmitted, and allow dynamic configuration. Gary calls this sensor and sensor switching network component of the architecture the “fog”, in contrast to the cloud which involves high capacity networking, large data storage, and super-computing. Additionally, there is an important third part: the collaboration cloud, which allows people scattered around the world to work together.