Sipping at the firehose...

Sipping at the firehose…

We recently completed a short project for a part of Oxford University’s Radcliffe Department of Medicine, to help them with their analysis of attitudes to cancer medicine. The work involved identifying technology assessments on various cancer treatments made by NICE, the National Institute for Health and Care Excellence, extracting the relevant information and summarising it conveniently in a document that the researcher could easily browse through and use to build up a coherent picture.

The purpose of this post is not to go into the details of the work itself – although the technical aspects of the scraping are the subject of a subsequent post for those that are interested. Instead, it is to emphasise how simple the task was and to consider the implications for smaller businesses. Data is all around us – both within and outside the walls of the company – and although I believe that some of the more extreme claims for the ‘Big Data Revolution’ are somewhat fanciful for all but the biggest companies or smartest start-ups, we at Golden Orb do think that all businesses can benefit from the structured capture and analysis of the data that is available to them. In doing so, they can take advantage of many of the tools and techniques that have been built up to deal with Big Data. This was the subject of our 2015 talk on Big Data for the smaller business at the Technology for Marketing and Advertising show, in which we argued that with so many free and low cost tools and data sources available, even small and medium-sized enterprises (SMEs) are able to take a sip at the Big Data firehose without it costing an arm and a leg.

A good place to start is gathering information on customers, competitors and market trends from what is freely available on public websites. Tools such as Python make it almost embarrassingly easy to visit large numbers of web pages and pull the data into a database, from where it can easily be viewed, summarised and acted upon. Other sites, such as Google and Twitter, offer formal APIs (published ways of extracting data other than scraping it) to make it even easier to extract data. Remember that whilst these APIs will typically impose sensible usage restrictions, for bespoke scraping, it is essential to observe some common sense rules of etiquette. Launching thousands of simultaneous requests on a website at all hours of day and night is a sure way of being identified as a nuisance and having your traffic blocked.

Whilst our own research into somewhat abstruse ideas like sentiment analysis suggests that the techniques still require some work, there are many practices that can help smaller businesses to learn more about their customers and competitive environment without entering the realms of advanced computing and mathematics. Collecting regular pricing data and information on promotions from key competitors can provide early warning of potential threats, providing more time to put together a response. Monitoring what consumers are searching for on Google (colours, styles, brands etc.) can help to identify trends and fashions early, which can help canny businesses to steal a march on their competitors.

At a small scale, this is easily manageable by SMEs that have the desire and management support to become more data-driven. However, the data volumes collected in this way can quickly become overwhelming, even if they don’t reach the levels that are usually associated with the term Big Data. Therefore, it is important to think through exactly what it is that you hope to get out of the data. That drives how it should be presented and also can lead to automated exception reporting that can immediately spot outliers – a price drop of more than 10% by a competitor, for example.

The key message, as it is with most of our data work, is that this is not primarily about spending a lot of money on technology. It is about taking the time to think about what it is that you would most like to know, and how you would need it presented in order to take advantage of it. The technology is not the answer, it is simply a tool. In fact, a simple data-gathering and analysis strategy can typically be put together and implemented in a matter of a few weeks. See, for example, our case study on work for Epson which involved gathering data from their online marketplace, or our forthcoming post on micro-IT.