We live in unprecedented times where, thanks to technology, any individual or organization can finally start taking advantage of data to improve its operation, through Data Science.
Like never before, today we have access to an enormous amount of data and information which is just waiting to be analyzed and interpreted by us, through a now mature and diffused set of methodical systems.
Not taking advantage of this opportunity is a big risk, as it will give a chance to competition to take a leap step in the race against anyone of us.
Data Proliferation & Super Computers – The Perfect Storm
Why now? For two reasons: data creation has exploded in the last 10-15 years, and we now have computing power that was unimaginable even 5 years ago.
Let us give you few examples to put things in perspective.
Talking about data creation, in 2010 at the event Techonomy, Eric Schmidt (CEO, Google) mentioned this astonishing fact: “since the dawn of civilization until 2003, there were 5 exabytes of information created; now [ndr: in 2010], the same amount is created every two days.” (https://youtu.be/UAcCIsrAq70).
Schmidt was not that much off. A report by IBM last year seemed to confirm these estimates for the year 2016 (https://www.iflscience.com/technology/how-much-data-does-the-world-generate-every-minute/). In 2018, we are already seeing estimates around 4-5 exabytes per day. To see it from another perspective, about 90% of all the world’s information has been created in the last two years alone. And that trend is only accelerating: some estimates foresee the data-creation trend to increase as much as 40 times by the years 2020-2022.
While it could be fun to continue sharing these whopping numbers and sensational trivia, perhaps a note should be made on why this is happening now. Most will immediately think, and rightly so, about the Internet and the explosions of social media and consumer-generated content. Certainly these played a big part, and we do indeed get plenty of amazing new content from Facebook’s post activity, YouTube’s uploaded videos and Instagram’s published pictures. But this is just the tip of the content-iceberg.
Today, though, data is created mostly automatically by billions of technology devices all over the globe (and beyond). Sensors on every smartphone, GPS on cars and trucks, CCTV cameras, Internet-of-Things devices, banks’ simulators of stock and market trends, and any electronic device you can think of, will generate (and share) a significant amount of information that can be used by us, or others.
And if on one side we literally lost control on what (and how much) data gets created in every second around the globe, on the other side, we can rely on an unprecedented computing capacity (and it’s not a human capacity; well, not a direct one at least). Our dear computers are taking the lead in collecting, organizing and processing data that we asked other machines to create in abundance.
While analysts and scientists are becoming bolder about admitting that the predictions by Moore (i.e., the so called “Moore’s Law”) might seem to be more anachronistic now, computers up to today did their part. Thanks to continued technical improvements both in hardware and software over the last 20-30 years, computers can still manage the exponential growth of data they receive by the second, and elaborate it in meaningful way. Furthermore, services such as Amazon Web Services, Google Cloud and others, are enabling anyone to access the computing power needed, at any time and for any purpose.
In short, the “Perfect Storm” is finally here, and we are navigating right towards its core. Do we risk losing control of our “boat” in this storm? If we don’t prepare fort it, sure we do. But, if you are reading this post, you are doing your part to learn how to navigate in this new weather.
What is Data Science?
The best way to understand what Data Science, is by considering it the intersection between mathematics (and its algorithms), computer science and the necessary expertise in the specific domain where it needs to be implemented.
Are you running operations in a factory? Then you must instruct computers (i.e., computer science) with specific algorithms (i.e., mathematics) on how to analyze production reports collected by the machines in order to balance the production lines (i.e., domains expertise).
Do you manage an advertising agency? Then you need to use proprietary or other analytics platforms (i.e., computer science) and determine what are the best KPI’s to track (i.e., mathematics) in order to monitor advertising effectiveness on your digital campaigns (i.e., domain expertise).
The process starts by collecting data (structured or unstructured) from many sources and in many forms, and being able to organize it in the form of knowledge, and even interpret it in the form of insight. If an organization is capable of mastering Data Science this way, they will have a significant competitive advantage in today’s age.
Now, this is not a new activity. Companies have been doing so since at least 30 or 40 years, with their Business Intelligence division (or equivalent units). What is changing the rules of the game nowadays is the amount and type of data available. This new phase of data creation and availability is called “Big Data”, and Big Data is best analyzed through Data Science.
There would be much to say about what “Big Data” really is, how we detect, identify, organize and process it, but we will leave these topics for a new post. Here, we want to focus on the right steps to get started taking advantage of this ‘hot’ field if studies.
How Companies Can Take Advantage of Data Science
OPERATIONS & CUSTOMERS
If you did not start in bringing your organization up to speed on this field and related management approach, it’s certainly time to catch up. The applications and benefits are practically unlimited.
Nowadays Big Science applications are in every imaginable field of operations of any organization, in any market or business. Learning how to collect data that matters and knowing how to interpret it, can help you in one of two key ways: boosting your business and increasing your margins.
You can boost your business by ways of making your business development activities more effective: Measure conversion rates, effectiveness of marketing campaigns, customer satisfaction, and so on.
But you can also increase margins, but knowing better which type of customer would pay for what. Furthermore, looking under the hood, learning how you can optimize your operations: Where can you cut costs? How can you increase production and yield? In which ways you can make your supply chain more efficient? And so on.
The applications of Data Science are literally unlimited. As Management thinker Peter Drucker always said: “If you can measure it, you can improve it!” and today, we measure almost everything we can think of.
‘DATA CULTURE’ FIRST
If you are serious about changing your organization so that it puts Data Science under the spot-light and gives it the importance it deserves, first you need to work on the culture of such organization. Change will not occur if people are not open to it and think it will be an improvement for all.
Internal culture on the topic of Data Science is of paramount importance: before you can do something with data, you need to understand what you are doing and why you do it.
It’s certainly an ongoing progress, and every new read or video helps. Top education institutions (including Stanford, Harvard and many others all across the globe) now offer full time degrees on it. If you do not have time to go back to full-time school, then you should look for other learning opportunities. Certainly the web can help a lot. Below, in the reference section, we are happy to share some resources’ URLs that we fould extremely useful to get started.
The second step is certainly the one of getting the resources to enable this cultural change within your organization. The resources you need (and which not necessarily need to be owned/controlled by your company) are people, tools (hardware and software) and, of course, data, the right one and lots of it.
RESOURCES: PEOPLE, DATA & TOOLS
Either you, or someone for you in the team, will need to lead this internal cultural transformation. That person will be the first Data Science Advocate in the team. It all starts from this person. She or he does not need to be an expert in the field, but certainly needs to have a good understanding of the value that Data Science can bring to the business or operations. S/he will be the first “sponsor”. Furthermore, this person will lead the team of Data Science, once the organization gets to that.
Now, building a team dedicated to Data Science would require a post in itself. Data Science, when done well, requires a set of skills that are difficult (if not impossible) to find in a single individual. Certainly a pivotal figure is the Data Scientist, considered one of the “sexiest” job figures today (see article ref. below).
In this instructive image you find the main skills that should be found in a Data Scientist. Not an easy profile to look for, uh?
But, even in the lucky event of finding the perfect job candidate, not even a Data Scientist alone would not do the “magic” needed in your organization. You will probably need software engineers, platform experts (the tools we will talk about in just a couple of paragraphs), system admins, mathematicians and statisticians (who will focus on the algorithms) and finally business intelligence experts who will understand how to interpret the results and what actions to suggest management to extract benefit from these new practices.
But let’s not panic yet. First steps first. As long as there is a conscious decision by an organization to invest in Data Science practices, the ball will start rolling. It can then be decided to build immediately an internal team with all needed resources, or work with other companies or individuals.
Once the people are (or at least the key-person is) in place, it will be time to evaluate what data the organization can access, and what value could potentially be extracted from it. This is obviously a key resource to have. As for the others, it is not necessary to own this data, even though we should consider that it could be of strategic importance and perhaps even a killer competitive advantage. Surely, in the initial phases, the value that you imagine could be extracted from the data available will still be supposed, but it will be a starting point. The actual analysis of the data based on the formulated hypotheses will get the process started. From that moment on, it will be continuous iterations and interpretations.
Finally, the third key resource to get the work done is getting your hands on a set of tools that can process the data, and hence perfect the algorithms you will decide to use to analyze it. You’ll need anything from storage drives to powerful CPU’s, and if you are a small/medium enterprise fortunately you can find most of these on the cloud, as Saas (Software-as-a-Service) or PaaS (Platform-as-a-Service) solutions. You will need specific softwares and tools (e.g., Hadoop, AWS, Cloudera, MongoDB, Google BigQuery, Hive, Spark. Etc.) and someone to master them.
Always remember though: these are just tools! So do not even consider getting your hands on any of them (which can be quite expensive, by the way) if you have not already decided which team will work on them and which type of (available) data will be processed.
In short: You need people (the right ones!), data and tools to get started in this realm. Then you need to get operations started by always coordinating and balancing these 3 types of resources. If you do not tick these boxes, it will be difficult to have a valuable Data Science operation within your organization.
OUTSOURCING OR BUILD WITHIN THE ORGANIZATION
The question if to outsource or build within the organization comes as a natural offspring, given the multiple resources needed. Unfortunately, there is no one-size-fits-all solution. Some companies will invest immediately to have all within, while other enterprises will opt for less investment-heavy approaches, which might be less risky in the short term, but perhaps more dangerous strategically.
Depending on your company strategy, you might want to outsource some parts of the process. Often, organizations tend to outsource the collection, storage and processing of data. Less often, and we can understand the strategic reasons behind this decision, the interpretation of the data-analysis results is done within the organization. Through time, as the organization gains know-how in the field of Data Science, the number of tasks increases internally, outsourcing the most mechanical and less strategic ones.
Our team at Exa Futures is passionate about these topics. If you want to reach out to us for a conversation or to know more about what we do, we will be happy to meet you. Contact us at info [at] exafutures.com and we will get back to you promptly.
Please let us know in the comments if you have any questions or comments on this article, or if would like more information about a specific topic. And if you find this topic interesting, share with someone who would appreciate.
- Data Science (Wikipedia): https://en.wikipedia.org/wiki/Data_science
- 10 Key Marketing Trends in 2017 (IBM): https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WRL12345USEN
- How Much Data does the World Generate Every Minute (IFLScience): https://www.iflscience.com/technology/how-much-data-does-the-world-generate-every-minute/
- Data Scientist – The Sexiest Job of the 21st Century (Harvard Business Review): https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
- The 9 Best Free Online Big Data and Data Science Courses: https://www.forbes.com/sites/bernardmarr/2017/06/06/the-9-best-free-online-big-data-and-data-science-courses
- Erik Schmidt at Techonomy 2010: https://youtu.be/UAcCIsrAq70
- Re-interpreting Moore’s Law (Electrical Engineering Journal): https://www.eejournal.com/article/re-interpreting-moores-law/