June 30, 2016
Yaniv Mor, the CEO of Xplenty, is witnessing a troubling trend, spawned in part by the rise of connected devices and incredibly affordable storage options. He calls it “data hoarding.”
“This is the new plague spreading among data-driven companies in the last three years or so,” he said. “Storage prices—especially for cloud storage—have come down considerably. So these companies think they need to collect every possible data point they can get their hands on. Their view is, they can extract useful insights from the data whenever they need to. The issue is, they now have lots of data, and found it very difficult to extract insights from it.”
On Amazon Web Services, cloud storage pricing begins at $0.03 per GB per month. At Google Cloud Storage, pricing begins at $0.026 per GB per month. Prices are lower for larger storage capacity. With these rates, a health-monitoring device maker might consider storing all the users’ heart rates, daily calorie counts, meals consumed, and jogging distances for future analysis. Similarly, a smart-home device maker might collect residents’ temperature preferences, hourly energy usages, and peak and low times for scrutiny.
Mor’s company, Xplenty, specializes in integrating, processing, and preparing data for analytics on the cloud—in “a coding and jargon-free environment.” So data greedy behaviors could offer him and his colleagues a lot more business opportunities. Yet, Mor’s suggestion to his clients is, “Collect only the data that’s going to be meaningful. Don’t just collect everything because you can. Put some thoughts into deciding what you should collect.”
That, however, is the crux of the problem. Many businesses don’t know what they should collect, because that requires some long-term view of where the business is going. Perhaps more important, they don’t know what they can discard safely without the risk of subsequent regret. (You might discover that the one failure trigger you chose to exclude in the data gathering is essential in predictive maintenance when your company decides to explore post-sales services as a new revenue stream.)
Mor points to an emerging practice among forward-thinking businesses—a dual-database approach. He said, “The first environment is a data lake, a repository. You might do some gentle pre-processing to make it relatively easy to sift through the data later, but not too much. You leave that data lake for really sophisticated technologists and data scientists. They’re the ones who can identify the trends others won’t find or see with normal data-visualization tools. The other is a more traditional data warehouse with well-defined schemas. You’ll only push up a subset of the data that drives your business reporting and business intelligence tools. That’s what your chief marketing officer and chief operating office will look at to make day-to-day decisions.”
Mor describes Xplenty as “a middleware connecting data sources and data destinations.” He added, “We provide the users with a graphical interface that makes it easy to specify the data fields, entities, and tables they’d like to move to the destination. We give users the building blocks needed to build their data pipeline.” Since Xplenty is cloud-based, it works well with Amazon, Google, and other cloud-hosted storage systems.
Using AI-like algorithms and machine learning to sort and analyze large data sets—an idea that NVIDIA, Intel, and IBM are proposing—is not out of the question, Mor said. “AI, machine learning, deep learning—these types of algorithms have proven to be useful in sifting through large data sets. But I think it will take a few more years before we get algorithms that can provide useful insights in an artificial manner.”
In the era of costly data center hardware (for storage) and computing resources (for data analysis), businesses were highly selective about the type of data they gathered and kept; in the current era of almost limitless cloud storage, businesses become less disciplined about how or why they collect certain data. But the prudent approach is somewhere in between.
The unscripted discussion will cover:
- The current data management systems’ preparedness—or unpreparedness—to cope with IoT’s Big Data challenges;
- The cutting-edge AI-like analytics tools available to harness useful insights from large data sets;
- Using field data and real-time data from the devices for product design and simulation.
Register for the webcast here.