In his 1798 book “An Essay on the Principle of Population,” Thomas Malthus predicted that population growth would outpace our food supply in time. While he couldn’t have known the impact that technology has had on agriculture and our global food supply, he was correct in assuming that there would be daunting challenges to sustainably feeding a growing planet.
Changes to the US healthcare system in recent years have been met with broad acceptance by some groups and vitriol by others. The changes have resulted in skyrocketing medical costs for some portions of the population, while providing others with new, much-needed access to healthcare services. Achieving ideal health outcomes for a large population is a complex and difficult challenge - one that may seem to be out of reach on the surface - but with emerging technologies, we see cause for hope.
Artificial intelligence and machine learning can improve outcomes and access for medical treatment, while reducing overhead, cost, and time to diagnosis.
Earlier this year, Professional Coin Grading Service rolled out PCGS Gold Shield™ Service utilizing new coin grading technology developed in partnership with Positronic. Coin grading is the process of determining the grade or condition of a coin in order to determine its value. PCGS is one of the top coin grading services in the United States, providing the standard for coin authenticity.
By Justin Hofer
The titans of technology believe that they have seen the future, and they have changed the course of their companies to follow it. Google and Facebook, for example, have made sweeping changes to each of their platforms in the name of better employing and gathering data. Algorithms backed by gargantuan stockpiles of data can act with an intelligence that matches or surpasses humans. Seemingly inconsequential information today can enable amazing advances tomorrow. These corporations seek to amass as much data as they can so that they can reap the benefits of their knowledge, and employ it to further improve and evolve their work. However, with great data comes great responsibility - including the proper storage and management of that data. In today’s article, we’re going to explore the best methods for managing large and complex data sets.
The Evolution of Data Storage
There has been a steady evolution of the technologies used in data storage:
Computer File: The first iteration was the humble computer file. All information contained within had to be read by hand.
Schema Databases: Then, various structured databases came about, using extensive rules to catalog, store, and read out data on demand. All data within was one of a predetermined set of attributes. This allowed for the records to be easily queried, sorted, and updated as needed.
Schemaless Databases: New advances then allowed for databases that did not require pre-definition of what would be recorded. These “unstructured” databases allowed for new information to be easily added to future records, without having to deal with the old records.
Throughout its history, data storage has been in a state of constant evolution, the new ways carrying some decisive advantage over the old. Each has made the data easier to access and utilize. Having a managed file system ensures that the information written to a hard drive will be able to be retrieved in the future, safe from being accidentally overwritten, with metadata that lets a computer know how to read it. With schema based databases, specific records can be easily searched for and changed as needed, without having to hunt down specific files, and then editing the files themselves. Schemaless databases opened this up further, allowing for data in one record to be completely unlike the data in another.
Old vs New
Traditional methods for storing data are highly structured, built around the information you know is important. What is being stored, manipulated, and retrieved is known before anything is entered. When that is the case, those methods work very well, but when those assumptions are taken away, things start to break down. Tangled databases of random tables and entries of unrecorded origin and content form a tumorous mass that even the database administrator might not be able to fully understand.
Enter the Data Lake, a schema built around the assumption that users do not know what will be important in the future, or what else might be added to it. Properly designed and maintained, it can hold varieties of different information, from traditionally structured database tables, more open ended JSON data, or even things like images, video, recordings, and emails. All of these files are stored in a way that can be easily searched in the future, so that esoteric algorithms like Neural Networks and Fuzzy Clustering can put that data to work.
The Data Lake is not a perfect solution. With improper design or lazy execution, it could end up just as messy as that large, messy mass of a database mentioned earlier (a so called “Data Swamp”). However, when done properly, it can serve as a stepping stone to becoming one of the machine learning and artificial intelligence backed powerhouses that will dominate the rapidly changing corporate world.
The Data Warehouse
The original data management method was what is referred to as the “Data Warehouse” model. Imagine for a moment a warehouse. Nice neatly spaced rows of pallets, organized by type and size. Ask for a particular pallet, and the manager can point you to a specific aisle, how far down you have to go, and what is inside. Trucks take new boxes into the warehouse, and deliver from the warehouse to customers. The warehouse holds pallets. It does not house livestock, act as a garage for cars, or a home for people. This is how the “Data Warehouse” model works. In this analogy, the information is the pallets, a database the warehouse manager, and users are its customers. It is designed to store very specific data, with unrelated info discarded, or not allowed inside the warehouse at all.
Let's imagine another different warehouse. Amongst the pallets, sheep wander the aisles, cars are parked where they fit, and makeshift homes are set up. If you ask for a specific pallet, the warehouse manager can point you to where it should be. Ask for a specific animal, and you will be told where it was last seen (a month ago). What about where a certain car is parked? You might be told it is usually parked in aisle 4, but that might not be right. The delivery of pallets is slowed down by all the obstructions from cars and sheep. This warehouse is a mess, and it resulted from being asked to do what it shouldn’t have. Too many different types of things are stored within this warehouse, and a warehouse is not the correct way to store some of them. This is what can happen when a “Data Warehouse” is miss-used. Random tables are strewn about, it is a pain to use, and it is hard to find what you need.
The two most important features for a data warehouse are that it must be:
Highly Structured – Everything has a place and purpose
Strict Entry – Only data that is relevant right now is allowed to be stored
The Data Lake
In Data Lake, these features are turned on their head. There is only a high-level structure, used to search for what data is useful right now. What data comes back could be of any type. Also, every source that can be used to feed in data is used. What is important for the future is unknown, so don’t risk missing out on data that could be important later on.
Imagine now a large lake. Fish swim inside the lake, boats float atop it, houses dot the shoreline. Instead of a warehouse manager, you have some old salt who knows everything there is to know about this lake. Need to find a certain house? They know the way. Where is the best spot to locate a certain type of fish? They know that too.
The key difference between Data Lake and Data Warehouse comes down to what they are used for. The Data Warehouse method of data storage is used when the user know everything that goes in and everything that goes out. It is designed to handle files & data quickly and efficiently. The Data Lake method is used for discovery. Analysts can use it to gleam new insights into your organization. Reports can analyze things going back years, without the foreknowledge that the data would be used later on. And of course, Machine Learning and Artificial Intelligence benefits from having access to the colossal sum of knowledge recorded within.
The importance of proper data storage to an organization cannot be understated. Consider the valuable things that can be done if the information is there. Internal company messages could be analyzed to figure out what departments are at risk of losing valuable employees. Emails and calls from your salespeople could be used to help determine which leads are most likely to buy product. Receipt data could be used to root out fraud. None of this can happen if the data doesn’t exist. With Data Lake, everything is recorded so that it can be used in the future. Although any given data-point might be not be useful, the more data you have, the more value it can give back.
If your organization is facing difficulties with unstructured data or putting your data to its best use, contact our data experts who specialize in developing solutions that keep your organization compliant and competitive. Contact us to explore solutions.
Platform Increases Talent Recruiter Effectiveness by 500%, reduces employee turnover by 40%
We have had the pleasure of working hand-in-hand with client John Cage Enterprises, a talent recruiting consultant firm based in the St. Louis area, to help their team become more effective, efficient, and accurate in their work. One of our most recent projects was to create an artificial intelligence powered talent recruitment platform.
At Positronic we take a four-step approach to implementing a new data science project. First, we go through our DISCOVER process where we visually navigate the available data looking for patterns and correlations and we apply advanced analytics and visualizations as a guide to building hypotheses for what sorts of predictive models may fall out of the data. Second, we test chosen hypothesis through TEST.
By Benjamin Vierk, CEO
Thinking about existing value chains in new ways requires a new way of thinking for the organization. It’s important to drive that change into the company culture so that it continues to drive your company forward and keep your you competitive long into the future. Lasting changes only are anchored by demonstrating results; both short and long term, and having company leadership communicate those results loudly and often. Having company leaders that communicate the importance of AI, both internally and externally, is critical to institutionalizing the new capabilities.
Consider the email that Microsoft CEO Satya Nadella sent to his executive team. It starts, “I know that this is going to ruin a number of your weekends”. In it were links to several AI resources and the missive: “if you want to be an exec at this company - you need to be competent at AI”. Jeffrey Snover, a Microsoft Technical Fellow, shared this with me over a recent lunch after I demanded more of Azure ML and interrogated him about Microsoft’s ML strategy, “How serious is Microsoft taking AI at the executive level?”. Satya’s message is clear; the winners tomorrow will have a strategy for Artificial Intelligence today.
Listen to the way that Jeff Bezos talked about AI in his most recent letter to shareholders : “big trends are not that hard to spot (they get talked and written about a lot), but they can be strangely hard for large organizations to embrace. We’re in the middle of an obvious one right now: machine learning and artificial intelligence. Over the past decades, computers have broadly automated tasks that programmers could describe with clear rules and algorithms. Modern machine learning techniques now allow us to do the same for tasks where describing the precise rules is much harder.” Bezos understands the impact that AI is having to his industry. If yours doesn’t, it’s time for a lunch and learn.
The biggest challenge you’ll face institutionalizing AI is that the demand for data science talent is significantly higher than the supply. Large companies are aggressively hiring and acquiring machine learning talent to build internal capabilities. According to a recent McKinsey report , 80-90% of all today’s AI talent are working at the largest technology companies in the world. In the current market medium-sized businesses are finding more success partnering with AI technology firms than trying to recruit and retain a force of their own. Small businesses with limited budgets will find their best ROI buying off-the-shelf software enhanced by AI.
Positronic specializes in making businesses perform better and faster through strategic implementation of technology including artificial intelligence. Is there an area of your business that you need to up your game to stay competitive? Let's talk about how we can help you change the game through artificial intelligence and machine learning technologies.