The Data Ecosystem in More Detail
In a previous post, I introduced the idea that data is an ecosystem of software tools that work together. In this article I want to explain in more detail how the ecosystem works, and introduce some of the commonly available tools.
The Three Phases of Data Analytics
In a previous post, I introduced the concept that data is an ecosystem of multiple software tools rather than a one-stop solution. In this follow-up I want to talk in more detail about the ecosystem in its entirety, and also identify some of the tools that I think fit into each phase or component.
From my past experience in developing operations analytics, I have settled on a data ecosystem that has three major components: 1) an input, 2) a repository and 3) a reporting component. When you scrutinize each of these aspects as separate and distinct, you can begin to ask: how did the phases originate? What are the optimal tools for each phase? And how do you knit it all together?
Identifying the Second Component of the Ecosystem
Here’s how my experience led to identifying the multi-part concept. At the outset of our data initiative, my team used our in-house ERP tools to see what kind of workload and revenue reports we could obtain. After a couple of years of working that way, the output from the ERP proved disappointing for our needs, and so we moved to using MS Excel to help with more responsive aggregation of exported data. This was a definite improvement due to its flexibility and ease of input. But the export and clean up was a tedious, time-consuming exercise each time we updated data.
Fortunately however, some new tools were coming on the market that were about to change everything, especially Power BI from Microsoft. Power BI was not only far more powerful than Excel, but it was a “clean once” software meaning that the data could be cleaned one time and then for subsequent updates, the cleanup was automated. This was a massive time-saver, so we quickly adopted it because it delivered much more power for aggregation, but especially reporting.
After successfully using Power BI for months, I began to understand that data management actually had at least two major components. One was the ERP repository system that all the departments in the firm were using as a data source: including operations, accounting, HR and marketing. Its main value was as a storage repository of data. But the second component, Power BI, helped us more clearly visualize all the data in the system. It was transformational in its ability to produce visual insights.
There’s a Third Phase?
Once we got the storage and reporting phases established however, we encountered the next hurdle - how to get information from the various partes into the system in the first place! The reality was that people were often simply too busy to convey the information they gathered from client meetings or phone calls, and it interfered with the timely reporting of important business processes. At this point we realized there is a third phase - one that precedes the ERP - the input phase. This realization launched us on a search for more effective modes of garnering input.
We again started with the tools we already had our disposal, this time MS Outlook. After some brainstorming, one of the team members discovered that you could make pre-addressed emails files (.oft files) that can be opened with a click of the link on a web page.
When clicked, it used a pre-populated email format, pre-addressed to the right recipients, with all the data points, categories and headings we needed. All the project manager had to do to send preliminary project data was click on the link in an intranet page, fill in the pieces of information we wanted, and hit send!
With this simple input mechanism, we had enough to launch a new data input initiative in the ERP. The results were significant. With this simple tool, we had created a new data input process into the ERP, and began to see significantly faster notification of new projects and opportunities.
It became clear therefore, that there are distinctly identifiable phases to the data analytics process, and it’s not uncommon to use wholly different software for each phase. Over the intervening years of course, new tools have emerged for each phase that were not available in 2017, and so the ecosystem now has even more enhanced options.
Identifying the Software Tools
So let's look at the actual applications that I believe (currently as of May 2024) are leading options for each phase of the data ecosystem. This list is ever evolving, and I will post in more detail about the various tools in the future.
ERP Options: Let's start in the middle phase - the ERP - as that is the heart of the system. Ultimately, the ERP choice is usually dictated by the type of business you’re running. Common options are Deltek Vantagepoint for A/E firms, NetSuite for general business, and Salesforce for sales-based organizations. But there are many more options out there such as Airtable, MS Dynamics, BQE, and even to some extent QuickBooks.
What’s important about the ERP software you choose is that it can handle project information, invoicing, timesheets, and personnel information. Another really important aspect is that it exports to a common data format, such as csv or Excel.
Reporting: The next evolution of the ecosystem is reporting. There are two dominant players right now, Power BI and Tableau. Each has their own advantages and most of my experience has been with Power BI, though I did experiment with Tableau at the time as I was evaluating Power BI. My assessment (which I believe remains current) is that Tableau produces beautiful visuals, but that Power BI has far more robust data structuring tools. Data structuring is actually one of the major tasks in data analytics, so for me it was a clear choice.
What’s important about the ERP software you choose is that it can handle project information, invoicing, timesheets, and personnel information. Another really important aspect is that it exports to a common data format, such as csv or Excel.
Most of the data we were getting from our ERP was what I called “dirty data”, full of useless headings and subtotals, and needing extensive cleaning and restructuring before we could start to get it into a clean tabular form. At the time that I was looking at Tableau, it's tools for restructuring data were not as advanced as Power BI, though since then there have been some improvements. Power BI also has some similarities to Excel formulas and so it can be easier to learn how to create custom fields.
But the most significant value of Power BI over typical ERP system reporting is its visualization capabilities, as well as its automatic aggregation. Adding a value to a chart is as simple as dragging that value onto the chart. What this means is when you add a metric to a screen it will show you the total for that metric, but when you then drag the category onto that same visual, it will subdivide the total into subtotals for each category. In that way it’s a lot like pivot tables in Excel. (probably a future post topic). The image below shows how highly visual data becomes in Power BI, and the image took about 20 seconds to produce, by simply switching the type of visual with one click.
Input: Finally, for input purposes, the most interesting tools for me right now would be Microsoft Lists and Smartsheet. Smartsheet in particular has developed into a strong option for certain tasks, especially scheduling tasks or phases. Both of these tools function equally well on a computer, smartphone or tablet , and are very simple. And both have a form input where the user can fill in fields and drop downs that make inputting data very simple. The data from these lists can then be integrated with ERP data by using Power BI or other reporting software.
Look for a future post about each of the software mentioned above. Each I believe has some features that distinguish it as the leader in one particular aspect of the ecosystem. But for now, I just wanted to illustrate the variety of different tools, and how they all relate to the larger ecosystem.
Conclusion
There is a plethora of software on the market for data analytics, and I haven't even mentioned half of the major players, just the ones that I think are most commonly used in the A/E space. I haven't made any mention of Python for data which is a very large and growing constituency. But I'm trying to make these articles relatively short (though not succeeding) so I might deal with some of the other options in future posts.
It’s important to realize that software changes constantly and so new capabilities will open up as new software becomes available. What will likely not change is the fact that any data ecosystem has multiple phases, and the real skill of a data analyst Is to figure out how to exchange data between the phases for optimal results.