What comes first: AI or Data Strategy?
Understand why Data Strategy is needed and what comes first. Hint: It's not always data strategy.
If you’re here, you’ve definitely heard someone say that you need a data strategy to start working with AI. Let’s just face it - data never gets the glamor that it deserves. But today, let’s explore why having a data strategy is paramount, and dissect the components and variations of a robust data strategy. And also try to figure out how data and AI strategies can still evolve simultaneously as your business needs grow with time.
No Data Strategy? Read this.
Continuing from my previous post on AI Strategy, I’ll take the example of my favorite imaginary company - Techie McTech, a mid-sized software company.
They recently invested heavily in developing predictive analytics, customer personalization, and of course - generative AI, for their customers. But 3 months in, their newly hired data scientists are whiling away in the breakout area playing ping pong all day long. Why?
They realized the data required for analytics and training predictive models isn’t nearly enough, the data engineering teams in different departments first would have to come up with a consistent schema that maps well with the rest of the data, and their department heads are fighting over access controls that would need to be forgone if data scientists can have any chance at exploring and understanding what lies underneath the entire mess. Plus, nobody understands why the total sales growth shows differently in dashboards built by two different teams independently and which one is accurate.
Without a data strategy, Techie McTech faces several issues:
Data Silos: Data is isolated in different departments, preventing a holistic view.
Data Quality: Inconsistent and inaccurate data leads to unreliable AI models.
Duplication of Effort: Teams duplicate efforts, wasting resources and time.
Lack of Documentation: Data fields to use are hidden in the abyss of several redundant code files.
No Cataloging: Which tables to use for different information often depends on who you ask.
Compliance Risks: Lack of data governance leads to potential regulatory issues.
The AI projects finally did come back on track for them - but not before wasting several resources, time, causing months long headache for the leaders, severe attrition in some departments and disappointed customers and shareholders who had been promised the world by the end of the year.
Instead of fixing these reactively, what could they have done differently?
How Data Strategy Could Have Helped?
A well-defined data strategy could have significantly mitigated the challenges faced by Techie McTech. Here's how:
1. Unified Data View:
Data Catalog: A centralized inventory that organizes all data assets, enabling users to discover and understand available data across the organization.
Data Lake or Data Warehouse: A centralized repository would have provided a unified view of the company's data, eliminating silos and enabling cross-functional analysis.
2. Improved Data Quality:
Data Cleansing: Automated processes could have identified and corrected errors, inconsistencies, and missing values.
Data Validation: Rules and checks could have been implemented to ensure data integrity and accuracy.
Data Governance: Consistent data definitions, standards, and policies would have ensured data quality and consistency across the organization.
3. Enhanced Collaboration:
Data Catalog: A centralized catalog would have documented data assets, making them discoverable and reusable and reducing duplicated ‘data truths’
Data Marketplace: A platform could have facilitated the exchange of data and insights between teams, reducing duplication of effort.
4. Efficient AI Development:
Data Preparation: Standardized data pipelines and tools would have streamlined the process of preparing data for AI models.
Data Warehouse: A centralized warehouse would have enabled cross-functional analysis - crucial for cross department AI initiatives.
5. Reduced Time-to-Market:
Accelerated Insights: Access to high-quality, well-organized data would have enabled faster development and deployment of AI applications.
Data-Driven Decision Making: Informed decisions based on data would have led to more efficient resource allocation and project prioritization.
6. Easier Compliance:
Data Privacy: A data strategy could have included measures to protect sensitive data and comply with relevant regulations (e.g., GDPR, CCPA).
Data Security: Robust security controls to prevent unauthorized access and data breaches.
In essence, a data strategy would have provided Techie McTech with a solid foundation for leveraging its data assets. By addressing the underlying data challenges, the company could have realized the full potential of its AI initiatives and avoided costly mistakes.
But What Exactly is a Data Strategy?
A data strategy is a comprehensive plan that outlines how an organization will collect, manage, analyze, and use data to achieve its objectives. It provides a roadmap for leveraging data as a strategic asset. It’s generally a good idea to start in a top down manner - by first figuring out what you need all the data for.
Here are the key components:
1. Business Alignment:
Goal Alignment: Ensuring that the data strategy is designed to directly support and drive the organization’s overarching business goals.
Prioritization of Data Initiatives: Focusing on data projects that have the highest potential to impact key business outcomes.
KPI Definition: Identifying key performance indicators (KPIs) that align with business objectives to measure success and track progress.
2. Data Governance:
Policies and Procedures: Establishing rules and guidelines for data management, quality, security, and privacy.
Data Ownership: Assigning responsibility for data stewardship.
Data Access Controls: Implementing measures to restrict access to sensitive data.
3. Data Architecture:
Data Model: Defining the structure and relationships of data elements.
Data Integration: Combining data from different sources into a unified view.
Data Catalog: Cataloging data for easier discovery and reducing deplucation.
Data Warehouse or Data Lake: Implementing a centralized repository for storing and managing data.
4. Data Quality:
Data Cleansing: Identifying and correcting errors, inconsistencies, and missing values.
Data Validation: Implementing rules and checks to ensure data integrity.
Data Profiling: Analyzing data characteristics to understand its quality and suitability for analysis.
5. Data Analytics:
Tools and Technologies: Selecting appropriate tools for data analysis, visualization, and reporting.
Analytics Capabilities: Identifying the specific types of analytics needed (e.g., descriptive, predictive, prescriptive).
Data Science and Machine Learning: Incorporating advanced techniques for managing, cleaning and extracting insights from data.
6. Data Security and Privacy:
Security Measures: Implementing measures to protect data from unauthorized access, breaches, and unintentional data loss.
Privacy Compliance: Ensuring compliance with relevant regulations (e.g., GDPR, CCPA).
Data Retention and Deletion Policies: Establishing guidelines for data lifecycle management.
7. Data Monetization:
Data Products: Creating value-added products or services based on data.
Data Partnerships: Collaborating with other organizations to share or exchange data.
Data Marketplaces: Participating in platforms for buying and selling data.
8. Data Culture:
Data Literacy: Promoting awareness and understanding of data availability and standards within the organization.
Data-Driven Decision Making: Fostering a culture of using data to inform strategic decisions.
Data Collaboration: Encouraging cross-functional collaboration and data sharing.
So, What Comes First: Data or AI Strategy?
When it comes to determining whether a data strategy or AI strategy should come first, the answer isn't one-size-fits-all. It depends on the specific goals, existing infrastructure, and maturity level of the organization. Here’s a breakdown of different scenarios and how to approach them:
1. The Data-First Approach: Ideal for Data-Driven Organizations
Scenario:
Imagine an established company with a wealth of data already collected, perhaps even a legacy system in place. This data is stored across various departments but hasn't been fully utilized for advanced analytics or AI.
Best Approach:
In this scenario, a data-first strategy is critical. The company should focus on:
- Data Governance: Ensuring data quality, consistency, and security across the organization.
- Data Integration: Breaking down silos and creating a unified data architecture.
- Data Infrastructure: Upgrading or refining data storage and processing capabilities.
By putting data strategy first, the organization ensures that when it does introduce AI, the models will be trained on high-quality, consistent, and well-governed data, leading to more reliable outcomes. For companies like this, a strong data foundation is the key to unlock AI’s full potential.
Example:
A healthcare provider with years of patient records, diagnostic images, and treatment data might focus first on creating a data strategy to clean, standardize, and secure this information before embarking on AI projects like predictive diagnostics.
2. The AI-First Approach: Suitable for AI-Centric Startups
Scenario:
Consider a startup aiming to disrupt the market with a novel AI-powered solution. They may not have much historical data, but they have a clear vision of what they want to achieve with AI — whether it’s creating a cutting-edge recommendation engine, an AI-driven chatbot, or a predictive analytics platform.
Best Approach:
Here, an AI-first strategy may be more appropriate. The company should:
- Identify AI Objectives: Clearly define what the AI solution needs to accomplish.
- Leverage External Data: Since the company might not have extensive internal data, it could focus on acquiring relevant data sets or generating new data through user interactions or partnerships.
- Iterate Quickly: Use AI models to drive early-stage product development, with a focus on gathering and learning from data as the product evolves.
In this context, AI is the primary driver, and the data strategy is developed in parallel to support the AI models as they mature. The emphasis is on quick iteration, experimentation, and scaling AI capabilities, with data strategy evolving in tandem.
Example:
A fintech startup developing a credit scoring model might initially focus on AI, using publicly available financial data, third-party APIs, and customer input to train and refine their algorithms.
3. The Balanced Approach: For Organizations with Both Data and AI Ambitions
Scenario:
Consider a mid-sized company, like our Techie McTech, which has decent data assets and a vision to integrate AI across its operations. They recognize the importance of data but also understand that they need to “do AI” to have any chance at success at their imminent funding round.
Best Approach:
A balanced strategy works best here, where data and AI strategies are developed in parallel:
- Simultaneous Planning: Establish a foundational data strategy while identifying key AI opportunities.
- Iterative Development: As AI models are developed, the data strategy is refined to ensure that the necessary data is available, clean, and ready for AI consumption.
- Feedback Loop: Use insights from AI implementations to continually improve the data strategy, making it more robust over time.
This approach allows the company to be agile, leveraging AI capabilities while ensuring that the data foundation is strong enough to support long-term growth.
Example:
A retail company with years of sales data, customer behavior analytics, and supply chain information might simultaneously develop a data strategy to clean and structure this data while also working on AI models to optimize inventory management and personalize customer experiences.
4. Reactive Approach: When AI Reveals Data Gaps
Scenario:
In some cases, a company might start with AI projects only to realize mid-way that their data isn’t adequate or properly managed - just like what actually happened at Techie McTech. This often happens when the excitement of AI leads to a premature focus on model development without considering the data foundation.
Best Approach:
Here, the reactive approach is the only option left, where the data strategy is developed as a response to issues encountered during AI implementation:
- Identify Gaps: Recognize where data issues are hindering AI progress — whether it’s data quality, availability, or integration.
- Revamp Data Strategy: Implement a targeted data strategy focused on filling these gaps and ensuring that future AI projects have a solid data foundation.
- Integrate and Align: Ensure that the new data strategy is aligned with ongoing AI projects to minimize disruption and maximize efficiency.
This approach, while not ideal, can help salvage AI initiatives and prevent future issues by establishing a more robust data strategy moving forward.
Example:
A manufacturing company that hastily implements AI for predictive maintenance might realize that their operational data is inconsistent and incomplete. They would then need to pause and develop a data strategy to clean, standardize, and augment their data before AI can deliver accurate predictions.
So which one fits your bill?