Garbage In, Garbage Out – The Pitfalls of Bad Data
What is it? In advance of our upcoming Data Science Bootcamp, we are pleased to announce an open evening to explore the importance of data quality. Talent Garden faculty member, Steph Locke – data scientist and Microsoft AI MVP talks Garbage In, Garbage Out – The pitfalls of bad data.
Steph takes on the importance of having good data especially for AI subsets like machine learning and deep learning, which gain greater capabilities over time by analyzing large sets of data, learning from them and making adjustments that make the applications more intelligent.
But what if the data is biased, corrupted or wrong? This workshop explores how to clean your data to safeguard data quality and ensure your insights are gold, not garbage.
Who is it for?
This workshop is ideal for anyone interested in learning more about the Data Science Bootcamp and how this course can accelerate their career. You might be an IT professional, a Finance Specialist, Business Analyst, Data Analyst, Researcher, Academics, Software Engineer or simply interested in learning more about the data science discipline.
Key Takeaways:
Horror stories to help convince others about data quality Insight into incentive structures for front-line staff re: quality data input An understanding of user experience and usability methods for improving data entry An awareness of post-collection cleaning and management techniques to improve quality
Faculty Bio
Steph Locke is one of only three individuals in the world to be recognised with both Microsoft’s Artificial Intelligence Most Valued Professional (MVP) award and their Data Platform MVP award. She is the founder of Locke Data, a UK-based data science consultancy, and Nightingale HQ, an online platform connecting data science and AI consultancies to businesses who need their expertise.
Article first published online published online
FAQs
Why data quality is the foundation of manufacturing AI?
The GIGO principle — Garbage In, Garbage Out — has been a truism in computing since the 1960s, but it is more relevant than ever in the age of AI. Machine learning models are trained on data. If the training data is wrong, incomplete, or biased, the model will be wrong, incomplete, or biased in ways that are often harder to detect than the data problems themselves.
For manufacturers deploying AI tools, data quality is the single most important prerequisite. A cutting optimisation model trained on inaccurate stock dimensions will produce cutting plans that do not match reality. A demand forecasting model trained on historical data that includes anomalous pandemic-period orders will make incorrect predictions for normal periods. A mill certificate reading model trained on clean, well-formatted certificates will struggle with the poor-quality scans that are common in real production environments.
What is the data quality challenge in metals manufacturing?
Manufacturing data has particular quality challenges that make GIGO especially relevant:
- Legacy data: Many manufacturers have years of historical data entered manually, with inconsistencies in units, product codes, and nomenclature that accumulated before anyone thought about using the data for analysis.
- Multi-source data: Production data, quality data, ERP data, and certificate data often live in different systems with different formats and data models. Integrating them creates new quality challenges.
- Scan quality: Mill certificates and production records that have been scanned — often from paper originals of variable quality — create extraction challenges that require robust AI models.
- Real-time data: Data captured in real time on the shop floor is often incomplete or approximated under production pressure, creating gaps that affect downstream analysis.