Data and DevOps Digest, episode 5

Welcome to Data and DevOps Digest, brought to you by cloud consultancy, Vivanti. A news and analysis podcast, we cover the trends, thought leadership and announcements happening in today’s Data, DataOps and DevOps space.

Topics span from data analytics and reporting, data science, machine learning, artificial intelligence and digital customer experience, through to DataOps, App Replatforming, CI / CD and more.

Episode five looks at six areas that went through the industry news cycle over the past fortnight:

1. Why are chess-playing robots breaking seven-year-olds fingers?

2. Could Machine Learning fuel a reproducibility crisis in science?

3. The death of the Data Scientist?

4. A 7-point framework for assessing and improving your company’s analytics maturity

5. Cloud trends: Is Machine Learning moving back to the data center for good?

6. Is AI environmentally unsustainable?

Finger breakin’ robots

Lahlan James 0:08
Alright, hello and welcome to Data and DevOps Digest – a news and analysis podcast about the latest data-driven innovations; the trends and technology impacting the people who develop them.

I’m your host, Lachlan James. I’m joined by Vivanti Principal Consultant and all-round technologist, James Hunt

James: I’m happier to see you than a 7-year-old getting his finger broken by an AI-powered robot at a chess tournament. So how are you!?

James Hunt 0:33
Did you just call me an all round technologist? Because I have been hitting the gym…

Lachlan James 0:38
I’ve noticed as well, I can tell.

James Hunt 0:41
I don’t know how I’m gonna top your opener there, Lachlan. Let’s just say I’m more pleased with myself than a chess playing robot.

Lachlan James 0:49
Fair enough! And If you’ve missed what we’re talking about, firstly, how? Secondly, Google it. But all this caused a bit of a kerfuffle last week, with some pundits – such as Maria O’Sulivan, writing for theconversation.com – calling for stronger regulation of artificial intelligence.

James: In a few short sentences, is this an overaction? And how would you regulate against such accidents occuring in the future? Personally, I just think the kid should have waited his turn.

chess-playing robot breaking the finger of a seven-year-old child

James Hunt 1:18
I mean, clearly we need to adopt Asimov’s laws for robotics, but without the Lionel Hutz re-punctuation. That’s “Do No Harm”; not “Do? No! Harm!”

Lionel Hutz
Seriously though, my question is this: Why did the robot need to grip the chess pieces with such force in the first place? My understanding is that the arm misinterpreted the too-fast move; and identified the child’s finger as a pawn or a bishop or something. If a human opponent did that it would be weird, but I doubt they’d grab your finger with a crushing force. Am I missing something here?

As far as robotics is concerned, I think that’s where we’re headed regulation-wise: expectation of the unexpected, and mitigation of the fallout. That’s going to get messy quick, and there’s going to be more high-profile snafus like this one before we get it right.

Lachlan James 2:16
Yeah, I think that’s about where it sits, right? I mean, it’s not malicious; as much as some people would like to make it out that it’s got its own brain and has gone rogue.

Could machine learning fuel a reproducibility crisis in science?

Lachlan James 2:25
Speaking of machine’s causing damage, Elizabeth Gibney penned an interesting piece for nature.com, in which she suggests that ‘Data leakage’ threatens the reliability of machine-learning’s use across scientific disciplines.

Could machine learning fuel a reproducibility crisis in science

Lachlan
Gibney summarizes research from New Jersey’s Princeton University, which asserts that a cavalier approach to machine learning technology is causing a reproducibility problem, where the predictive claims of certain ML models fall short of the promise, with other teams unable to replicate the results.

The article says that both accidentally producing errors, which damage a model’s reproducibility, and the ability to judge such errors, comes down to sufficiently deep domain knowledge. In the former instance, researchers are attempting to apply ML to problems without learning the intricacies of machine learning sufficiently. In the latter, identifying flawed models can be subjective and come down to extremely specialist knowledge of the field to which the machine learning is being applied.

The article touches on vital industries, arguing that “Over-optimism about the powers of machine-learning models could prove damaging when algorithms are applied in areas such as health and justice“.

In fact, Momin Malik from the Mayo Clinic is quoted as saying: “I’m somewhat surprised that there hasn’t been a crash in the legitimacy of machine learning already. But I think it could be coming very soon.”

James: Where do you stand on this? Is the Machine Learning industry being too bullish here? Are non-ML researchers rushing-in too fast and putting the cart before the horse? Or is some other animal-based metaphor at play here?

James Hunt 4:23
I mean, didn’t we already go through this in the Enlightenment? Isn’t the basis of the modern scientific method a backlash against the lack of independently verifiable quote-unquote science?

For my part, I really like Kapoor and Narayanan’s idea of “model info” being included in all papers that lean heavily on machine learning to produce predictive models. I think it gets at the biggest problem with reproducibility – the cost of assembling (and archiving) training data. If you’ll humor me Lachlan, think back to grade school science projects. Baking soda and vinegar. You can make a volcano with just baking soda and vinegar. Any baking soda; any vinegar will do. Obviously, ML training data sets take more effort to pull together than a trip down to the local grocer, but I like the approach.

Lachlan James 5:37
Yeah, absolutely. I think it’ll go a long way to trying to address that though, I think that’ll be a problem and challenge for a while to come.

The biggest process-oriented deficiency highlighted in this research is ‘data leakage’ – so, when information from the data set, upon which a model is trained, includes data that it’s later evaluated on.

James, this feels like a pretty basic principle to know and adhere to. So, does this speak further to the notion that ‘fools rush in’? Does the ML community need to do more to educate people around the rigors required to harness machine learning effectively – instead of just selling its virtues? Or does all this simply highlight that doing machine learning in the real world is harder than we’d like to admit?

James Hunt 6:23
I mean, I don’t know about casting academic researchers in Machine Learning as “fools”, but I do see a lot of what you might call amateur ML practitioners doing “a little bit of home machine learning” and then building tools and platforms off of that. And there definitely needs to be some more rigor there.

Don’t get me wrong, I love AI and ML. It’s neat stuff. But there’s got to be a balance on that optimism and futurist protopian view. We have to be able to trust the things ML and deep learning in particular build. And for that we have to get more rigorous and apply some more science methods to what we do.

Lachlan James 7:11
Yeah, absolutely. So having established as part of this conversation that domain knowledge is important to Machine Learning, another article caught my attention during the past two weeks – precisely because it does such a neat job of outlining why it’s important.

Published on KD nuggets, the idiocy of the article’s rhetorical-question-based-headline initially had me concerned. But, author and Data Scientist, Nate Rosidi, neatly breaks-down his write-up into bite-sized-chunks, outlining why domain knowledge is important at each step of the model training process:

Is Domain Knowledge Important for Machine Learning
Lachlan
From Data Pre-Processing, to selecting the right model in the first place, and adjusting the chosen model and its supporting architecture over time.

James: Whilst I highly recommend viewers step through the article for themselves, what was your major takeaway here?

James Hunt 8:04
Frankly, Lachlan, I was shocked – SHOCKED I TELL YOU – to find out that, and I quote “not every data point has the same value.”

It actually says that, and then goes on to talk about how picking the right data points and assigning them weights is important. So I think this is a clickbait-y article, designed to get people all engaged-out over the implication of the title’s questioning the place of domain knowledge in ML. Do you need it? Yeah.

If I took anything away from this it was the idea that domain knowledge can help in re-forming or re-shaping an algorithm based on what you know specifically about the domain. With NLP, he talks about how thinking like a linguist can help to really boost the veracity and utility of an NLP model through focus and attention. I thought that was a neat way of looking at the problem of how do you bring that domain expertise to bear in new and interesting ways?

Lachlan James 9:19
There you go. You found something nice to say…

James Hunt 9:21
Yeah, I did find something.

Lachlan James 9:23
Good work! You worked your way around the corner. But no, I agree. Clearly a super clickbait-y headline. However, I do think there were some nice ways of outline some of those points, even if some of the ways that he segwayed in there were fairly obvious to begin with. I think there were some neat little takeaways in the end – not going to change your life – but a good way to break it down nonetheless.

The death of the Data Scientist?

Lachlan James 9:47
So next-up, we consider the death of the Data Scientist. It’s a phrase the scientific purists have hated for years, so maybe we shouldn’t be too quick to mourn its apparent demise. But, what did raise my hackles was its suggested replacement – the Product Scientist.

This Medium article argues that Data Science has lost its luster because the world has moved on, saying that evolution of available tools and demands-of-industry have ‘alleviated today’s Data Scientists of the burden of understanding statistics, beyond hypothesis testing, in exchange for deep product and market knowledge’ – the previous domain of the Product Manager.

The Decline of Data Science

Lachlan
So James: Are we seeing a blended role emerge, where – and I quote – “you don’t need a Stats graduate degree — just hustle, product intuition and SQL”? Beneath the dramatization, is there a sane takeaway here?

James Hunt 11:02
There’s a joke older than you and I, from software engineering circles that goes like this: One USENET poster writes “Is Lisp dead?” To which the old Lisp hacker responds: “Lisp doesn’t look any deader than usual to me”.

The point of that joke – besides being a shameless plug for my favorite family of uber-powerful languages – is that lots of people are concerned with the decline of things, especially as new kids enter the fray and things get easier.

In the early days of cloud you had to really know things like virtualization. When Kubernetes hit the scene you really needed to “get” container tech to run it effectively. Data processing used to be gate-kept (is that a word?) by a highly trained priesthood of machine operators. Those conditions no longer hold, and I think you’d be hard-pressed to say that we have less cloud, or less containerization, or less data processing.

That doesn’t mean people don’t still care about those topics; just that they don’t have to. That means the people who are still doing the data science of the world actually want to – and I think that’s a good thing. I think it’s nice that data science is not a requirement. You don’t have to have the stats degree, but it definitely helps. Right? Just like you don’t have to have a CS degree to write JavaScript, but it definitely helps.

Lachlan James 12:51
Whatever the case, I do think successful data leaders in organizations need a clear method for categorizing and communicating the benefits, and intended outcomes of, the programs or projects they push.

To that end, Joel Shapiro – a Clinical Associate Professor of Data Analytics at Northwestern University’s Kellogg School of Management – wrote an article for Forbes that grabbed my attention.

How To Support Your Data Science Team

Lachlan
Shapiro talks to a four-point framework he’s developed, which he claims:

    • Helps data professionals shape and organize their thinking;
    • Facilitates better communication and collaboration with business leaders;
    • And develops a ‘shared data language’ that empowers business leaders, thereby elevating the value of data teams’ work.

Shapiro’s framework categorizes data projects in four ways, which he says helps data scientists to better describe what they’re doing and why they’re doing it. So there’s:

    • Planning: The use of advanced analytics to make predictions – such as forecasting revenue and CRM deal close rates to underpin accurate planning and budgeting
    • Selecting: The use of analytics to pinpoint and model the attributes of optimal groups for business purposes – such as identifying an ideal customer profile, and then trying to help sales and marketing acquire more of those ideal types of customer orgs
    • Targeting: The use of predictive analytics to forecast behavior and look to change that behavior – such as predicting the likelihood of customer retention rates, and then intervening with selected higher-risk individuals
    • Then there’s Ideating: Using data modeling to identify the factors that might cause better business outcomes – such as improving customer experience or boosting website conversion rates.

So James; as a technologist who has to frequently communicate your value and intent to business folks, do you find this framework useful? Or is this more fluff from a data advisor spruiking their short course? Would you add anything to this framework – or does this particular baby deserve to be thrown out with the bathwater?

James Hunt 15:21
Never say the word “ideate” to me again, Lachlan.

Lachlan James 15:25
It’s hideous. Oh, Lord, I know…

James Hunt 15:28
I hate that word more than synergy.

Look, I think it’s a fine framework for categorizing the type of data science work you’re doing, and coming to a common shared understanding, but it does require that the business side also agrees to the framework’s bounds.

I personally find that business people tend to think along different lines of demarcation than this framework, however. Specifically, when I talk to business folks they are interested in two things: value and risk. Everything else can be explained in terms of these two things, and data scientists would be well-served to be able to align their communications to these two vectors.

Each of these four buckets can be recast in terms of risk and value quite effortlessly:

    • Planning is the reduction of risk by removing unknowns (or at least making them known)
    • Selecting is about identifying value so that it can be risk-managed (keep the high-paying customers happy) or outright increased – if you give a top shopper 20% off, they spend more.
    • Targeting is all about risk. Your example was finding problems and intervening. That’s risk reduction.
    • Finally, let’s call it “Idea Generation”. What’s it all about? Finding new value, or finding new ways to handle risk.

How Well Does Your Company Use Analytics?

Lachlan James 17:35
I think that’s a pretty good breakdown. I do think that whole concept, from a business-centered perspective, can effectively be broken down across those two ideas, those two vectors of value and risk. That does make a lot of sense.

So, on the subject of frameworks, The Harvard Business Review published one! The authors state that it’s designed to help business and data leaders assess their current analytics capabilities and invest in strengthening them.

How Well Does Your Company Use Analytics
Lachlan
The framework was spurred by stats like this from a 2021 New Vantage study, which found just 24% of executives believe their companies are data-driven.

The assessment framework comprises 7 dimensions on a likert scale you can see on the screen now. It covers: Culture; Leadership commitment; Operations and structure; Skills and competencies; Analytics strategy alignment; Proactive market orientation; and employee empowerment.

James: Do you think this is a useful tool that can help organizations assess their current data literacy, and, hopefully improve how they use data for decision-making? Is there anything you’d change about it?

James Hunt 18:53
Well, halfway through this article I had forgotten two things. One, it was an article about data-driven-ness, not DevOps-iness, and two, it wasn’t 2013.

Seriously, this is the exact same conversation we have about digital transformation on the software engineering and delivery side of the house. The culture component has to be there or people won’t do the thing you want them to do; be that data-driven ops or DevOps driven delivery. Culture without leadership is doomed to fail as a grassroots-only campaign. As soon as leadership needs something counter to the direction of desired change, you’re dead in the water.

Support from all over the organization can do nothing about the skills and abilities of the practitioning. I personally want to be a really good guitarist, and my wife supports me in that, but I lack both the raw talent and the dedication to practice. The last three components are mostly a remix of those three.

Lachlan James 20:46
Fair enough. So the authors move on to offer what they term a ‘playbook’ for improving company performance on each of these seven dimensions.

James: Whilst I think listeners should pick apart the detail here individually, were there any nuggets of advice that stood out to you as particularly important? Personally, having worked in the analytics industry, the third tip under ‘honing skills and competencies’ – which suggests organizations should “create and nurture career paths that enable non-technical employees to embrace data and leverage its value” – rings true to me. What caught your attention?

James Hunt 21:25
I didn’t really care much for their advice, and I’ll tell you why. All of it was passive persuasion. For example, they say things like – and I quote – “Help [employees] feel empowered by showing them how analytics fit it into their daily activities”.

This is awful. You know who you have to show benefit to? Children. Eat your vegetables to grow big and strong. Clean your room so you won’t lose your toys.

We’re all adults here. I think better advice would be something like “Make sure that your data-driven efforts are actually in-line with the day-to-day responsibilities of your employees.” People love using new tools that actually make their jobs easier / better / faster / etc. So you shouldn’t have to demonstrate the benefit. The benefit should be there, they should be able to see it because the tool is self-evident as to how that’s going to happen.

Lachlan James 22:24
James Hunt, ladies and gentlemen: Getting more cantankerous by the day.

James Hunt 22:30
What’s the name of the podcast again? Old man yells at cloud?

Lachlan James 22:35
Exactly, right? Cantankerous old men talk about stuff. But no, I think that is a pretty fair comment.

I do agree that you actually need better drivers than that. So I think that’s pretty fair. I think the seven dimensions are an interesting way to go and work out where you are in that analytics journey. But yeah, I’m not so sure about the part that comes afterwards. I agree.

Cloud trends and assertions

Lachlan James 23:17
With Amazon, Microsoft and Google releasing their quarterly earnings last week, it seems pertinent to discuss some recent cloud trends and assertions.

CRN.com reported that the big three now own 65% of the total worldwide cloud infrastructure services market, according to the latest data published by Synergy Research Group.

AWS VS MICROSOFT VS GOOGLE CLOUD EARNINGS FACE-OFF

Lachlan
From a growth perspective, SDX Central reported that AWS recorded a 33% increase in cloud services revenue in Q2, while Google’s cloud services increased by 35% and Microsoft’s by 20%, compared to the same time last year.

Amazon, Microsoft, Google Lead Q2 Cloud Growth

Lachlan
So there’s continued massive growth there, despite some more dire economic outlooks for the tech industry holistically. And on that note, I wanted to talk about a couple of recent articles, which relate to cloud infrastructure spend, and point to where it’s been and perhaps where it’s heading too.

David Linthicum had a bit of an axe-to-grind on InfoWorld.com during the week: He argues that industry’s cloud love-affair is breeding over-engineered cloud architectures.

Don't overengineer your cloud architecture

Lachlan
He also cites a recent Deloitte study, which indicates that cloud budgets – and effectively leveraging cloud computing to achieve business objectives – often don’t correlate like you might expect.

James: Firstly, do you agree with this notion – this apparent trend of over-engineering cloud solutions and budget being a poor predictor of success?

Deloitte US Future of Cloud Survey Report

James Hunt 25:05
We are really leaning hard into the ‘old man yells at cloud’ here.

So Linthicum opens up the article with an anecdote about repairing old motorcycles. Humblebrags aside, it’s worth zeroing in on that narrative for a bit. He says that some front-wheel braking systems he has duty-of-care for consist of 5 parts, and some of 20. He then goes on to say that the 5-part assembly breaks less frequently.

Curiously, he doesn’t talk about the single-part braking system, whereby you lean down and shove a lead pipe between the spokes of the front wheel. Surely that system is far simpler and less prone to breakage than even the 5-parter, right?

Lachlan James 25:57
Ah, the level of snark is amazing. I get the feeling I know where you’re going here. Sometimes you need more complex solutions for more complex problems. However, I think maybe the truth is somewhere in the middle. But yeah, you can’t just tear-down a more complex solution without looking at what it’s trying to solve. I agree with that.

Secondly, if we accept that there’s some truth to Linthicum’s article, the question then becomes this: What’s driving this over-engineering and disconnect between money spent and value achieved in cloud deployments? Are cloud providers over-hyping and over-complicating things to boost-up average customer spend? Are people caught-up in cloud mania and so adopt poor strategic approaches – ie, finding a solution before they know what problem they’re solving? Or, is it a lack of vendor-agnostic cloud architects in the tech sector? What’s going on here?

James Hunt 26:49
Let’s take those point by point. Are cloud providers overhyping or overcomplicating things to make customers spend more money? No, I think that’s a charitable view of human nature. The reality is s3 alone is enough to make customers spend more money: Put something in a bucket, forget about it for a couple of years, spend money. Spin-up EC2 instances, forget they’re running, spend money.

I think if Amazon really wanted to force customers to spend more money, they would just make the EC2 launch instance button bigger. Do I think people are caught up in cloud mania and so adopt bad strategies? Yes, and no.

I think that it’s a bit disingenuous to say that modern architects and engineers are, quote, ‘finding solutions before they know what the problem is’ – because they’ve been doing that for a very long time. It’s just now we do it in the cloud, versus the data center, or on paper or in you know, whatever happened before paper.

Do I think it’s a lack of cloud architects being vendor agnostic? No. In fact, I think most of your over engineering, that Linthicum is upset about, is people trying to be too vendor agnostic. So they’re not actually using the more cost-effective tailor-built solutions from the providers. But it’s also worth taking a step back…

Dan Luu wrote a really nice piece about interactive computer system lag.

Computer latency - 1977-2017

James
In 1983, the Apple IIe had a keyboard-to-screen lag of about 30ms. So most humans can generally detect between 120 – 180 milliseconds as a slight delay. So 30 is way under that. And what that means – that keyboard-to-screen lag measurement – was within 30 milliseconds of you pressing a key, the computer would do something: Type it back to you, run a programme, something.

To contrast the Apple IIe of 1983, in 2016, on a Lenovo laptop, that same round trip took 150ms, or FIVE TIMES as long. Is a 2016 commodity laptop overengineered? You bet, but that’s the point. It’s over engineered because it’s a bunch of commodity systems glued together, and we are okay with that because it gets us a cheaper, more portable computer device.

If you look back at the Apple IIe, it had no operating system, it had no software to speak of in that 30 millisecond loop. So yeah, it was fast, but it wasn’t as capable. AndI think modern cloud architectures are more complicated because we are trying to solve for different things; notably maintenance and resilience.

Also, when you read these articles, it’s almost always industry veterans saying that the kids of today are over engineering stuff, right? Their stuff was fine. We had things to work towards and requirements, but kids these days just spin up a Node js container, and it takes eight gigs of RAM. I think it’s not over engineered, but it is more complicated.

Lachlan James 30:24
This is perfect. We’ve got another little Simpsons meme, we can put in the background right now. But I think that does actually ring pretty true there.

So whilst over-eagerness to embrace over-engineered architected solutions might play a part in continued profits growth, there’s another area where storm clouds could be gathering for the Big Three.

Protocol.com suggests that as more companies spin-up machine learning initiatives, more organizations are also moving compute-intensive machine learning workloads from the cloud back into their own data centers. The article cites concerns over costs, latency and security.

Rolling your own machine learning

Lachlan
James: ML and AI are penciled-in as huge growth areas for the major cloud vendors over the coming years. So does this spell trouble for them? Or is this trend simply indicative of the rise of edge computing, which the major cloud platforms are already busy spinning-up solutions for?

James Hunt 31:42
No, I don’t think this spells disaster for the big three. I also don’t know about Edge. Edge is interesting to me. And there’s a lot of stuff edge can and will do. But to look just at the pullback, as it were, the reverse cloud migration. What we’re dealing with right now is another in the cycle of IT bundling and unbundling, right?

We had the same concerns about cloud vendors when they started wanting to run our VMs: When EC2 launched and Google Compute started.

Sure, we were okay with outsourcing things like email and web pages to the cloud. Clickstream data tracking? Sure. But for the quote-unquote “real workloads” it had to be on VMs in the data center, right?

Before that, when we first started with virtualization, IT departments were okay with virtualizing lightly loaded servers and services, but the quote-unquote “business-critical” stuff still all had to be on dedicated hardware, right?

I think we’re at the same place with AI/ML. It’s expensive on the cloud, but its only going to get cheaper as the central providers amp up economies of scale. It’s cheap in the data center, but it’s only going to get pricier as you factor in power, cooling, staffing, training, hardware refreshes, etc.

Lachlan James 33:11
So is this simply just a natural cyclical thing, where new things come on the market, and there’s always a little bit of trepidation?

James Hunt 33:18
If you’re curious why we’re pulling things back into on-prem, it’s actually because we started a couple of AI machine learning things in the cloud, because as small organizations who didn’t have staff, you know, to build the GPU rigs and all this stuff, it was just easier to try it out on the cloud.

You try it, you find some value, and then you go great: Now how do we scale this? Oh, that’s what it’s going to cost? Okay. Well, for that amount, we can just build a data center, and then we have fixed costs, right? And I think that’s why you’re seeing a pullback. But I think over the next three to five years, you’re going to see more of a push away.

I heard a joke on Twitter the other day, that really made me feel old, keeping in line with the old man yells at cloud motif. And that was this: somebody’s overheard in a tech discussion: ‘Active Directory? You mean Azure AD on-prem!’

Times are changing. And I think where we’re headed.

Lachlan James 34:23
Makes good sense. I think there’s always going to be a little bit of pushback against new things until you reach that adoption curve. And people realize it’s the way to go, it becomes cheaper and more accessible and people move on.

When I was thinking about this, another recent article on cloud trends, this time focusing on migration strategies, caught my attention on DevOps.com.

A Guide to Cloud Migration Trends and Strategies
Lachlan
Of the factors defining cloud migration today, environmental sustainability is cited as a major consideration. This leads me onto the final area of today’s discussion… Is AI environmentally unsustainable?

Is AI environmentally unsustainable?

Lachlan James 35:06
So, in the previous episode of Data and DevOps Digest, we looked at some of the moral conundrums posed by artificial intelligence. And as bad timing would have it, just the other day a really decent summary of that conversation, which distills 7 common ethical issues of AI, was published by Hima Pujara on Medium.

Artificial Intelligence - Underlining The 7 Most Common Ethical Issues

Lachlan
It’s an easy 5-minute read and I suggest you check it out – it certainly helped to crystallize some of the ethical unease I sometimes feel about the topic. But what I wanted to highlight in particular was point 6 – Environmental Concerns.

As Pujara points out: The computers powering AI endeavors, and the cloud infrastructure that facilitates them, require immense power. In fact, she goes on to say that “training some AI algorithms can create 17 times more carbon emissions than an average American does in a year”.

I feel like this conundrum will become increasingly pertinent over the mid-term. But, instead of taking the conversation in a fatalistic direction, I wanted to focus on potential solutions.

SiliconANGLE.com attempts to tackle this challenge, citing three roads to achieving environmental sustainability when it comes to AI:

    • Improving AI model efficiency
    • Streamlining compute environments
    • And harnessing AI itself to tackle the problem
How IT leaders can make AI environmentally sustainable

Lachlan
Stephanie Glen offered some additional thoughts when writing for TechTarget’s Search Enterprise AI publication

    • From abandoning machine learning where its use isn’t’ absolutely necessary;
    • To deploying models in carbon-friendly regions that don’t rely on fossil fuels;
    • Using federated learning deployments to spread the load;
    • Embracing tinyML to shrink the size of the models trained;
    • As well as simply using consumption metrics to keep track of power-hungry GPUs.

James: What do you make of these suggestions? Are they realistic? Can they make a big enough difference? Or are we fiddling at the edges – pun definitely intended – to soothe our collective conscience?

Green AI tackles effects of AI, ML on climate change

James Hunt 37:20
Have you seen Silicon Valley all the way through to its finale?

Lachlan James 37:25
I have not seen the finale. You’re about to spoil it for people. Anyway.

James Hunt 37:29
I’ll try to spoil it too much. So in the first couple seasons, Gilfoyle builds, Anton, right? The distributed grid, essentially a bunch of motherboards on wire shelves, that run Pied Piper.

And towards the end, after Anton goes out the back of a U-haul, he builds Son of Anton – which starts to get into the Data Science and the AI bits. And they actually task Son of Anton with getting better at being Son of Anton. They give him the compression algorithm. And let’s just say it’s not great. It’s good for him, Son of Anton, but not great for everybody else. So teaching AI to make AI better, may not be the thing we want to do. Probably Okay, little bit of grey area.

But I think making things more efficient is the tool people always reach for when dealing with perceived waste, but you have to be careful there. The faster we make Internet access speeds, the more online we are. The cheaper we make door-to-door delivery, the more delivery we consume.

While I 100% agree that if you don’t need Machine Learning you shouldn’t be using Machine Learning, I do think it can be hard to determine if you do or don’t need it in the first place. Often, you have to try it and find out it’s not doing what you want. If lots of people do that, we’re wasting the power and cooling resources.

I think it’s also good to keep things in perspective. Global carbon emissions from all data centers across the globe is estimated at anywhere from 2.5% to 3.7%, according to an article by Climatiq. 27% of global carbon emissions can be attributed to transportation. Fuel efficiency and alternate fuel sources will do weigh more to move the needle on the climate crisis.

Measuring greenhouse gas emissions in data centres - the environmental impact of cloud computing
James
That’s not to say you can’t do your part in AI/ML. My biggest concern there, specifically with the “use regions that don’t rely on fossil fuels” advice is the complacency that breeds. Do you know where Google, Amazon, and Microsoft get their electricity? And if they told you, how would you verify it? And how do you know that where they’re getting their electricity from is clean, right? They might be buying it from a carbon neutral place, which is just buying carbon offsets to keep burning coal.

This stuff’s complicated, and there’s going to be no easy fix. But I think it’s worth having the discussion.

Your homework

Lachlan James 41:55
Alright, that’s our last discussion piece for today but, because there was such a good variety of news and analysis pieces out over the past two weeks, I’ve added a further reading list that you should check-out at the end of this podcast.

My favorite is BIll Schmarzo’s piece on Data Science Central that offers one of the best frameworks I’ve seen for helping data professionals quantify the value of business data, as well as the analytical initiatives that extract insights from that data.

What’s the Value of my Data

Until next time

Lachlan James 43:06
Ok, we’re officially out-of-time for episode 5 of Data and DevOps Digest; brought to you by Vivanti Consulting. James, thanks for your time. For those watching, thanks for tuning-in.

To receive regular Data and DevOps news and analysis, subscribe to our YouTube Channel, or go to vivanti.com/contact-us, and sign-up to our mailing list.

It’s been a pleasure to have your company; bye for now!

If You Liked That, You’ll Love These:

Building a new kind of consultancy

Building a new kind of consultancy

At Vivanti, we’re building a new type of cloud consultancy. One based on trust and empowerment. We are looking for savvy, technical people who want to forge their own path in the industry.