Lesson Contents01. Machine Learning Lesson 1.01: Introduction - What is Machine Learning?02. What Does Someone Who Does Machine Learning Actually Do?03. Machine Learning (MI) vs Artificial Intelligence (AI)04. What is a Data Scientist?05. What Are Some Qualities of a Good Data Scientist?06. Why is Machine Learning so Hyped? Is it Legit?07. Different Types of Machine Learning08. What Are The Consequences of Not Integrating Domain Knowledge Into The Machine Learning Process?
01. Machine Learning Lesson 1.01: Introduction - What is Machine Learning?OK, so what is machine learning? There are lots of definitions out there and not everyone agrees on what it actually means. But there's a few definitions that I think are pretty solid so we'll go through them now. Algorithms and statistical models that computer systems use to perform a specific task effectively without using explicit instructions. Another definition is, a set of methods that computers used to make and improve predictions of behaviors based on data. Another definition is, machine learning is the science of getting computers to learn and act like humans do and improve their learning over time in an autonomous fashion, by feeding them data and information in the form of observations and real world interactions. So basically, without machine learning we have to write the program. We have to program the computer to do exactly what we want. And with machine learning, we're basically using these special algorithms that people have come up with to just feed the computer a bunch of data and have it sort of learn the patterns or learn the program on its own.
02. What Does Someone Who Does Machine Learning Actually Do?So what does someone who does machine learning actually do? Again, this is something that is not agreed on by by everyone and there are a lot of different thoughts on what a machine learning practitioner actually does. One of the common things that comes up a lot is that it's sort of a combination between statistics and computer science. Some people say, if you use machine learning in your work, you're probably someone who is not quite as good at programing as a real computer scientist and not quite as good at statistics as a real statistician but you can do both kind of reasonably well. And I would say in the energy industry in particular and probably other industries too, I would add that domain knowledge is a really key part of being kind of a good machine learning practitioner, as well as good communication skills. So if you can't take what you've built, take your model or whatever and understand it in the context of the technical space you're working in. So in our case, if you're building machine learning models for reservoir engineering, if you don't know anything about reservoir engineering your model is probably going to be lacking in some way. And then the other angle of that is you have to be able to communicate it effectively to other reservoir engineers and managers. So I would say for effective machine learning and in the energy industry, domain knowledge and good communication are at least as important as those first two. And there's lots of other things. I guess, in my mind, one of the things that makes a really good machine learning practitioner or data scientist is you need to be curious. So if you're the kind of person who, you get told something and you just accept it at face value and you're not kind of interested in understanding why happens or what alternative interpretations might be then you might not get as far in the machine learning space as someone who's really curious, willing to experiment, is patient with iterations, has maybe some hacking skills or at least kind of likes the idea of hacking things together. And then there's math and probability, which kind of relates to the statistics piece. And software engineering, which obviously you have to at the end of the day, write some code to make it all work.
03. Machine Learning (MI) vs Artificial Intelligence (AI)What is machine learning versus artificial intelligence? This is again, something that has a relatively low degree of agreement out there and there is no widely accepted definition. AI, artificial intelligence, one description is it's a science or approach to developing technology that works like a human. So it learns by experience and it kind of has that element of the you know, it's like us, it's like our brain or it's kind of that conceptual label that people put on things. Machine learning, most people would consider it a subset of AI and it focuses on algorithms that learn useful patterns or models from datasets. And then there's deep learning, which is a newer term, not that new anymore. But it's basically just machine learning with multi-layered neural networks. So neural networks, which are one type of machine learning algorithm or structure, neural networks with lots of layers are usually considered deep learning. And one of the key factors with deep learning is it requires lots and lots of data. So in oil and gas, with most of the data that we're used to working with, if you take seismic out of the equation, I would say and maybe image processing, usually deep learning is less applicable certainly to the type of the stuff that I'm going to focus on in this course for the most part.
04. What is a Data Scientist?So what is a data scientist? Lately, I would say a data scientist is the name for anyone in a technical profession who's currently job hunting. And that's because I know a lot of people right now and if you look at the number of data scientists out there on LinkedIn or wherever else, it seems to be just absolutely exploding. So, a lot of people who used to be technical professionals in some other field are now simply just calling themselves data scientists because they think it'll be an effective way to get a job. And I can't blame them. And honestly, maybe you could argue that I'm someone like that because I have no formal data science training. Really, what it is, I guess again there's no one definition, but I would say it's someone whose role combines statistics, computer science, math and ideally domain knowledge. It's also, I would say, someone who can kind of wrangle data and extract meaning from it and communicate that meaning to other people using sound defendable methods and in a language that they can understand. So right now, data science and data scientist will mean a lot of different things to a lot of different people. Again, there's no one commonly accepted definition. There is even substantial disagreement among people who are well known recognized data scientists as to what a data scientist is. There are also lots of articles online complaining about how everyone is calling themselves a data scientist when they aren't really a data scientist. And that only a small minority of these people actually meet what the particular author considers to be a legit data scientist. The problem again is that different authors have different ideas of what is legit, and sometimes those don't even overlap. There are also other related rules out there in addition to data scientists, like data engineer, data analyst, probably lots of others. And there's no one governing body that hands out data scientist licenses. Like here in Alberta, we have a APEGA that basically says you are a professional engineer and you're licensed or you're not. And there's nothing like that right now for data scientists. So there's nothing stopping anyone from calling themselves a data scientist. And again, for me, I think I probably would meet most definitions of a data scientist, but I actually have no idea.
05. What Are Some Qualities of a Good Data Scientist?So what are some qualities of a good data scientist? In my mind domain knowledge, for a data scientist to be useful I think, is absolutely key. So in the oil and gas industry, if you want to be useful as a data scientist in the industry, I think it's important to know something about the industry. So again, whether you're using data science and machine learning techniques for geological understanding, reservoir engineering understanding, production performance prediction, optimization, it's important to know something about the terminology and the physical realities of what we're talking about.
06. Why is Machine Learning so Hyped? Is it Legit?So why is machine learning so hyped? Is it a legitimate technology that we can expect to be around for a long time? Is it something that's just the latest fad and that's why we're making a video about it? Or is it something that's here to stay? So back in 2015, this is the Gartner Hype Cycle for Emerging Technologies and machine learning is pretty close there on the "peak of inflated expectations" is what they call it. So it's arguably just starting its descent down into the trough of disillusionment. And for the technologies that kind of make it, the idea is that they will then turn the corner at the bottom of the trough of disillusionment down here and start to make their way up the slope of enlightenment and they will reach a plateau of productivity. So this is obviously a hilariously simplified way of looking at the world but the thing to note, I guess, is that back in 2015 machine learning was more or less at the peak of inflated expectations.
07. Different Types of Machine LearningSo there's a few different types of machine learning. The three categories that machine learning, are usually kind of lumped into our supervised learning, unsupervised learning and reinforcement learning.
08. What Are The Consequences of Not Integrating Domain Knowledge Into The Machine Learning Process?I think if you don't integrate domain knowledge, it's easier to fall victim to a few different kinds of sort of traps. First of all, spurious correlations are easy to come by but maybe not recognize that they're spurious. And so something like the phase of the moon affecting your well productivity results is something that maybe might show up as a strong feature in the model when obviously we know that there's no causal link there. The other thing is it's difficult to know what features that we might be able to come up with out of the data. So some engineered features that we might take the raw data and say, if we look at it like this, that's probably physically relevant. Or that's maybe a good way to look at things that would be intuitive and interpretable to someone who's looking at the results of the model. If we don't have domain expertise, it's really hard to kind of come up with those features. There are automated feature engineering methods and algorithms out there, but they usually sort of just try to throw everything at the wall and see what sticks. And sometimes what sticks is, again, very unintuitive or just sort of not particularly meaningful to someone who's interpreting the results. And usually we can find better links to causality with domain knowledge. So if we're trying to understand what actually drives performance for a well or what is actually predictive, let's say of core porosity or something like that. With domain knowledge, we can sort of piece together what we know from the physical relationships and see if that's showing through in the model. Or if we think that some things are shining through in the model that maybe we strongly suspect are not causal, we might want to immediately go investigate those or filter those out to make sure we're not falling into a trap there. And really at the end of the day, it's also important and just the ability to convince stakeholders of the value of the analysis or the predictions generated from the models. So if our features and the whole process is rooted in domain knowledge, we can we can communicate the modeling process in those terms and we can link it back to some of the maybe first principles that are already widely accepted and make it much easier to get buy-in from management, let's say, in actually using this model to make decisions.
Machine Learning Introduction