I came across Judea Pearl’s ideas on causality while browsing Youtube for videos about the mathematics of machine learning and physics. A few weeks later I decided to borrow his book “The book of Why” from my local library and delve deeper into his ideas on causality. It was a book that I found easy to read and I felt he did a great job of explaining the complexity of causality in easy to understand terms. I found his book highly thought provoking and I have spent several weeks investigating the challenge he posed in the book, hence why I am writing this post.
After being involved with machine learning for a while, I found myself both in awe of how beautiful the mathematics is and disappointed at how unintelligent the algorithms are. Machine Learning models are extremely good at being able to find patterns in the data available to them but not very good at understanding if the data is actually related in any meaningful way. Judea touched on this issue a number of times in his book, the examples I felt illustrated this issue best were:
- The rooster and sun analogy - a machine learning algorithm shown data about a rooster and the sun, would be able to very accurately predict the probability of a rooster waking up the sun.
- The Adam and Eve example - why they ate the forbidden fruit.
The Rooster and the Sun
As I said above, a machine learning algorithm shown data about the Rooster and the Sun would be very good at understanding how Roosters and Suns interact and could predict very closely what would happen should the rooster crow, being able to predict correlation (causative or not) isn’t intelligence, its simply modeling a reality based on data the algorithm is provided.
We could patch this missing causality by teaching the algorithm to ask the question why/how/does the rooster wake up the sun. The algorithm could find or a human being could provide more data to the algorithm in the hopes of getting closer to the real answer. If the algorithm knew to look for more data, correlation would likely equal causation the more data it was fed. The drawback is:
- it does not seem to be computationally efficient nor feasible to expect the algorithm to research every possible scenario and then come to a conclusion. It would be much more efficient for it to make an informed guess and then go down that route.
- After it’s research it may be able to model the data correctly but it would be reliant on being able to observe things after they happened, it would never be able to consider things outside the known reality. This could be solved by it asking questions of other machine learning algorithms or human beings but it would then be at risk of getting an incorrect answer or no answer. Either way, it wouldn’t create an intelligent algorithm, it would create an algorithm that is good at modeling reality based on data it was provided.
Adam and Eve
Judea uses the example of Adam and Eve providing God with the reason why the ate the forbidden fruit. It occurs to me this example does not fit the logical model of causality, Adam and Eve provide a reason but a reason isn’t why. A reason is just a guess out of many other options, it isn’t enlightened nor does it consider all the information available, for example, they do not exhibit the ability to self reflect. This example sounds very similar to what I would expect from a slightly more advanced machine learning model, a simple guessing machine that was sometimes right and one that didn’t take responsibility for its actions or even understand the ramifications, it just acted.
When I came to the above realisation, I felt that maybe I had misunderstood the point Judea was trying to make and that is definitely still possible. As I considered his book further (I’ve borrowed it twice) I felt there is a significant amount of information missing from the book, which Judea admits but I feel it would have been helpful to discuss some of the major challenges to automating causality. I believe that Judea and his students have discovered much about the mathematics of causality but it is only the tip of the iceberg and the equations still require human intelligence.
After reading the book the first time I started mapping out how an algorithm could be programmed to find answers to questions or unknowns. I created some fake scenarios (much like the ones in the book) and considered at how an algorithm might work out how each object interacted or if they interacted at all. I started drawing out the diagrams that Judea put in his book but quickly realised that the algorithm needed to create these diagrams itself and that makes the challenge much bigger.
I went a step backwards and considered a very simple reality, that contains 4 objects, A, B, C and D. How might an algorithm observe the objects and consider what the information it observed, tells it about the reality it is observing and the relations (if any) between the objects. How would an algorithm know how each object interacts, if they interact at all. How would an algorithm consider if correlation equaled causation, was it direct interaction, just a coincidence or something in between. These questions provided only a small amount of insight, I felt like I needed to tackle the problem another way.
Comparison between humans and machine learning
If you consider how most machine learning vision algorithms work, they are trained to detect objects. If we consider 4 billiard balls on a pool table rolling around doing what billiard balls do with a human and a machine learning algorithm watching. Lets compare their abilities:
- Both the human and the algorithm can detect all the balls on the table with ease.
- The human can predict distances but the algorithm has no spacial understanding, it just looks for and detects the balls.
- Detection is where the algorithm’s ability to understand reality stops. In the event the algorithm can detect changes in movement, it has no way of knowing if the change in direction was produced by the interaction, it can only assume correlation equals causation.
- Neither the human or the algorithm have the ability to know how the balls started moving. The best humans can do is guess. The algorithm has no ability to do this.
- When two balls inevitably collide, the human understands context and is aware they can interact but the algorithm can only assume.
Lets consider the same scenario but where the table was being recorded before the human and an algorithm (that has the same ability as humans) started viewing the table. They are both asked to tell us why the balls are moving the way they are. They are both aware that the only interactions possible are between the balls themselves and the walls of the pool table. Both the human and the algorithm would need to calculate current trajectories of the balls and compute the possible interactions that could have occurred that resulted in the the balls moving the way they are. It is likely that their answers would be different and both correct based on the information they have. Their answers can be easily checked for “correctness” but the time to find a possible “correct” answer is much higher than the time to calculate “correctness”. I put “correct” and “correctness” in double quotes because the human and algorithm answers are likely to be different but also correct given the information they know but incorrect based on the video evidence.
Thinking about the last paragraph further, it seems to me that getting the correct answer could be a P=NP problem. If this is the case, how are human beings able to solve problems?
I decided to see if I could find any evidence that humans are able to solve P=NP problems by considering human behavior. I’ve mentioned in other posts that I have studied human psychology, human behavior and neurology for several years before I got into machine learning, so I’ve got a bit of an idea how we work. Much of human behavior can be contradictory, so comparisons of human behavior feel more like philosophical endeavors than mathematical ones. Looking for anomalies in human behavior could provide insights into how we think and allow us to find out how we process reality. Some example anomalies that come to mind:
- We often act in direct conflict with our own life goals.
- How are people able to act like a victim, view themselves as not a victim and then go about creating victims themselves?
- Why are their extremely positive and negative people? How are these people able to view both negative or positive as normal acceptable behavior?
- Human beings are very smart and can solve problems very quickly but the world is full of huge problems that do not result in human action.
I decided to look at how human beings solve problems by considering our relationship to problem solving (causality). Before I do that, I think it is important to define what a problem is by looking at problems from a number of different perspectives vs the human perspective:
- The universe doesn’t seem to care about problems, it just is.
- Dogs don’t care about problems, they care about their own safety and survival but they don’t consider things problems.
- Very few animals consider problems, sure they can assess interactions but it is not considered a problem, it just is what it is.
- Crow’s are one of the few animals that can solve problems but they also seem to create problems (by seemingly holding a grudge)
If problems are not problems (and we are at least subconsciously aware of that fact) what drives humans to create/solve them? Is it our need to make a difference, to give life meaning? Is it because we are constantly imagining events as problems? Is it because we have a fundamental need to create, to imagine and sometimes the pathways that solve problems sometimes create them?
Human beings have an unusual relationship with problems, we see them, we imagine them, we solve them and we create them by the millions every day:
- We get offended by others behavior and take it personally, creating internal and external problems
- We look for minute differences between each other and completely disregard how similar we all are
- We look at the world and see the need for a helicopter or space travel and then go about solving it
- We create widgets of all shapes and sizes to solve the most obscure of problems
- Many of us have difficulty understanding the difference between a problem people want solved and ones that humans are happy to have exist - this begs the question whether it is a problem at all or something to distract us.
Why do we behave this way? What does this behavior tell us about our relationship with problems? Human beings seem to love problems, we love creating them but we also love solving them. If problem solving (actually solving a problem not creating a new one) is part of our DNA, then why are their so many problems in the world? Would it be better to describe humans as problem creators? Is our ability to solve problems is proportional to our ability to create them?
Going back to the Adam and Eve example, they did not provide a definitive answer of why, they provided an excuse. If Eve had said “I ate the fruit because I wanted to be as smart and powerful as you, God”, it would be pretty hard to argue against that but human beings rarely consider all the information available, we cherry pick. In the Adam and Eve example, they tried to absolve themselves from guilt by ignoring facts but at the same time, they created guilt by not being truthful.
Another interesting behavior that I think provides further insight into human problem solving is the flat earth philosophy. There are many people in the world that believe the earth is flat, for many people, it is hard to understand how these people can have this belief, but they do and they are very passionate about it. How are they able to believe the earth is flat? What thought process is going on here? I believe it is a similar thought process to the invention of the helicopter. We imagine a reality that doesn’t exist, cherry pick information to prove that our imagined reality is possible (or does exist) in an effort to make the imagined reality, real. This worked for helicopters, so far it hasn’t worked for the flat earthers.
Considering the above few paragraphs, it doesn’t seem like we are solving problems, we are simply guessing at possible reasons using our knowledge of context obtained from past experiences (the illusion of free will comes to mind). It does seem that we use our brains more for problem creating and solving than we do for observing the universe as it is. If all we are doing is guessing by using our imagination, I am not sure we should program machines to solve problems like human beings. Imagine the ramifications of creating machines with the ability to both create and solving problems like human beings. In saying this, without imagination, how will a machine solve a problem? If we program a machine to have an imagination, how can we constrain it to not create problems at the same levels of extreme human behavior.
I believe there are many insights to be gained from looking at how human beings solve problems. Our ability to guess at a solution is much more advanced than any other animal, there are certainly insights available. In saying this, if we solve problems by continually imagining possible realities, we need to work out how to contain the guesses to be within the scope of morality.