Issue 126 - June 14th 2022
It's been a while (day job, vacation, blah, blah, blah...), which means that there is plenty to show you this month, including some pretty major advances in AI. So let's dive right in...
DALL-E 2 vs Imagen vs the world
Both OpenAI and Google announced major text-to-image generation models in the last month. OpenAI's is a v2 of their DALL-E model that I originally wrote about back in Issue 115 in January 2021, which 'generates more realistic and accurate images with 4x greater resolution'. The image in the header of this newsletter was created by DALL-E 2 from the description 'An astronaut riding a horse, in a photorealistic style'. The AI is able to make different versions from the same description and in different styles, so I could have asked for a pencil drawing or a picture in the style of Andy Warhol. Here's another example, and one more, from OpenAI's founder, Sam Altman. Google's Imagen model does pretty much the same stuff as DALL-E 2, but, apparently, slightly better. The image that accompanies this article was generated by asking for 'A photo of a raccoon wearing an astronaut helmet, looking out of the window at night' - there are lots more examples on their site and you can generate some limited examples yourself. Both models have not been released for use by the general public or even researchers, although it would seem that OpenAI plans to do this at some point. Google has no plans to release Imagen for the time being, if ever. But why, when these tools could be so useful to graphic designers, marketers, educators, etc would you not want to let everyone have a go? The answer lies in the messaging that is included at the bottom of the Imagen home page where it talks about the 'limitations and societal impact' of the model. The biggest issue is that the model (and Imagen as well) was trained on very large, public image sets, many of which (in Google's own words) 'often reflect social stereotypes, oppressive viewpoints, and derogatory, or otherwise harmful, associations to marginalized identity groups'. One of the reasons that many of the sample images are of cute animals doing cute things is that the model is simply not that good at generating images of people. But, more worrying, when it does so, 'Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes'. This is not the sort of AI world that we want to live in (we have enough trouble with real people doing this without the machines reinforcing those behaviours). OpenAI are (ironically, considering their name) less open about DALL-E's limitations - they focus on the restrictions that they will enforce on the model: 'Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies'. The problem here is who defines what is good content and bad content? Should we really leave that to corporations founded by billionaires? Behind all those cute panda pictures lies a big problem that is not going to be solved easily or soon.
Gato-aid
Another big advance in AI was also announced last month, and this one is arguably much more important and impactful. Deepmind took the wrappers off Gato, a 'generalist' AI model. Just as Einstein's 'General' Theory of Relativity is much more consequential than his 'Special' theory, an AI model that is able to do many things (Gato can do 604 different tasks, including playing Atari, captioning images, chatting and stacking blocks with a robot arm) is significantly more important than pretty much all other AI models which are inherently specialist. Currently, most AI models need to be completely retrained if they need to learn a new task (it's called 'Catastrophic Forgetting'). Gato is able to learn multiple tasks at once, which means it can switch between them without having to forget one before learning another. It doesn't sound like a big deal but, in the move toward Artificial General Intelligence, it is significant. Now, we have to be careful we don't get carried away with the hype because Gato is very far away from being AGI - it can't adapt to tasks it hasn't learnt, for example, and it is very much a 'Jack of all trades, but master of none' - but it is one small step closer. And, if we never even get to AGI (which is my suspicion), models like Gato will still be really useful tools in their own right.
I feel the need... the need for speed
Everyone loves a robot video, and this one of a 'cheetah' robot is no exception. The robot, called Mini Cheetah, has been optimised for speed, but it is able to do so across difficult terrain rather than controlled environments. You could hardly describe it as elegant, but much of that is down to the way it has been trained. The researchers at MIT used reinforcement learning so that the model learned itself how to cope with any scenario, rather than having each scenario trained into it. This also meant that it was unencumbered by human guidance and was free to develop its own 'style' of running. It seems to do its job remarkably well, including running over ice, catching itself when it tripped and, even if it had a broken leg, finding a way to hobble on as fast as possible.
NeRF-ing USA
Researchers from Waymo, the self-driving car company, have created photo-realistic 3D renditions of parts of San Francisco using 2.8 million photos and an AI algorithm they call Block-NeRF. Neural Radiance Fields is a technique where AI can fill in the gaps from a large number of photos to generate these 3D environments. Once created, the user doesn't have to use the routes that the original cars took, but can fly over and explore the area from any angle, and can see it during different times of the day or in different weather conditions. This is obviously really useful for autonomous driving but also ariel surveying as well. Their showreel video shows how they did it all as well as some examples of the outputs. One can imagine that this technique will very quickly spread so that soon you will be able to travel the whole world from the comfort of your own home.
Kendrick Lamar - The Heart Part 5
Usually this slot in the newsletter is reserved for a purely musical choice, but in this case we have some amazing music and some AI. Kendrick Lamar recently released his fifth LP, Mr. Morale & the Big Steppers, which is his first for over 5 years. This is a bold record, where he turns the spotlight on himself and interrogates his own belief systems. Sometimes it makes for uncomfortable listening ('We Cry Together'), but it can also be uplifting as well ('Auntie Diaries'). The AI comes in the form of deep fakes in the video for 'The Heart - Part 5'. In it, Lamar seemingly transforms in front of our eyes into Kanye West, Nipsey Hussle, Will Smith, and O.J. Simpson. The techy stuff was done by a studio called Deep Voodoo, and it's good to see deep fakes being used for entertainment rather than deception (like the one of Volodymyr Zelenskyy). You can listen to Mr. Morale and the Big Steppers on Apple Music or Spotify.
With great power...
Writing about AI Ethics can be a binary experience - it feels that there is only bad news (some company or organisation ignoring blatant ethical issues in order to chase profit) or good news (some other company or organisation taking affirmative action to mitigate ethical issues with their AI). But there is also a middle ground where companies say they are doing the latter so they can hide doing the former. So-called ethics-washing is not confined to AI ('green-washing' is particularly popular at the moment) but we have to read everything very carefully to understand what the true motives are behind their actions. Here are a couple of examples from each side of the bad/good spectrum. I'll let you decide what their motives really are.
South Korea is going to use AI to track individuals who have Coronavirus. According to Reuters the system will 'analyse footage gathered by more than 10,820 CCTV cameras and track an infected person’s movements, anyone they had close contact with, and whether they were wearing a mask'. In the article it talks a lot about the efficiency benefits that the system will deliver but hardly anything about the blatant invasion of privacy that this brings.
I've already banged on about the dangers of identifying people's emotions using AI, but still, organisations are trying to do it, and, in this collaboration between Intel and Class Technologies, they are trying to do it to children. The tools from Class Technologies work with Intel-based CPUs and Zoom, with the goal to analyse the emotions of students and provide insights to teachers. Intel say, "Through technology, we have the ability to set the standard for impactful synchronous online learning experiences that empower educators." which sounds like BS to me, and completely ignores the needs of the children. Has anyone thought of just asking the students how they feel?
But some apparent good news: Oregon's Department of Human Service's 'Safety at Screening Tool', (an algorithm that generates a 'risk score' for abuse hotline workers, recommending whether a social worker needs to further investigate the contents of a call) is being shut down because the original model was found to be flagging a disproportionate number of black children for 'mandatory' neglect. Meanwhile, Google's data science platform, Colab, has decided to restrict developers from creating deep fakes. Whilst this won't halt the development of deep fakes completely it will certainly impact their output, and it's good to see a BigTech firm with a tarnished ethical record doing something positive.
Afterword...
Way back in October 2020 I mentioned a ship called the Mayflower which was about to start sea trials ahead of an autonomous, unmanned trip across the Atlantic. After a few malfunctions, I can report that it finally made it.
And a rather sad follow up to an article from February on the man who had a heart transplant from a pig. He died last month, likely from a porcine virus that was lurking in the donor organ.
In the UK, a very large trial has just got under way to test the benefits of a 4 day week. The trial involves over 3,000 people in 70 firms so will be the world's largest to date. Results should be known in about 6 months.
And finally (as they say at the end of the News), in complete contrast to the technological advancements described above, here is a video from 1969 of Japanese craftsmen making Samurai swords. It's a long, painstaking process that is a marvel to watch, and is a complete juxtaposition to any process from today where everything has to be done as quickly and efficiently as possible.
Greenhouse Intelligence Ltd thegreenhouse.ai
Andrew Burgess is the founder of Greenhouse Intelligence, a strategic AI advisory firm.
You are receiving this newsletter because you subscribed and/or attended one of our events.