QFM005: Machine Intelligence Reading List - February 2024
Source: Photo by Mike Kononov on Unsplash
Here is everything I found interesting about machines behaving intelligently during February 2024.
This month's reading highlights a recurring theme in the ethical implications and societal impact of machine intelligence such as Marcin Jabłonowski's clever exploration of AI avatars in Marcin 2.0, the discourse on the replacement of human jobs by AI at Klarna, and Geoffrey Hinton's discussion on the potential future dangers of AI at scale. Going a bit deeper into practical advances in LLM tech we see the introduction of Mamba, a State Space Model challenging Transformer models, and the innovative approaches to AI safety and effectiveness in GradSafe and Matryoshka Embedding Models. We also explore the potential of AI to replace human jobs and the ethical considerations this introduces as well as advancements in AI safety and model efficiency.
Perhaps the most incredible generative AI release this month was OpenAI's SORA video generation. The potential for SORA to disrupt video creation and production are obvious and profound, but an intelligent system that has an understanding of real-world physics has much wider implications.
See the Slideshare version of the post, or read on.
As always, the Quantum Fax Machine Propellor Hat Key will guide your browsing. Enjoy!

Links
The article discusses strategies to mitigate AI hallucinations in generative models, emphasising the necessity of integrating anti-hallucination measures across the entire Retrieval Augmented Generation (RAG) pipeline. It argues that achieving near-perfect control over hallucinations is crucial for reliability, drawing parallels to business standards in security and uptime. Techniques include thorough testing, leveraging economies of scale in SaaS platforms, and applying specific technical solutions like query pre-processing and dynamic context boundary walls in prompts.
The paper introduces a novel method for eliciting chain-of-thought reasoning from large language models without the need for explicit prompting. By altering the decoding process, the study reveals that models can inherently generate reasoning paths, demonstrating a significant improvement in reasoning capabilities and model confidence over standard decoding methods.
The article introduces GradSafe, a method for detecting unsafe prompts in Large Language Models (LLMs) by analysing the gradients of safety-critical parameters. GradSafe outperforms existing methods by efficiently identifying unsafe prompts without requiring extensive data collection or training, demonstrating its effectiveness with Llama-2 against the Llama Guard system across different evaluation datasets.
The article discusses Mamba, a State Space Model (SSM) that challenges the dominance of Transformer models in AI by offering similar performance with faster processing and better scalability for long sequences. Mamba optimises efficiency and effectiveness, promising advancements in AI safety, interpretability, and applications across various modalities.
This video gives a quick (5m) intro to OpenAI's [SORA](https://openai.com/sora){:target="_blank"}, a groundbreaking AI that generates high-definition, detailed videos from text descriptions, capable of handling complex scenes and occlusion effectively.
In his Romanes Lecture at the University of Oxford, Geoffrey Hinton, known as the 'Godfather of AI,' discussed the potential dangers of AI, including its ability to replace human intelligence, the risk of AI taking control over humanity, and the implications for the workforce and the spread of misinformation.
Artificial intelligence has unlocked the contents of a papyrus scroll from Herculaneum, revealing a Greek philosopher's insights on pleasure, previously hidden by the eruption of Mount Vesuvius 2000 years ago. This breakthrough, winning a $700,000 prize, could lead to more ancient texts being deciphered.
The paper "Antagonistic AI" explores the concept of AI systems designed to exhibit disagreeable or challenging behaviours, arguing these characteristics can sometimes offer benefits like forcing users to confront assumptions or build resilience. The authors discuss the ethical considerations and potential design strategies for such AI systems.
This video provides a comprehensive (if somewhat introductory) guide for learning AI in 2024, covering technical skills, theoretical fundamentals, project ideas, specialised areas, AI safety, regulations, and recommended resources including courses, books, and newsletters to achieve a well-rounded AI education. This fantastic intro video also has a companion [Notion Site](https://gilded-enquiry-cb8.notion.site/Roadmap-How-to-Learn-AI-in-2024-a9e105c14c0f4915913b8cb2eccc7ff2){:target="_blank"} and a [PDF](https://drive.google.com/file/d/1dEfzIA7CS3bpHSkOV9h5Y7gripHPiTid/view){:target="_blank"}. Well worth a few minutes of your time.
Sora is OpenAI's AI model capable of generating videos from text prompts, creating realistic and imaginative scenes that simulate real-world motion. It's designed to assist in problem-solving that requires real-world interaction and is currently available to select visual artists, designers, and filmmakers for feedback. This is yet another mind-blowing piece of generative AI functionality from OpenAI. [The "LLM Event Horizon" continues its expansion at pace. First: text. Then: images. Now: video. What will be the next category consumed?](https://masto.ai/@matthewsinclair/111936967025552383#.){:target="_blank"}
EmoSpeaker introduces a revolutionary technique for generating emotional talking-head videos from a single image, input audio, and specified emotion, capable of adjusting emotional intensity through fine-grained control. This method surpasses existing technologies in expression variation and lip-sync accuracy.
Klarna's OpenAI-powered virtual assistant now handles two-thirds of customer service chats, equating to the workload of 700 humans, showcasing significant efficiency gains and potential profit improvement for the company.
This video breaks down the leaked GPT4 system prompt. The capabilities hinted at within the prompt are *very* surprising. For example, the policy statements for the use of DALL-E are particularly interesting with respect to emulating the style of artists.
The article showcases a compilation of 12 **Video-To-Video** demos by , highlighting how this technology could revolutionise the movie, animation, and social media industries with its astonishing results. It delves into Sora's technical aspects, including its use of spatiotemporal latent patches, transformer-based video diffusion models, and dataset creation using high-precision video captioning, without employing notably new technology but rather emphasising the importance of computational resources.
Research suggests that while popular and enhancing productivity, AI coding assistants like GitHub's Copilot may lead to lower code quality, with issues like increased code churn and higher amounts of repeated code.
This article explores how a small transformer language model predicts the next token, focusing on the role of transformer blocks and feed-forward networks beyond multi-head self-attention. The author shares findings from a six-month investigation, proposing that each transformer block predicts the next tokens based on learned associations with classes of strings from the training data.
OpenAI is reportedly developing a web search tool, potentially integrated with Bing, to directly challenge Google's search engine. This initiative aligns with Microsoft CEO Satya Nadella's strategy, as expressed last year, to innovate in search technologies through AI, notably with the Copilot AI tools in Bing. The competitive landscape in search engines is expanding, with Google's Bard/Gemini, Copilot, and emerging players like Perplexity joining the fray, indicating a rapidly evolving market.
This article explores the homogenisation of culture and creativity across various fields such as art, interior design, architecture, automotive design, personal appearance, and media. It argues that despite the illusion of choice and individuality, most creative domains have converged towards a median, characterised by widespread uniformity and a lack of distinctiveness, leading to an era where originality is rare. I have been referring to this phenomenon as **The Tyranny of the Banal**.
The FCC has declared AI-generated voice calls as illegal under the Telephone Consumer Protection Act, aiming to address the issue of artificial robocalls.
This article discusses a new technique for reducing AI-generated inaccuracies by augmenting large language models (LLMs) with proprietary data, which shows promise in enhancing the models' knowledge base.
Cory Doctorow's article in Locus Magazine explores the nature of AI as a bubble, comparing it to previous tech bubbles. He discusses this bubble's potential outcomes and remnants, highlighting the distinction between bubbles that leave valuable assets behind and those that do not. Doctorow expresses scepticism about AI's sustainable value and business models, questioning what will remain when the hype subsides.
Researchers have developed an innovative approach using explainable deep learning to identify new structural classes of antibiotics crucial for combating antibiotic resistance. By employing graph neural networks to analyse a vast array of chemical compounds, they have successfully discovered compounds effective against MRSA and other resistant bacteria with low human toxicity. This method surpasses traditional drug discovery methods in efficiency, marking a significant advancement in the ongoing fight against antibiotic-resistant infections. More details in the Nature paper here: [Discovery of a structural class of antibiotics with explainable deep learning](https://www.nature.com/articles/s41586-023-06887-8){:target="_blank"}.
This article introduces GALA3D, a tool for creating realistic 3D scenes from text descriptions using layout-guided generative models and large language models for layout descriptions, offering an end-to-end framework for state-of-the-art scene-level 3D content generation and editing.
Demis Hassabis discusses Google's latest AI models, the existential risks of AI, and the future of artificial general intelligence (AGI), including the temporary suspension of Gemini's human image generation due to controversial outputs.
This article discusses how the GPT-2 model and Transformer architecture can be understood through spreadsheets, enabling even non-developers to explore AI concepts directly with minimal abstraction.
The article criticises the hype around AI and Large Language Models (LLMs), arguing that instead of leading to a technological singularity of super-intelligence, we're more likely to encounter a "[bullshit](https://www.theguardian.com/news/2017/nov/23/from-inboxing-to-thought-showers-how-business-bullshit-took-over){:target="_blank"} singularity" where the internet becomes flooded with low-quality, AI-generated content, making it difficult to discern truth. _ED: There is more than a little bit of irony with using GPT to summarise an article criticising the rise of AI-generated bullshit. Which is why, careful reader, I make sure that I read what the AI-generates and then editorialise as necessary._
This is an excellent and highly detailed primer on Large Language Models (LLMs). This paper covers the significant recent advances in natural language processing, with key developments in model families like GPT, LLaMA, and PaLM, and includes ongoing research focusing on building, augmenting, and evaluating these models against various benchmarks. Also in [PDF](https://arxiv.org/pdf/2402.06196v1.pdf){:target="_blank"} format.
The article introduces *Matryoshka Embedding Models*, which are designed to produce useful embeddings of variable sizes, allowing for more efficient performance in downstream tasks without a significant loss in effectiveness. These models, inspired by Matryoshka dolls, prioritise important information in smaller, truncated embeddings for tasks like search or classification.
The OWASP LLM AI Security and Governance Checklist provides a comprehensive framework for ensuring the security and responsible governance of Large Language Models (LLMs), addressing risks, legal and regulatory considerations, and strategies for deployment and evaluation.
OpenAI's Sora is not just a creative tool but a sophisticated data-driven physics engine capable of simulating complex, realistic, or fantastical worlds with detailed rendering and physics. Although, there [seems to be some debate](https://twitter.com/moo9000/status/1758218635485528202){:target="_blank"} as to the degree to which Sora is _actually_ a "data-driven physics engine".
This strongly worded opinion piece from [Punks and Pinstripes](https://www.punksandpinstripes.com/){:target="_blank"} argues against WPP's heavy investment in generative AI, likening it to a decoy masking stagnation akin to North Korea's strategy with nuclear investment. It suggests that while AI can handle operational tasks efficiently, it stifles creativity in fields that thrive on human ingenuity, urging companies to balance AI use to avoid creative atrophy.
What Apple does with machine intelligence in 2024 is anyone's guess. Whereas the other Big Tech vendors tend to release incrementally, Apple (traditionally) likes to save up releases for one big announcement each year, so we will have to wait and see. Some [breadcrumbs are starting to emerge](https://github.com/apple/ml-mgie){:target="_blank"}.
GeneGPT is a novel approach designed to improve large language models by utilizing NCBI Web APIs for accurate biomedical information retrieval, achieving state-of-the-art performance on GeneTuring tasks. This method not only enhances accuracy in specialized knowledge areas but also showcases the effectiveness of API demonstrations over documentation for in-context learning.
This video examines a recent Harvard Business Review [paper](https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf){:target="_blank"} "Navigating the Jagged Technological Frontier" and explores the implications of the paper's findings on the use of generative AI in professional knowledge work environments.
The article analyses 5M freelancing jobs to identify the impact of AI on various job categories, finding that writing, translation, and customer service jobs saw significant declines, whereas video production, graphic design, and software development jobs increased. It suggests that while AI has replaced certain tasks, it has not yet fully replaced creative and technical jobs.
[Jeff Dean](https://research.google/people/jeffrey-dean/){:target="_blank"}, Google's Chief Scientist, gives a (Google-flavoured) talk on advancements in AI and machine learning, highlighting the creation of more capable, general-purpose systems like the Gemini family of multimodal models, and their applications in science, engineering, and health, underscoring the collaborative efforts at Google.
Sam Altman, CEO of OpenAI, is seeking to raise trillions to expand global semiconductor capabilities, aiming to address the shortage of AI chips and advance the development of artificial general intelligence. A trillion here, a trillion there. Pretty soon you're talking real money.
Regards,
M@
[ED: If you’d like to sign up for this content as an email, click here to join the mailing list.]
Originally published by M@ on Medium.
Was this useful?