AI and Machine Learning: Top 5 Trends
//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>
Artificial intelligence adds a new level of intelligence to systems as diverse as a supercomputer working on the discovery of new drugs, a mobile phone applying face-flattering filters to its camera, and IoT nodes that perform a level of… analysis on sensor data without the need to send anything to the cloud. The result for the semiconductor industry is an increased emphasis on techniques to run AI more efficiently on existing hardware, on new chip types for heterogeneous acceleration, and on futuristic technologies that can satisfy the ever-increasing demand for AI computation.
Here are some of the AI-specific trends that we think will begin or continue over the next two years.
Transformers take over
Transformer networks have been widely used for natural language processing for quite some time, with impressive results. Today’s large language models (LLMs) can be used to power chatbots, answer questions, and write essays, and the text they produce is often indistinguishable from what a human might have written. . These networks use a technique called attention to explore the relationships between words in a sentence or paragraph. The problem is that to truly understand the language, LLMs must consider the relationships between the furthest words in the text. The result is that transformer models rapidly increase in size, as do the computers needed to train them. The formation of GPT-3 has been estimated at millions of dollars – that is, to train a model, once. Despite huge costs, the demand for accelerated computing from transformer networks is not slowing down. Economic or practical limits to the size of transformers – if they exist – have yet to be seen, let alone reached.
Transformers are increasingly being applied to various use cases. This includes vision transformers, which look for relationships between different pixel patches in an image. They are also used as a sort of intermediate step to teach neural networks the divisions of science or industry that can be described in language. The idea is quite simple: Need to use AI to discover the relationships between drugs and their side effects? LLMs can be trained using data from medical journals and articles, then simply ask your question and wait for the response. In theory, any facet of human knowledge for which there is significant unstructured linguistic data (books or scientific texts) would be applicable here, although skeptics point out that not all human knowledge can be represented using the speech. However, the technique is certainly powerful.
Any convergence on neural network topologies, which at the moment seems likely for processors, will of course make things easier for chipmakers. Transformer-specific acceleration features are already appearing in chips like Nvidia’s H100 and will continue to emerge.
Parsimony is a brain-inspired concept with practical applications through AI acceleration. If a neural network is sparse, it means that a significant number of its parameters are zero. Some types of networks are more sparse than others, but in general high levels of sparseness are perfectly common. The implications for AI accelerators are that when we multiply two numbers together, if one of the numbers is zero, we already know the answer will be zero. If we can skip this calculation by going straight to the answer, we can save time and energy.
Although it sounds simple, the application space for parsimony is complex and still quite immature. Pruning, or software techniques that remove branches of the neural network downstream of zeros or near zeros to reduce the size of the network, are well understood but often require tedious manual tuning. Automated techniques for taking advantage of finer parsimony are emerging, and we should also expect to see smarter ways to use parsimony in chip design.
Chiplets for all
The acceleration of AI has been a major driver of heterogeneous computing over the past few years, and this trend is sure to continue as Moore’s Law slows down. Heterogeneous computing refers to the technique of system design in which accelerators for specific workloads are added to more general computing hardware like processors, either as separate chips or as blocks on a SoC. This trend is evident in the data center, but endpoint SoCs for everything from home appliances to mobile phones now have specific blocks dedicated to accelerating AI.
For large-scale chips, such as those used in data centers, chips are an important enabling technology. Chiplets make it possible to build huge chips by connecting several similar reticle-sized chips through a silicon interposer, but they also allow heterogeneous computing by allowing the connection of the processor, memory, and chip. acceleration at high bandwidth. Chip technologies are maturing and will continue to do so over the next two years as we see chips like Intel’s Ponte Vecchio hit the market.
Shedding Light on Photonics
The maturation of silicon photonics manufacturing and processing technologies enables a whole new computing paradigm: optical computing. The exchange of electrons and electric currents for photons and light waves has the potential to create ultra-fast computers. Light travels through silicon waveguides like a wire, and the photonic equivalent of multiplication and accumulation (MAC) units can now be reliably built at scale to fabricate high-end computer chips. These techniques have been applied to chips for AI workloads, which require a high proportion of MAC operations at extremely high speed. Exciting features include the ability to shine in light at different wavelengths to efficiently perform multiple inferences at the same time to further speed up AI applications.
Companies like Lightmatter and Lightelligence have demonstrated that system-level challenges, including packaging that incorporates electrical and photonic chips, can be overcome. Although these two companies are the most advanced, there are still companies emerging in this space with new ideas.
Neuromorphic computing refers to chips that use one of many brain-inspired techniques to produce ultra-low-power devices for specific types of AI workloads.
Although “neuromorphic” can be applied to any chip that mixes memory and computation at a fine granularity level and uses many-to-many connectivity, it is more frequently applied to chips designed to process and accelerate neural networks. spiked (SNN) . SNNs, which are distinct from traditional AI (deep learning), copy the brain’s method of processing data and communicating between neurons. These networks are extremely sparse and can enable very low power chips.
Our current understanding of neuroscience suggests that voltage spikes travel from neuron to neuron, with the neuron performing some form of data integration (roughly analogous to applying neural network weights) before trigger a spike at the next neuron in the circuit. Approaches to reproduce this can encode data in peak amplitudes and use digital electronics (BrainChip) or encode data in peak synchronization and use asynchronous (Intel Loihi) or analog (Innatera) digital electronics.
As these technologies (and our understanding of neuroscience) continue to mature, we will see more brain-inspired chip companies, as well as further integration between neuromorphic computing and neuromorphic sensing, where there there are certainly synergies to be exploited. SynSense, for example, is already working with Inivation and Prophesee to combine its neuromorphic chip with event-based image sensors.