This year’s GTC started with NVIDIA CEO and Founder Jensen Huang’s Keynote announcing what will be released this year. Since COVID most of the world’s business meetings have been at unusual rooms of our own houses, and this opening Keynote was not different, with Jensen recording it from his own house kitchen¹. Last year’s edition was made from there too, maybe evidencing that 2020 and 2021 still look very similar.
Jensen cookings main dish for this year: “Grace”, the first CPU released by NVIDIA. Grace is Arm-based, no surprise after the company intended to buy Arm last September. Governments regulators are still considering whether to approve the merge, but that has been not refrain for NVIDIA to present this CPU saying it will let datacenters achieve ten times more performance than Nvidia DGX systems which are based on Intel’s x86 processors. Grace will be available in 2023, and already has an interested early adopter: the Swiss National Supercomputing Center for its upcoming supercomputer named Alps. The others dishes on the menu were a new generation of its data processing unit BlueField and the DGX SuperPOD, a infrastructure minded to be used for cloud AI research so huge you probably are not consider buying it if you are not SONY, but you can now use your share for a $9,000 monthly subscription.
But not everything is hardware at Nvidia. They released Omniverse, a collaborative platform for 3D design and simulation. Design needs high performance computers, and that is generally a limitation for meetings and parallel work, especially during pandemic times when you can’t just walk to your teammate desktop. With Omniverse, anyone can work on any project at any computer, even at the same time. Big design teams like BMW, Foster + Partners and WPP have been using Omniverse beta since December.
Opening keynote ended with Jensen’s proclaiming that “ultimately, Nvidia is an instrument, an instrument for you to make your life’s work”, and then the whole kitchen got dismantled part by part. Drawers, spoons, chairs and furniture fly away from the scene, letting us wonder if that kitchen was ever real or just an Omniverse simulation. Then, GTC starts, and over a thousand virtual sessions, speakers talk about ways to use Nvidia as an instrument.
Conference agenda is huge, so much it even gets hard to read the list of sessions for a single day. The topics list is very wide, and so are the industries concerned. It could be something so specific as “Neural Networks for Exotic Options Risk” or a title that would interest even your grandpa, like “The Next Decade in AI”. Some sessions are highlighted by Nvidia, but besides that it’s up to you to make your own way diving in the agenda and finding what captivates you.
With more and more AI-based systems getting to production every day (finally!), MLOps is hotter everyday, and GTC agenda wasn’t alien to that. MLOps was presented by WWT on their talk as a matter of evolving or disappear. They share a template on how this evolving should happen inside an organization, with progressive adoption of MlOps not only on a focused team but making everyone part of it. With so much said about MLOps, sometimes it gets into a celestial paradise we all want to get to, but don’t really know how it looks like. That’s why I particularly enjoyed Moses Guttmann’s from ClearML session, where he show how terrestrial MLOps looks like: creating a template training on your machine, making it work locally, and then just deploy that template to remote machines and make multiple experiments, searching the best combination of hyperparameters, trying with different dataset, everything shown directly sharing his screen using, of course, ClearML. As Nick Elprin said on J&J talk: “Please don’t try to build all the infrastructure yourself! It’s a sexy problem, but your engineers are going to be much more valuable focusing on problems that are unique to your business”
If Nvidia is itself a tool, then ML frameworks like TensorFlow and PyTorch are the tools that let us use that other tool. These frameworks have seen frenetic develop last years, trying to stay updated with both academic research and industry trends. API’s are getting increasingly larger and a ML learning model is no longer defined by code and weights, but now model=code, as exposed on this talk by Soumith Chintala. Unless the models stabilize in some stable architecture, we won’t see much development on specialized hardware, and while this doesn’t happen, ML frameworks just have to keep up with changes with high-velocity.
Soumith exposes that, if Data-efficient models via priors end up being where it all goes, then the ML practice will consist of applying priors of a knowledge body to a particular case, like for example nowadays lawyers do when working.
Democratizing priors libraries is revealed indeed as the direction on The Next Five Years for Keras and Tensorflow, trying to make a better UX design to increase productivity by creating an ecosystem of reusable parts: KerasCV for Computer Vision problemas, KerasNLP for language, even “antigravity should be as easy as import antigravity”. The ultimate UX design for a ML framework is just an API to provide user dataset and a metric to optimize for, and the algo will do everything for the user. The user work will be to focus on problem specification and cure the dataset. Deep Learning should be truly possible to everyone, as today’s anyone can make a website. On PyTorch’s State session, the same emphasis was placed on better frontend APIs desire, and they announced a very interesting release: the PyTorch profiler, that will let us see a performance analysis of the model and its different layers on a Tensorboard tab.
More academic research had also its place at GTC, with Yann LeCun, the father of Convolutional Networks, presenting a Energy-Based view of Self-Supervised Learning, both introducing how Energy Based models work and how they can be used on SSL. Modern Hopefield Networks are presented by Sepp Hochreiter (who formulated the vanishing gradients problem) talking about how they work and how and they are integrated to act as “Deep Learning with Memories”. Featuring equations and integrals these are definitely not the kind of talks you will want to watch at 1.5x speed.
How is the industry adopting so many AI develop and released tools? GTC sessions are a good opportunity to hear the industry trends directly from the tongue of its creators.
During lockdown, virtual meetings have been more prominent than ever, and its forced and fast adoption opened the door to a lot of innovative solutions using AI. Automatic detection of somebody raising the hand to ask for attention, real time speech to text conversion and translation between different languages and even having an AI summarize the call and bullet points on the discussion are some of the features that will disrupt our experience at virtual meetings.
One thing to stress out before start talking about this industry is that according to the experts in the call the aim of A.I is not replace human cognitive abilities but for the contrary, to enhance and automate tasks that are best suited for machines (This is a long debate and you can either agree or disagree with this argument). Whether simplifying complicated tasks or reducing mistakes in monotonous activities where human error is higher than machines. Here are some of the most relevant areas where A.I is being applied, starting with price optimization, anomaly detection, process automation, product recommendations, human verification, stock counting, and chatbots for customer care.
Even though A.I powered robots that build cars seem extraordinary and an example of next level innovation this has got a bit in the past. To give you an example of what’s coming next you can refer to BMW use case where they have done a 3D Digital Twin of the factory in Nvidia Omniverse to train new robots & people. Yes, you read correctly: no training on site which will incur a great overhead in terms of time and money, instead they utilize this 3D Twin. This won’t stop here, because this open up a world of possibilities where changes on the factory can be tested, new working routes for employees, hardware space utilization, and many many more.
With the help of amazing rendering, a great ecosystem and undoubtedly advanced A.I techniques companies like Adobe and Nvidia Omniverse are changing the world in terms of digital arts. Photoshop by Adobe which cover a wide range of uses is now launching a new feature which takes semantic segmentation to a whole new level, where basically you can remove any background instantly and even re-create motion or places that wasn’t in the image before, this is done another A.I technology call GANs.
But what about cinematography? This is an exciting topic because it blends a lot of A.I technologies to one of the most creative ways to apply Artificial Intelligence! A sneaky peek in to this technologies include human motion capture with 3D digitalization where they consist in pick up a real human pose and output a 3D virtual character, speech to face expression where given an audio input it creates a 3D mesh face that talks with the emotion of the speech and automatic scene generation. All this creates a perfect combo for passionate people from all around the world to express they creativity using this forward thinking tools. To see this in action you can refer to this talk: Intro to Omniverse Machinima
With so many releases this year, so many encouraging talks and after assisting to the wonderful review of AI history Jürgen Schmidhuber made on his session, how could one not believe we are very close to the singularity point? I recommend listening to the more conservative talk from Gary Marcus, who reminds us about many previously failed forecasts about AI develop. We have been obsessed by big data and deep learning, and this is good for learning, but bad for abstraction. Classical ML models, on the other hand, are good for abstraction, but bad for learning. Gary calls for hybrid models, as we need ways to represent knowledge about space, time and causality, and only truly reasoning and cognitive models are going to led us there.
Hope this short review serves you as path through 2021 GTC. For sure there are a lot of other talks that are worth mentioning here, so comment if you would like to particularly recommend one!
 Jensen kitchen hasn’t changed much since last year, the spoons are on the same place, and so are the salt and pepper grinds. So we can say he doesn’t cook food as much as he cooks new GPUs.
Exploring how to extract a complete 3D human body pose from just a single image using SMPLify-X, a revolutionary method to generate a complex, expressive 3D model of the human body, including hands, face, and body
Following up on our previous article about Hand Detection, this is the second part of our series about the possibilities of hand recognition technology. Here, we dive the design, training, and testing of a Hand Keypoint detector, a Neural Network capable of detecting and tracking hand movements.
Introducing our series about hand recognition technology, we define the importance of hand recognition, its potential to revolutionize human-device interactions, the evolution of hand recognition technology and the different stages of hand detection and gesture recognition.