VRguy podcast Episode 7: Layla Mah, Insightful VR, on the evolution of GPUs for VR

Layla MahMy guest is Layla Mah, CEO of InsightfulVR. This episode was recorded on Mar 30, 2016

Layla and I discuss what GPUs can do and cannot do with video, audio and haptics. We talk about the evolution of the graphics architecture, VR standards for rendering and more.

Before founding InsightfulVR, Layla Mah was AMD’s Lead Architect of VR and Advanced Rendering, and the creator of the company’s LiquidVR technology. She was responsible for the company’s unified vision, strategy and architecture for current and future hardware and software as it relates to virtual reality and other advanced rendering paradigms.

Subscribe on Android

subscribe on itunes

Interview transcript

Yuval Boger (VRguy):     Hello, Layla, and welcome to the program.

Layla Mah:          Hi, how are you? Nice to be on the program.

VRguy:  Thank you. Who are you and what do you do?

Layla:    My name is Layla Mah and I’m founder and CEO of InsightfulVR. I can’t really say much about what the company is doing since we’re fairly new and staying pretty quiet, but I’ve been working on graphics and VR problems for quite a while now. My expertise is in those areas.

VRguy:  You were a part, or actually the technical leader of the LiquidVR team, right?

Layla :   Yes, at AMD I was the lead architect of VR and Advanced Rendering, and basically lead the VR project from start to product. I named the thing as well, so it’s always fun.

VRguy:  Yeah, that’s good when you get to choose the name. We at Sensics, in our role in OSVR technical team, we’ve worked both with LiquidVR and also with NVIDIA GameWorks VR.  I think that the functionality that both deliver are quite similar. You have to be able to detect an HMD, to put it into direct mode, to give high priority to the rendering task, to know when Vsync is on, and so on, and so on.

Layla :   Yeah, I think that it’s natural, that they both evolved to become a mirror image of each other for the two IHVs. In the ideal world, it would have been vendor-neutral and there would have been one standard, but I think what happened was that we saw a need for the industry to fix these initial problems with the HMDs and each company took to fixing the problems as quickly as possible and standardization would have taken too long, so I think it just ended up happening that way as a natural evolution.

VRguy:  Do you see any chance of having some kind of a standard in the future, or a common API to both and even to other vendors like in Intel or Qualcomm on the mobile side?

Layla :   If that doesn’t happen it would be almost silly, because like you said, the features are largely the same. There are some implementation details that are different on each side, but standardizing them shouldn’t be much of a problem. In fact, if you look at the implementations, like Oculus, where they have their Async Time Warp and stuff, there are some differences, but there’s not a really good reason why it couldn’t go through the same path.

VRguy:  Actually, what we’ve done with OSVR is created a standard interface for this kind of functionality, and I think we ended up implementing NVIDIA first, and then AMD. The implementation to AMD was really quick, and now an application developer can use the OSVR wrapper and we do the magic underneath. It’s not that big of a magic, you did most of the magic for us.

Layla :   Exactly, your wrapper kind of proves how close they are. You would have had a much tougher time if there was big differences, and I think the biggest difference at this point is in the … if you want to do something like Asynchronous Time Warp, which is kind of a combination of the Async Shaders, and the Direct-to-Display functionality on AMD’s side, and also the late-latching, but if you want to do that, then you end up needing to use a Compute Shader on AMD, where you’re using normal graphic stuff on NVIDIA at the moment. That’s really the largest difference, I think.

VRguy:  Do you see this extending to other vendors, how about mobile? Same type of API, same type of functionality on the mobile, or is it too much OS-specific?

Layla :   I think it’s a bit different set of problems on mobile. Mobile doesn’t have as much expandability as PC. In PC, one of the problems was, if you have two GPUs, shouldn’t you be able to leverage those for VR, and before these APIs, two GPUs in VR was actually a worse experience because alternate frame rendering, which is the way that multi-GPU rendering have been done for years, actually adds latency. When you add a second GPU traditionally, you’re adding latency rather than removing it, so both AMD and NVIDIA, went ahead and created solutions to this that actually allowed you to use two GPUs, get better performance, but also reduce latency instead of add it. On a smartphone you don’t tend to have multi-GPUs, or four GPUs, it tends to be one standard. Similarly, you don’t tend to have your HMD as a separate display device with a desktop and everything, so some of the problems that we were trying to solve were really PC-specific.  For example, you mentioned Intel. There’s really no reason Intel couldn’t jump onto a common standard with AMD and NVIDIA.

VRguy:  Just while we’re on this topic, how about other operating systems?  I could have my PC hardware run Linux, when can I get this kind of driver functionality for Linux?

Layla :   I pushed early on for it to be on Linux and all the platforms. Unfortunately you’ll see things like Oculus dropped Linux support, I believe they dropped Mac support a while back, too. It ended up being that the first common platform was Windows PC, but ideally you’d want to see this on Linux, you’d want to see it on Mac OS, you’d want to see it on every platform, and I hope that the IHVs will follow up on that.

VRguy:  I hope so, too. Not to promote OSVR too much on this podcast, but we do have what we call Render Manager running on other operating systems such as Linux. We certainly have a community, for instance the Mac community or the WebVR community that are looking for higher-performance VR experiences on Windows platforms.

Layla :   If you think about it, if you one day wanted to make an embedded headset that ran an APU or some kind of small ASIC right on the headset, you’d probably not want to put Windows on that unless you were Microsoft.  You’d probably put Linux on that, it gives you more flexibility in turning things off that you don’t need, and setting up scheduling the way you want it to be, so it makes sense that eventually we don’t want to have our HMDs tethered to PCs by HDMI or display port cables forever, we probably want a headset that actually has self-contained computers inside of them, and then Linux and Windows both will probably be platforms that are important, but I definitely see Linux being more important in that embedded case.

VRguy:  Yes, I agree. I saw somewhere that you were speaking about the GPU as also doing other things, for instance, audio. Could you explain why that would make sense?

Layla :   For one thing, audio becomes so much more important in VR. When you’re looking at a flat display and you have headphones on, turning your head relative to that display is not a very viscerally connected thing. Your body is not 1 to 1 mapping that audio with what you’re seeing, it’s kind of a disconnected thing, so it’s not very important to the experience, relatively, but when you’re wearing a headset where your head is tracked, and your vision is perfectly tracking with your head movement in the virtual world, suddenly your audio should do the exact same thing, it should actually track perfectly with what your head … when your head moves, the audio should change because the directions from which the audio is arriving at your ears are different.

                This is a level of audio simulation that nobody’s dared to pursue since the late 90s when audio fell off a cliff in terms of research and development and consumer products, but I think with VR becoming so big now, and AR as well, I think there’s a resurgence in the need to pursue audio.

                If you think about it, audio is a lot like light. Waves bouncing around in the world, reflecting off things, refracting. In some ways, it’s even more complex, because the frequency spectrum of audio is much wider, 20Hz to 20kHz is what humans can hear. From that perspective, you actually start to think of GPUs and ray tracing, and all these other techniques that’s quite applicable to audio, and if you really do it right, you could spend teraflops of compute on audio, and still not do it right. It becomes something where, in the next few years, I think we’ll see the importance of audio, the importance of audio engines, and we’ve had this resurgence of, in graphics, everybody’s moving towards physically-based rendering, I think we’re going to see audio finally moving towards physically-based audio as well.

VRguy:  Just like you have transparency and light-reflection properties of triangles, you could have the sound-reflection property-

Layla :   Exactly, material properties for audio as well, and most of them are similar. The material properties that affect light might be smaller scale. You can have micro-geometry that actually reflects light in all these different ways, and in audio, maybe the features that affect it are larger. In general, you want the same system, you want a render that’s actually going to look at the material properties for light and for sound, and in theory, process them together, walk the triangles together, all of those things.

VRguy:  Though the resolution for audio is probably lower.

Layla :   Yeah, in general, the features, like we’re saying, that are going to affect what you hear, are probably much larger, resolution-wise, they don’t need to be as detailed in some sense.

VRguy:  That’s good news, last thing you want is another 4k audio processor.

Layla :   Right.

VRguy:  Could the same also be said about haptics? You’ve got this GPU, massively parallel computer, that can do localized calculation, I could envision objects would have touch properties as well if I wear a glove.

Layla :   I would agree, the main difference I would see between audio, graphics, and haptics, is that, in the future, I think that graphics and audio may go more to a global simulation, rather than a user-local simulation. At the moment, when we render things, we render it in the space of a screen, what you see, and then in the next frame, you move your head, and you have a new screen, and we render through that camera, what you see, but in the world of lighting, you could actually think about doing a lot of things outside of the space of what you see, and the same goes for sound, whereas with touch and haptics, it’s really going to be very specific to what one user is seeing. The computations will be much more local than global.

VRguy:  I see. If we think a few years back, or maybe many years back, there are processors, and then there were math coprocessors, used to have this chips that could make sine functions really fast, and then that became embedded into the CPU and now you’ve got GPU as an assistant for the rendering and maybe sound and haptics function…

Layla :   Even GPUs have the special instructions to make sine go really fast now.

VRguy:  Sure, but my question is, do you see … I sometimes call it a VRPU, Virtual Reality Processing Unit, that would also sit next to the main processor and deal with sensor fusion, and deal with predictive tracking, and deal with all the things that are specific to VR, or you still see that just part of what the CPU has to do regularly?

Layla :   I really see everything as more coming together than spreading apart. Technology does tend to go through cycles, and we can all point to where something that was old became new again and vice versa, so I’m not going to say it won’t happen in the meanwhile, but I think, with things like APUs, where a CPU and a GPU can co-exist on the same die and can share memory and all of these things, you actually get to a much more efficient place with certain algorithms if you’re not limited to them being on one type of processor or another, but actually being able to freely move between the right type of processing, and/or just be distributed between them locally.

                For example, a GPU core right next to a CPU core. Currently, when you have an APU, you’ve got, say four CPU cores, and then say 8 GPU clusters, and those are two separate sides of the die. I think that, in the future, you would possibly see something becoming more integrated as opposed to less. You might see a CPU core right next to some GPU compute units, right next to a rasterizer, right next to some local memory, right next to some other processor that does video processing or something like that, and you might have clusters of these that are then stamped and parallel to each other.  I actually see things becoming more heterogeneous and more integrated as opposed to more specialized and spread apart, if that makes sense.

VRguy:  Absolutely. Do you think the computing architecture is going to change? Today, we have an application that uses a game engine that figures out where the objects are in space, and then pushes the down to a GPU for rendering and then trying to render an increasingly large number of pixels on an increasingly higher frame rate. You see that changing in a substantial way?

Layla :   I definitely do. I think that largely, GPU architecture has been very evolutionary for the past 15 years or so, the changes have been quite obvious, and I think that from some perspective, that will continue, but I look at the actual largest changes in architecture becoming between the server side and the client side. At the moment, we don’t really differentiate much the GPUs or CPUs that go into a server from those that go into a desktop or a local device. Intel has their Xenon for the server and their non-xenon, their core for the desktop, but largely they’re the same architecture, like sky-lake for example.

                I actually see that, in 10 or 15 years, what we may end up having is something on the server that is massively parallel, and massively focused on compute and focused on being able to crunch a lot of data, a lot of power by comparison. Whereas, on the client’s side, I see us moving more towards clusters of, again, fixed-function hardware and things that are really low-power, and actually having, in some sense, a lower compute ration on the client than what we’ve seen in the past. I actually see that, in the future, you would have a separation of software, where a lot of things would be done in the server, in a global sense, that can apply to all the clients, and then you’d have little one watt, or five watt devices on the client’s side that were putting all that back together. The real goal is to have the power of the cloud, but accessible from a device that’s the size of a Fitbit or smaller, and that’s never going to happen with the way current architecture scale.

VRguy:  Right, especially now that you get into multi-player, you have an increasingly large computational burden to figure out where these players are and how they interact and what they can do to each other and it’s getting increasingly difficult to do that locally on a single client, I think.

Layla :   Yep, and especially when you’re doing … let’s say you’ve got a multi-user experience and lots of people are connected. If all the clients are duplicating the same work over and over, you’ve really wasted a lot of power. Effectively, you’ve done less than you could have if you had offloaded the common stuff and only worked on the local stuff, locally. I shouldn’t say what I just said, that it’ll never happen.  I think if you wait 50 years, Moore’s law, if it keeps going even remotely how it’s been, eventually we’ll get teraflops and petaflops on our wrists at one watt, but it’s a long time to wait, so what I really mean is, in the interim, I think that having a distributed and parallel system with different architectures on the server’s side and the client’s side, can get us much more realism and presence in these virtual environments much sooner.

VRguy:  In the short term you’re seeing local optimizations like multi-resolution rendering or foveated rendering and other algorithmic improvements, but further out, you’re seeing more of an architecture change in the way things are done.

Layla :   Yeah, I think so.

VRguy:  How does this translate into human interface, what you can do with it, how VR impacts humanity, and how the new displays and input devices come into the picture?

Layla :   I think that one of the biggest changes will be … currently, VR is very much a tethered and solo experience for the most part, even with these network experiences like Toybox, people are physically separated from each other. I think one of the biggest changes will be to make that into a group experience. By that I mean physically co-located groups as well as distributed groups.

                At the moment, we all need $1000+ PCs, cables, headsets, and everything in separate spaces to go into VR together, and like I said, we could do something like Toybox, but you could imagine in a couple years, wireless headsets, where we could all put on a headset and be tracked in the same physical space, and so if we wanted to give each other a high-five in VR, our hands could actually touch each other because we’re in the same space. I think once you get to that point there’s a huge amount of possibility for group exploration of things … imagine in schools, if you could have an affordable system where all, say 30 students in a class could go together into VR into a Magic School Bus experience with the teacher, I think that would just be a transformative learning experience that we can’t currently achieve, or at least it’s not feasible, technologically and financially, it’s a very big bar at the moment, and so if we can lower that bar, there are just so many applications that can improve. Learning, medicine, social interaction, especially for people who might have certain disabilities, there’s just a lot of ways that we could improve connection between people in both the non-virtual and the virtual worlds. I would really like to see that change happen so that VR becomes less isolating and more of something that brings people together.

VRguy:  I think we first have to make, or in parallel we have to make VR more accessible. A $500 system that gives you excellent VR experience instead of three times that today.

Layla :   I would even go further and say we get to be a $50 system, you know, that gives you better experience than what we have today

                For the moment, there’s still not that … at the moment, what is it, a few million people who can afford the current level of hardware to get into VR. High end VR. Maybe, if you got it down to 500, that becomes 30 million, or 50 million people, but if you get it down to $50 per person, then it becomes something like possibly billion people.

VRguy:  Layla, this has been great. Where could people connect with you if they want to find out more about what you do online?

Layla :   There’s a few ways they could email me at Layla@InsightfulVR.com or visit our website at www.insightfulVR.com and maybe sign up for our newsletter as well, or they could find me on Twitter @MissQuickstep. Any of those should work pretty well.

VRguy:  Excellent. Thanks again for coming on to the program.

Layla :   Thank you so much for having me. It’s been great.

Related Posts