VR and AR: The Real(ity) Story
SEE LAST PAGE OF THIS REPORT Paul Sagawa / Artur Pylak
FOR IMPORTANT DISCLOSURES 203.901.1633 /.1634
psagawa@ / firstname.lastname@example.org
December 21, 2016
VR and AR: The Real(ity) Story
Virtual reality advocates hope the technology ends up more like HDTV than like 3D TV, but biological constraints, the demands of full immersion, and technical obstacles seem likely to limit VR to niche applications. Meanwhile, AI-fueled augmented reality solutions, interpolating digital images into a user’s view of the world, can address a much wider and more lucrative range of opportunities. While component costs for quality AR displays will likely keep them out of the consumer mass market for years, high value enterprise applications – e.g. machine maintenance, training, hospitality, security, etc. – could drive development until the wave guides and other parts hit volume economics. Commercial solutions will marry AR hardware with deep learning software required to integrate appropriate images precisely into the user’s perspective. 3rd parties may then adapt more general purpose platform products to the specific needs of a vertical application. MSFT (HoloLens), GOOGL (Glass), Magic Leap, and AAPL are all in the very early phases of delivering market-ready AR platforms. Of these, we see MSFT as ahead, based on its AI prowess, enterprise focused ecosystem and substantial progress to date.
- The reality about virtual reality. Hype around VR stepped up following FB’s $2B 2014 deal for consumer VR pioneer Oculus. While the experience is compelling, we see substantial drawbacks to the technology. First, fully immersive systems are impractical for most applications and run counter to decades of increasing mobility. Second, “sim sickness” – a physical reaction that many people have to VR – restricts developer choices in designing user experiences, eliminates many forms of cloud-delivered content (i.e. live events, multi-player interactions, etc.), and could have lingering cognitive effects on users who over-indulge. Finally, the cost of developing VR content, the lack of clear standards and tiny penetration of VR systems leave the technology far from critical mass.
- Cheap and dirty VR for consumers. We believe high end, deeply immersive systems, like FB’s Oculus, will fail to reach critical mass, appealing narrowly to high end gamers, who may be disappointed relative to their expectations. Lower end smartphone-based systems have more of a chance, but the need for specialized viewers and their anti-social nature will limit their uptake. We see both as niche markets, rather than game changers.
- Augmented reality will be the paradigm shift. AR systems, which project digital images directly into a user’s field of view, solve the two major problems with VR – they are fully mobile and do not cause sim sickness – albeit sacrificing full immersion. We believe AR will eventually replace displays in most devices – smartphones, PC monitors, industrial instruments, etc. – while enabling entirely new applications. Information about things that the user is seeing could be interpolated directly on those things – customer records, medical diagnoses, repair schematics, maps, tourism brochures, etc. – along with instructions for action, all without glancing away.
- AR hardware will be very expensive to start. Demos by AR pioneers MSFT and Magic Leap have left their audiences amazed, but these bulky, tethered prototype systems are far from commercial reality. Current iterations rely on “glass” etched with waveguides – one layer and one driver for each color on each of several focal planes. This adds cost, bulk, and image quality compromises that will preclude AR from mass markets for many years. Patents have been filed on solutions that could potentially resolve these limitations, but the technologies are years from commercial viability.
- AR demands world class AI. Unlike VR, AR systems must accurately interpolate appropriate digital images within a user’s field of view. Image processing – in this case, analyzing real time video from cameras capturing the view from each eye – uses deep learning AI to identify the aspects of the scene necessary for the application and provide a precise digital map. Very simple applications may just look for a blank wall, but more complicated solutions will require extremely sophisticated analysis done very, very quickly. Few scientists in the world are capable of leading teams toward the more complex end of this spectrum. Here, we believe MSFT leads the way, applying its AI expertise to AR hardware tech acquired with Nokia to build its HoloLens prototype systems.
- Enterprise first. AR will be impractical for consumer markets for several years, given hardware costs and remaining technology hurdles, but we see enterprise applications where the clear business value of a heads-up display connecting digital information to objects in view could justify hefty prices. These systems may not require as many focal planes or even full color to be useful. For example, a system that could overlay the schematics of a properly configured system onto a repair technician’s view could boost productivity significantly. We see obvious applications to maintenance/service, medicine, hospitality, manufacturing, security, and many other fields. Adoption of the technology into industrial markets would also drive costs down, while encouraging further development and standardization.
- Replacing displays. As AR becomes less cost prohibitive and more standard, we believe that wearables featuring the technology could spark a paradigm shift away from fixed displays for consumer electronics. Google Glass was a failure not just for the geeky/creepy form factor, but for the very poor quality of its display and the lack of thoughtful applications. We believe AR offers rich opportunities to integrate digital life with real life, and could one day make checking a smartphone as much of an anachronism as faxes and payphones. This future will not occur this decade, but it will come not long after.
- Who will win? We see AR as the game changing technology, with VR likely to be a modest niche. We believe the first big successes in AR will come from high value enterprise applications, with mass market platforms following well behind. In this scenario, we see MSFT, with its enterprise ecosystem, AR hardware bone fides, and AI prowess as the early leader. GOOGL has the technology chops and strategic urgency to challenge. AR may be AAPL’s chance to re-establish its platform mojo, but has major R&D work to do, particularly in AI, to get in the game. AMZN, BIDU and FB are wild cards. There will also be HW component winners, but it is too early to call. Similarly, we expect start-ups to adapt AR platforms for specific high-value enterprise apps, but this market is nascent.
Is This the Real Life? Is This Just Fantasy?
In buying VR pioneer Oculus in 2015, FB CEO Mark Zuckerberg gushed about VR as a primary user interface but we are highly skeptical. First, fully immersive VR users are tethered to a computer and cut off from the real world, reversing decades of momentum toward mobility and social applications. At some point, the novelty wears off and the inconvenience and isolation sets in. Second, military VR training systems have found many people physiologically intolerant, and most cognitively impaired by more than short exposures. Avoiding “Sim sickness” requires extremely low latency, longer focal lengths, and restricted experiences (no flying!). This functionally eliminates live streamed events, including multiplayer networked gaming, which have inherently excessive latency. Finally, VR is far from critical mass without clear standards, limiting its appeal to both consumers and developers. We believe that the technology will settle in as two separate niche markets – low end smartphone applications (simple games, virtual tourism, etc.) and high end gaming systems – neither of which is likely to be particularly lucrative.
Augmented reality is another story. AR systems leave users mobile and present in the real world, projecting digital images into their field of view. That grounding also neutralizes the disequilibrium behind sim sickness, removing a major potential barrier to future adoption. Still, there are substantial technology barriers that must be surmounted before AR is ready for the spotlight. Most AR depends on waveguides – sophisticated light transmitting paths that can be etched onto transparent substrates, which can be layered into lenses. To offer the illusion of depth, separate waveguides for various focal lengths are required – the more focal lengths, the better the 3D rendering. To offer full color, each focal length needs three waveguides. This stacking is cumbersome, expensive and degrades the user’s vision, while the necessary drivers are bulky. The software behind AR is sophisticated AI able to interpret the objects in view in real time, generate appropriate 3D images and project them accurately.
Thus, the hurdles for sophisticated AR applications on well-integrated hardware/software platforms are substantial, and we believe it will be several years before compelling mass market products are possible. However, there are clear use cases for AR in enterprise applications where the cost and convenience thresholds are much less a barrier. These applications may not require multiple focal planes or even full color, further reducing the cost and complexity. We believe AR will find its first success here – on the heads of repair technicians, customer service reps, or physicians – before the technology emerges for consumers. In the long run – as waveguides get thinner and cheaper and as the AI software gets more powerful – we believe augmented reality systems could displace most personal devices, including smartphones, eliminating the now necessary glance at a screen and opening a world of new uses.
A few companies are going for this brass ring. MSFT, with its HoloLens initiative, is the furthest along and has impressive technology assets. GOOGL has the technical chops but has been more focused on other AI opportunities. Recent scuttlebutt puts AAPL in the fray, and the long road to commercial viability may give them a chance to catch up. FB seems more focused on VR, but things could change. Hardware focused startups, like Magic Leap, will likely need AI partners to deliver fully integrated solutions. We also see substantial long term opportunity for component suppliers – waveguides, OLEDs, GPUs, etc. – and for vertical solutions developers.
Virtual Reality Bites
Anyone who has ever waited in line for a virtual reality demo understands its appeal. Strapping on the bulky viewer is an escapist experience like no other available today. It is easy to extrapolate the thrill of flying a Starfighter in a deep space dogfight into other possible deeply immersive digital experiences – sitting courtside for the NBA finals or in the front row of a sold-out concert, holding virtual meetings with colleagues around the globe or experiencing life in “The Matrix”. It is this easy narrative that led Facebook CEO Mark Zuckerberg to buy VR pioneer Oculus for $2B in 2015. We believe that was a mistake.
Immobile and unsocial. Virtual reality seems compelling, but there are serious drawbacks. The experience – alone, tethered to a computer, seated for your own safety – is inherently immobile and anti-social, counter to decades of platform development moving in exactly the opposite direction. Applications, including Facebook, are typically more compelling on a desktop computer, but all usage growth is coming from mobile devices with their relatively tiny screens, more cumbersome navigation, and modest processors. Why? The best computer is the one that you have with you. Mobility trumps experience. Similarly, Mark Zuckerberg, of all people, should understand the power of social connection – even in games, the action is in multiplayer on-line titles which bring friends and acquaintances together. VR users, isolated by their helmets and headphones, set themselves apart from others in physical space. We believe these inherent characteristics will seriously inhibit user engagement.
Exh 1: U.S. Military Studies on Simulator Sickness
Sim Sickness. Another serious drawback will separate VR users in cyberspace as well. The military has been using high-end virtual reality flight simulators to train its pilots for many years. In the process, they discovered that a significant proportion of the population (10-90%, depending on the duration and intensity of the task) cannot tolerate visual stimulus that conflicts with the body’s experience of physical movement (Exhibit 1). When the eyes say that the user is moving, but the proprioceptive sensors in the inner ears say that he/she is sitting still, the response is nausea, disorientation, headache and fatigue. The reaction is particularly bad when there is a lag of more than 10ms between the movement of the user’s head and the corresponding change in the image being displayed. This phenomenon is known as “sim sickness” and is analogous to the seasickness that many people experience when in a moving vehicle without a view of the surrounding landscape. While individuals can become inured to a specific VR experience with repeated exposure, the adaptation does not carry over to a new experience (e.g. a new simulator or a new game) and the user must start the sim sickness process from the beginning. Some very sensitive individuals may experience lingering effects as much as 6 hours beyond their time in the simulator.
Exh 2: Factors contributing to simulator sickness
Developers can work around sim sickness by limiting the degree of simulated motion available within the virtual experience, by keeping virtual objects at more distant focal lengths, and, most importantly, by insuring that the images change at latencies below the 10ms threshold (Exhibit 2). This is a substantial constraint on application design. Certain experiences – such as human flight, close character interactions, or rapid maneuvering – are out of bounds. More importantly, the images must be served from a nearby computer – content streamed from the internet could never meet the 10ms threshold (Exhibits 3-4). This effectively dooms live telepresence and wide-area gaming applications. Even for applications within these design limitations, many users find the experience draining (e.g eyestrain, headaches, fatigue, etc.), and despite the initial exhilaration, user enthusiasm quickly fades.
Exh 3: Effect of Distance on Content Delivery Performance
Exh 4: Latency and router hops of major web properties
Critical mass. The final obstacle for mass market adoption of VR is practical. Content platforms face a chicken and egg problem – not enough viewers and developers won’t produce content and without content, consumers won’t adopt the platform. Today, several content formats compete for adoption – Oculus, Sony, Samsung, HTC, Google, Steam, and others offer their own proprietary platforms – and none has critical mass (Exhibit 5). A fledgling standards body – The Khronos Group – recently formed as a consortium amongst some of the leading contenders, but the differing visions of the technology will be difficult to reconcile (Exhibit 6). With time, we believe products will converge to two different product concepts – high end gaming systems and low end smartphone peripherals. Gaming systems will aim for very high frame rate, fully immersive systems – specs that will require tethering to a high-performance computer – and will cost hundreds of dollars beyond the cost of the host system. The PC gaming hardware market is $25-30B worldwide, but less than a quarter of that reflects spending by the true enthusiasts most likely to be drawn to VR (Exhibit 7-8).
Exh 5: Current Generation VR Offering Specs
Exh 6: Khronos Group Membership
Exh 7: Global Video Games Market Size
Exh 8: Willingness to pay for video game that can be played on a VR headset
Low end systems – consisting of an in expensive headpiece into which a smartphone can be inserted – will target a much broader audience, but may encounter indifference from users who may be relatively unimpressed by the low fidelity experience. Google Daydream and Samsung Galaxy Gear VR systems risk gathering dust on family room shelves after the thrill of Christmas morning wears off unless a steady stream of compelling content emerges. So far? Not so much.
In Summary. We believe the obstacles to VR adoption are too great for the technology to emerge as “the next major computing and communications platform” as predicted by Mark Zuckerberg upon Facebook’s acquisition of Oculus. Of late, even Zuckerberg has been more of a realist, quoted by Business Insider as saying the mass market adoption of VR could be “at least 10” years away. While 10 years might raise the standard of the experience and bring down the cost of content development enough to broaden the appeal, the isolation, immobility and physiological issues will remain. Of course, there is another way …
Augmented Reality Intrudes
Augmented reality (sometimes called mixed reality) sacrifices immersion for mobility, and in the process, solves for the social and physiological downsides of its virtual cousin. Unlike VR, AR systems keep the user in the real world, interpolating digital images into the field of view rather than replacing the view entirely. Users can move freely and safely, interact with other humans, and otherwise inhabit their normal lives while wearing their AR gear. Because their brains can ground the sense of motion with a view of the physical world, no one gets sick using AR. Nonetheless, the experience can be very compelling.
The lucky few who have been invited for a demo at Magic Leap’s locked down Florida HQ universally come away in awe. Animated robots attack over office cubical walls, virtual displays float in the air “Minority Report” style, 3D solar system models hover over a desk. Microsoft’s HoloLens group also has a killer demo, showing a “Minecraft” excavation in a conference room table and guiding users through a home renovation task via animations projected right onto the task at hand. The experiences are both exciting and practical.
The possible applications are boundless. AR could replace glass screens in smartphones, computers, tablets, TVs, industrial instruments, and anywhere else you might find them. AR could paint your world with information – people and place names, customer records, usage instructions, map directions, device schematics, diagnostic analysis, etc. AR could keep you apprised of information without requiring even a glance away. AR could entertain you in unimagined ways and help you do your job better. This, not fully immersive VR, is the paradigm shift for computing and communication of which Mark Zuckerberg has dreamt.
High Priced, ‘Cause it Feels So Nice
Of course, all may not be as it seems with Magic Leap. Its demo system, the size of a dorm room refrigerator, used bespoke lens crafted from flexible material to deliver the projected 3D images. The commercial products were to use a proprietary system called a Fiber Scanning Display (FSD), which would use a rapidly oscillating piece of optical fiber to “draw” a 3D image on the surface of a lens. Recent reports suggest that the development of the FSD has hit serious, and potentially insurmountable, technical obstacles putting its commercialization years into the future, if at all. Instead, Magic Leap is falling back to technology similar to that used by Microsoft in its HoloLens system.
The basic concepts behind the hardware are straightforward. Images are projected by chip based microdisplays – LCOS (liquid crystal on silicon), DLP (digital light processing), and OLED (organic light emitting diodes) are the primary competing technologies. The projections are transmitted into the field of view via optronic waveguides etched into transparent glass-like substrates that allow the user to see through to the real world while also perceiving the digital images. To give 3-dimensional depth, the system splits the projection into multiple focal planes, each perceived at a different distance from the user. Only 3 or 4 focal planes are necessary, as the human brain fills in the gradations between the planes as it interprets the image, but each focal plane requires its own waveguides. To achieve full color, three waveguides per focal plane are necessary. Each waveguide is a layer of substrate, so a working AR system will need 9-12 layers, each with a microdisplay driver (Exhibit 9). The multiple waveguide layers present a challenge, as they will degrade the view clarity of the real world and an engineering headache in keeping the projected images precisely aligned.
Exh 9: Waveguide technology in augmented reality
High end AR will also require very powerful local computing. The software for augmented reality must take the image of the user’s view, analyze it, determine what image to interpolate and exactly where, and then generate the image for the optronics to display. This is a job for deep learning based image processing, likely requiring a hefty GPU or ASIC on board. Today, the combination of waveguides, microdisplays and processing is necessarily bulky and expensive for demo solutions that are largely preprogrammed. Slim, stylish, and affordable AR is still a distant hope.
AR Needs AI
One of the key attributes of AR is its ability to juxtapose relevant digital images precisely in relation to the specific environment in view of the user. This aspect distinguishes augmented reality from a simple heads up display (or the game Pokemon Go) which simply projects digital content in front of anything that might be visible behind it. A car windshield that simply shows a read out of vehicle speed in the corner is not AR, while one that overlays a map with indicated directions directly on the drivers view of the streets in question would qualify. This aspect of AR demands sophisticated machine vision, which, in turn, demands artificial intelligence.
All AR approaches begin with some sort of camera positioned to see what the user sees. The images from that camera are analyzed to identify specific details to be accessed by the program – a blank wall for a virtual display, a desk or table to serve as a platform for animation, a face to identify against customer records, a piece of machinery on which to overlay schematics, etc. Given that analysis, the program then generates the digital image specifically for the intended 3D space, continually adjusting it to account for the user’s movements and the changing dictates of the program. The ability to map a perspective in 3D and analyze its contents is a major field of deep learning research, central to the future of autonomous vehicles and robots as well as augmented reality (Exhibit 10). In the training phase, a deep learning system must iterate through mountains of data to refine its ability to precisely identify and analyze those things within a video relevant to the goals of the application. Once well trained, the system must be streamlined for operational deployment, having embedded the learnings generated during the training phase.
Exh 10: Hierarchy of Deep Neural Networks
The sophistication of the AI will depend on the needs of the application. Obviously, it is easier to train a system to look for a blank wall than to recognize specific human faces or to identify flaws in machinery to be repaired. We believe that there are only a few organizations with the capability to address the most complex manifestations in the next few years – Alphabet, Microsoft, Facebook, Amazon, and Baidu lead the list, with a few others (e.g. Apple) investing to catch up. Longer term, hosted AI platforms on Google Cloud Platform, Microsoft Azure, Amazon Web Services and IBM Watson with machine vision APIs could support 3rd party AR development.
Enterprise AR – Boldly Going Where No One Has Been Before
The reality for AR is that the hardware will be expensive, limited and bulky for several years. AR AI will be tightly application specific and will add to the expense and bulk. This likely makes compelling mass market augmented reality products a decade away or more. Still, we see substantial near term opportunity for the technology.
Exh 11: Microsoft HoloLens Tech Specs
Enterprises can justify the expense and ignore the bulk if the application can deliver real value. Focused solutions may not require full color or a deep range of focal lengths, and the software can be tailored to the specific use case (Exhibit 11). Because of this, we expect the first beachhead for AR will be in vertical solutions for enterprise markets, where workers could benefit from accessing contextual information without having to divert their attention. AR could flag problems and guide repairs for technicians servicing sophisticated machinery. AR could provide diagnostic support to physicians during examinations and procedures. AR could help law enforcement, military or security personnel evaluate possible evidence or threats. AR could help hospitality workers identify guests and provide personalized information. AR could help architects and designers to envision possible changes on site. For many of these use cases, the cost and limitations of existing technology may be acceptable.
Adoption of AR into industrial uses would help propel development forward, while creating scale economies for system components. Future breakthroughs in optronics, ever more powerful processors, and advances in AI could yield step function improvements down the road that could make mass market consumer AR systems more realistic.
Where We are Going
The failure of Google Glass – an underpowered, barely functional prototype – notwithstanding, we believe that cheap, lightweight and stylish AR wearables will one day begin to displace physical displays as costs come down and performance gets better. The rise of voice activated AI assistants as a familiar input mechanism for home hubs and smartphones will help erase some of the social stigma for more personal interaction with tech so that, years from now, when the first fully functional consumer ready headsets come to market, early adopters will heartily embrace the transformative aspects of the technology.
Adoption will drive invention, and new use cases will emerge to exploit an always on, handsfree display that can interpolate data and images onto a view of real life. Social mores and privacy regulations will have to adjudicate the use of facial recognition and other potentially intrusive capabilities. Smartphone tech will be integrated right into AR wearables with AI as the primary user interface, and many well-off consumers will stop carrying phones in the now traditional form factor. The trend will spread to other device displays – computer monitors, tablets, even TVs – and enable entirely new applications and content forms.
This paradigm shift will be a threat and an opportunity for today’s consumer tech leaders. We believe the winning platforms will tightly integrate hardware and software – excellent optics, a powerful virtual assistant and well-designed bundled AR applications in a set of compact and stylish eye glasses. AR platforms with momentum will attract 3rd party developers and hardware OEMs to their emerging standards, in much the same way that the smartphone market played out.
Who Might Win?
It is almost certainly folly to pick winners more than decade before a game is to be played. Still, we believe incremental developments – industrial AR products, AI virtual assistants, general image processing progress – may suggest some leading contenders. Certainly, Alphabet, with its powerful Android franchise and overwhelming AI prowess, will be in the mix. Apple is rumored to be working on AR and has its obvious track record of peerless execution around hardware/software integration and device design – it will have to raise its game on AI, but there is plenty of time. Microsoft, an AI leader, gained substantial AR hardware intellectual property and expertise in the Nokia acquisition and has staked itself to a modest head start with its HoloLens prototype AR platform. Given that we expect the early market to center on enterprise applications, Microsoft would seem to be particularly well positioned. Facebook has Oculus, and some of that technology could be applicable to AR as well as VR. It is also a leader in AI with specific strength in image processing. Still, we are worried that Zuckerberg may have zigged when he should have zagged vis a vis full immersion. Amazon, secretive and aggressive, cannot be counted out – it has a foothold in AI virtual assistants with Alexa and has shown a fascination with consumer devices.
The movement toward AI hosting – Google Compute Platform, Microsoft Azure, Amazon Web Services and IBM Watson are all building specialized deep learning datacenter infrastructure, development libraries and pre-trained APIs – could democratize the technology enough over the next decade to open the door for a AR hardware specialist. Magic Leap has drawn a lot of attention with its secrecy and flashy demos, and could turn out to be the real deal. However, given our view that the mass market for AR will emerge closer to 2030 than 2020, the company has a long row to hoe. Meanwhile, other science driven startups will almost certainly emerge – one might deliver the breakthrough needed to make high quality AR optronics compact and affordable for the consumer market.
Component suppliers will also play a role. Processor vendors Qualcomm and Nvidia are devoting R&D to AR, which will require a GPU-like platform both to dive graphics and to support the AI. The industrial systems that we believe will be the first areas of real traction are also a considerable opportunity for FPGA companies, of which Xilinx is the primary pure play public supplier. A wide number of companies – both big and small – are involved in developing optical waveguides and other photonic display technologies. It is difficult to tell which of these companies could have technology advantage.
Exh 12: SSR TMT AI Heatmap