This is tcw#1

thinking about design strategies for 'magic lens' AR

Tuesday, September 1st, 2009

I love that we are on the cusp of a connected, augmented world, but I think the current crop of magic lenses are likely to overpromise and underdeliver. Here are some initial, rough thoughts on designing magic lens experiences for mobile augmented reality.

The magic lens

The magic lens metaphor [1] for mobile augmented reality overlays graphics on a live video display from the device’s camera, so that it appears you are looking through a transparent window to the world beyond. This idea was visualized to great effect in Mac Funamizu’s design studies on the future of Internet search from 2008. Many of the emerging mobile AR applications for Android and the iPhone 3GS, including Wikitude, Layar, Metro Paris, robotvision, Gamaray and Yelp’s Monocle, are magic lens apps which use the device’s integrated GPS and digital compass to provide location and orientation references (camera pose, more or less) for the overlay graphics.

The idea of a magic lens is visually intuitive and emotionally evocative, and there is understandable excitement surrounding the rollout of commercial AR applications. These apps are really cool looking, and they invoke familiar visual tropes from video games, sci-fi movies, and comics. We know what Terminator vision is, we’re experienced with flight sim HUDs, and we know how a speech balloon works. These are common, everyday forms of magical design fiction that we take for granted in popular culture.

And that’s going to be the biggest challenge for this kind of mobile augmented reality; we already know what a magic lens does, and our expectations are set impossibly high.

Less-than-magical capabilities

Compared to our expectations of magic lenses, today’s GPS+compass implementations of mobile AR have some significant limitations:

* Inaccuracy of position, direction, elevation – The inaccuracy of today’s GPS and compass devices in real world settings, combined with positional errors in geo-annotated data, mean that there will generally be poor correspondence between augmented graphical features and physical features. This will be most evident indoors, under trees, and in urban settings where location signals are imprecise or unavailable. Another consequence of location and orientation errors is that immediately nearby geo-annotations are likely to be badly misplaced. With typical errors of 3-30 meters, the augments for the shop you are standing right in front of are likely to appear behind you or across the street.

* Line of sight – Since we can’t see through walls and objects, and these AR systems don’t have a way to determine our line of sight, augmented features will often be overlaid on nearby obstructions instead of on the desired targets. For example, right now I’m looking at Yelp restaurant reviews floating in space over my bookshelf.

* Lat/long is not how we experience the world – By definition, GPS+compass AR presents you with geo-annotated data, information tied to geographic coordinates. People don’t see the world in coordinate systems, though, so AR systems need to correlate coordinate systems to world semantics. The quality of our AR experience will depend on how well that translation is done, and today it is not done well at all. Points Of Interest (POIs) only provide the barest minimum of semantic knowledge about any given point in space.

* Simplistic, non-standard data formats – POIs, the geo-annotated data that many of these apps display, are mostly very simple one-dimensional points of lat/long coordinates, plus a few bytes of metadata. Despite their simplicity there has been no real standardization of POI formats; so far, data providers and AR app developers are only giving lip service to open interoperability. Furthermore, they are not looking ahead to future capabilities that will require more sophisticated data representations. At the same time, there is a large community of GIS, mapping and Geoweb experts who have defined open formats such as GeoRSS, GeoJSON and KML that may be suitable for mobile AR use and standardization. I’ll have more to say about AR and the Geoweb in a future post. For now, I’ll just say that today’s mobile AR systems are starting to look like walled gardens and monocultures.

* Public gesture & social ambiguity – Holding your device in front of you at eye level and staring at it gives many of the same social cues as taking a photograph. It feels like a public gesture, and people in your line of sight are likely to be unsure of your intent. Contrast this with the head down, cradled position most people adopt when using their phone in a private way for email, games and browsing the web.

* Ergonomics - Holding your phone out in front of you at eye level is not a relaxed body position for extended viewing periods; nor is it a particularly good position for walking.

* Small screen visual clutter – If augmented features are densely populated in an area, they will be densely packed on the screen. A phone display with more than about 10 simultaneous augments will likely be difficult to parse. Some of Layar’s layer developers propose showing dozens of features at a time.

Design strategies for practical magic

Given these limitations, many of the initial wave of mobile AR applications are probably not going to see great adoption. The most successful apps will deliver experiences that take advantage of the natural technology affordances and don’t overreach the inherent limitations. Some design strategies to consider:

* Use augments with low requirements for precision and realism. A virtual scavenger hunt for imaginary monsters doesn’t need to be tied to the exact geometry of the city. A graphic overlay showing air pollution levels from a network of sensors can tolerate some imprecision. Audio augmentation can be very approximate and still deliver nicely immersive experiences. Searching for a nearby restroom may not need any augments at all.

* Design for context. The context of use matters tremendously. Augmenting a city experience is potentially very different from creating an experience in an open, flat landscape. Day is a different context than night. Alone is different than with a group. Directed search and wayshowing is different from open-ended flaneurism. Consider the design parameters and differences for a user who is sitting, standing, walking, running, cycling, driving and flying. It seems trivially obvious, but nonetheless important to ask who is the user, what is their situation, and what are they hoping will happen when they open up your app?

* Fail gracefully and transparently. When the accuracy of your GPS signal goes to hell, reduce the locative fidelity of your app, or ask the player to move where there is a clear view of the sky. When you are very close to a POI, drop the directional aspect of your app and just say that you are close.

* Use magic lens moments sparingly. Don’t make your player constantly chase the virtual monsters with the viewfinder, give her a head-down tricorder-style interaction mode too, and make it intuitive to switch modes. If you’re offering local search, consider returning the results in a text list or on a map. Reserve the visual candy for those interactions that truly add value and enhance the sense of magical experience.

* Take ownership for the quality of your AR experiences. Push your data providers to adopt open standards and richer formats. Beat the drum for improved accuracy of devices and geo-annotations. Do lots of user studies and experiments. Create design guidelines based on what works well, and what fails. Discourage shovelwARe. Find the application genres that work best, and focus on delivering great, industry-defining experiences.

We are at an early, formative stage of what will eventually become a connected, digitally enspirited world, and we are all learners when it comes to designing augmented experiences. Please share your thoughts in the comments below, or via @genebecker. YMMV as always.


[1] The idea of a metaphorical magic lens interface for computing was formulated at Xerox PARC in the early 1990′s; see Bier et al, “Toolglass and Magic Lenses: The See-Through Interface” from SIGGRAPH 1993. There is also a substantial body of previous work in mobile AR including many research explorations of the concept.

what is ubiquitous media?

Friday, June 26th, 2009

In the 2003 short paper “Creating and Experiencing Ubimedia“, members of my research group sketched a new conceptual model for interconnected media experiences in a ubiquitous computing environment. At the time, we observed that media was evolving from single content objects in a single format (e.g., a movie or a book), to collections of related content objects across several formats. This was exemplified by media properties like Pokemon and Star Wars, which manifested as coherent fictional universes of character and story across TV, movies, books, games, physical action figures, clothing and toys, and American Idol which harnessed large-scale participatory engagement across TV, phones/text, live concerts and the web. Along the same lines, social scientist Mimi Ito wrote about her study of Japanese media mix culture in “Technologies of the Childhood Imagination: Yugioh, Media Mixes, and Otaku” in 2004, and Henry Jenkins published his notable Convergence Culture in 2006. We know this phenomenon today as cross-media, transmedia, or any of dozens of related terms.

Coming from a ubicomp perspective, our view was that the implicit semantic linkages between media objects would also become explicit connections, through digital and physical hyperlinking. Any single media object would become a connected facet of a larger interlinked media structure that spanned the physical and digital worlds. Further, the creation and experience of these ubimedia structures would take place in the context of a ubiquitous computing technology platform combining fixed, mobile, embedded and cloud computing with a wide range of physical sensing and actuating technologies. So this is the sense in which I use the term ubiquitous media; it is hypermedia that is made for and experienced on a ubicomp platform in the blended physical/digital world.

Of course the definitions of ubicomp and transmedia are already quite fuzzy, and the boundaries are constantly expanding as more research and creative development occur. A few examples of ubiquitous media might help demonstrate the range of possibilities:

nikeplus430px

An interesting commercial application is the Nike+ running system, jointly developed between Nike and Apple. A small wireless pressure sensor installed in a running shoe sends footfall data to the runner’s iPod, which also plays music selected for the workout. The data from the run is later uploaded to an online service for analysis and display. The online service includes social components, game mechanics, and the ability to mashup running data with maps. Nike-sponsored professional athletes endorse Nike-branded music playlists on Apple’s iTunes store. A recent feature extends Nike+ connectivity to specially-designed exercise machines in selected gyms. Nike+ is a simple but elegant example of embodied ubicomp-based media that integrates sensing, networking, mobility, embedded computing, cloud services, and digital representations of people, places and things. Nike+ creates new kinds of experiences for runners, and gives Nike new ways to extend their value proposition, expand their brand footprint, and build customer loyalty. Nike+ has been around since 2006, but with the recent buzz about personal sensing and quantified selves it is receiving renewed attention including a solid article in the latest Wired.

mediascapes430px

A good pre-commercial example is HP Labs’ mscape system for creating and playing a media type called mediascapes. These are interactive experiences that overlay audio, visual and embodied media interactions onto a physical landscape. Elements of the experience are triggered by player actions and sensor readings, especially location-based sensing via GPS. In the current generation, mscape includes authoring tools for creating mediascapes on a standard PC, player software for running the pieces on mobile devices, and a community website for sharing user-created mediascapes. Hundreds of artists and authors are actively using mscape, creating a wide variety of experiences including treasure hunts, biofeedback games, walking tours of cities, historical sites and national parks, educational tools, and artistic pieces. Mscape enables individuals and teams to produce sophisticated, expressive media experiences, and its open innovation model gives HP access to a vibrant and engaged creative community beyond the walls of the laboratory.

These two examples demonstrate an essential point about ubiquitous media: in a ubicomp world, anything – a shoe, a city, your own body – can become a touchpoint for engaging people with media. The potential for new experiences is quite literally everywhere. At the same time, the production of ubiquitous media pushes us out of our comfort zones – asking us to embrace new technologies, new collaborators, new ways of engaging with our customers and our publics, new business ecologies, and new skill sets. It seems there’s a lot to do, so let’s get to it.

a remarkable confluence of technologies

Tuesday, June 23rd, 2009

ubicomp-tech-mandala

I have used versions of this picture in many of my talks over the last 10 years, and it just keeps getting more interesting. The overarching message is that we are in the midst of a remarkable wave of innovation, with major advances coming to fruition across many different technology domains, at more or less the same time. This is creating fertile conditions for all manner of new products, services and experiences to emerge through what economist Hal Varian calls “combinatorial innovation” (a nicely refined phrase for an incredibly messy process). Another way to put it is, this is the ubiquitous computing + digital media + internet + physical world supercollider, and we are starting to see the results of a very big, long-running experiment.

It has become somewhat of a cliche to promise radical transformation of businesses through technology, but just because it’s a cliche doesn’t mean it’s wrong. Over the next several years, much of the physical world will become connected into an Internet of people, places & things. This will fundamentally change the nature of our experiences with products, services, environments, organizations, and each other. There is no industry or institution that will be untouched; in retrospect we will see that traditional media companies were simply the low-hanging fruit. Just as the Nike+ running system transforms a shoe into a networked sensor node and an apparel company into the host of a worldwide social media community, combinatorial innovation will plant seeds of opportunity and disruption widely.

(Click image for ridiculously large version, cc-by-sa)

hello (connected) world

Monday, June 22nd, 2009

We live in a connected world, now. The web is real-time and social. The physical world of people, places and things is becoming digital, networked, sensate. Computing is a fabric of mobile, embedded and cloud systems woven with data and services. Media flows everywhere, innovation abounds at street-level, and we are all creators, authors, and makers. It’s an exciting time, but also one fraught with complexity and difficult choices for businesses, institutions and individuals.

This journal offers disruptive ideas, spirited perspectives and tools to think with. My goals are to help you make sense of the emerging landscape, engage you in an ongoing conversation about technology, strategy and society, and work with you to envision and build the kind of future we actually want to live in. Please explore the archives, immerse yourself in new ideas, and share your reactions in the comments.