Applying HCI principles to create the next generation of music production tools

In this blog post I am going to argue that music mixing tools need to be redesigned. They are no longer just of interest to highly trained sound engineers and are now being used by a new breed of DIY music producer as well as music fans. The problem is that currently these tools are prohibitively complex which is restricting participation in music production activities [1]. I am going to try and explain why I think this issue matters, and I will also try and explain how knowledge from HCI might be applied to design a new generation of much more accessible music mixing tools, drawing specifically from HCI work around Interface Design Approaches, Tangible Interfaces and Ambiguity as a Resource for Design.

Some background

The digital revolution has brought about a radical change in the way people create, interact with, and consume music [15]. This has led to exciting new applications. For example, people are now participating in collaborative, remote music making projects; interactive musical games like SingStar, Rock Band, and Guitar Hero have blossomed and derivatives have even found applications in health areas such as stroke rehabilitation [17]; new instruments are being developed like the world radio keyboard, as are novel ways to interface with music that utilise brain power; and there have even been experiments into democratising the music listening experience for nightclub crowds.

Whilst the above examples offer a glimpse into the hugely diverse and rapidly expanding world of music technology, I am particularly interested in the ways music production is changing, and would like to help support new groups of people to participate in this area. A key component of the music production process is the ‘mix’. This is where all the individual sonic elements (such as drums, bass, guitar, vocals etc.) are blended together to create a single audio file. This process used to be conducted by highly skilled sound engineers, but in recent years we have seen a sharp decline in the number of professional recording studios where musicians and sound engineers would work collaboratively, and instead we have witnessed the rise of the DIY music producer, where musicians are writing, recording, mixing and distributing their music from the comfort of their own homes [15]. This change has come about due to studio equipment (mixing desks, and sound processing devices) being replicated in digital forms which can run on home computers and tablet devices, and are therefore within reach of more musicians, both amateur and professional, than ever before. This widening of access to music production tools has contributed towards a blurring of the lines between the producers and consumers of recorded music. For example, established acts such as Nine Inch Nails have released versions of songs that can be remixed by fans, and the emergence of new audio file formats like the Interactive Music Application Format [12] mean this trend for end user interaction with the mixing process is only likely to grow.

The problem with music production tools

The tools used to mix music have not evolved to better fit these new user groups, and a number of authors have pointed out good reasons why these tools should be improved; as Adams states “These tools are highly varied, often quite complicated, and difficult to learn. While trained professional sound engineers can adapt to new tools and devices with reasonable ease, amateurs face a high learning curve that may prevent them from pursuing their creative goals” [1].

Figure 1. A prohibitively complex interface — Figure 1. Where do I start? A typical prohibitively complex music production interface

The design of most music production software follows the same paradigm that was established when the equipment existed in hardware form, with the interface controls mapping directly to the underlying mechanics of the hardware devices, and with the complexity of the interfaces necessitating the involvement of a highly skilled sound engineer. But with the transition of these tools to digital there is no reason why their interfaces cannot be redesigned to offer greater accessibility to emerging user groups for whom a low bar to entry usability-wise would be an increased priority. I would argue that redesigning these tools represents fertile ground for HCI researchers.

Music’s relationship with HCI

Figure 2. Music and Human-Computer Interaction textbook

The study of music interaction is an important subfield within HCI, with its own yearly conference called NIME (New Interfaces for Musical Expression), and there is even a book on the relationship called ‘Music and Human-Computer Interaction’ [9]. Perhaps Bill Buxton best sums up exactly why the study of music interaction can be of such use to the HCI community as a whole: “Musicians had specialized skills, were highly creative, what they did could be generalized to other professions, and perhaps most of all – unlike doctors, lawyers and other ‘serious’ professions – they would be willing to do serious work on a flaky system at all hours of the day and night.” [3]

How could HCI knowledge shape the next generations of music mixing tools?

I think there are three areas of HCI that could have a significant impact on the design of new music mixing tools, these are: Interface Design approaches, Tangible Interfaces, and Ambiguity as a Resource for Design. I will now explain the contribution that knowledge from each of these areas could have.

Interface Design Approaches

We can draw interface design ideas from Eberts [5] who in 1994 described four HCI design approaches that offer potential for creating user-friendly, intuitive, and efficient interfaces. These approaches are categorised as ethnographic/anthropomorphic, cognitive, empirical, and predictive modelling.

Ethnographic/Anthropomorphic designs. Anthropomorphism is all about attributing human characteristics to nonhuman things. Using this approach, human-computer interactions can be designed around how humans would communicate with each other, and then this human-human interaction is used as a model for the human-computer interactions. So a solid foundation for this design approach should be a study of how humans want to communicate with each other around the task in question. Whilst researchers have studied how advanced music producers approach mixing tasks [8], more work is required to understand the thought processes and workflows of novice music producers. Perhaps a study could be made of how a musician would issue instructions to guide an expert engineer do produce a mix. Understanding this communication flow might help us design tools that more directly ‘speak’ to the musicians intuitions, and provide a more direct method for them to achieve their goals.

Building on the idea of anthropomorphic design we could potentially develop intelligent interfaces that ‘listen’ to the music on behalf of users (through signal analysis techniques) then make ‘suggestions’. This could be useful because I believe that novice users’ perceptual listening skills are unlikely to be fully developed and therefore assistance could be beneficial. This type of approach could also draw inspiration from Weiser’s vision of calm computing [16], where the interfaces quietly assist the user. In the world of music production it could be argued that we have already seen one example of an intelligent interface: The now infamous Auto-tune can ‘listen’ for notes sung out of tune and can then correct them automatically for the user. This was meant to make singing sound ‘better’ but there are significant side effects with the way the processing is applied, and as a result opinion is divided about whether this tool does more harm than good.

Anthropomorphic design ties in with the idea of perceived affordances [14] which asks the question “does the user know what they can do with an object?” I believe that in many music mixing devices the affordances are not at all well understood by novice users.

Figure 3. A typical equaliser interface.

For example, equalisers can be used for a several separate tasks (to fix bad sounds, to polish good sounds, and to create space between sounds), yet all these affordances are accessed through one single set of controls (i.e. each control can do more than one job). The result of this is that most novices typically only use them for the most basic of these tasks and their mixes suffer as a result. By making clear these affordances, or by guiding users to consider each of these affordances in turn, the quality of users mixes might improve.

Cognitive approaches are all about understanding the limitations of user’s brain power and sensory perception abilities, and it is this latter aspect that I think is key for designing music mixing tools for novice users. I believe the most important difference between novice music producers and expert producers is how highly developed their listening skills are – experts have developed the ability to listen ‘into’ a mix to discern subtleties about individual sound objects despite the presence of lots of other sonic elements. I would like to test this theory properly (I couldn’t find any literature on the differences between novice and expert music producers) but if I am correct then it could support the case made above for intelligent ‘listening’ interfaces being advantageous to novice user groups.

Empirical approaches could be used to change one thing on the interface at a time and test the effect of this change experimentally. In mixing tool interface design for example, giving users slightly different interfaces to use during a mixing task, followed by listening tests to see which resultant mix people prefer might be a good way to develop tools that will produce good results. Currently a qualitative approach to this kind of development seems to be more popular with users being asked which interface they prefer and why (see for example [7]), but there could certainly be a place for an empirical approach.

Predictive modelling approaches are all about adding up the time it takes a user to complete a given task, and whilst at first I didn’t think this would be as relevant to creative tasks like music production I have come to consider that designing fast and efficient interfaces might help keep users in the ‘flow’.

Tangibles

Tangible user interfaces (TUI’s) are all about giving physical forms to digital information [11]. TUI expert Hiroshi Ishii believes they have advantages over GUI’s in that they play to people’s strengths in terms of their abilities to sense and manipulate physical objects [10]. In the music production world tangibles are used for mixing purposes in professional recording studios, where they exist as computer controllers that replicate the form and function of analogue mixing desks, and their motorised sliders (faders) move up and down to let users ‘feel’ and adjust volume changes programmed in to the computer software. Whilst these interfaces are popular with expert users they are rarely found in the studios inhabited by novices.

Figure 4. A tangible user interface for music mixing

The disadvantages of them are that are that they are expensive, lack portability, and require a lot of studio space. However on the positive side they offer a workflow that many older sound engineers (who are used to working on analogue mixing desks) are familiar with, their large surface area lays out all the information for the user to see instead of suffering from GUI problems related to screen real-estate limitations resulting in a need to hide information away, and perhaps most significant of all is their ability to allow for bimanual input. This is where users can interact with information using both hands. For example, if a user placed each of their fingers on a slider then they could have control over the volume of 10 sound objects (see figure 5).

For some users this represents a big advantage compared with a typical GUI setup where a user can only click and drag single interface controls at a time using a mouse, and as [1] points out, there is quite a lot of HCI evidence supporting the advantageous nature of bimanual input.

The distinction between GUI’s and TUI’s is not fully agreed upon even by HCI experts (see 1:10:00 in the video of HCI expert Brygg Ullmer’s thesis defence for an example of this) and in the music production world we are just starting to see the emergence of a trend that is fusing together the best aspects of GUI’s and TUI’s.

Figure 5. A multi-touch tablet interface — Figure 6. A multi-touch tablet interface – GUI or TUI?

Mixing desk apps have become popular for multi-touch tablet devices like the iPad, and the idea behind these is that they connect wirelessly with a main computer running music production software and the tablet is then laid down on the table and used just like a mixing desk, allowing bimanual operation and increasing screen real estate in a portable and relatively cheap package (compared to the big tangibles at least). I think we are likely to see this trend continue to grow as people make the best of both worlds, and I will certainly try and bear in mind this alternative interaction method and not think only in terms of GUI’s controlled by a mouse and keyboard.

As a note of caution, controlling more than one sound object at a time is likely to put a higher load on users’ cognitive capacities and therefore some (primarily novice users) may end up feeling more comfortable just controlling one sound object at a time even if their tools offer the option to control more.

Ambiguity

In class we read the paper ‘Ambiguity as a resource for design’ [6] and it made the case that we should view ambiguity as a positive thing for HCI systems, offering opportunities to a designer. The opportunities that the author (Gaver) is referring to is in the ability of HCI systems to borrow some of the powers of the traditional arts in order to offer “rich aesthetic and conceptual potentials” and to “encourage close personal engagement with systems“. Ambiguity has also been shown to be appreciated by the users of music production tools [8], but for different reasons. In this case the benefit is not in the ambiguity triggering some kind of conceptual realisation but instead it serves as to trigger creativity. This is because the electronic musicians who were involved in the study, said that a lot of their ideas come from playing with interface controls where the results of their interactions are not obvious. This leads in time to unexpected and interesting musical output, the best of which might form part of one of their new songs. For these musicians the creative potential of the ambiguous music production tools they were using was so great that many viewed their musical output as a collaboration between themselves and the musical interfaces they used.

This study would suggest that it might be very beneficial to consider building-in ambiguity when creating music production tools… but there is a big caveat. The musicians’ desire for ambiguity changes based on what phase of the music production process they are at, with more desired at the start of the process when they are generating their ideas, and less at the end when they are refining their production and doing the mixing. This offers up quite a tricky challenge: Could we design music production tools with flexible levels of ambiguity, to allow the ambiguity to be reduced as the production progresses?

The above study does appear to conflict a little with some of the other studies presented earlier in this blog post, as these earlier findings suggested that users do not like ambiguous and complex musical interfaces. It should be noted that ambiguity was favoured by expert musicians, and mainly during the initial phase of the production, novice users on the other hand did not like ambiguity, and the studies that produced these findings were focussed on mixing which is one of the latter stages of the production process. The conflict is therefore minimal, but two challenges are presented:

When and what interface controls should we make ambiguous and when and what should we make exact?
How could we design a set of tools that will be able to accommodate both novice and expert music producers?

Equally useful to the idea of creating ambiguity for advanced users could be the idea of creating greater constraints for novices. As mixing tools are typically designed for expert users they possess a great deal of deep functionality and allow lots of manipulation options, not all of which are useful to a novice. In [1] the author defines the concept of ‘exploratory satisficing’ (which is an elaboration on the original satisficing concept of Herbert Simon). Exploratory satisficing says that the user does not have enough attention span to try out all the possible interaction options. If some of these extra controls are either hidden, or it is at least made clear to the user which controls are most likely to be important then we will be making the most efficient use of their limited resources. For example we could make commonly used interface controls much bigger, and rarely used controls could be made very small. This would also be consistent with Fitts’s law as an increase in the size of regularly used controls should speed up the efficiency with which a user can interact with an interface.

Why this work matters and where to go next

Music making can offer a wide range of benefits to those who participate. These benefits encompass health [2], development [4] and social [13] aspects. Whilst more work could be done around understanding the specific benefits of participating in computer based music production activities, I think the overarching benefits of music making are compelling enough make the case that we should be encouraging people to participate in music making of any form, including computer based production. As I have explained in this blog post, people can be put off participating in computer based music production activities due to the complexity of the tools involved. I have tried to make the case that by applying knowledge from HCI we could redesign music production tools (and specifically music mixing tools) to offer much better usability to novices, and thus lower the bar to entry.

In the future I would like to look at how best to design these tools to suit both novices (who require walk-up usability) and experts (who require deep-functionality). Solving this challenge could pave the way towards increasing collaboration between novices and experts, which could in turn provide a pathway to help novices improve.

References

Adams, A.T., Gonzalez, B., and Latulipe, C. SonicExplorer: Fluid exploration of audio parameters. Conference on Human Factors in Computing Systems – Proceedings, (2014), 237–246.
Bittman, B., Bruhn, K.T., Stevens, C., Westengard, J., and Umbach, P.O. Recreational music-making: a cost-effective group interdisciplinary strategy for reducing burnout and improving mood states in long-term care workers. Advances in mind-body medicine 19, 3 (2003), 4–15.
Buxton, W. My Vision Isn’t My Vision : Making a Career Out of Getting Back to Where I Started. Interactions, (2008), 7 – 12.
Črnčec, R., Wilson, S.J., and Prior, M. The Cognitive and Academic Benefits of Music to Children: Facts and fiction. Educational Psychology 26, 4 (2006), 579–594.
Cuevas, H.M. an Illustrative Example of Four Hci Design Approaches. Human Factors, (2004), 892–896.
Gaver, W., Beaver, J., and Benford, S. Ambiguity as a resource for design. Chi 2005, 5 (2003), 233–240.
Gelineck, S. and Korsgaard, D. Stage Metaphor Mixing on a Multi-touch Tablet Device. Audio Engineering Society Convention 137, (2014), 1–10.
Gelineck, S. and Serafin, S. From idea to realization-understanding the compositional processes of electronic musicians. Proc. Audio Mostly, (2009), 1–5.
Holland, S. Music and human-computer interaction. Springer, London, 2013.
Ishii, H. Tangible bits. Proceedings of the 2nd international conference on Tangible and embedded interaction – TEI ’08, (2008).
Jacko, J.A. and Sears, A. The human-computer interaction handbook: fundamentals, evolving technologies, and emerging applications. Lawrence Erlbaum Associates, Mahwah, NJ, (2003), 466.
Jang, I., Kudumakis, P., Sandler, M., and Kang, K. The MPEG Interactive Music Application Format Standard [Standards in a Nutshell]. Signal Processing Magazine, IEEE 28, 1 (2011), 150–154.
Kokotsaki, D. and Hallam, S. The perceived benefits of participative music making for non-music university students: a comparison with music students. Music Education Research 13, 2 (2011), 149–172.
Norman, D. a. Affordance, conventions, and design. Interactions 6, 3 (1999), 38–43.
Prior, N. The rise of the new amateurs: Popular music, digital technology, and the fate of cultural production. Handbook of Cultural Sociology, Frith 1978 (2010), 398–407.
Rogers, Y. Moving on from Weiser’s Vision of Calm Computing: Engaging UbiComp Experiences. Lecture Notes in Computer Science UbiComp 2006: Ubiquitous Computing, (2006), 404–421.
Wijck, F.V., Knox, D., Dodds, C., Cassidy, G., Alexander, G., and Macdonald, R. Making music after stroke: using musical activities to enhance arm function. Annals of the New York Academy of Sciences 1252, 1 (2012), 305–311.