video lectures released

In an effort to extend the amount of teaching resources on the website, I am happy to release a series of video lectures for my class on Audio Content Analysis. The new structure closely resembles the structure of the textbook and the modules try to split each chapter into digestible chunks.

The videos are currently streamed from a Georgia Tech service and allow you to see the slides and the video side by side in one player.

All content, including the slides (pdf), the latex source code for the slides, the Matlab code to produce the plots, and the audio examples, are available on github. Links to the relevant content can be found on the individual video module page.

Furthermore, I launched a forum in which you can ask (and answer) questions on the class content. This is just a test and we will see how well this plugin works.

I look forward to your feedback – let me know what you think.


The Georgia Tech Center for Music Technology (GTCMT) has shown strong presence at the International Conference for Music Information Retrieval (ISMIR) with students, post-docs, and alumni.

The GTCMT group at the Spotify headquarters in NY

The GTCMT group at the Spotify headquarters in NY

Contributions from the group at the conference:


Bachelor in Music Technology @ Georgia Tech

This fall we will launch a Bachelor of Science in Music Technology at Georgia Tech. Given our current MS and PhD programs, the BS now allows us to offer Music Technology classes at all levels. The undergraduate program will be, just like the graduate programs, quite unique in its curriculum and focus; with concentrations/minors in subjects such as ECE or ME, it is clear that this is not another audio engineering degree. Core classes such as Fundamentals of Musicianship and Introduction to Audio Technology as well as many project-based classes will allow our graduates to have a rare and sought-after combination of skills and knowledge. We expect our bachelors to work at the same companies our graduate students work, including Dolby, Google, Apple, Gracenote, Spotify, Pandora, ableton, Moog, and many more.

Public teaching materials

As the Fall 2015 semester proceeds, I will make as many teaching materials as possible available online. This most noticeably includes all the slides, along with source code for both the LaTeX slides as well as the Matlab code for generating the figures. The previous list of exercises and quiz questions will also be expanded from time to time.

I hope that, in combination with the original Matlab source code now being available in a repository as well, these materials will provide both teachers and students with helpful resources for their classes in Music Information Retrieval and Audio Analysis.

matlab code now available at github

All matlab code is now moved from my private repository to a public github repository. It can now be downloaded directly from the repository instead of the manually assembled zip archive I offered before.

This allows for easier updates and has a publicly available version history. Furthermore, this allows to directly embed the code into the page (see, e.g., the feature computation function. Please let me know if you have any questions or if I made any mistakes updating the web site.

academic textbook sales

When writing a book on Audio Content Analysis and Music Information Retrieval, you know that the field is so small that you can’t expect high sales numbers. But nobody starts an academic book project for the money, anyway.

Still, I had no good idea of what sales numbers to expect from an academic book in a niche market. Searching for text book sales figures only gives a limited amount of vague results that do not necessarily compare to your field, such as the numbers in this list and the answers to this question. I found one resource giving specific sales numbers for academic books on Nietzsche.

Although I cannot really tell without comparison, it is my impression that the sales numbers for my book are reasonable but far from exciting. There is no way to say without a point of reference, so I decided to publish the current sales numbers for my book in the hope that others might find the sales figures for the first two years interesting and maybe publish comparable statistics. Without further ado, here they are:


academic text book sales numbers: “An Introduction to Audio Content Analysis”

A few notes: The 6-months cycle starts in September and in March, respectively. The book is available both as hardcover (blue) and as ebook (yellow); while the ebook is available at the IEEE, Wiley, and Google Books (among many other shops), it is neither available on amazon nor at Apple’s iTunes store.

There was a minor publishing hiccup in Aug 2012, so the book’s final release was in November 2012. Of the nearly 600 sold copies since then, about one quarter are ebook sales. I started to track amazon sales directly in 2013 with novelrank; this data I plotted in red. Note that the amazon data is only for hardcover sales but might include used books as well, as far as I understand.

Update 09/2015: here is the updated sales count.

half-annual text book sales (An Introduction to Audio Content Analysis)

A short history of music listening

Music is omnipresent in our daily lives, and it is hard to imagine that this has not always been the case. We rarely stop to wonder how listeners experienced music in past times and how technological innovation shaped our expectations and listening habits. In the 19th century, listening to (professionally performed) music required the listener to visit a dedicated venue such as a church or a concert hall at a specific time. Obviously, the event character implies that the listener had no influence on the program and the performing artists, the time of the concert, or its location. Furthermore, there was no alternative to sharing your listening experience with an audience and there existed no option of listening repeatedly to the same music performance. While nowadays we still enjoy concerts, the majority of our listening experience is unrelated to live performances.

The first notable change to our listening habits was initiated at the end of the 19th century with the introduction of technology to record and to reproduce a music performance. The gramophone (and its competitors, the graphophone and the phonograph) enabled listeners for the first time to listen to a music performance at home, at any time desired, and possibly alone. What previously was a unique, non-repeatable performance of pre-selected repertoire in a concert venue lost its temporal and spatial uniqueness. In addition to these contextual changes, listening to recorded music is different from a concert; on top of obvious technical deficiencies of the recording and reproduction system (limited bandwidth and dynamic range, added distortion and noise, missing ambient envelopment), there is no direct communication or interaction between the performers and the audience anymore. This has implications for both the recording and the listeners: in the recording studio is no audience, no applause, and no stage fright, and the reproduction misses the performers’ gestural and facial expressions and the interaction with other listeners. A recording also invites the listener to repeated listening, allowing a level of analytical listening unheard of before.

During the following decades, technological innovation focused on improving the quality of the listening experience: condenser microphones improved the recording quality, and the introduction of vinyl LPs improved reproduction quality significantly. At the same time, stereophony significantly enhanced the listening experience by creating an illusion of localization and spatial envelopment.

The compact cassette, introduced by Philips in the 1960s, was the first wide-spread medium that allowed consumers to copy LPs and later CDs. Although the quality of the copy could never match the quality of the original medium, steadily advancing magnetization techniques and noise reduction systems improved the cassette’s audio quality step by step. But the compact cassette had another advantage that made its usage so appealing: music could easily be recorded from a radio broadcast without the requirement of having the distributed medium available. And more importantly, users were not forced anymore to the selection and order of songs selected by musicians, labels, and DJs; everybody could create their own individual mix tape with their own playlist. While from today’s perspective this seems hardly noteworthy, at that time the easy creation of a personalized playlist gave consumers unprecedented freedom: they could select and combine individual songs, produce their own tapes for different occasions or friends and even create simple mashups of songs. Listeners were not forced anymore to accept the decisions of a DJ or to stick to a specific song order on a LP.

However, the impact of the compact cassette on our listening habits did not stop there. Since the medium was so compact and not prone to playback errors when in motion, it eventually enabled the usage of mobile listening devices with reasonable audio quality. Listeners could listen to their tapes anywhere with their Walkman, introduced in the 1980s. Music finally could be listened to through the whole day, regardless of the activity. Furthermore, the increased usage of headphones converted the act of listening to a personal and intimate experience not necessarily shared with others.

At about the same time, the digital age of consumer music started with the introduction of the Compact Disc (CD). The CD soon replaced other media in the living room as a robust, easy-to-handle medium with high quality at a reasonable price point. The release of the CD also marks the stagnation of the trend for ever-increasing audio quality in the consumer market; attempts by the industry to introduce more advanced high-resolution media such as the SACD (Super Audio CD) and the DVD-Audio to consumers failed.

At the dawn of the new millennium, two more or less simultaneous technological developments disrupted the market in a way that doomed many established business models of the music industry: perceptual audio coding (with its most prominent representative MP3) and growing internet communities and peer-to-peer networks (for example, Napster). MP3, or more precisely ISO MPEG-1 Layer 3, still remains one of the most popular approaches to audio coding. It allows the transmission and storage of audio files at a fraction of the uncompressed bit rate (about a tenth of the bit-rate of a CD) while maintaining the same or an only slightly reduced perceived audio quality. The format MP3 has become so popular that for many consumers there is no difference anymore between online music and “MP3”, regardless of what compression format the audio data has been encoded with.

Perceptual audio coding allowed users to upload and download audio files online at reasonable speeds even considering the slow dial-in internet connections at that time. Peer-to-peer networks such as Napster soon allowed to exchange vast amounts of music data. Suddenly the exchange of music was not limited anymore to a small circle of personal friends but expanded to include an international online community. Instant access to the music libraries of thousands of users led to additional ways of browsing and discovering music. The perception of music as something you buy on a physical medium such as a CD started to disappear; consumers started to see music more and more as data content available online instantly and free of charge. Nowadays, music streaming services such as Pandora and Spotify shift paradigms again as users do not understand music as something to be owned anymore (for example, in a vinyl collection or in a database of music files) but as something to be accessed and streamed whenever desired.
In addition to changing listening habits and changing access and concept of music, the listener’s expectation has significantly changed as well. This is particularly evident in recordings of traditional, ‘classical’ music. While historical recordings tend to contain at least minor playing errors and inaccuracies, the level of perfection increased with time. Modern recordings have reached a level of perfection of technical musicianship that is hard or even impossible to achieve in a live setting. The possibility of editing recordings by slicing the recordings from different sessions leads to hundreds of edit points on each record. The number of edit points has been increasing steadily for decades. The modern listener is so used to hearing perfect intonation and perfect timing that the expectations not only for recordings but also for live performances have risen accordingly.

All aspects of how we listen to music has changed over the last one or two centuries. We listen to music all the time (instead of only occasionally in a concert), we listen to it in the privacy provided by headphones (instead of in an audience), we listen to it everywhere (instead of at specific event locations), we expect technically perfect renditions (instead of allowing the occasional glitch), we have access to all music all the time and can adjust the playlist to our liking (instead of listening to something predetermined by a music director), and we tend to understand music recordings as something that is available free of charge (instead of something to be bought and owned). All these changes have been triggered or at least amplified by the introduction of new technology. It is a fair assumption that our listening habits will change further with the introduction of more sophisticated technology; for instance, just as the borders between professional producers and hobbyists began to blur with modern production technology becoming more affordable, the distinction between the producer and the listener who only consumes music might blur with technological options that allow the listener to interact with, mix, and modify the content on the fly. It will be fascinating to observe how technology will influence the way we listen to music in the future.


funded MS & PhD positions in music technology at Georgia Tech

Georgia Tech is now accepting applications for the MS and PhD programs in music technology for matriculation in August 2015. All PhD students, and a limited number of MS students, receive graduate research assistantships that cover tuition and pay a competitive monthly stipend. The deadline for applications is January 31, 2015.

The MS in Music Technology is a two-year program that instills in students the theoretical foundation, technical skills, and creative aptitude to design the disruptive technologies that will enable new modes of music creation and consumption in a changing industry. Students take courses in areas such as music information retrieval, music perception and cognition, signal processing, interactive music, the history of electronic music, and technology ensemble. They also work closely with faculty on collaborative research projects and on their own MS project or thesis. Recent students in the program have worked and/or interned at companies such as Pandora, Spotify Apple, Avid, Dolby, Harman, Bose, Gracenote, Rdio, Sennheiser, Ableton, and Smule, and gone on to PhD studies at institutions such Georgia Tech, MIT, and Michigan University and UPF. Applicants are expected to have an undergraduate degree in music, computing, engineering, or a related discipline, and they should possess both strong musical and technical skills.

Students in the PhD program in Music Technology pursue individualized research agendas in close collaboration with faculty in areas such as interactive music, robotic musicianship, music information retrieval, digital signal processing, mobile music, network music, and music education, focusing on conducting and disseminating novel research with a broad impact. PhD students are also trained in research methods, teaching pedagogy, and an interdisciplinary minor field as they prepare for careers in academia, at industry research labs, or in their own startup companies. PhD applicants are expected to hold a Masters degree in music technology or from an allied field, such as computing, music, engineering, or media arts and sciences. All applicants must demonstrate mastery of core masters-level material covered in Music Technology, including music theory, performance, composition, and/or analysis; music information retrieval; digital signal processing and synthesis; interactive music systems design; and music cognition.

Both the MS and PhD programs are housed within the School of Music at Georgia Tech, in close collaboration with the Georgia Tech Center for Music Technology (GTCMT). The GTCMT is an international center for creative and technological research in music, focusing on the development and deployment of innovative musical technologies that transform the ways in which we create and experience music. Its mission is to provide a collaborative framework for committed students, faculty, and researchers from all across campus to apply their musical, technological, and scientific creativity to the development of innovative artistic and technological artifacts.

Core faculty in the music technology program include Gil Weinberg (robotic musicianship, mobile music, and sonification), Jason Freeman (participatory and collaborative systems, education, and composition), Alexander Lerch (music information retrieval and digital signal processing), Timothy Hsu (acoustics), Frank Clark (multimedia and network music), and Chris Moore (recording and production).


2015 Guthman Musical Instrument Competition

Semi-finalists have been announced for this year’s Guthman Musical Instrument Competition.

The Margaret Guthman Musical Instrument Competition, held at the Georgia Institute of Technology, seeks to find the world’s best new ideas in musical instrument design, engineering and musicianship. Entries represent a dozen countries and expand our assumed notion of what constitutes an instrument and the sounds it can produce.

Judging the 20 semi-finalists are: DJ Hurricane, a producer and rapper who is best known for his work with the Beastie Boys; Graham Marsh, a Grammy award-winning producer, mixer and engineer who has worked with Ludacris, Bruno Mars and CeeLo Green; and Joe Paradiso, a physicist who designs electronic music synthesizers and directs the MIT Media Lab’s Responsive Environments Group.

The Guthman Competition will be held February 19 and 20 at Georgia Tech, Atlanta. The finals will be held February 20 from 7:00 – 9:00 pm on campus and are free and open to the public. The finals will also be live streamed.