The GTZAN data set is probably one of the most prominent data sets used in research related to Music Information Retrieval and Audio Content Analysis. The data set contains 1000 excerpts from songs, sorted into 10 genres.
While this set is old and it is clear that it has its disadvantages , it is a well-known, widely-used, and, last but not least, easily available set. This is why I chose this data set to annotate the musical keys of the excerpts. The complete annotation can be found here.
I used my tuning fork to identify the key of each song. I did not label excerpts that
- included modulations, or
- were particularly difficult to identify.
|Genre||Major Keys||Minor Keys||# annotated|
Besides the obvious application of directly evaluating key detection systems, we used this data set also for an indirect evaluation of a detection of tonal components in a spectrum .
 Sturm, Bob: An Analysis of the GTZAN Music Genre Dataset, Proceedings of the 2nd International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM), Nara, Japan, November, 2012
 Kraft, Sebastian; Lerch, Alexander, Zoelzer, Udo: The Tonalness Spectrum: Feature-Based Estimation of Tonal Components, Proc. of the 16th Int. Conference on Digital Audio Effects (DAFx), Maynooth, Ireland, September 2-5, 2013