This is yet another attempt of maintaining a list of datasets directly related to MIR. Other lists that I have found are this wiki, the ISMIR page, this web page, and this web page. If you are interested in speech processing, you can find a table of speech datasets on this page. If you are interested in multi-tracks, the Open Multitrack Testbed should be a good starting point. UPF also has an excellent page with datasets for world-music, including Indian art music, Turkish Makam music, and Beijing Opera. Two additional general resources are for MIDI files and for audio files.

If you know of other data sets that should be included in this list and eventually in the book please send me a note or post a comment.

dataset meta data contents with audio
200DrumMachines7371 one-shots yes
ACM_MIRUMtempo1410 excerpts (60s) yes
ADC2004predominant pitch20 excerpts yes
AED28 event classes5223 audio snippets yes
Amg1608valence & arousal1608 excerpts (30s) no
AMT-pilotstructure by multiple annotators8 songs yes
APLpiano practice620 segments yes
artist2020 artists1413 songs no
AudioSet632 event classes2084320 clips (10s) no
bach10multitrack & aligned MIDI10 chorales yes
ballroom8 genres & tempo & (down-)beats698 excerpts (30s) yes
beatboxset1perc. annotation14 clips yes
C224a14 genres224 artistsno
C3ka18 genres3000 artistsno
C49ka-C111kagenres48800/110588 artistsno
CAL500tags502 songsyes
CAL10ktags10870 songsno
CarnaticRhythmsama & beats176 pieces on request
CASDchords by 4 annotators50 songs no
CCMixtervocal & background track 50 mixes yes
Chopin22audio & aligned MIDI44 recordings yes
CMMSDnote/rest/transition & onsets & vibrato36 excerpts no
Coidach55 genres26420 songs no
corpusCOFLAeditorial & predominant melody1800 flamenco recordings no
covers80cover songs80 song pairsyes
Cross-Composer11 composers1100 tracks no
Cross-Era5 eras 2000 tracks no
DAMPkaraoke performances34000 monophonic recordingsyes
DEAMvalence & arousal1802 excerptsyes
DEAPDatasetvalence & arousal & dominance & physiological data120 music video excerptsno
DREANSSonset times & perc. instruments18 excerptsyes
DrumPt4 playing techniquesapp. 2000 annotationsyes (see ENST)
emoMusic arousal & valence744 excerpts (45s)yes
Emotify induced emotion400 excerptsyes
ENST-Drumsonset times & perc. instruments & playing technique 318 segmentsyes
Extendedballroom9 genres & tempo &amp4000 excerpts (30s) downloadable
ExtraSensory 51 context labels300000 sensor recordings from 60 usersyes
ffuhrmann11 predom. instr.6951 excerpts/220 songsyes/no
FlaBase editorial & biographical & musicological information on flamenco1102 artists & 74 palos & 2860 albums & 13311 tracksno
FMA-small8 genres 8000 excerpts (30s) yes
FMA-medium16 genres 25000 excerpts (30s) yes
FMA-large161 genres 106574 excerpts (30s) yes
FMA-full161 genres 106574 songs yes
Fuguefugue analysis36 piecesno
GiantStepsTempo tempo 664 files no
GiantStepsKey key 604 files no
GNMID14 timestamp & country 110M music ID matches no
Good-sounds.org12 instruments, pitch, sound quality 8750 notes yes
GPT7 guitar playing techniques6580 clipsyes
GMDgenre & valence & arousal1400 songsdownloadable
GSDstart/stop of guitar solos60 songsno
GTZAN 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levels 1000 excerpts (30s) yes
Hainsworthtempo245 excerpts (60s) yes
HHDS multitrack & style & tempo 18 songs yes
HJDBdownbeat236 excerpts yes
holzapfel:onsetonset times78 excerpts yes
homburg 9 genres 1889 excerpts (10s) yes
IADS valence & arousal & dominance 111 sound snippets yes
IDMT-SMT-Bass bass performance styles 4300 excerpts yes
IDMT-SMT-Audio-Effects effects on bass and guitar notes 55044 recordings yes
IDMT-SMT-Bass-SINGLE-TRACK style annotated bass lines 17 bass lines (?) yes
IDMT-SMT-Drums onset times & perc. instruments 518 files yes
IDMT-SMT-Guitar 9 guitar playing techniques 4700+400 note events yes
Multitrack multitrack & style 12 songs yes
iKala singing voice & background 252 excerpts (30s) yes
INRIA:DSD100 multitrack 100 songs yes
INRIA:EuroVision structure 124 songs no
INRIA:Quaero structure 159 songs no
UIOWA:MIS single instrument notesmany yes
IRMAS 11 instruments2874 excerpts yes
ISMIR2004Genre6 genres729 excerpts (30s) yes
ISMIR2004Tempotempo465 excerpts (20s) yes
Jamendovoice activity61+16+16 songs yes
JGDB multitrack & MIDI random generated excerpts yes
Jordan:Classical structure 15 pieces yes
Jordan:Jazz structure 15 pieces yes
LabROSA:APTMIDI29 piano excerpts yes
LabROSA:MIDIaudio & MIDI4 songs yes
Lakh MIDI DatasetMIDI176581 MIDI files no
last.fmlistening habits 992 usersno
LFM-1blistening habits 120000 usersno
magnatagatune similarity 25863 excerpts (30s)yes
LMD – Latin 10 genres3160 songsno
MAPSpiano notes/chords/pieces238 piecesyes
MARD album reviews66566 songsno
MARG-AMT MIDI pitch & onset/offset times 30 melodiesyes
MAST vocal performance assessment1018 performancesno
McGill Billboardchords740 songsno
MDBDrums onset times & perc. instrument & playing technique23 excerptsyes
MedleyDB multitrack & genre & melody f0 & instrument activation122 songsyes
MIR-1Kvocal and background1000 excerptsyes
mirex05Trainpredominant pitch13 excerpts yes
mirex06Traintempo & beats20 excerpts (30s) yes
MMTDlistening behavior1086808 tweetsno
Modalonset times71 snippetsyes
MOODetector:Bi-Modallyrics & valence & arousal133 excerptsyes
MOODetector:Multi-Modallyrics & MIDI & mood903 excerpts (30s)yes
moodswings arousal & valence240 excerpts (30s)no
MSDmeta data & proprietary features1000000 songsno
The Meertens Tune Collectionsphrases & key & meter3000-7000 melodiesyes
MTG-QBHtitle & artist118 queries/481 songs yes/no
musiclef2012tags1355 songsno
MusicMicromusic listening patterns136866 usersno
MusicNetpitch and onsets330 recordingsimplicitly
NINmultitrack66 songsyes
NSynthinstrument and pitch305979 single notesyes
NUS-48Ealigned phonemes48 pairs of sung and spokenyes
ODBonset times19 excerptsyes
Onset_Leveauonset times21 excerptsyes
Orchsetpredominant pitch64 excerpts yes
Phenicx-Anechoicaudio & aligned MIDI4 pieces yes
Phonation pitch & vowel & phonation mode 900 monophonic snippetsyes
PlaylistDatasetplaylists75262 songs/2840553 transitionsno
QBT-Extendedtaps3365 queries/51 songsMIDI
QMUL:Beatlesstructure & key & chords & beats181 songsno
QMUL:Kingstructure & key & chords14 songsno
QMUL:MichaelJacksonstructure38 songsno
QMUL:MixEvaluationmultitrack & mixes18 songs/180 mixesyes
QMUL:Queenstructure/key & chords51/31 songsno
QMUL:RSSstructure60 songs no
QMUL:Zweieckstructure & key & chords & beats18 songsno
QUASImultitrack11 songsyes
RockCorpuschords & melody & bars200 songsno
RWClyrics & 10 genre & 50 instruments & chords & structure & aligned MIDI115 songs/50 classical/100 songsyes
SALAMIstructure779 songsno
SDDstart of samples80 songs & 80 samplesno
Sargon structure 4 songs yes
SchenkerMusicXML & Schenker analysis41 piecesno
SASDartist biographies & similarity268+2336artistsno
Seyerlehner:1517-Artists19 genres3180 songsyes
Seyerlehner:Annotated19 genres190 songsyes
Seyerlehner:Poptempo1105 songsyes
Seyerlehner:Unique14 genres3115 excerpts (30s)yes
SISECmultitrack & mix5 excerptsyes
SMC:MIREXtempo & beat positions217 excerptsyes
SMDaudio & aligned MIDI50 recordings yes
SoundTracksvalence & energy & tension & mood360+110 excerpts yes
SPAM structure 50 songs no
Su-AMTonset times & pitch10 excerpts yes
ThisIsMyJamfavorite songs & artists131k usersno
TONASpitch72 single-voiced excerptsyes
TPDpopularity rating23385 songsno
Tunebottitle & artist10000 queries/? songs yes/no
UMA-Pianopiano chords 275040 recordingsyes
UrbanSound8k 10 event classes 8732 slices yes
URBAN-SED 9 event classes 10000 recordings yes
uspop2002tags & genre & chords 8752 songs no
Zanoni-Giorgichords & keys & beats 65 songs no


Last updated: March 23, 2018 at 19:47 pm

27 thoughts on “datasets

  1. Very extensive list. Thanks! Off the top of your head, do you know of any that contain MIDI for solos, or songs with solos in them? Could be jazz, or something else

  2. Could you please add some dataset containing “ornaments” in western music (such as vibrato, portamento, etc..)?

    • The column indicates whether the audio files are within the dataset (yes) or have to be acquired individually, indicating that the dataset will only contain references or links (no).


  3. It is really a commendable job and helpful for all.
    Which of the above list is the dataset containing Singing Voices for Singer Identification?

    • Hi Ananya, I am not aware of any specific singer identification datasets. There are many datasets who label the artist, these might serve as a starting point. Not sure if they would provide enough data for training, though.


    • Hi Steve,

      thanks for your effort. I have a hard time understanding what data this dataset actually contains where, and what possible tasks might be. Is there a publication? I would appreciate if you could send me a clarifying email. Thanks, Alexander

      • Hi Alex, I have been using the Smartscore OCR software to scan lead sheets in MusicXml. Chords + single note melodies. Each leadsheet is transposed into 12Keys. There is a link in the github ReadMe to a GoLang app that can encodes the file s into a format that can be easily feed into a RNN. I am using it to generate melodies given chords. I will continue to devolop documentation as I add the the dataset.

Leave a Reply

Your email address will not be published. Required fields are marked *