This is yet another attempt of maintaining a list of datasets directly related to MIR. Other lists that I have found are this wiki, the ISMIR page, this web page, and this web page. If you are interested in speech processing, you can find a table of speech datasets on this page. If you are interested in multi-tracks, the Open Multitrack Testbed should be a good starting point.

If you know of other data sets that should be included in this list and eventually in the book please send me a note or post a comment.

dataset meta data contents with audio
200DrumMachines7371 one-shots yes
ACM_MIRUMtempo1410 excerpts (60s) yes
ADC2004predominant pitch20 excerpts yes
Amg1608valence & arousal1608 excerpts (30s) no
APLpiano practice620 segments yes
artist2020 artists1413 songs no
AudioSet632 event classes2084320 clips (10s) no
bach10multitrack & aligned MIDI10 chorales yes
ballroom8 genres & tempo & (down-)beats698 excerpts (30s) yes
beatboxset1perc. annotation14 clips yes
C224a14 genres224 artistsno
C3ka18 genres3000 artistsno
C49ka-C111kagenres48800/110588 artistsno
CAL500tags502 songsyes
CAL10ktags10870 songsno
CCMixtervocal & background track 50 mixes yes
Chopin22audio & aligned MIDI44 recordings yes
CMMSDnote/rest/transition & onsets & vibrato36 excerpts no
Coidach55 genres26420 songs no
corpusCOFLAeditorial & predominant melody1800 flamenco recordings no
covers80cover songs80 song pairsyes
DAMPkaraoke performances34000 monophonic recordingsyes
DEAMvalence & arousal1802 excerptsyes
DEAPDatasetvalence & arousal & dominance & physiological data120 music video excerptsno
DREANSSonset times & perc. instruments18 excerptsyes
DrumPt4 playing techniquesapp. 2000 annotationsyes (see ENST)
emoMusic arousal & valence744 excerpts (45s)yes
Emotify induced emotion400 excerptsyes
ENST-Drumsonset times & perc. instruments & playing technique 318 segmentsyes
Extendedballroom9 genres & tempo &amp4000 excerpts (30s) downloadable
ffuhrmann11 predom. instr.6951 excerpts/220 songsyes/no
FlaBase editorial & biographical & musicological information on flamenco1102 artists & 74 palos & 2860 albums & 13311 tracksno
FMA-small10 genres 4000 excerpts (30s) yes
FMA-medium20 genres 14511 excerpts (30s) yes
Fuguefugue analysis36 piecesno
GiantStepsTempo tempo 664 files no
GiantStepsKey key 604 files no
GNMID14 timestamp & country 110M music ID matches no
Good-sounds.org12 instruments, pitch, sound quality 8750 notes yes
GPT7 guitar playing techniques6580 clipsyes
GMDgenre & valence & arousal1400 songsdownloadable
GTZAN 10 genres & tempo & key1 & key2 & beat/downbeat & metrical levels 1000 excerpts (30s) yes
Hainsworthtempo245 excerpts (60s) yes
HJDBdownbeat236 excerpts yes
holzapfel:onsetonset times78 excerpts yes
homburg 9 genres 1889 excerpts (10s) yes
IADS valence & arousal & dominance 111 sound snippets yes
IDMT-SMT-Bass bass performance styles 4300 excerpts yes
IDMT-SMT-Audio-Effects effects on bass and guitar notes 55044 recordings yes
IDMT-SMT-Bass-SINGLE-TRACK style annotated bass lines 17 bass lines (?) yes
IDMT-SMT-Drums onset times & perc. instruments 518 files yes
IDMT-SMT-Guitar 9 guitar playing techniques 4700+400 note events yes
IDMT-MT multitrack & style 12 songs yes
iKala singing voice & background 252 excerpts (30s) yes
INRIA:EuroVision structure 124 songs no
INRIA:Quaero structure 159 songs no
IRMAS 11 instruments2874 excerpts yes
ISMIR2004Genre6 genres729 excerpts (30s) yes
ISMIR2004Tempotempo465 excerpts (20s) yes
Jamendovoice activity61+16+16 songs yes
JGDB multitrack & MIDI random generated excerpts yes
Jordan:Classical structure 15 pieces yes
Jordan:Jazz structure 15 pieces yes
LabROSA:APTMIDI29 piano excerpts yes
LabROSA:MIDIaudio & MIDI4 songs yes
Lakh MIDI DatasetMIDI176581 MIDI files no
last.fmlistening habits 992 usersno
LFM-1blistening habits 120000 usersno
magnatagatune similarity 25863 excerpts (30s)yes
LMD – Latin 10 genres3160 songsno
MAPSpiano notes/chords/pieces238 piecesyes
MARD album reviews66566 songsno
MARG-AMT MIDI pitch & onset/offset times 30 melodiesyes
McGill Billboardchords740 songsno
MedleyDB multitrack & genre & melody f0 & instrument activation122 songsyes
MIR-1Kvocal and background1000 excerptsyes
mirex05Trainpredominant pitch13 excerpts yes
mirex06Traintempo & beats20 excerpts (30s) yes
MMTDlistening behavior1086808 tweetsno
Modalonset times71 snippetsyes
MOODetector:Bi-Modallyrics & valence & arousal133 excerptsyes
MOODetector:Multi-Modallyrics & MIDI & mood903 excerpts (30s)yes
moodswings arousal & valence240 excerpts (30s)no
MSDmeta data & proprietary features1000000 songsno
The Meertens Tune Collectionsphrases & key & meter3000-7000 melodiesyes
MTG-QBHtitle & artist118 queries/481 songs yes/no
musiclef2012tags1355 songsno
MusicMicromusic listening patterns136866 usersno
MusicNetpitch and onsets330 recordingsimplicitly
ODBonset times19 excerptsyes
Onset_Leveauonset times21 excerptsyes
Orchsetpredominant pitch64 excerpts yes
Phenicx-Anechoicaudio & aligned MIDI4 pieces yes
Phonation pitch & vowel & phonation mode 900 monophonic snippetsyes
PlaylistDatasetplaylists75262 songs/2840553 transitionsno
QBT-Extendedtaps3365 queries/51 songsMIDI
QMUL:Beatlesstructure & key & chords & beats181 songsno
QMUL:Kingstructure & key & chords14 songsno
QMUL:MichaelJacksonstructure38 songsno
QMUL:MultiTrackstructure & multitrack104 songspartly
QMUL:Queenstructure/key & chords51/31 songsno
QMUL:RSSstructure60 songs no
QMUL:Zweieckstructure & key & chords & beats18 songsno
QUASImultitrack11 songsyes
RockCorpuschords & melody & bars200 songsno
RWClyrics & 10 genre & 50 instruments & chords & structure & aligned MIDI115 songs/50 classical/100 songsyes
SALAMIstructure779 songsno
Sargon structure 4 songs yes
SchenkerMusicXML & Schenker analysis41 piecesno
SASDartist biographies & similarity268+2336artistsno
Seyerlehner:1517-Artists19 genres3180 songsyes
Seyerlehner:Annotated19 genres190 songsyes
Seyerlehner:Poptempo1105 songsyes
Seyerlehner:Unique14 genres3115 excerpts (30s)yes
SISECmultitrack & mix5 excerptsyes
SMC:MIREXtempo & beat positions217 excerptsyes
SMDaudio & aligned MIDI50 recordings yes
SoundTracksvalence & energy & tension & mood360+110 excerpts yes
SPAM structure 50 songs no
Su-AMTonset times & pitch10 excerpts yes
TONASpitch72 single-voiced excerptsyes
TPDpopularity rating23385 songsno
TRIOSmultitrack & aligned MIDI5 excerptsyes
Tunebottitle & artist10000 queries/? songs yes/no
UMA-Pianopiano chords 275040 recordingsyes
uspop2002tags & genre & chords 8752 songs no
Zanoni-Giorgichords & keys & beats 65 songs no


Last updated: March 7, 2017 at 15:11 pm

15 thoughts on “datasets

  1. Very extensive list. Thanks! Off the top of your head, do you know of any that contain MIDI for solos, or songs with solos in them? Could be jazz, or something else

Leave a Reply

Your email address will not be published. Required fields are marked *