![]() ![]() ![]() The network performs a succession of convolution and average pooling operations in frequency to remove inharmonicity and timbre. Future work will look into alternative output representations, like attention seq2seq models where the output is a list of pitch events instead of piano rolls. Despite needing high time resolution, we choose to work with piano rolls to avoid having an encoder and decoder, and to benefit from forcing models to choose a MIDI note number regardless of tuning and inharmonicity. ConvLSTMs were later applied to spectrograms in ASR, although they pool over time before the RNN as their output is only a few symbols per sample (words in sentences) compared to the output resolution of piano rolls (hundreds of frames in order to put notes at accurate onsets). Instead, proposed replacing the recurrent connection with convolutions (ConvLSTM), and applied a seq2seq variant to precipitation nowcasting 1 1 1Loosely speaking, precipitation nowcasting is the problem of predicting if it will rain from satelite images. This is straightforward but still discards spatial structure in the recurrent part of the network. Many variants of combining convolutional neural networks with recurrent neural networks have been tried, often by stacking networks such that the output of a convolutional model is directly fed into a recurrent model, and trained jointly.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |