Motif-based Music Representation Learning
The formation of music structure heavily relies on repetitions and variations of music motifs. Understanding the manifestations and behaviors of these motifs is crucial for effective music structure analysis and high-quality automatic music composition. However, capturing music motifs' implicit nature is often challenging. In this study, we employ deep learning techniques to explore an efficacious method for learning robust representations of music motifs.
The Jazz Tutor
We use generative models and careful HCI design to help improve novices' practicing experience/efficiency on jazz improvisation.
Timbre Transfer with Flexible Timbre Control
Timbre style transfer has been an intriguing but mysterious sub-topic in music style transfer. We use a concise autoencoder model with one-hot representations of instruments as the condition, and a Diffwave model trained especially for music synthesis. The results proved that our method is able to provide one-to-one style transfer outputs comparable with the existing GAN-based method, and can transfer among multiple timbres with only one single model.
A to I
Nowadays, AI models excel with impressive performance on various tasks that were once only considered humans-exclusive. As AI models grow more powerful, their role in co-creation expands. However, ethical concerns arise with the rise of AI: will AI models replace or be harmful to humans? In this song, we try to answer those open questions by exploring the possible roles AI models could play in the co-creation process and try to resolve the ethical concern from the perspective of AI themselves. The AI models in this song act not only as tools but also as collaborators, song and lyrics writers, performers, storytellers, and even the mentor and first-person narrator.
Speech Anonymization with Pseudo Voice Conversion
The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice.
Project Ming
Project Ming is a Cycling 74 Max/MSP program that can simulate the sound ambience in ancient Chinese cities. By moving your mouse on the map and pressing different keys on the keyboard, you can experience through the colorful and realistic soundscape in ancient China. It was completed during my time at Berkeley summer school in 2019, and a variety of synthesizing techniques were adopted.