Notes on Text-To-Speech

Recently, I’ve spent quite a lot of time to study about Text-to-Speech (TTS) system, more specifically, acoustic model. To be fair, this module is the most actively researched in the whole pipeline. In the beginning, admittedly, I’m not a big fan of speech processing because of the vagueness of some... [Read More]

Attention and self-attention

Another post insprired by an interview failure. During a chat about a position in speech processing area, interviewer asked me about the differences between attention and self attention. I, at that time, who only have a vague understanding about these two, started to invent something about these two to make... [Read More]

SSD vs YOLO

To be honest, I’m truly fed up with revising knowledge before an interview, especially object detection algorithms like SSD and YOLO. Every time I prepare for an interview in computer vision, I read about these two architectures and I can’t tell the exact differences between them. So today, I decide... [Read More]

SOLO-Segmenting Objects by Locations

In the previous blog, I have discussed YOLACT, a common instance segmentation architecture and my grudges against it and its variants. In this blog, I will introduce another architecture, which I believe a more efficient design. This is namely SOLO. [Read More]

Instance Segmentation-YOLACT

Recently, I reluctantly had a task involving image segmentation. Why reluctance? As a deep learning engineer, I don’t really believe in the power of this branch since convolutional network doesn’t work well in pixel level (I think!!!). Anyways, I give it a try and now I want to share with... [Read More]