Parallel Implementation of String-Based Clustering for HT-SELEX Data

Shintaro Kato; Takayoshi Ono; Masaki Ito; Koichi Ito; Hirotaka Minagawa; Katsunori Horii; Ikuo Shiratori; Iwao Waga; Takafumi Aoki

Research Article

Parallel Implementation of String-Based Clustering for HT-SELEX Data

Download933 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.19-10-2020.166664,
    author={Shintaro Kato and Takayoshi Ono and Masaki Ito and Koichi Ito and Hirotaka Minagawa and Katsunori Horii and Ikuo Shiratori and Iwao Waga and Takafumi Aoki},
    title={Parallel Implementation of String-Based Clustering for HT-SELEX Data},
    journal={EAI Endorsed Transactions on Bioengineering and Bioinformatics},
    volume={1},
    number={1},
    publisher={EAI},
    journal_a={BEBI},
    year={2020},
    month={10},
    keywords={sequence analysis, clustering, SELEX, next-generation sequencing, aptamer, parallel implementation},
    doi={10.4108/eai.19-10-2020.166664}
}

Shintaro Kato
Takayoshi Ono
Masaki Ito
Koichi Ito
Hirotaka Minagawa
Katsunori Horii
Ikuo Shiratori
Iwao Waga
Takafumi Aoki
Year: 2020
Parallel Implementation of String-Based Clustering for HT-SELEX Data
BEBI
EAI
DOI: 10.4108/eai.19-10-2020.166664

Shintaro Kato^1,2^,*, Takayoshi Ono², Masaki Ito², Koichi Ito², Hirotaka Minagawa¹, Katsunori Horii¹, Ikuo Shiratori¹, Iwao Waga¹, Takafumi Aoki²

1: NEC Solution Innovators, Ltd.1-18-7, Shinkiba, Koto-ku, Tokyo, 136-8627, Japan
2: Graduate School of Information Sciences, Tohoku University,6-6-05, Aramaki Aza Aoba, Aoba-ku, Sendai-shi, Miyagi, 980-8579, Japan

*Contact email: katou-s-mxn@nec.com

Abstract

INTRODUCTION: A clustering method for HT-SELEX is crucial for selecting different types of aptamer candidates. We have developed FSBC method for HT-SELEX data implemented in R. FSBC exhibited the highest accuracy of sequence clustering compared with conventional methods, while the processing time of FSBC is longer than AptaCluster.

OBJECTIVES: The objective of this study is to improve the processing time of FSBC.

METHODS: We propose pFSBC, which reduces the processing time of ORS estimation in FSBC by introducing parallel implementation.

RESULTS: The processing time and clustering accuracy were evaluated with the last round of NCBI SRA data of SRR3279661 from BioProject PRJNA315881 comparing with other conventional clustering methods. We demonstrated that pFSBC exhibited the highest clustering accuracy and the shortest processing time.

CONCLUSION: We expect that pFSBC will help to avoid the time-consuming clustering task, and it will provide accurate clustering results for the HT-SELEX data.

Keywords: sequence analysis, clustering, SELEX, next-generation sequencing, aptamer, parallel implementation

Received: 2020-06-30
Accepted: 2020-10-01
Published: 2020-10-19
Publisher: EAI

: http://dx.doi.org/10.4108/eai.19-10-2020.166664

Copyright © 2020 S. Kato et al., licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.