ct 20(24): e5

Research Article

From web to SMS: A text summarization of Wikipedia pages with character limitation

Download920 downloads
  • @ARTICLE{10.4108/eai.11-6-2020.165277,
        author={J.L.E.K Fendji and B.A.H. Aminatou},
        title={From web to SMS: A text summarization of Wikipedia pages with character limitation},
        journal={EAI Endorsed Transactions on Creative Technologies},
        volume={7},
        number={24},
        publisher={EAI},
        journal_a={CT},
        year={2020},
        month={6},
        keywords={Character-limitation summarization, SMS, LSA, TextRank, ROUGE, TACOS, Wikipedia},
        doi={10.4108/eai.11-6-2020.165277}
    }
    
  • J.L.E.K Fendji
    B.A.H. Aminatou
    Year: 2020
    From web to SMS: A text summarization of Wikipedia pages with character limitation
    CT
    EAI
    DOI: 10.4108/eai.11-6-2020.165277
J.L.E.K Fendji1,*, B.A.H. Aminatou1
  • 1: Computer Engineering, University Institute of Technology, The University of Ngaoundéré – Cameroon
*Contact email: lfendji@gmail.com

Abstract

Wikipedia is one of the main sources of information on the Web. But the access to this content may be difficult especially when using a basic telephone without browsing capability and only a GSM network. The only means of text-based communication remains through SMS. Due to the limitation of the number of characters, a Wikipedia page cannot always be sent through SMS. This work raises the issue of text summarization with character limitation. To solve this issue, two extractive approaches have been combined: LSA and TextRank algorithms. Generated summaries have been evaluated using ROUGE metrics. Since ROUGE metrics do not consider character limitation, a new threshold named Threshold of Acceptability for Character-Oriented Summaries (TACOS) has been proposed to appreciate ROUGE metrics. The evaluation showed the relevance of the approach for pages of at most 2000 characters. The system has been tested using the SMS simulator of RapidSMS without a GSM gateway to simulate the deployment in a real environment. To the best of our knowledge, this is the first work tackling text summarization issue with character limitation.