Data Mining Oriented Automatic Scientific Documents Summarization

Main Article Content

BJD Kalyani
Jaishri Wankhede
Shaik Shahanaz

Abstract

The scientific research process usually begins with an examination of the advanced, which may include voluminous publications. Summarizing scientific articles can assist researchers in their research by speeding up the research process. The summary of scientific articles differs from the abstract text in general due to its specific structure and the inclusion of cited sentences. Most of the important information in scientific articles is presented in tables, statistics, and algorithm pseudocode. These features, however, rarely appear in the standard text. Therefore, a number of methods that consider the value of the structure of a scientific article have been suggested that improve the standard of the produced summary. This paper makes use of clustering algorithms to handle CL- SciSumm 2020 and longsumm 2020 tasks for summarization of scientific documents. There are three well-known clustering algorithms that are employed to tackle CL- SciSumm 2020 and LongSumm 2020 tasks, and several sentences recording functions, with textual deduction, are used to retrieved phrases from each cluster to generate summary.

Article Details

How to Cite
Kalyani, B. ., Wankhede, J. ., & Shahanaz, S. . (2023). Data Mining Oriented Automatic Scientific Documents Summarization. International Journal on Recent and Innovation Trends in Computing and Communication, 11(4), 126–130. https://doi.org/10.17762/ijritcc.v11i4.6395
Section
Articles

References

Atanassova, I., Bertin, M., & Larivière, V. (2016). On the composition of scientific abstracts. Journal of Documentation.

Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215-2222.

Cohan, A., & Goharian, N. (2017). Scientific article summarization using citation-context and article's discourse structure. ArXiv preprint arXiv: 1704.06619.

Cohan, A., & Goharian, N. (2018). Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 19(2), 287-303.

Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).

Mendoza, M., Bonilla, S., Nogoer, C., Cobos, C., & León, E. (2014). Extractive single-document summarization based on genetic operators and guided local search. Expert Systems with Applications, 41(9), 4158-4169s.

Li, W., Xiao, X., Lyu, Y., & Wang, Y. (2018). Improving neural abstractive document summarization with explicit information selection modelling. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 1787-1796).

Gehrmann, S., Deng, Y., & Rush, A. M. (2018). Bottom-up abstractive summarization. ArXiv preprint arXiv: 1808.10792.

Saini, N., Saha, S., Bhattacharyya, P., & Tuteja, H. (2020). Textual Entailment--Based Figure Summarization for Biomedical Articles. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 16(1s), 1-24.

Yasunaga, M., Kasai, J., Zhang, R., Fabbri, A. R., Li, I., Friedman, D., & Radev, D. R. (2019, July). Scisummnet: A large annotated corpus and contentimpact models for scientific paper summarization with citation networks. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 7386-7393).

Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., & Moon, T. (2013). Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research, 46, 165-201.

Li, L., Xie, Y., Liu, W., Liu, Y., Jiang, Y., Qi, S., & Li, X. (2020, November). Cist@ cl-SciSumm 2020, longsumm 2020: Automatic scientific document summarization. In Proceedings of the First Workshop on Scholarly Document Processing (pp. 225-234).

Fergadis, A., Pappas, D., & Papageorgiou, H. (2019). ATHENA@ CLSciSumm 2019: Siamese recurrent bi- directional neural network for identifying cited text spans. In BIRNDL@ SIGIR (pp. 256-262).

Ou, S., & Kim, H. (2020, March). Ranking-Based Cited Text Identification with Highway Networks. In International Conference on Information (pp. 738-750). Springer, Cham.

Parihar, A. S., Jain, A., & Gupta, A. (2020, June). Citation- Based Scientific Paper Summarization Using Game Theory. In 2020 5th International Conference on Communication and Electronics Systems (ICCES) (pp. 1157- 1161). IEEE.

Zerva, C., Nghiem, M. Q., Nguyen, N. T., & Ananiadou, S. (2020). Cited text span identification for scientific summarisation using pre-trained encoders. Scientometrics, 125(3), 3109-3137.

Altmami, N. I., & Menai, M. E. B. (2020). Automatic summarization of scientific articles: A survey. Journal of King Saud University-Computer and Information Sciences.

Sotudeh, S., Cohan, A., & Goharian, N. (2020). On generating extended summaries of long documents. ArXiv preprint arXiv: 2012.14136.

Vicente, M., & Lloret, E. (2020, October). A discourse- informed approach for cost-effective extractive summarization. In International Conference on Statistical Language and Speech Processing (pp. 109-121). Springer, Cham.

Huang, R., & Krylova, K. (2020, November). Team MLU@ CL-SciSumm20: Methods for Computational Linguistics Scientific Citation Linkage. In Proceedings of the First Workshop on Scholarly Document Processing (pp. 282- 287).

Mishra, S. K., Kundarapu, H., Saini, N., Saha, S., & Bhattacharyya, P. (2020, November). IITP-AI-NLP-ML@ CL- SciSumm 2020, CL-LaySumm 2020, LongSumm 2020. In Proceedings of the First Workshop on Scholarly Document Processing (pp. 270-276).

Yu, T., Su, D., Dai, W., & Fung, P. (2020). Dimsum@ laysumm 20: Bart based approach for scientific document summarization. ArXiv preprint arXiv: 2010.09252.