Publication

IEEE White Paper

Aktuell

Hinweis: Neuste Ausgabe: IEEE White Paper

Bestehende oder zukünftige Amendments und Versionen müssen separat erworben werden.

Sprache
Format

Zusammenfassung

- Active. A lot of research has been done on different NLP tasks and standards both internationally and in Indian languages (ILs). Much software has been built around these tasks and widely used in products. However, often different research and product groups have created different standards to address the problem. This often causes issues in sharing of data, information representation, etc. In this report, the authors investigate the NLP text processing tasks for which standardization is required and subsequently explore the different standards available either in ILs or internationally. They categorize the tasks primarily based on their input/output for this study. Furthermore, they also conduct few case studies based on downstream applications.

Produktspezifikationen

  • Publication von IEEE
  • Ausgabedatum:
  • Dokumenttyp: IS
  • Seiten
  • Herausgeber: IEEE
  • Lieferant: IEEE
  • Nationales Komitee: IEEE-SASB / Industry Connections Committee

Produktbeziehungen

  • referenziert: [14] W. C. Mann and S. A. Thompson, “Rhetorical structure theory: Toward a functional theory of text organization,” The Structure of Discourse, vol. 8, no.3, pp.243–281, 1987.
  • referenziert: [21] E. Pitler and A. Nenkova, “Revisiting readability: A Unified Framework for Predicting Text Quality,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2008, pp. 186–195.
  • referenziert: [1] K. van Deemter and R. Kibble, “On Coreferring: Coreference in MUC and Related Annotation Schemes,” Comput. Linguistics, vol. 26, no. 4, pp. 629–637, 2000.
  • referenziert: [3] S. Gerani et al., “Abstractive summarization of product reviews using discourse structure,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2014, pp. 1602–1613.
  • referenziert: [9] S. Dandapat, S. Sarkar, and A. Basu, “Automatic part-of-speech tagging for Bengali: An approach for morphologically rich languages in a poor resource scenario,” in Proc. 45th Annu. Meeting Assoc. Comput. Linguistics Companion Volume Proc. Demo Poster Sessions, 2007, pp. 221–224.
  • referenziert: [12] B. Webber, “D-LTAG: Extending lexicalized TAG to discourse,” Cognit. Sci., vol. 28, no. 5, pp. 751–779, Sep. 2004.
  • referenziert: [16] R. Barzilay and M. Lapata, “Modeling local coherence: An entity-based approach,” Comput. Linguistics, vol. 34, no. 1, pp. 1–34, 2008.
  • referenziert: [11] D. Chandrasekaran and V. Mago, “Domain specific complex sentence (DCSC) semantic similarity dataset,” 2020, arXiv:2010.12637.
  • referenziert: [6] P. Gupta and V. Gupta, “A survey of text question answering techniques,” Int. J. Comput. Appl., vol. 53, no. 4, pp. 1–8, Sep. 2012.
  • referenziert: [19] M. Lapata and R. Barzilay, “Automatic evaluation of text coherence: Models and representations,” in Proc. IJCAI, vol. 5, 2005, pp. 1085–1090.
  • referenziert: [13] R. Prasad et al., “The Penn discourse treebank 2.0,” in Proc. LREC, 2008, pp. 1–8.
  • referenziert: [18] C. Guinaudeau and M. Strube, “Graph-based local coherence modeling,” in Proc. 51st Annu. Meeting Assoc. Comput. Linguistics, vol. 1, 2013, pp. 93–103.
  • referenziert: [7] M. Marcus, B. Santorini, and M. A. Marcinkiewicz, “Building A Large Annotated Corpus of English: The Penn Treebank,” MIT Press, Computational Linguistics, Volume 19, Number 2, June 1993, Special Issue on Using Large Corpora: II. Available: https://aclanthology.org/J93-2004/
  • referenziert: [10] S. Baskaran et al., “A common parts-of-speech tagset framework for Indian languages,” in Proc. LREC, 2008, pp. 1–7.
  • referenziert: [23] T. Mohiuddin, S. Joty, and D. T. Nguyen, “Coherence modeling of asynchronous conversations: A neural entity grid approach,” 2018, arXiv:1805.02275.
  • referenziert: [17] M. Elsner and E. Charniak, “Extending the entity grid with entity-specific features,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Hum. Lang. Technol., 2011, pp. 125–129.
  • referenziert: [15] R. Subba and B. Di Eugenio, “An effective discourse parser that uses rich linguistic information,” in Proc. Hum. Lang. Technol., Annu. Conf. North Amer. Chapter Assoc. Comput. Linguistics (NAACL), 2009, pp. 566–574.
  • referenziert: [20] Z. Lin, H. T. Ng, and M. Y. Kan, “Automatically evaluating text coherence using discourse relations,” in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Hum. Lang. Technol., 2011, pp. 997–1006.
  • referenziert: [4] W. Medhat, A. Hassan, and H. Korashy, “Sentiment Analysis Algorithms and Applications: A Survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, Dec. 2014.
  • referenziert: [8] A. Agarwal et al., “Automatic extraction of multiword expressions in Bengali: An approach for miserly resource scenario,” in Proc. Int. Conf. Natural Lang. Process. (ICON), Dec. 2004, pp. 165–174.
  • referenziert: [5] Garje, Goraksh V., and G. K. Kharate. “Survey of Machine Translation Systems in India.” International Journal on Natural Language Computing 2.4 (2013): 47-65.
  • referenziert: [22] M. Mesgar and M. Strube, “A Neural Local Coherence Model for Text Quality Assessment,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2018, pp. 4328–4339.
  • referenziert: [2] M. Poesio, R. Stevenson, B. D. Eugenio, and J. Hitzeman, “Centering: A parametric theory and its instantiations,” Comput. Linguistics, vol. 30, no.~3, pp. 309–363, Sep. 2004.