[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[InetBib] [1st CfP] SDP@EMNLP 2020: 1st Workshop on Scholarly Document Processing and Shared Tasks (SDP 2020)

Date: Thu, 5 Mar 2020 07:48:22 +0000
From: "Mayr-Schlegel, Philipp via InetBib" <inetbib@xxxxxxxxxx>
Subject: [InetBib] [1st CfP] SDP@EMNLP 2020: 1st Workshop on Scholarly Document Processing and Shared Tasks (SDP 2020)

Dear colleagues,

You are invited to participate in the 1st Workshop on Scholarly Document
Processing (SDP 2020) to be held in conjunction with the 2020 Conference in
Empirical Methods in Natural Language Processing (EMNLP 2020) on November 11 or
12 in Punta Cana, Dominican Republic.

The workshop will consist of a research track and a shared task track. The
shared task track includes the 6th edition of the CL-SciSumm shared task
(https://github.com/WING-NUS/scisumm-corpus) and two new summarization tasks --
CL-LaySumm and LongSumm -- geared towards easier access to scientific methods
and results.

The tentative submission deadline is July 15, 2020.

SDP is a continuation of the BIRNDL (https://philippmayr.github.io/BIRNDL-WS/)
and WOSP (https://wosp.core.ac.uk/) workshop series.

Workshop Date and Venue: November 11/12, Punta Cana, Dominican Republic

Website: https://ornlcda.github.io/SDProc/

A tweet to this cfp is also on Twitter: Please share!

<https://twitter.com/SDProc/status/1235405786068602880>

** Introduction **

In addition to the long-standing challenge faced by scholars of keeping up with
the growing literature in their own and related fields, they must now compete
with malign pseudo-science and dis-information in informing public policy and
behavior. This has stimulated workshops and research focused on enhancing
search, retrieval, summarization, and analysis of scholarly documents. However,
the general research community on scholarly document processing remains
fragmented, and efforts towards natural language understanding of scholarly
text that is central to vastly improve all the said downstream applications are
not widespread.

To address these gaps, we, the organizers of BIRNDL and WOSP workshops, propose
the first Workshop on Scholarly Document Processing. We seek to reach to the
broader NLP and AI/ML community to pool the distributed efforts to improve
scholarly document understanding and enable intelligent access to the published
research. The goal of SDP is two-fold: to increase collaboration between
communities interested in leveraging knowledge stored in scientific literature
and data and to establish SDP as the single-focused primary venue for the field.

We seek to appeal to the mainstream NLP and ML community working on SDP tasks –
which are NLP tasks – to publish at SDP as we seek to establish SDP as the
integrated premier venue. We have established a steering committee to help us
turn SDP into a conference in the forthcoming years.

** Topics of Interest **

We invite submissions from all communities interested in natural language
processing, information retrieval, and data mining problems in scientific
documents; and in processing scientific documents for easier access to various
audiences. The topics of interest include, but are not limited to:

· Information extraction, text mining and parsing scholarly literature

· Reproducibility and peer review

· Lay Summarization (i.e., summaries created for non-experts) of individual and
collections of scholarly documents

· Discourse modeling and argument mining

· Summarization and question-answering for scholarly documents

· Semantic and network-based indexing, search and navigation in structured text

· Graph analysis/mining including citation and co-authorship networks

· Analysing and mining of citation contexts for document understanding and
retrieval

· New scholarly language resources and evaluation

· Connecting and interlinking publications, data, tweets, blogs or their parts

· Disambiguation, metadata extraction, enrichment, and data quality assurance
for scholarly documents

· Bibliometrics, scientometrics, and altmetrics approaches and applications

· Other aspects of scientific workflows including open access/science, and
research assessment

· Infrastructures for accessing scientific publications and/or research data

** The 6th Computational Linguistics Scientific Document Summarization Shared
Task (CL-SciSumm 2020) **

(Organisers: Muthu Kumar Chandrasekaran)

CL-SciSumm is the first medium-scale shared task on scientific document
summarization, with over 500 annotated documents. Last year's CL-SciSumm shared
task introduced large scale training datasets, both annotated from ScisummNet
and auto-annotated. For the task, Systems were provided with a Reference Paper
(RP) and 10 or more Citing Papers (CPs) that all contain citations to the RP,
which they used to summarise RP. This was evaluated against abstract and human
written summaries on ROUGE.

The task is defined as follows:

· Given: A topic consisting of a Reference Paper (RP) and Citing Papers (CPs)
that all contain citations to the RP. In each CP, the text spans (i.e.,
citances) have been identified that pertain to a particular citation to the RP.

· Task 1A: For each citance, identify the spans of text (cited text spans) in
the RP that most accurately reflect the citance. These are of the granularity
of a sentence fragment, a full sentence, or several consecutive sentences (no
more than 5).

· Task 1B: For each cited text span, identify what facet of the paper it
belongs to, from a predefined set of facets.

· Task 2 (optional bonus task): Finally, generate a structured summary of the
RP from the cited text spans of the RP. The length of the summary should not
exceed 250 words.

This year, CL-SciSumm '20 will have two new tracks: LaySumm and LongSumm.

** CL-LaySumm 2020: The 1st Computational Linguistics Lay Summary Challenge
Shared Task **

(Organisers: Anita De Waard, Ed Hovy)

To ensure and increase the relevance of science for all of society and not just
a small group of niche practitioners, researchers have been increasingly tasked
by funders and publishers to outline the scope of their research for the
general public by writing a summary for a lay audience, or lay summary. The
LaySumm summarization task considers automating this responsibility, by
enabling systems to automatically generate lay summaries. A lay summary
explains, succinctly and without using technical jargon, what the overall
scope, goal and potential impact of a scientific paper is.

The corpus for this task will comprise full-text papers with lay summaries, in
a variety of domains, and from a number of journals. Elsevier will make
available a collection of Lay Summaries from a multidisciplinary collection of
journals, as well as the abstracts and full text of these journals.

The task is defined as follows:

· Given: A full-text paper, its Abstract, and a Lay Summary of a given paper

· Task: For each paper, generate a Lay Summary of the specified length

Evaluation

The Lay Summary Task will be scored by using several ROUGE metrics to compare
the system output and the gold standard Lay Summary. As a follow-up to the
intrinsic evaluation, we will crowdsource a number of automatically generated
lay summaries to a panel of judges and a lay audience. Details of the
crowdsourcing evaluation will be announced with the sharing of the final test
corpus on July 1st.

All nominated entries will be invited to publish a paper in Open Access
(Author-Payment Charges will be waived) in a selected Elsevier publication.
Authors will be asked to provide an automatically generated lay summary of
their paper, together with their contribution.

** LongSumm 2020: Shared Task on Generating Long Summaries for Scientific
Documents **

(Organisers: Michal Shmueli-Scheuer, Guy Feigenblat)

Most of the work on scientific document summarization focuses on generating
relatively short summaries (250 words or less). While such a length constraint
can be sufficient for summarizing news articles, it is far from sufficient for
summarizing scientific work. In fact, such a short summary resembles more to an
abstract than to a summary that aims to cover all the salient information
conveyed in a given text. Writing such summaries requires expertise and a deep
understanding in a scientific domain, as can be found in some researchers’
blogs.

The LongSumm task opted to leverage blogs created by researchers in the NLP and
Machine learning communities and use these summaries as reference summaries to
compare the submissions against.

The corpus for this task includes a training set that consists of 1705
extractive summaries and around 700 abstractive summaries of NLP and Machine
Learning scientific papers. These are drawn from papers based on video talks
from associated conferences (Lev et al. 2019
TalkSumm<https://arxiv.org/abs/1906.01351>) and from blogs created by NLP and
ML researchers. In addition, we create a test set of abstractive summaries.
Each submission is judged against one reference summary (gold summary) on ROUGE
and should not exceed 600 words.

** Submission Information **

Authors are invited to submit full and short papers with unpublished, original
work. Submissions will be subject to a double-blind peer review process.
Accepted papers will be presented by the authors at the workshop either as a
talk or a poster. All accepted papers will be published in the workshop
proceedings.

The submissions should be in PDF format and anonymized for review.

All submissions must be written in English and follow the EMNLP 2020 formatting
requirements<https://2020.emnlp.org/call-for-papers>. EMNLP will make it
available soon.

Long paper submissions: up to 8 pages of content, plus unlimited references.

Short paper submissions: up to 4 pages of content, plus unlimited references.

Final versions of accepted papers will be allowed 1 additional page of content
so that reviewer comments can be taken into account.

Submission Website: Submission is electronic, using the Softconf START
conference management system. EMNLP will make it available soon.

Shared Task registration: Participants of all shared tasks need to register
here<https://docs.google.com/forms/d/e/1FAIpQLScfHzByrog-k299qBuCp3SbPWcb905_kmOWMvHpDH57VLpVrg/viewform>
before March 31st, 2020

** Important Dates **

Research track:

Submission deadline – July 15, 2020

Notification of Acceptance – August 17, 2020

Camera-ready submission due – August 31, 2020

Workshop – November 11 or 12, 2020

Shared task track:

Training set release – Feb 15, 2020

Deadline for registration and short systems description – March 31, 2020

Test set release (Blind) – July 1, 2020

System runs due – August 1, 2020

Preliminary system reports due – August 16, 2020

Camera-ready submission due – August 31, 2020

Workshop – November 11 or 12, 2020

The dates are at this stage indicative only and can change.

** Keynote Speakers **

1. Kuansang
Wang<https://www.microsoft.com/en-us/research/people/kuansanw/>, Managing
Director, Microsoft Research Outreach Academic Services

2. The second keynote speaker will be announced shortly

** Journal Extension **

In the past, the accepted authors were invited to submit an extended version of
their work to a special issue of a selected journal. The organizers are
currently in the process of identifying appropriate journals to host a similar
special issue this year. Relevant updates including topics and requirements for
this special issue will be shared on the workshop website in due time.

** Organizing Committee **

Muthu Kumar Chandrasekaran, Amazon, Seattle, USA

Anita de Waard, Elsevier, USA

Guy Feigenblat, IBM Research AI, Haifa Research Lab, Israel

Dayne Freitag, SRI International, San Diego, USA

Tirthankar Ghosal, Indian Institute of Technology Patna, India

Drahomira Herrmannova, Oak Ridge National Laboratory, USA

Eduard Hovy, Research Professor, LTI, Carnegie Melon University, USA

Petr Knoth, Open University, UK

David Konopnicki, IBM Research AI, Haifa Research Lab, Israel

Philipp Mayr, GESIS – Leibniz Institute for the Social Sciences, Germany

Robert M. Patton, Oak Ridge National Laboratory, USA

Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel

Dominika Tkaczyk, Crossref, UK

** Steering Committee **

Edward Fox, Professor, Department of Computer Science and Director, Digital
Library Research Laboratory, Virginia Tech

C. Lee Giles, David Reese Professor, College of Information Sciences and
Technology, Pennsylvania State University

Min-Yen Kan, Associate Professor, School of Computing, National University of
Singapore

Dragomir Radev, A. Bartlett Giamatti Professor of Computer Science, Yale
University

Jie Tang, Professor and Associate Chair of the Department of Computer Science
and Technology, Tsinghua University

Alex Wade, Group Technical Program Manager, Chan Zuckerberg Initiative

Kuansang Wang, Managing Director, Microsoft Research Outreach Academic Services

Bonnie Webber, Professor, School of Informatics, University of Edinburgh

** Programme Committee **

1. Akiko Aizawa, National Institute of Informatics, Japan

2. Colin Batchelor, Cambridge, UK

3. Joeran Beel, Trinity College Dublin, Ireland

4. Katarina Boland, GESIS, Germany

5. Guillaume Cabanac, University of Toulouse, France

6. Cornelia Caragea, University of illinois at Chicago, US

7. Zeljko Carevic, GESIS, Germany

8. Tanmoy Chakraborty, IIIT Delhi, India

9. Richard Eckart de Castilho, TU Darmstadt, Germany

10. Helena Deus<https://www.linkedin.com/in/helenadeus>, Elsevier Labs

11. Daniel Duma, University of Edinburgh, UK

12. Ed A. Fox, Virginia Tech, USA

13. Norbert Fuhr, University of Duisburg, Germany

14. C. Lee Giles, Penn State University, USA

15. Bela Gipp, University of Wuppertal, Germany

16. Goran Glavas, University of Mannheim, Germany

17. Hannaneh Hajishirzi, University of Washington, USA

18. Monica Ihli, University of Tennessee Knoxville, USA

19. Ameni Kacem, iCOVER, France

20. Roman Kern, Graz University of Technology, Austria

21. Atsushi Keyaki, Denso IT Laboratory Inc, Tokyo, Japan

22. Martin Klein, Los Alamos National Laboratory, USA

23. Ilia Kuznetsov, Techn. Univ. Darmstadt, Germany

24. Birger Larsen, Aalborg University, Denmark

25. Anne Lauscher, University of Mannheim, Germany

26. Paolo Manghi, National Research Council of Italy, Italy

27. Bruno Martins, University of Lisbon, Portugal

28. Norman Meuschke, University of Wuppertal, Germany

29. Diego Molla-Aliod, Macquarie University, Australia

30. Preslav Nakov, Qatar Computing Research Inst., Qatar

31. Federico Nanni, University of Mannheim, Germany

32. Jumana Nassour, Ben-Gurion University, Israel

33. Paco Nathan, Derwen Inc., US

34. Manabu Okumura, Tokyo Institute of Technology, Japan

35. Francesco Osborne, Open University, UK

36. Arzucan Ozgur, Bogazici University, Turkey

37. Sujit Pal, Elsevier Labs

38. Rajesh Piryani, South Asian University, India

39. Silvio Peroni, University of Bologna, Italy

40. Sujit Pal, Elsevier Labs

41. Animesh Prasad, Amazon, Cambridge, UK

42. Horacio Saggion, Universitat Pompeu Fabra, Spain

43. Angelo Antonio Salatino, The Open University, UK

44. Philipp Schaer, TH Cologne, Germany

45. Vivek Kumar Singh, Banaras Hindu University, India

46. Kazunari Sugiyama, Kyoto University, Japan

47. Saeed Ul Hassan, IT University, Pakistan

48. Lucy Vanderwende, University of Washington, USA

49. Stephen Wan, CSIRO, Australia

50. Bonnie Webber, University of Edinburgh, UK

51. Ivana Williams, Chan Zuckerberg Initiative, USA

52. Dietmar Wolfram, University of Wisconsin-Milwaukee, USA

53. Jian Wu, Old Dominion University, USA

More details available on the workshop website:

https://ornlcda.github.io/SDProc/

With kind regards,

SDP 2020 organizing committee

Muthu Kumar Chandrasekaran
Research Scientist II
Amazon, Day 1
2121 7th Ave,
Seattle, WA 98121
LinkedIn <https://linkedin.com/in/cmkumar/> | Google Scholar
Profile<https://scholar.google.com/citations?user=TNXPTz0AAAAJ&hl=en>

Prev by Date: [InetBib] DGI-Praxisseminar: Methoden und Instrumente der formalen und inhaltlichen Informationserschließung I +II 18. und 19. März 2020, Frankfurt am Main
Next by Date: [InetBib] Tagungsankündigung : MOI! stakeholder-Forum am 1.4.2020
Previous by thread: [InetBib] 1st CfP: SEMANTiCS 2019 || Sep 9 - 12, 2019 || Karlsruhe, Germany
Next by thread: [InetBib] 1st Announcement - International Conference on Semantic Web & Digital Libraries
Index(es):
- Date
- Thread

Listeninformationen unter http://www.inetbib.de.