InteNSE: Interpretability, Robustness, and Benchmarking in Neural Software Engineering (Second Edition: Large Language Models)

InteNSE'24: The 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering

(Second Edition: Large Language Models)

April 15, 2024

Congress and Conference Centre: Centro Cultural de Belém (Daciano da Costa)

[Co-located with ICSE'24, Lisbon, Portugal]

Call for Papers

Important Dates:	Important Links:
Submission Deadline: Dec 15, 2023	Submission Website: HotCRP
Notification: Jan 11, 2024	Workshop Website: https://intense24.github.io/
Camera-Ready: Jan 25, 2024	Twitter: @IntenseWorkshop

Why InteNSE?

Do you have preliminary results on leveraging Large Language Models (LLMs) in a code-related task? Then, we invite you to submit your new ideas and preliminary results to the InteNSE workshop!

The InteNSE workshop aims to promote the required awareness and knowledge necessary for the SE research community to interpret neural models and design robust AI artifacts for software analysis and engineering. The InteNSE program will combine academia and industry to seek well-founded practical solutions. Specifically, the workshop invites international experts in academia and industry with both AI and SE backgrounds to

Discuss their research,
Evaluate prior AI4SE research, and
Identify a research roadmap.

This year, we will focus on LLMs, including general-purpose LLMs such as GPT-4 and Llama 2, and code LLMs such as StarCoder, CodeGeex, and CodeLlama. The workshop will consist of invited talks, presentations based on research papers, a hackathon-style session where attendees interpret various aspects of LLMs and assess their robustness/generalizability, and a panel discussion where all participants are invited to share their insights and ideas to identify a research roadmap.

Research Papers: We accept three types of papers:

Full Workshop Papers: 6-page papers related to the workshop's topics of interest that (1) present novel research ideas and preliminary results or (2) reproduce the results of a previously published paper with a new dataset (to assess robustness and generalizability).
Posters: 2-page papers related to the workshop's topics of interest that are (1) proposing a statement of vision or position or (2) presenting novel ideas without any preliminary results. For the second category of papers, authors should clearly indicate their evaluation plan.
Journal First Papers: The papers accepted to appear at TSE, TOSEM, EMSE, JSS, and IST are welcome to present their papers at the workshop. Journal-first papers do not appear in the workshop proceeding, but they can advertise their work through the InteNSE workshop program.

Topics of Interest

We welcome research related to different aspects of software/code, including code completion and synthesis, program analysis, software testing and debugging, formal verification and proof synthesis, neurosymbolic programming, and prompting. Specifically, we are interested in both theoretical and empirical papers that explore one or more of the following perspectives related to ML4Code (while the focus of this year is LLMs, we welcome research on smaller models as well):

Interpretability:
- Why interpret neural models of code?
- How to interpret neural models of code?
- What are the limitations of ML4Code models?
- How to leverage interpretability for improving neural models of code?
- Do the neural models perceive the code the same way humans do?
Robustness:
- What are the consequences of brittle neural models of code?
- How to assess the robustness/generalizability of neural models of code?
- How to quantify the robustness/generalizability of neural models of code?
- How to develop/train robust neural models of code?
- What are the impacts of robustness on other requirements (e.g., generalization)?
Bnechmarking:
- How to collect benchmarks to better evaluate code language models?
- Why and how can crafted and small-size benchmarks be misleading in evaluating code language models?
Application:
- False positives and false negatives that result in performance degradation.
- Model uncertainty, out-of-distribution, and out-of-sample detection.
- Biases in the model or data that question the fairness of the model.
- Privacy and confidentiality issues.
- Accountability and legal challenges.
- Security-critical applications (e.g., addressing adversarial example and data reconstruction).
- Other applications for neural software engineering/analysis.

Submission Format

Submissions must conform to the ICSE 2024 Proceedings Formatting Guidelines (The title in 24pt font and full text in 10pt type. Latex users must use \documentclass[sigconf,review,anonymous]{acmart} without including the `compsoc` or `compsocconf` options). The page limit is strict, and purchases of additional pages in the proceedings is not allowed. The page limit is strict and purchases of additional pages in the proceedings is not allowed. The official publication date of the workshop proceedings is the date the proceedings are made available. This date may be up to two weeks prior to the first day of ICSE 2024. InteNSE will employ a double-blind review process. No submission may reveal its authors' identities. The authors must make every effort to honor the double-blind review process. In particular, the authors' names must be omitted from the submission, and references to their prior work should be in the third person. The Workshop will follow the ACM SIGSOFT rules on Conflicts of Interest and Confidentiality of Submissions

Program (Monday 15 April)

Time	Presentation
9:00-9:20 (Paper)	An Empirical Comparison of Code Generation Approaches for Ansible Benjamin Darnell (University of California, Santa Barbara), Hetarth Chopra (University of Illinois at Urbana-Champaign), Aaron Councilman (Univ of Illinois Urbana-Champaign), David Grove (IBM Research), Vikram S. Adve (University of Illinois at Urbana-Champaign)
9:20-10:30 (Keynote)	Towards an Interpretable Science of Deep Learning for Software Engineering: A Causal Inference View Denys Poshyvanyk (William & Mary)
11:00-12:10 (Keynote)	Assured LLM-Based Software Engineering Mark Harman (Meta Platforms, Inc. and UCL)
12:10-12:30 (Paper)	An Exploratory Study on How Non-Determinism in Large Language Models Affects Log Parsing Merve Astekin (Max Hort Simula Research Laboratory) and Leon Moonen (Simula Research Laboratory and BI Norwegian Business School)
13:30-15:30 (Tutorial)	Tutorial on Neuro-symbolic Programming Swarat Chaudhuri (University of Texas at Austin) and Atharva Sehgal (University of Texas at Austin)
16:00-16:30 (Talk)	Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion Shizhuo Zhang (University of Illinois Urbana-Champaign)
16:30-17:00 (Talk)	SWE-bench: Can Language Models Resolve Real-World GitHub Issues? John Yang (Princeton)
17:00-17:30 (Day Closing)	InteNSE 2024 (Closing Remarks) Reyhaneh Jabbarvand (University of Illinois at Urbana-Champaign)

Organization Commitee

Reyhaneh Jabbarvand (University of Illinois at Urbana-Champaign)
Saeid Tizpaz-Niari (University of Texas at El Paso)
Earl T. Barr (University College London)
Satish Chandra (Google)

Call for Papers

Important Dates:

Important Links:

Why InteNSE?

Topics of Interest

Submission Format

Program (Monday 15 April)

Time

Presentation

Organization Commitee