ANALYZING THE CONSEQUENCES OF PROMPTS ON SOFTWARE DEVELOPERS USING DEEP LEARNING AND GEN-AI

Muhammad Murtaza Khan; Naeem Aslam; Umair Rashid; Ahmad Naeem; Syed Shahid Abbas; Mohsin Ikhlaq Jaam

Authors

Muhammad Murtaza Khan
Naeem Aslam
Umair Rashid
Ahmad Naeem
Syed Shahid Abbas
Mohsin Ikhlaq Jaam

Keywords:

Generative AI, prompt engineering, software development, AI-generated code, developer satisfaction

Abstract

This paper reports on an empirical investigation of the effects of early engineering on performance of Generative artificial intelligence (Gen-AI) tools for software development. Specifically, this thesis explores how ambiguous, purpose-specific, and incremental prompts can be used to improve AI-generated code to be more truthful, productive, and developer satisfying. Our experiments find that the structured prompts substantially outperform the vague prompts in different metrics. Goal-specific prompts approached a 25% improvement in accuracy while cutting down task completion in half, and step-by-step prompts sequenced a 40% improvement in accuracy while cutting down task completion in half. Using structured prompts led to an increase in developer satisfaction of between 30 -40%, implying a desire for improved and more direct instructions. In addition to accuracy and time efficiency, structured prompts provided a 20% to 25% gain in precision and recall measures, with step-by-step prompts providing the greatest improvements in both measures. For step-by-step prompts, F1-scores were always larger, meaning a more balanced performance between recall and precision. The use of goal and step-by-step prompts was also associated with a 25-30% reduction in error rates, further validating goal and step-by-step prompts as a way to reduce the need for correction. However, this paper presents, for the first time, a systematic comparison of various types of prompts and the quantitative analysis of their contribution in piece writing in AI-generated code. By empirically demonstrating the effects of prompt structure on the performance of Gen-AI agents, this paper contributes concrete evidence for how to effectively structure human-AI collaboration in software development. Results show that early engineering is a critical enabler in realizing the true potential of Gen-AI tools and improving developer productivity and satisfaction