Q&A: How are researchers optimizing AI systems for science?

A group led by researchers at Penn State authored three papers detailing their improvements to AI systems

July 30, 2025

By Ty Tkacik

UNIVERSITY PARK, Pa. — Using services like ChatGPT or Microsoft Copilot can sometimes seem like magic — to the point it can be easy to forget about the advanced science running behind the scenes of any artificial intelligence (AI) system. Like any complex system, however, there is always room for improvement and optimization, according to Rui Zhang, assistant professor of computer science and engineering in the Penn State School of Electrical Engineering and Computer Science.

Zhang and his research group recently authored three papers introducing new approaches to processing high-resolution images and automatically prompting better responses from AI systems. The papers, which are currently available online, are set to be published at the 63rd Annual Meeting of the Association for Computational Linguistics, July 27 through Aug. 1 in Vienna, Austria; the 2025 International Conference on Computer Vision, Oct. 19-23 in Honolulu, Hawaii; and the 13th International Conference on Learning Representations, April 24-28 in Singapore.

In the following Q&A, Zhang discussed his group's work, how it can improve the efficiency and usefulness of AI and some strategies individuals can employ to get more value out of their personal AI use.

Q: What is prompt engineering? Are there specific things readers can do to write better prompts for an AI system?

Zhang: Prompt engineering is the process of designing effective inputs — or “prompts” — that guide AI systems like ChatGPT to produce better responses. Since these systems are sensitive to how questions are asked, a well-crafted prompt can significantly improve the system’s output. For example, instead of asking, “summarize this article,” you might say, “summarize this article in three bullet points for a high school student.” The extra context helps the AI tailor its response. For everyday users, the key strategies are to be clear, specific and goal-oriented — don’t be afraid to try multiple prompt versions to refine the results.

Q: What are the benefits of automating and optimizing prompt generation?

Zhang: While good prompt engineering can greatly improve AI performance, writing the best prompt often takes time, experimentation and expertise in the subject matter included in the prompt. In our research, we developed a method called GReaTer that allows AI systems to automatically generate and refine prompts using gradient-based optimization, a type of algorithm that excels at optimizing data in AI systems.

We also developed GReaTerPrompt, a user-friendly and open-source toolkit built on the GReaTer method, which enables models to automatically generate and refine prompts for a wide range of tasks. Automating this process means AI can adapt to new tasks with less human input, improving accuracy, saving time and lowering costs. This is especially valuable for users who lack the time or expertise in a subject to come up with a better prompt. By providing an open-source toolkit, which is freely available for anyone to download, modify or share, we effectively distribute access to our work for all interested users.

Q: How did you measure the effectiveness of GReaTer? Are there real-world tools that could improve with its implementation?

Zhang: We evaluated GReaTer on a wide variety of language reasoning and mathematical problem-solving tasks, such as answering complex questions, solving logic puzzles and performing mathematical computations. The results showed that GReaTer significantly improved performance compared to standard prompting — especially for smaller language models that typically struggle with these tasks because they are limited with specialized parameters for specific tasks and questions. In some cases, these GReaTer-optimized smaller models rivaled much larger ones in quality. Real-world applications that could benefit include AI-powered tutors, writing assistants, customer support agents and any tool that needs to adapt quickly to different users or topics without manual reprogramming.

People smiling looking at the camera at a conference

Rui Zhang was joined by two members of his research group, Ryo Kamoi and Yusen Zhang, to present some of their previous work at last year's Conference on Language Modeling in Philadelphia. Credit: Provided by Rui Zhang.

Q: What is HRScene, and why do researchers care about “high‑resolution image understanding?”

Zhang: HRScene is a new benchmark we developed to evaluate how well modern vision-language models like GPT-4V, Gemini or Claude can understand high-resolution, information-dense images with millions of pixels. Although these models can answer questions about images using natural language, they often fall short when dealing with large, detailed visuals. High-resolution image understanding is critical because many real-world scientific and societal applications depend on subtle, localized details that may be missed by models not equipped to handle large-scale visual input. HRScene includes curated examples from domains like radiology, plant phenotyping, remote sensing and astronomy, which will help accelerate the development of AI systems capable of interpreting visuals and improve their assessment accuracy.

Q: What are the applications of accurate and efficient high-resolution image processing?

Zhang: The potential impact spans many scientific and social domains. In health care, high-resolution AI tools could help interpret radiology scans like MRIs or CTs more effectively, leading to earlier and more accurate diagnoses. In agriculture, AI could assist with plant phenotyping — analyzing traits like leaf structure or disease presence from detailed images — to improve crop yields and sustainability. In environmental science and public safety, high-resolution satellite imagery is used for disaster monitoring, urban planning and climate research. Astronomy could also be improved, as researchers currently analyze telescope imagery at extremely high resolutions to detect faint or distant celestial objects. AI systems that can reliably process such data could accelerate scientific discovery, enhance public health tools and improve responses to global challenges.

The articles’ co-authors at Penn State include Wenpeng Yin, assistant professor of computer science; Yusen Zhang, a computer science doctoral candidate who led research on HRScene; Sarkar Snigdha Sarathi Das, a computer science doctoral candidate who led research on GReaTer; Wenliang Zheng, a third-year computer science undergraduate student who led research on GreaTerPrompt; doctoral candidates Ryo Kamoi, Xiaoxin Lu, Nan Zhang, Ranran Haoran Zhang, Hao Zhou, Zhuoyang Zou, Shu Zhao, Vipul Gupta and Renze Lou; and graduate students Avitej Iyer and Aashrith Madasu.

Additionally, Bo Pang, lead research scientist at Salesforce, and Caiming Xiong, vice president of AI research and applied AI at Salesforce, also contributed to this research. Funding from the U.S. National Science Foundation and Salesforce helped support this research.

At Penn State, researchers are solving real problems that impact the health, safety and quality of life of people across the commonwealth, the nation and around the world.  

For decades, federal support for research has fueled innovation that makes our country safer, our industries more competitive and our economy stronger. Recent federal funding cuts threaten this progress.  

Learn more about the implications of federal funding cuts to our future at Research or Regress.