Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Skip large diffs when generating context for the LLM #2217

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions packages/cli/src/lib/processPatchset.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/**
* Examine all the diffs in a git diff output and replace long ones with a simple
* message indicating that there is a change and its size.
* @param patchset - The patchset to process.
* @param maxDiffLength - The maximum length of a diff to include in the patchset.
* @returns The processed patchset.
*/

export default function processPatchset(patchset: string, maxDiffLength = 5000): string {
const parts = patchset.split(/^(?=diff|commit)/m);
return parts
.map((part) => {
if (part.startsWith('diff')) {
const lines = part.split('\n');
const header = lines[0];
const body = lines.slice(1).join('\n');
if (body.length > maxDiffLength) {
return `${header}\n[Change of size ${body.trimEnd().length}]\n`;
}
}
return part;
})
.join('');
}
6 changes: 5 additions & 1 deletion packages/cli/src/rpc/explain/review.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { parseOptions, REVIEW_DIFF_LOCATION, UserContext } from '@appland/navie';
import configuration from '../configuration';
import { getDiffLog, getWorkingDiff } from '../../lib/git';
import processPatchset from '../../lib/processPatchset';

/**
* This function is responsible for transforming user context to include diff content when the
Expand All @@ -27,7 +28,10 @@ export default async function handleReview(
{
type: 'code-snippet',
location: REVIEW_DIFF_LOCATION, // eslint-disable-line @typescript-eslint/no-unsafe-assignment
content: diffContent.filter(Boolean).join('\n\n'),
content: diffContent
.filter(Boolean)
.map((diff) => processPatchset(diff))
.join('\n\n'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been looking into this type of limiting as well, and one of the issues I see is that getDiffLog returns, as the name suggests, git log output not git diff output. So, splitting this output would seem to be more naturally done on the commit <sha> (author) rather than on the diff --git line.

WDYT?

commit 5f7fed0d7fab55e9b2cdc6cddbebd9644bca416c
Author: Kevin Gilpin <[email protected]>
Date:   Thu Jan 23 09:49:53 2025 -0500

    refactor: Remove unused code
    
    Locating the git repository - can't find any references to this any more

diff --git a/packages/cli/src/lib/git.ts b/packages/cli/src/lib/git.ts
index 5bd8b0a03..139b10eca 100644
--- a/packages/cli/src/lib/git.ts
+++ b/packages/cli/src/lib/git.ts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, maybe it would be better / easier if getDiffLog and getWorkingDiff returned an output that had some structure (at least an array of chunks?) rather than just a string that the caller then has to figure out how to split.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've actually noticed this too and fixed it in the amend below :)

},
],
};
Expand Down
3 changes: 2 additions & 1 deletion packages/cli/src/rpc/navie/welcome-suggestion.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import configuration from '../configuration';
import { getDiffLog, getWorkingDiff } from '../../lib/git';
import processPatchset from '../../lib/processPatchset';
import INavie, { INavieProvider } from '../explain/navie/inavie';
import { UserContext } from '@appland/navie';
import isCustomWelcomeMessageEnabled from './isCustomWelcomeMessageEnabled';
Expand Down Expand Up @@ -40,7 +41,7 @@ async function getChangeDiffs(projectDirectories: string[]): Promise<string[]> {
);
return diffs
.filter((result): result is PromiseFulfilledResult<string> => result.status === 'fulfilled')
.map((result) => result.value);
.map((result) => processPatchset(result.value));
}

export async function getWelcomeSuggestion(
Expand Down
161 changes: 161 additions & 0 deletions packages/cli/tests/unit/lib/processPatchset.spec.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
import processPatchset from '../../../src/lib/processPatchset';

describe('processPatchset', () => {
it('should handle patchset with small diffs correctly', () => {
const patchset = `diff --git a/file1.txt b/file1.txt
index 83db48f..f735c4e 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is a test file.`;

const result = processPatchset(patchset, 1000);
expect(result).toContain('Hello AppMap');
});

it('should have the correct output format', () => {
const largeDiff = 'a'.repeat(2000);
const patchset = `Here's some header.
diff --git a/file1.txt b/file1.txt
index 83db48f..f735c4e 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is a test file.
diff --git a/file2.txt b/file2.txt
index 83db48f..f735c4e 100644
--- a/file2.txt
+++ b/file2.txt
@@ -1,3 +1,3 @@
${largeDiff}
diff --git a/file3.txt b/file3.txt
index 83db48f..f735c4e 100644
--- a/file3.txt
+++ b/file3.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is another test file.`;

const result = processPatchset(patchset, 1000);
expect(result).toMatchInlineSnapshot(`
"Here's some header.
diff --git a/file1.txt b/file1.txt
index 83db48f..f735c4e 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is a test file.
diff --git a/file2.txt b/file2.txt
[Change of size 2078]
diff --git a/file3.txt b/file3.txt
index 83db48f..f735c4e 100644
--- a/file3.txt
+++ b/file3.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is another test file."
`);
});

it('should handle patchsets with multiple commit headers like from git log', () => {
const patchset = `commit 1
Author: John Doe <[email protected]>
Date: Mon Jan 1 12:00:00 2023 +0000

Initial commit

diff --git a/file1.txt b/file1.txt
index 83db48f..f735c4e 100644
--- a/file1.txt
+++ b/file1.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is a test file.

commit 2
Author: Jane Doe <[email protected]>
Date: Tue Jan 2 12:00:00 2023 +0000

Second commit

diff --git a/file2.txt b/file2.txt
index 83db48f..f735c4e 100644
--- a/file2.txt
+++ b/file2.txt
@@ -1,3 +1,3 @@
-Hello World
+Hello AppMap
This is another test file.`;

const result = processPatchset(patchset, 10);
expect(result).toMatchInlineSnapshot(`
"commit 1
Author: John Doe <[email protected]>
Date: Mon Jan 1 12:00:00 2023 +0000

Initial commit

diff --git a/file1.txt b/file1.txt
[Change of size 126]
commit 2
Author: Jane Doe <[email protected]>
Date: Tue Jan 2 12:00:00 2023 +0000

Second commit

diff --git a/file2.txt b/file2.txt
[Change of size 132]
"
`);
});

it('should handle patchset with large diffs correctly', () => {
const largeDiff = 'a'.repeat(2000);
const patchset = `diff --git a/file2.txt b/file2.txt
index 83db48f..f735c4e 100644
--- a/file2.txt
+++ b/file2.txt
@@ -1,3 +1,3 @@
${largeDiff}`;

const result = processPatchset(patchset, 1000);
expect(result).toContain('[Change of size 2078]');
});

it('should handle patchset with mixed diff sizes correctly', () => {
const largeDiff = 'a'.repeat(2000);
const smallDiff = 'b'.repeat(500);
const patchset = `diff --git a/file3.txt b/file3.txt
index 83db48f..f735c4e 100644
--- a/file3.txt
+++ b/file3.txt
@@ -1,3 +1,3 @@
${largeDiff}
diff --git a/file4.txt b/file4.txt
index 83db48f..f735c4e 100644
--- a/file4.txt
+++ b/file4.txt
@@ -1,3 +1,3 @@
${smallDiff}`;

const result = processPatchset(patchset, 1000);
expect(result).toContain('[Change of size 2078]');
expect(result).toContain(smallDiff);
});

it('should return other text unchanged', () => {
const diffs = ['This is some text with no diffs.', '', '- this\n+ that'];
const result = diffs.map((diff) => processPatchset(diff));
expect(result).toEqual(diffs);
});
});

Loading