I've often found the numbers displayed when using git diff
to compare two files confusing. What exactly do those numbers represent? Recently, while working on a project to gather code review statistics data from GitHub pull requests, I realized that understanding these numbers is crucial. Line of Code (LOC) information, for instance, is something I wanted to retrieve. Initially, I struggled to obtain this data through the GitHub API. However, I suspected it might be related to the numbers in git diff, prompting me to delve deeper into analyzing code patches within pull requests.
Diff result
Each set of changes displayed in a git diff is referred to as a "Hunk." This concept isn't unique to Git. Let's take an example from one of my projects, where I generated the output by comparing two commits using the command line. You can obtain the same result on GitHub.
index f658fec..9e7f48e 100644
--- a/src/models/event-data.ts
+++ b/src/models/event-data.ts
@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"
export class EventData {
context: Context;
db: EventDB;
+ dataFilePath: string;
constructor(context: Context) {
this.context = context;
this.db = { ranking: [] };
+ this.dataFilePath = process.env.DATA_FILE_PATH || '';
+
+ if (this.dataFilePath == null) {
+ throw new Error('DATA_FILE_PATH is missing in the environment variable.')
+ }
}
async load(context?: Context) {
@@ -18,11 +24,10 @@ export class EventData {
}
const repo = new Repo(context as any);
- const dataFilePath = 'data/ranking.json';
- let contentResponse: OctokitResponse<any, number> = await repo.getContent(dataFilePath)
+ let contentResponse: OctokitResponse<any, number> = await repo.getContent(this.dataFilePath)
let buffer = Buffer.from(contentResponse.data.content, 'base64');
- let data = buffer.toString('ascii');
+ let data = buffer.toString('utf-8');
this.db = JSON.parse(data);
@@ -45,6 +50,28 @@ export class EventData {
console.log('type: ', eo.type);
console.log('will save eo to data.json');
console.log('>>>>> db is looks like:', this.db);
+
+ let message = `rank: ${eo.receiver} -> ${eo.points} point(s)`;
+
+ this.sync(message, 'main');
+ }
+
+ async sync(message: string, branch: string = 'main', context?: Context) {
+ if (context == null) {
+ context = this.context
+ }
+
+ const content = JSON.stringify(this.db);
+ const repo = new Repo(context as any);
+ const currentCommit = await repo.getCurrentCommit(branch);
+ const fileBlob = await repo.createBlob(content, 'utf-8');
+ const pathsForBlobs = [this.dataFilePath];
+ const newTree = await repo.createNewTree([fileBlob], pathsForBlobs, currentCommit.treeSha);
+ const newCommit = await repo.createCommit(message, newTree.sha, currentCommit.commitSha);
+
+ await repo.updateRef(branch, newCommit.data.sha);
+
+ console.log('database sync done.');
}
private add(eo: EventObject) {
Let's start from the header.
diff --git a/src/models/event-data.ts b/src/models/event-data.ts
index f658fec..9e7f48e 100644
The first two lines tell us the diff format is --git
and the file being compared. The git hashes (f658fec..9e7f48e
) of the two files are going after it, and the file permissions.
--- a/src/models/event-data.ts
+++ b/src/models/event-data.ts
The next two lines indicate the file name again with symbols. The base file (---
) is on the top and the compare file (+++
) is on the bottom. All the lines that exist in the base file but do not exist in the compare file are decorated with a -
, these lines are usually displayed in red. All the lines that do not exist in the base file but exist in the compare file are decorated with a +
, these lines are usually displayed in green.
@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"
export class EventData {
context: Context;
db: EventDB;
+ dataFilePath: string;
constructor(context: Context) {
this.context = context;
this.db = { ranking: [] };
+ this.dataFilePath = process.env.DATA_FILE_PATH || '';
+
+ if (this.dataFilePath == null) {
+ throw new Error('DATA_FILE_PATH is missing in the environment variable.')
+ }
}
async load(context?: Context) {
Now, let's figure out what those numbers mean.
The first line of the hunk is a header, @@ -6,10 +6,16 @@
indicating that the hunk is showing 10
lines of the base file, starting from line 6
. It also shows 16
lines from the compare file which starts at line 6
. About the rest content of the header, we will talk about it later.
There are 10
lines without decoration in the rest of the file, they're from both the base file and the compare file. There are 0
lines decorated with -
. 10 + 0 = 10, that's why we got -6, 10
. There are a total of 6
lines decorated with the +
sign in the front from the compare file. 10 + 6 = 16, that's why we got +6, 16
.
The next hunk is a little bit confusing.
@@ -18,11 +24,10 @@ export class EventData {
}
const repo = new Repo(context as any);
- const dataFilePath = 'data/ranking.json';
- let contentResponse: OctokitResponse<any, number> = await repo.getContent(dataFilePath)
+ let contentResponse: OctokitResponse<any, number> = await repo.getContent(this.dataFilePath)
let buffer = Buffer.from(contentResponse.data.content, 'base64');
- let data = buffer.toString('ascii');
+ let data = buffer.toString('utf-8');
this.db = JSON.parse(data);
If you open this commit in the GitHub you find the first line in this hunk is line 24 not 18. The @@ -18,11 +24,10 @@
here actually means Git takes 11
lines of the code (starting from line 18
) from the base file, and compares it with the 10
lines of code (starting from line 24
) from the compare file, and here is the result. Remember the 6 additions in the first hunk? 18 + 6 = 24. Hope this picture can help you understand it if you are still confused.
@@ -45,6 +50,28 @@ export class EventData {
console.log('type: ', eo.type);
console.log('will save eo to data.json');
console.log('>>>>> db is looks like:', this.db);
+
+ let message = `rank: ${eo.receiver} -> ${eo.points} point(s)`;
+
+ this.sync(message, 'main');
+ }
+
+ async sync(message: string, branch: string = 'main', context?: Context) {
+ if (context == null) {
+ context = this.context
+ }
+
+ const content = JSON.stringify(this.db);
+ const repo = new Repo(context as any);
+ const currentCommit = await repo.getCurrentCommit(branch);
+ const fileBlob = await repo.createBlob(content, 'utf-8');
+ const pathsForBlobs = [this.dataFilePath];
+ const newTree = await repo.createNewTree([fileBlob], pathsForBlobs, currentCommit.treeSha);
+ const newCommit = await repo.createCommit(message, newTree.sha, currentCommit.commitSha);
+
+ await repo.updateRef(branch, newCommit.data.sha);
+
+ console.log('database sync done.');
}
private add(eo: EventObject) {
The last hunk is straightforward. Comparing the 6
lines from the base file and the 28
lines from the compare file, we got 22
lines of additions.
Hunk header
Let's break down the hunk header:
@@ -6,10 +6,16 @@ import { EventDB, EventObject } from "../types/ranking-board"
@@ -18,11 +24,10 @@ export class EventData {
@@ -45,6 +50,28 @@ export class EventData {
The hunk header not only provides code range information but also other content. For instance, it displays an import
/export
code snippet as the hunk's header. However, in this case, it might not show the expected result due to Git's rules for selecting a line of text as the hunk header (Defining a custom hunk-header). Won't talk more about it here because it doesn't play an important role here.
GitHub API
GitHub allows developers to get the additions/deletions easily with GitHub API. Somehow, I didn't figure it out in the first place. But thanks to that, I learned how to read git diffs๐.
References:
Howto: Reading Git Diffs and Staging Hunks
Where does the excerpt in the git diff hunk header come from?
Top comments (0)