Home
Got Linux ?

Blah blah blah... Mostly technical thoughts, rants and gibberish


Git History/Repository (Partial) Compaction

[2023-03-29]

Mmmh…

git count-objects -vH
# [output]
#count: 2535
#size: 12.09 MiB
#in-pack: 1589396
#packs: 43
#size-pack: 1.67 GiB

Don’t ask me where and how I ended up dealing with such a behemoth Git(Hub) repository. Suffice it to say it utterly disrupted the user-friendliness of the Git(Hub) experience, with operations latency growing way past my Zen-master patience threshold.

Now, squashing the Git history and compacting a repository is something that is widely documented and discussed on the Wild Wicked Web, by minds way more enlighted than mine. What is less obvious is how to do so while keeping a portion of the history.

So bear with me (and my feeble mind)…

Partial Git Compaction Procedure (Proof of Concept)

mkdir /path/to/git/poc
cd /path/to/git/poc
git init
for n in 1 2 3 4 5 6 7 8 9 10; do
  echo $n >> history
  git add history
  git commit -m "Commit N.${n}"
done
# [output]
#[master (root-commit) 7213819] Commit N.1
# [...]
#[master df39488] Commit N.5
# [...]
#[master c38f922] Commit N.10

# Check the repo content
cat history
# [output]
#1
# [...]
#10

# Show the repo statistics
git count-objects -vH
# [output]
#count: 30
#size: 120.00 KiB
git tag HISTORY_SQUASH df39488  # Commit N.5
git log --format=oneline
# [output]
#c38f922 (HEAD -> master) Commit N.10
# [...]
#df39488 (tag: HISTORY_SQUASH) Commit N.5
# [...]
#7213819 Commit N.1
git checkout HISTORY_SQUASH
# [output]
#HEAD is now at df39488 Commit N.5

# Check the repo content
cat history
# [output]
#1
# [...]
#5
git checkout --orphan NEW
# [output]
#Switched to a new branch 'NEW'
git add .
git commit -m "HISTORY SQUASH ($(TZ=UTC date +'%FT%TZ'))"
# [output]
#[NEW (root-commit) 2ec572b] HISTORY SQUASH (2023-03-29T06:55:54Z)
git cherry-pick HISTORY_SQUASH..master
# [output]
#[NEW 47a79c2] Commit N.6
# [...]
#[NEW 8209fcc] Commit N.10
git branch -D master
# [output]
# Deleted branch master (was c38f922).

git branch -m NEW master
git log --format=oneline
# [output]
#8209fcc (HEAD -> master) Commit N.10
# [...]
#47a79c2 Commit N.6
#2ec572b HISTORY SQUASH (2023-03-29T06:55:54Z)

# Check the repo content
cat history
# [output]
#1
# [...]
#10

# Show the repo statistics
git count-objects -vH
# [output]
#count: 36
#size: 144.00 KiB
# Clean up
git tag -d HISTORY_SQUASH
# [output]
#Deleted tag 'HISTORY_SQUASH' (was df39488)  # Commit N.5

# Disconnect remote branch(es)
for remote in $(git remote show); do
  echo "Removing remote: ${remote}"
  echo "Add again: git remote add ${remote} $(git remote get-url "${remote}")"
  git remote remove "${remote}"
done

# Delete unreferenced objects
git reflog expire --all --expire=now --expire-unreachable=now --verbose
# [very verbose output]

# Garbage collect
git gc --aggressive --prune=now
# [output]
#Enumerating objects: 18, done.
#Counting objects: 100% (18/18), done.
#Delta compression using up to 8 threads
#Compressing objects: 100% (6/6), done.
#Writing objects: 100% (18/18), done.
#Total 18 (delta 0), reused 0 (delta 0), pack-reused 0

# Show the repo statistics
git count-objects -vH
#count: 0
#size: 0 bytes
#in-pack: 18
#packs: 1
#size-pack: 2.94  # kiB

This is it!

PS: Before running git reflog expire and git gc make sure to delete all tags, branches, remotes, etc. that may refer to commits older than the squash point you chose.