Rewriting Git history is a powerful, yet potentially perilous, endeavor. When Git Large File Storage (LFS) enters the picture, the complexity escalates significantly. Whether you’re aiming to retroactively migrate large files to LFS, remove sensitive data, or clean up a repository’s past, understanding the right tools and techniques is crucial. While git filter-branch
is a command many recall for history manipulation, its use is now strongly discouraged, especially in LFS contexts. Similarly, git replace
offers unique capabilities but isn’t a silver bullet for LFS migrations.
This guide provides experienced developers and DevOps engineers with a definitive path to navigating advanced Git LFS history rewriting. We’ll explore the pitfalls of outdated methods and detail modern, safer, and more efficient solutions using git lfs migrate
and git-filter-repo
, ensuring your repository remains clean, performant, and correctly configured.
The Perils of git filter-branch
with LFS
For years, git filter-branch
was the go-to tool for complex history rewrites. However, it’s notoriously slow, cumbersome, and fraught with risks, including potential repository corruption if misused. The official Git documentation itself now heavily advises against its use, recommending alternatives like git-filter-repo
.
When LFS is involved, filter-branch
’s shortcomings are magnified:
- LFS Unawareness:
git filter-branch
doesn’t inherently understand LFS. Rewriting history might correctly remove a large file’s blob from Git history but fail to replace it with a proper LFS pointer or update the.gitattributes
file consistently across all rewritten commits. - Pointer Corruption: Incorrectly manipulating commits can lead to broken LFS pointers, where Git expects an LFS object that doesn’t exist or is misreferenced.
.gitattributes
Inconsistency: LFS relies on.gitattributes
to define tracked files.filter-branch
scripts would need extremely careful, custom logic to manage this file correctly through history, which is error-prone.- Performance Nightmare: On large repositories,
filter-branch
can take hours or even days, making iterative refinements or corrections impractical.
Attempting to use filter-branch
to, for example, move all .zip
files to LFS retroactively would involve complex scripting with --tree-filter
or --index-filter
, manually adding files to LFS tracking, and committing changes for each affected commit. This process is highly susceptible to errors.
|
|
The above conceptual command only hints at the complexity and doesn’t fully address replacing file content with pointers. Do not run git filter-branch
for LFS tasks; use the modern tools discussed next.
Modern Gold Standard: git lfs migrate
and git-filter-repo
Fortunately, robust and LFS-aware tools are available that make history rewriting safer and more efficient.
git lfs migrate
: The Purpose-Built LFS History Rewriter
The git lfs migrate
command is specifically designed by the LFS team for converting files in Git history to and from LFS. As detailed in the git-lfs-migrate
(1) man page, it correctly handles LFS pointer creation/removal and .gitattributes
updates.
Migrating Files to LFS (import
)
To move existing large files from Git history into LFS:
This command scans all reachable commits across all local and remote references (--everything
) and converts any files matching *.psd
or *.mov
into LFS pointers. It also adds or updates the .gitattributes
file accordingly throughout the history.
|
|
Ensure your .gitattributes
file itself is committed before running migrate operations if you want it to be part of the history from the beginning.
For more targeted migrations, you can specify branches and explicitly ask for .gitattributes
fixups (though import
typically handles this):
|
|
The --fixup-attributes
option ensures that .gitattributes
entries are correctly added or modified throughout the rewritten history for the specified files.
Migrating Files from LFS (export
)
To convert LFS pointers back into regular Git blobs (removing them from LFS tracking):
|
|
This command replaces LFS pointers for *.tmp
files with their actual content and updates .gitattributes
to stop tracking them.
git-filter-repo
: The Superior General History Rewriter
For general history rewriting tasks beyond simple LFS conversions, git-filter-repo
is the recommended tool. It’s significantly faster and safer than git filter-branch
. You can find its documentation and installation instructions on GitHub.
While git lfs migrate
is often preferred for pure LFS tasks, git-filter-repo
can be invaluable:
Pre-LFS Cleanup: Before an LFS migration, you might want to remove accidentally committed large files or sensitive data entirely from history.
1 2 3
# BACKUP YOUR REPOSITORY FIRST! # Example: Completely remove a mistakenly committed large log file git filter-repo --invert-paths --path my-large-accidental-log.txt
This removes
my-large-accidental-log.txt
from all commits.Direct LFS Conversion:
git-filter-repo
also has built-in capabilities to convert files to LFS pointers using the--to-git-lfs
option.1 2 3
# BACKUP YOUR REPOSITORY FIRST! # Example: Convert all .asset files to LFS pointers using filter-repo git filter-repo --path-glob '*.asset' --to-git-lfs
This can be an alternative to
git lfs migrate import
, potentially useful if you’re already usinggit-filter-repo
for other transformations in the same pass.
Choosing Between git lfs migrate
and git-filter-repo --to-git-lfs
:
For straightforward LFS conversions, git lfs migrate
is purpose-built and often simpler. If you need to perform other history manipulations (e.g., changing commit messages, altering paths) simultaneously with LFS conversion, git-filter-repo
’s combined capabilities can be more efficient.
Understanding git replace
: Non-Destructive History Alteration
The git replace
command offers a different approach: it allows you to substitute one Git object (like a commit) with another without actually rewriting history. The original object remains, but Git commands will operate on the replacement when asked for the original. The git-replace
(1) man page provides details.
Key Use Cases:
- Temporary Fixes: “Patching” a bad commit in history without forcing all collaborators to deal with rewritten SHAs (though the replacements need to be shared).
- Grafting History: Connecting unrelated lines of history.
- Previews with
git-filter-repo
:git-filter-repo
can userefs/replace/
to create non-destructive previews of its changes. Before committing to a full rewrite, you can examine the replaced history.
Limitations for LFS Migration:
git replace
by itself doesn’t solve the core LFS migration problem, which involves changing blobs into LFS pointers within commits. If you replace a commit A
(with a large file) with commit B
(where the large file is an LFS pointer), commit A
still contains the large file. The primary benefit in an LFS context comes from git-filter-repo
’s ability to use replacements for previewing.
A conceptual workflow using git-filter-repo
’s replacement mechanism for preview:
|
|
After this, you can inspect the history. Git commands will show the “replaced” history. The original refs are preserved (e.g., refs/original/refs/heads/main
). Once satisfied, you can make the changes permanent by deleting the original refs and force-pushing the rewritten ones. Refer to git-filter-repo
documentation for the exact commands to finalize these changes.
Critical Considerations for LFS History Rewrites
Any history rewrite, especially with LFS, demands meticulous planning and execution.
Backup, Backup, Backup! Before any operation, create a full, fresh clone (preferably a mirror clone) of your repository.
1
git clone --mirror /path/to/original/repo.git /path/to/backup.git
There is no undo button for a botched history rewrite that’s been force-pushed without a backup.
Team Communication & Coordination Rewriting shared history changes commit SHAs. This will disrupt collaborators.
- Announce a maintenance window.
- Ensure everyone has pushed their changes.
- After the rewrite and force-push, collaborators will need to re-clone or perform a more complex rebase/reset of their local repositories. Provide clear instructions.
Thorough Verification After rewriting, meticulously verify the changes:
- Check out various branches and historical commits.
- Verify LFS-tracked files: Are they pointers? Can their content be fetched?
1 2 3 4 5 6 7 8
# List LFS tracked files in the current checkout git lfs ls-files # List all LFS files in a specific commit's tree git lfs ls-files --tree <commit-sha-or-branch> # Check LFS file status and size (useful for verifying pointer vs. content) git lfs ls-files -s
- Ensure
.gitattributes
is correct at various points in history:1
git show <commit-sha>:.gitattributes
- Confirm LFS objects are present on the LFS server and accessible. Perform a fresh clone in a new directory and run
git lfs pull
.
.gitattributes
Consistency The.gitattributes
file is the heart of LFS tracking. Ensure it’s correctly defined and committed throughout the rewritten history for all LFS-tracked patterns.git lfs migrate
andgit-filter-repo --to-git-lfs
generally handle this well.LFS Server Integrity and Object Uploads After rewriting history locally, when you force-push, ensure all necessary LFS objects are also pushed to the LFS server.
1 2 3 4 5 6 7 8 9 10 11 12
# After local rewrite and verification: git push --all --force git push --tags --force # Ensure LFS objects are pushed for all rewritten commits # This might require pushing each rewritten branch individually # if not all refs were pushed by --all or if LFS objects weren't uploaded. # A common approach is to push LFS objects for the current branch: git lfs push --all origin <branch-name> # You might need to iterate through all rewritten branches. # `git lfs push --all origin` pushes LFS objects for the current branch. # `git push --force-with-lease` is generally safer than `--force`.
A
git lfs push --all origin refs/heads/main refs/heads/develop
etc. might be needed to cover all LFS objects on all relevant branches.
- Impact on CI/CD and Integrations Commit SHAs are fundamental identifiers. Rewriting history will break existing build triggers, links in issue trackers, and any system referencing old SHAs. Plan to update these configurations.
Advanced Scenarios & Troubleshooting
- Complex Branching and Merges:
git lfs migrate --everything
andgit-filter-repo
(when operating on--all
refs) are designed to handle complex histories, including branches and merges, correctly. Always verify merge points post-rewrite. - Incomplete Migrations: If some files were missed, or
.gitattributes
was incorrect, you might have a mixed state.- Use
git lfs ls-files
and manual inspection (checking file sizes, opening files to see if they are pointers or actual content) to diagnose. - You may need to run another targeted
git lfs migrate import
orgit-filter-repo
pass.
- Use
- Performance on Large Repositories:
git-filter-repo
is significantly faster thangit filter-branch
. For extremely large repositories, ensure you have sufficient disk space and RAM. Breaking down the rewrite into smaller, manageable chunks (e.g., by path or file type) might be feasible if one single pass is too resource-intensive, though this adds complexity. - Undoing LFS Tracking for Specific Paths: If files were mistakenly added to LFS, use
git lfs migrate export
(as shown earlier) orgit-filter-repo
to convert them back to regular Git blobs.1 2 3
# Using filter-repo to remove LFS tracking for specific text files # (assuming they were incorrectly tracked) git filter-repo --path-glob '*.config' --from-git-lfs
Conclusion
Rewriting Git history, particularly with LFS, is a task that demands respect for its potential impact. The era of git filter-branch
for such operations is definitively over due to its inherent risks and inefficiencies. Modern tools like git lfs migrate
and git-filter-repo
provide powerful, safer, and more efficient mechanisms for these complex tasks. git replace
serves a different, niche purpose, primarily for non-destructive previews or temporary alterations.
By prioritizing thorough backups, clear team communication, meticulous verification, and leveraging the appropriate modern tools, you can confidently refactor your LFS-enabled repositories. This ensures they remain lean, performant, and correctly structured, reflecting best practices in version control management. Remember that a clean history is not just an aesthetic concern but a foundation for a more maintainable and efficient development lifecycle.