You've probably used it, or at least heard of it.pull requests: aPullanfrageis the contribution workflow practiced and popularized by [Code] collaboration sites like GitHub, GitLab, Bitbucket, and others.Gabel, creates some commits, pushes them into a branch, and then creates apull requestto track the integration of these commits into a target repository and branch. Opull requestIt is then used as a vehicle for code review, tracking automated reviews, and discussions until it's ready to be integrated. Integration is typically performed by a project maintainer, often with the click of a button.sessionsButton on the pull request web page.
It is worth noting that the termpull requestnot used everywhere: GitLab calls themmerge requestsfor example. In addition, I take into account the conditionspull requestemerge requestincorrectly named because the terms can be confused with the terminology used by your version control tool (e.g.pull git
ormerge git
🇧🇷 And the implementations of adraworsessionsRequest can't even perform onedrawor onesessions(You can alsoFuchsonedraw/sessionsInquiry but no one calls herRebase Requests🇧🇷 a modern daypull requestis much more than a version control tool operation or even a simple requestdraworsessionsa branch: is an attachment to track the integration of a proposed change before, during and after the change is integrated. But unfortunately. Since GitHub coined the term and is the most popular collaboration platform that implements this functionality, I refer to this general workflow as implemented on GitHub, GitLab, Bitbucket, and others aspull requestsfor the rest of this post.
Pull requests have essentially existed in their current form for over a decade. The core workflow remained largely unchanged. What's different is the addition of value-added features like the integration of status checks as CI results, the ability to rebase or squeeze commits instead of merging them, improvements to code review tools, and lots of UIpolish. GitLab is worth mentioning here because its implementation ofmerge requeststracks much more than other tools. (This is a side effect of GitLab having more built-in functionality than comparable tools.) I'll also give GitLab credit for adding new functionality for pull requests when GitHub was sleeping as a company a few years ago. (Not having a CEO for clear product/company leadership really showed.) Luckily, both companies (and others) are now releasing new and useful features at an amazing pace, which is great for the industry!
While I have no evidence of this, I suspect that pull requests (and the forking model used by the services that implement them) arose when someone thoughtHow do I design a collaborative website based on Git's new and innovative distributed nature and branching capabilities?🇧🇷 Then they started to inventcrotchepull requests🇧🇷 Finally, the pull request implemented by GitHub was initially a veneer over a common Git workflowMake a clone, create a branch and push it somewhere.Without GitHub you would be runningGitclone
,ramo git
, then another command likegit-Request-Pull(where I've seen those words before) to generate/send your branch somewhere. On GitHub, the comparable steps are approximatecreate fork,Create a branch to your fork, zsend a pull request🇧🇷 Today you can even do all of this directly from the web interface without having to run itgit
direct! This means that GitHub can be thought of conceptually as a purely server-side abstraction/implementation of the Git feature branch workflow.
At its core, the pull requirement is basically a good UI and feature layer built around Git's common feature branching workflow.It was probably originally conceived as fine-grained value-added functionality on top of this historically client-side workflow. And this core property of pull requests has been copied since its inception by vendors like Bitbucket and GitLab (and in Bitbucket's case, it was implemented for Mercurial - not Git - since Bitbucket was originally just Mercurial).
A decade is an eternity in the computer industry. As the saying goes, if you're not moving forward, you're going backwards.I think it's about time the industry took a look at the pull request model and evolved it into something better.
I know what you're thinking: you think that pull requests work really well and that they're popular because they're a superior model compared to previous models. These statements are - apart from a few nuances - true. But if you live in the version control realm (like me) or get paid to provide developers with tools and workflows to improve productivity and code/product quality (which I am), pull request workflow flaws and the implementation of this workflow at vendors like GitHub, GitLab, Bitbucket, etc. are obvious and call for revision, if not complete replacement.
So buckle up: you've started a ten thousand word adventure on everything you thought you didn't want to know about pull requests!
Issues like pull requests
To create a better workflow, we first need to understand what's wrong/sub-optimal about pull requests.
I assume that the primary purpose of a pull request is to encourage embedding of a desired, high-quality change into a target repository with minimal effort and complexity for the submitter, integrator, and everyone in between.Pull requests achieve this goal by encouraging collaboration to discuss the change (including code review), tracking automated reviews of the change, linking to related issues, and an implementation detail. And like all implementation details, they should be regularly reviewed and changed as necessary.
Let's start analyzing the problems with pull requests by focusing on the size of the review units. search forGoogle,Microsofthere, zhere, and others showed an inverse correlation with revision unit size and error rate. In the words of Google (emphasis mine):
The size distribution of the changes is an important factor in the quality of the code review process.Previous studies have found that as the change size increases, the number of helpful comments decreases and revision latency increases.Size also affects developers' perception of the code review process; A survey of Mozilla employees has shown thatDevelopers find that size-related factors have the greatest impact on validation latency🇧🇷 A correlation between change size and review quality is recognized by Google andDevelopers are strongly encouraged to make small incremental changes(with the exception of large deletes and automated refactoring). These results and our study support the value of small change review and the need for research and tools that help developers create these small code changes independently for review.
In short, larger changes result in less helpful feedback during review (meaning quality suffers) and make revisions take longer (meaning productivity suffers).Conclusion: If error rate/quality and/or speed is important to you, you should create and review more minor changes rather than fewer major changes.
I totally agree with Google's point of view on this matter and wholeheartedly encourage writing more small changes. Having practiced both types of authoring changes, I can say without a doubt that smaller changes are better: better for authors, better for code reviewers, and better for people looking up repository history later. The main disadvantage of this model is that it requires a bit more knowledge of your version control tool to run. And it requires appropriate tools to play well with this change authoring model while creating as little friction as possible, as the number of tool interactions increases as change size decreases, speed increases, and different units of change increase for integration.
This last point is important and relevant to this post, as today's implementation of pull requests is not very compatible with themany small changesWorkflow. as i will argueThe current implementation of pull requests actively prevents the workflow of many smaller changes. And since minor changes result in faster, higher quality revisions, current pull request implementations compromise quality and speed.
I don't want to single them out, but since they're the most popular and the people who popularized pull requests, let's use GitHub's implementation of pull requests to get my point across.
I posit that in order to make more and smaller changes, we must either a) make more smaller pull requests, or b) have pull request reviews that emphasize individual commits (as opposed to overall commits).merge differences).Let's look at them individually.
If we were to create more smaller pull requests, this would obviously require the need for dependencies between pull requests to maintain speed. And dependencies between pull requests add a potentially prohibitive amount of overhead. let me explain. We don't want to sacrifice the overall speed with which authors and maintainers can integrate proposed changes. If we split the existing proposed changes into smaller pull requests, we would have many more pull requests. With no dependencies between them, authors could wait for each pull request to be merged before submitting the next one. But that would mean more back and forth between author and integrator and almost certainly slow down the overall process. This is not desirable. The obvious workaround for this is to allow multiple related pull requests at the same time. But that would require inventing dependencies between pull requests to track relationships when a pull request isn't integrated before another on which it logically depends. Technically this is certainly possible. But it causes a significant overhead. How do you define dependencies? Are dependencies automatically detected or updated based on commits to a DAG? If so what happens when you force the jab and is it ambiguous if aNovoCommit is a logically new commit or a successor to a previous one? If not, do you really want to put additional hurdles on senders to define dependencies between each pull request? In the extreme of one pull request per commit, do you have someone submit a series of, say, twenty commits, and pull requests actually comment on nineteen dependencies? That's crazy!
There's another, more practical problem: the interaction between Git branches and pull requests. As implemented on GitHub, a pull request is tracked by a Git branch. If we have N interdependent pull requests, that means N branches of Git. Worst case, we have a Git branch for each Git commit. Maintaining N branches of git in progress would be absurd. This would impose significant overhead on pull request senders. This would perfectly highlight Git's inefficiencyreferee gameI wrote branch management about two years ago. (In short, once you get used to workflows—like Mercurial's—that don't require commit or branch naming, Git's forced naming of branches and all the commands that require those branch names seems downright inefficient and a mountain of overhead .) Certain tools could certainly be implemented to allow efficient submission of pull requests. (To seeghstackas an example.) But I think the interplay between Git branches and GitHub pull requests is complex enough that the tools and workflow are unsolvable for anything but the most trivial and best-case scenarios. Keep in mind that any user-friendly solution to this problem would also mean an improvementGit-Rebase
then it moves branches to rewritten ancestor commits instead of keeping them in old commit versions. (Seriously, someone should implement this feature: it definitely makes sense as the default behavior for local branches.) In other words,I don't think you can implement the multiple pull request model reliably and without overwhelming people without fundamentally changing the requirement that a pull request be a gitbranch🇧🇷 (I'd love to prove myself wrong.)
Because of this,I don't think the workflow of small changes can be easily practiced with multiple pull requests using the common GitHub model without effectively moving the definition of a pull request away from equivalence with a Git branch(more on that later). And I'm not suggesting that dependencies between pull requests can't be implemented: they can and doGitLab is proof.But GitLab's implementation is pretty simple and rudimentary (possibly because it's really hard to do something more complicated I speculate).
So, without fundamentally changing the relationship between a pull request and a branch, we're left with our pull request alternative, which emphasizes individual changes more than they domerge differences🇧🇷 Now let's talk about it.
Pull requests have historically emphasized themerge differencesThat is, GitHub (or another provider) takes the Git branch you uploaded and runs amerge git
against the target branch behind the scenes and displays that difference front and center for review as a suggested major change unit: When you clickchanged filestab to start the review you see heremerge differences🇧🇷 You can clickconfirmThen select a single commit to check just that commit. Or you can use the drop down menu on thechanged filesTab to select a single commit for review. These (relatively new) features are a very welcome improvement and make it easier to perform commit-by-commit review, which is a prerequisite for realizing the benefits of a minor and major change workflow. Unfortunately, they are far from sufficient to take full advantage of this workflow.
Standards matter, and GitHub's standard is to show thatmerge differenceswhen checking. (I bet a large percentage of users don't even know they can review individual commits.)Because larger changes result in a higher error rate and slower revision, GitHub's default setting is themerge differencesThis effectively means that GitHub falls back to low-quality, longer-running reviews by default.(I suppose this is good for engagement numbers as it increases service usage immediately and over the long term as subsequent failures drive usage. But I sincerely hope no product managers are concernedLet's design a product that undermines quality to generate engagement.)
Unfortunately, a trivial change from the default to view individual commits took placemerge differencesIt's not that simple, as many authors and projects don't practice clean commit authoring practices, where individual commits are created in a way that allows them to be reviewed in isolation.
(One way to categorize commit-authoring styles is whether a set of commits is created in such a way that each commit is good in isolation, or whether the effect of applying the entire set matters. A handful of mature projects - like the kernel Linux, Firefox, Chrome, Git and Mercurial - practice thisSet of individually good commitsmodel I will callConfirmation-centric workflow🇧🇷 I'd bet most projects on GitHub and similar services practice thisWe are only interested in the end result of the series of commitsModel. A litmus test for practicing the latter model is whether the pull requests contain commits likefix fooor when subsequent pull request revisions create new commits instead of correcting existing ones. I'm a strong proponent of a clean commit history, where each commit stays that way in the final history of the repositoryBoain isolation. But I prefer moreadultsoftware development practices and I'm a version control guru. However, the issue/debate is the subject of another post.)
If GitHub (or anyone else) changed the pull request standard to a commit review without changing the relationship between a pull request and a git branch, it would force many less experienced users to rewrite the familiarize yourself with the history in Git. This would cause significant pain and suffering to pull request authors, which in turn would annoy users, hurt engagement, etc. So I don't think it's feasible.globalPreset that can be changed. Maybe we'd be in a better position if Git's history rewriting experience was better, or if we didn't have to undo behavior from a decade...globalchange:You couldthe ship on the rightOffer projects that practiceclean commit practicesan option to change the revision pattern to highlight individual commits instead of thosemerge differences.This would go a long way towards promoting authorship and reviewing individual commits, which should have a positive impact on review speed and code quality results.
But while these services highlight individual commits in pull request reviews by default, there are still a handful of significant shortcomings that would hinder the workflow we want for both minor and major changes.
While it's possible to review individual commits, all review comments are still piped into a single pull request.timelineactivity view.When the submitter and reviewer go to the trouble of creating and then revising each commit, it pays off to aggregate all the feedback for each change unit into one giant pile of feedback for the entire pull request.This unified feedback stack has a poor (current) ability to identify which commit it refers to, and provides little assistance to the author in identifying which commits need fixing in order to deal with the feedback. This undermines the value of commit-centric workflows and effectively pushes commit authors into itto repairCommit-Authoring-Stil.To effectively review by commit, review comments and discussions must be grouped by commit rather than merged into a unified pull request timeline.That would be a big change to the pull request UI and a daunting task, so it's understandable why it hasn't happened yet. And such an endeavor would also require addressing subtly complex problems, such as: B. How to preserve revisions in the face of pressure from power. Today, GitHub can review commentslose contextwhen overvoltages occur. Things are better than they used to be if review comments left on individual commits were deleted (yes: GitHub did lose several years of code review comments). Revision tracking, these technical issues would probably need to be addressed to reassure the users of this workflow.
Even if GitHub (or someone else) implements robust per-commit checking for pull requests, there's still a speed issue. And this problem is as followsif the pull requirement is your unit of integration (read:merge), so you have to wait for each commit to be verified before the merge takes place🇧🇷 That may sound bearable (after all, that's how it's practiced today). But I contend that this is less than ideal than a world where a change is incorporated as soon as it's ready, without having to wait for later changes.As an author and maintainer, when I see a change that's ready to be merged, I prefer to merge it as quickly as possible and without delay.The longer a change takes to be ready to integrate, the more susceptible it is to bit rot (when the change is no longer valid/good due to other changes in the system). Integrating a good change early also shortens the time for meaningful feedback: if a fundamental problem arises at the beginning of a series of changes that is not recognized before integration, integrating earlier changes without waiting for the next ones will resolve the problems revealed faster. early. This minimizes deltas on changed systems (which often makes regression searching easier), generally minimizes the explosion radius when something goes wrong, and gives the author more time and less pressure to fix subsequent commits that haven't yet been merged. And besides all thatIntegrating more often just feels better.The principle of progressstates that people feel better and work better when they continuously make progress. But the setbacks more than make up for the power of small victories. While I am not aware of any explicit research in this area, my interpretation of the progress principle for change of project ownership and maintenance (supported by anecdotal observations) is that a steady stream of integrated changes looks much better than a single monolithic permanent change in revision purgatory , which often seems like an eternity.While you need to be aware not to confuse movement with meaningful progress, I think the progress principle has real power and that we should try to incorporate changes as soon as they are ready and not later.Applied to both version control and code review, this means that once the author, reviewer and our machinists report the status reviews and agree that it's done, a commit is integrated without waiting for a larger unit of work like the pull request to have to. In short, get on with it as soon as possible!
this desire for itintegrate fasterhas a significant impact on pull requests. Again, looking at GitHub's implementation of pull requests, I don't see how today's pull requests could be adapted to this desired end state without significant structural changes. Check firstdevoincrease the ability to track state per commit, otherwise merging individual commits without the entirety of the pieces makes little sense. But that involves all the complexity I described above. Then there's the issue of Git branches effectively setting a pull request. What happens when some commits are merged into a pull request and the author rebases or merges their local branch with their new changes? that may or may notjust working🇧🇷 And when notjust working, the author is easy to find atMerge Hell Conflict, where commit after commit isn't applied cleanly and your carefully curated stack of commits quickly becomes a liability and an impediment to future progress. (As an aside, Mercurial's version control tool has a concept calledChangeset Developmentwhere it keeps track of which commits - changesets in Mercurial parlance - have been rewritten as other commits and responds gracefully in situations like a rebase. For example when you have commitsx
eY
ex
is integrated by a rebase likeX'
, ahg rebase
againY
to theX'
will see thatx
was rewritten asX'
and skip the rebase attemptx
because it is already applied! This avoids many problems when rewriting the history - such as e.g. B. Merge conflicts - and can thereby make the end-user experience much smoother.) While it's certainly possible to incorporate changes once they are in placeinstantlyWith a pull request workflow, I find it odd and that when you make enough changes to customize the workflow, there is very little left of the pull request workflow as we know it, and it's effectively a completely different one workflow.
The arguments above rest heavily on the assumption that minor changes are superior for quality and/or speed, and that we should design workflows around this assertion.While I firmly believe in the merits of smaller units of change, others may disagree. (If you disagree, you might be wondering if you believe the opposite: that larger units of change are better for quality and speed. I suspect most people can't justify that. But I believe that the argument that smaller units changes have a unit cost or second-round effects that detract from advertised quality or speed benefits.)
But even if you don't believe the resizing arguments, there's still a very compelling reason why we should think beyond the pull requests implemented today: the scalability of the tools.
Implementing pull requests today is heavily dependent on how Git works by default. A pull request is initiated from a git branch that is sent to a remote git repository. When the pull request is created, the server creates a git branch/ref that points to that pull requestKopfoblige. These links are on GitHubcalleddraw/identify/head
(You can fetch them from the remote git repository, but they are not fetched by default). Also when a pull request is created or updated, amerge git
is run to create a diff for verification. On GitHub is the resulting merge commitSavedand pointed to the open pull request via adrag/identify/merge
ref, which is also available locally. (Osessionsref is deleted when the pull request is closed.)
Herein lies our scalability problem: unlimited growth of Git references and ever-increasing rate of change for a growing project. Each Git reference adds overhead to chart and data move operations. data structures or algorithms), there are scaling challenges inherent in this unlimited growth that, speaking as a maintainer of version control tools, I don't want to be a part of. Are technical solutions that allow you to scale to millions of Git references feasible? Yes indeed. But it requires high overhead solutions likeReftable makes JGit, which took about 90 overhaul laps over about 4 months to land. And that's after the feature's design was first proposed, at least as far back asJuly 2017.Don't get me wrong: I'm glad Reftable exists: Yes. It's a fantastic solution to a real problem, and reading how it works will probably make you a better engineer. But at the same time, it's a solution to a problem that doesn't have to exist. There's room to scale diagram data structures and algorithms to millions or even billions of nodes, edges, and paths: your version control tool shouldn't be. Millions or billions of commits and files: that's fine. But scaling the number of distinct paths through this graph by introducing millions of DAG heads is insane given the complexity it brings to random areas of the tool. In my opinion, it requires an unreasonably large investment to make it work at scale. As an engineer, I tend to avoid such problems in the first place. The easiest problems to solve are the ones you don't have.
Unfortunately, the tight coupling of pull requests to git branches/refs leads to unlimited growth and countless associated problems.Most projects may not reach a size that exhibits these issues. But as someone who has experienced this pain space at multiple companies, I can tell you that the problem is very real and the performance and scalability issues it causes are eroding the viability of using the current implementation of pull requests once you have reached a certain threshold scale. Since we can probably fix the underlying sizing issues with Git, I don't think the Git ref explosion is a long-term issue for pull request sizing. But it is today and will remain so until Git and the tools built on top of it improve.
In summary, some common issues with pull requests are as follows:
- examinationmerge differencesBy default, larger review units are encouraged, affecting the quality and speed of the results.
- Inability to incrementally wrap commits in a pull request, which reduces speed and time for meaningful feedback and can lower morale.
- The tight coupling of pull requests with Git branches gives workflows rigidity and makes them less flexible and less desirable workflows.
- Weaknesses in Git's user experience -- particularly in relation to what happens when rewrites (including rebasing) occur -- greatly reduce what workflows are safe to practice with pull requests.
- Tightly coupling pull requests with Git branches can cause performance issues at scale.
We can flip the language to get more ideal results:
- The review experience is optimized for individual commits - notmerge differences- therefore the verification units are smaller and the results are improved in quality and speed.
- Ability to incrementally integrate individual commits from a larger set so that finished changes are integrated sooner, improving speed, time for meaningful feedback, and morale.
- The way you use Git branches doesn't significantly limit the processing of pull requests.
- You're free to use your versioning tool however you like without worrying about having working pull requests hamper your workflow.
- The pull request server is relatively easy to scale to the most demanding use cases.
Let's talk about how we might achieve these more desirable outcomes.
Exploring alternative models
ONEpull requestis just an implementation pattern for the general problem space ofIncorporation of a proposed change🇧🇷 There are other patterns used by other tools. Before I describe them, I would like to coin the termIntegrationsanfrageto refer to the generic concept of asking for a changeintegratedsomewhere else. GitHub pull requests and GitLab merge requests are implementations ofintegration requests, for example.
Rather than detailing the alternative tools, I'll describe the main areas in which the various pull request tools differ and evaluate the pros and cons of the different approaches.
Using VCS for data exchange
One can classify implementations ofintegration requestsby the way, they use the underlying version control tools.
Before Git and GitHub came along, you were probably running a centralized version control tool that didn't support offline commits or feature branches (such as CVS or Subversion). In this world, the common mechanism forintegration requestsshared differences or patches through various mediums - email, posting to a web service from a code review tool, etc. Your version control tool didn't talk directly to a VCS server to start oneIntegrationsanfrage🇧🇷 Instead, you would run a command that exports a text-based representation of the change and sends it somewhere.
Today we can classifyintegration requestswhether they speak the version control tool's native protocol for exchanging data, or whether they exchange patches via some other mechanism. Pull requests speak the native VCS protocol. Tools like Review Board and Phabricator exchange patches via custom HTTP web services. Typically, tools that do not use native switching require additional client-side configuration, possibly including the installation of a custom tool (e.g.RBTools
to the examination board orArcanist
to the manufacturer). Although modern version control tools sometimes have this functionality built in. (e.g. Git and Mercurial are equivalentZawinski's lawand Mercurial has a Phabricator expansion in its official distribution).
An interesting outlier is Gerrit eating hisintegration requeststhroughgit push
. (Verthe documents.) But as Gerrits saw takinggit push
works is fundamentally different from how pull requests work! With pull requests, you push your local branch to a remote branch, and a pull request is created around that remote branch. With Gerrit, your push command is likegit push gerrit HEAD:refs/for/master
.For the non-gurus, thisHEAD:refs/to/master
syntax meanspress theKOPF
commit (effectively the commit corresponding to the working directory) to therefs/an/master
reference numberGerrit
remote control(aFONT:TARGET
The syntax specifies a mapping from local revision identifier to remote reference). The wizard behind the curtain here is that Gerrit runs a special git server that implements non-standard behavior for theReferences/to/*
Refs. when you press itrefs/an/master
Gerrit gets your git push like a normal git server. But instead of writing, a referee calledrefs/an/master
, it takes incoming commits and includes them in a code review request! Gerrit creates Git references to the committed commits. But it does this mainly for its internal tracking (Gerrit stores all of its data in Git - from Git data to revision comments). And if this functionality isn't too magical for you, you can also pass parameters to Gerrit by ref name! for example.git push gerrit HEAD refs/for/master%private
creates a private review request that requires special permissions to view. (It's arguable that overloading the reference name for additional functionality is a good user experience for average users. But you can't deny that this isn't a cool trick!)
On the surface, it might seem that using the version control tool's native data exchange is a superior workflow, because it isnativeand more modern. (Emailing patches is so old fashioned.) Gone are the days of configuring client-side tools to export and ship patches. Instead you rungit push
and your changes can be converted to aIntegrationsanfrageautomatically or with a few mouse clicks. And from a technical standpoint, this replacement method is probably more secure, since iterating through a text-based representation of a change without data loss is surprisingly tedious. (e.g. the lack of lossless binary data exchange from unencrypted JSON to e.g. base64 at first often means that many text-based patches from exchange services are lossy, especially in the presence of non-UTF-8 compliant content, which is reflected in You'd be surprised how many tools experience data loss when converting version control diffs/commits to text (but I digress). Exchanging binary data over Git's network protocol is more secure than exchanging text patches and probably easier to use since no additional configuration is required on the client side.
But while it's more native, modern, and arguably more robust, switching via the version control tool may not be the casePreferably.
First, using the version control tool's native wire protocol prevents the use of any version control tool on the client. If yourIntegrationsanfragerequires the use of a version control tool's wire protocol, the client will likely need to run that version control tool. For other approaches, such as B. text-based exchange of patches, the client could run any software: as long as it could issue a patch or API request in the format required by the server, aIntegrationsanfragecould be created! As a result, there was less potentialHad, since people could use their own tools on their machines if they wanted to and wouldn't (hopefully) impose their choice on others. For example, most Firefox developers use Mercurial - the VCS canonical repository - but a large number use Git on the client. Because Firefox uses Phabricator (previously Review Board and Bugzilla) for code review, and because Phabricator ingests text-based patches, the choice of VCS on the client doesn't matter much and the choice of VCS server can be made without starting a holy war between developers who would be forced to use a tool they don't prefer. Yes, there are good reasons for using a uniform tool (including organizational effort) and sometimes the requirements for using the tool are justified. But in many cases (like random open source posts) it probably doesn't or shouldn't matter. And in cases like Git and Mercurial, where tools like that are awesomeGit-ZinnoberAdopting the version control tool's native network protocol allows for easy conversion between repositories without data loss and acceptable overhead, and can exclude or inhibit contributor productivity as it can force the use of specific and unwanted tools.
Another problem with using the version control tool's Wire protocol is that it often forces or encourages you to work in a certain way. Take GitHubpull requests for example. The pull request is built around the remote gitbranch yougit push
🇧🇷 If you want to update this branch, you need to know its name. So it takes some effort to create and track this branch or find its name when you want to update it. Unlike Gerrit, where you don't have an explicit remote branch to push: you justgit push gerritHEAD:refs/for/mestre
and discovers things automatically (more on that later). With Gerrit I don't have to create a local git branch to start aIntegrationsanfrage🇧🇷 I am obliged to do this for pull requests. And it can sap my productivity by forcing me to practice less efficient workflows!
Our final point of comparison is scalability.If you use the Wire Protocol version control tool as part ofintegration requests, you introduced the problem of scaling your version control server.Take it from someone who has had a lot to do with scaling version control servers and who knows the basic details of Git and Mercurial network protocols well: you don't want to delve into scaling a version control server. The wire protocols for Git and Mercurial were developed in a now-old computing era and were not designed by network protocol experts. They are inherently difficult to scale only at the line protocol level. I've heard stories that Google's most expensive single server was once Perforce or the Perforce-derived server (that was a few years ago - Google has since switched to a better architecture).
Bad network protocols in version control tools have many side effects, including the inability or outright difficulty to use distributed storage on the server. Therefore, to scale out computing, you need to invest in expensive network storage solutions or develop a replication and synchronization strategy. And take it from someone who's worked on data sync products (outside the source control space) at three companies: this is a problem you wouldn't want to solve yourself. Data synchronization is inherently difficult and fraught with difficult trade-offs. It's almost always a problem best avoided when given a choice.
If creating git references is part of creating aIntegrationsanfrage, They introduced a scaling challenge with the number of git references. Do these gitrefs live forever? What happens when you have thousands of developers—possibly all working on the same repository—and the number of references or refmutations grows to hundreds of thousands or millions a year?
Can your version control server accommodate a reasonably performing push every or two seconds? Unless you're Google, Facebook, or a handful of other companies I know of, you can't. And before you cry that I'm talking about issues that only affect 0.01% of companies, I can name a handful of companies that are less than 10% the size of these giants where this is a problem for them. And I also guarantee that a lot of people don't have client-side metrics for themselvesgit push
P99 times or reliability and not even realizing there is a problem! Scaling version control is probably not an essential part of your company's operations. Unfortunately, this often becomes something that companies have to allocate resources to due to poorly designed or misused tools.
Compare size challengesintegration requestswith a native version control server instead of just exchanging patches. with mostPrimitiveapproach, you're probably pushing the patch to a web service over HTTP. And with tools like Phabricator and Review Board, that patch is turned into rows in a relational database.I guarantee it will be easier to scale an HTTP web service against a relational database than your version control server.At the very least, it should be easier to manage and debug, since there are far more experts in these areas than in the version control server domain!
Yes, it's true that many won't hit the size limits of the version control server. And there are some nifty solutions to scaling. But large parts of this problem space - including the maintainers of the version control tool - have to be supportedCrazyScaling vectors in your tools - could be avoided entirely ifintegration requestsIt didn't rely as heavily on the standard mode of operation of version control tools. Unfortunately, solutions like GitHub use pull requests and Gerrit's use of Git references for storageatPut a lot of pressure on the version control server to scale and make it a very real problem once you get to a certain scale.
We hope that the paragraphs above have clarified some of the implications of choosing a data exchange mechanismintegration requests!Let's move to another point of comparison.
confirm tracking
One can classify implementations ofintegration requestsasto accompanycommits during its integration lifecycle. What I mean by that is like themIntegrationsanfragefollows the same logical change as it develops. For example, if you submit a commit and then change it, how does the system know that the commitdevelopedcommitx
to theX'
.
Pull requests do not track commits directly. Instead, a commit is part of a git branch, and that branch is tracked as the entity around which the pull request is built. The verification interface shows themerge differencesfront and center. You can view individual confirmations. But as far as I know, none of these tools have the intelligence to explicitly track commits or map new commits. Instead, they simply assume that the commit order will be the same. When commits are reordered, added, or removed in the middle of an existing series, it's easy for the tool to become confused. (With GitHub, it was possible for a revision comment left on a commit to disappear entirely. The behavior has now been fixed, and if GitHub doesn't know where to print a comment from a previous commit, it will print it as part of the project timeline. Pullrequest -Opinion.)
If you're only familiar with pull requests, you might not know that there are alternatives to tracking commits! In fact, the most common alternative (which isn'tdo nothing) is completely older than pull requests and is still practiced by many tools today.
Gerrit, Phabricator and Review Board work in such a way that the confirmation message contains a unique token that identifies themIntegrationsanfragefor this commit, e.g. a Phabricator revision confirmation message contains the lineDifferential Review: https://phab.mercurial-scm.org/D7543
🇧🇷 Gerrit will have something like thisChange ID: id9bfca21f7697ebf85d6a6fa7bac7de4358d7a43
.
The way this annotation appears in the confirmation message varies by tool. Gerrit's web UI promotes a shell line for cloning repositories that not onlygit-clone
but also usescurling
to download a shell script from the Gerrit server and install it as Gitconfirmation message
Check the newly cloned repositories. This git hook ensures that each new commit created has aChange ID: XXX
Row containing a randomly generated unique identifier. Phabricator and Review Board use client-side tools to rewrite commit messages after submission to their respective tool so that the commit message includes the code review URL. Which approach that is is debatablePreferably- each of them has advantages and disadvantages. Fortunately, this debate is not relevant to this post, so we will not address them here.
What is important is how this metadata is used in the confirmation messages.
The commit message metadata comes into play when a commit is included in aIntegrationsanfrage🇧🇷 If an acknowledgment message does not contain metadata or references a nonexistent entity, the receiving system assumes it is new. Normally, if the metadata matches an entity in the file, the incoming commit automatically matches an existing commit, even if your Git SHA is different!
This approach of including a tracking ID in commit messages works surprisingly well for tracking commit progress! Even if you change, rearrange, insert, or remove acknowledgments, the tool can usually figure out what corresponds to previous submissions and match the status accordingly. Although support for this varies by tool. The Mercurial push to Phabricator extension is smart enough to honor the local commit DAG and change the revision unit dependencies in Phabricator to reflect the new DAGthe image, for example.
Commit tracking is another area where the simpler, more modern features of pull requests often don't work as well as previous solutions.Yes, including an identifier in commit messages seems cumbersome and can be brittle at times (some tools don't implement commit rewrite very well and this can lead to poor user experience). But you can't dispute the results:Using explicit, stable identifiers to track commits is much more robust than the heuristic pull requests they rely on.The false negative/positive rate is much lower. (I know this from experience because we were trying to implement the commit tracking heuristic for a code review tool at Mozilla before Phabricator was deployed, and there was a surprising number of corner cases that we couldn't properly handle And that was using MercurialObsolescence marker, which gave us commit evolution data generated directly from the version control tool! If that didn't work well enough, it's hard to think of a heuristic that would. Eventually we gave up and used stable identifiers in commit messages, which fixed most of the pesky corner cases.)
Using explicit commit tracking identifiers doesn't seem to make a significant difference. But its impact is profound.
The obvious benefit of tracking identifiers is that they allow you to rewrite commits without rewritingconfusedaIntegrationsanfrageTool.This means that people can rewrite advanced history with near impunity about how they relate to theIntegrationsanfrage.I'm a heavy story rewriter. I like to organize a series of high quality single commits, each isolated. When I submit a series like this to a GitHub pull request and get feedback on something I need to change, I have to think when making those changesmy history rewritten here will make re-evaluation difficult🇧🇷 (I try to empathize with the reviewer and make their life easier whenever possible. I ask what I would expect from someone reviewing their change, and I usually do.) With GitHub pull -Requests, if I reorder or add commits or remove a commit in the middle of a series, I realize that this can make the revision comments left on those commits hard to find, as GitHub will not be able to resolve the rewrite of history. And that could mean that these are review commentslostand ultimately not implemented, resulting in bugs or bad changes.This is a classic example of tool flaws dictating a sub-optimal workflow and output: Since pull requests don't explicitly track commits, I'm forced to either adopt a sub-optimal workflow or sacrifice something like commit quality in order to Minimize the risks assessment tool will not get confused.In general, tools shouldn't offload that kind of cost or trade-off to users: they shouldjust workingand optimize for generally accepted optimal results.
Another benefit of tracking identifiers is that they allow verification by confirmation.Once you can follow the logical progression of an individual commit, you can start associating things like revision comments with individual commits with a high degree of confidence. With pull requests (as implemented today) you can try to link comments to commits. But since you can't track commits between rewrites with any reasonable success, you usually rewrite commitsfall through the cracks, and leave orphaned data-like rating comments with them. Data loss is bad, so you need a place to collect those orphaned data. The main pull request activity timeline facilitates this feature.
But once you can track commits reliably (and tools like Gerrit and Phabricator prove you can), you no longer have this serious data loss problem and therefore don't have to worry about finding a place to collect data. to create per-commit revision units, each loosely coupled with other commits and a total series, as you wish!
It is interesting to look at the different approaches in the different tools here. It's doubly interesting to look at the possible behavior of the proofreading tool itself and what it does by default!
Let's look at Phabricator. The Phabricator Review Unit is theDifferential verification. (Differentialis the name of the code review tool in Phabricator, which is actually a feature set - like GitLab, but not quite as complete as the feature.) OneDifferential verificationrepresents a single difference.differential The auditcan have parent-child relationships with other people. Several revisions connected in this way form a conceptBatteryin Phabricator terminology. Go tohttps://phab.mercurial-scm.org/D4414and searchBatteryto see it in action. 🇧🇷Batteryis a bit of a misnomer as the parent-child relationships actually form a DAG and Phabricator is able to represent things like multiple children in its graphical view.) Phabricator's official client-side tool for sending to Phabricator - Arcanist orThe book
- has the default behavior of merging all git commits into oneDifferential verification.
Phabricator can keep the metadata of individual commits (it can at least render the commit messages in the web UI so you can see where thedifferential verificationcame out). In other words, Arcanist doesn't do this by defaultnobuild multiplesdifferential The auditfor each commit and therefore does not create parent-child relationships for them. so there isn'tBatteryrender here. To be honest I'm not sure if modern versions of Arcanist support this. I know that Mercurial and Mozilla created custom client tools to push Phabricator to work around flaws like this in Arcanist. Mozilla may or may not be appropriate for users outside of Mozilla in general - I'm not sure.
Another interesting aspect of Phabricator is that there is no overarching series concept. Instead everyoneDifferential verificationremains isolated. They can form parent-child relationships and form aBattery🇧🇷 But there is no user interface or primary APIsStaple(I watched the last one anyway). That may seem radical. You may be wondering howHow do I monitor the general status of a series?orHow do I submit information relevant to the series as a whole?.Those are good questions. But without getting into that, the answer, radical as it may sound, is no general tracking unit for a bunch ofdifferential The audit, It works out. And having used this workflow with Mercurial Project for a few years now, I can say that I don't miss the functionality much.
Gerrit is also worth a look. Like Phabricator, Gerrit uses an identifier in commit messages to track the commit. But while Phabricator rewrites commit messages at the time of the first commit to include the URL created as part of that commit, Gerrit peppers the commit message with a unique identifier at the time of the commit creation. The server then manages the commit identifier mapping for the verification unit. Aside from implementation details, the end result is similar: individual commits are easier to track.
What makes Gerrit different from Phabricator is that Gerrit has stronger clustering around multiple commits. Gerrit tracks when commits are pushed together and renders both arelationship chainesent togetherlist automatically. While it lacks the visual beauty of Phabricator's implementation, it is effective and, unlike Phabricator, comes in the UI by default.
Another difference from Phabricator is that Gerrit uses proofreading by default. While you need an unofficial client for Phabricator to send a series of commits to form a linked chain, Gerrit does it by default. for the revision: If you want a single revision to appear, you must first squish the commits locally and then push the zipped commit. (More on this topic later in the post.)
Another benefit of per-commit verification is that this model allows for incremental merge workflows, where some commits in a series or set can be merged before others without waiting for the entire batch.Merging commits incrementally can significantly speed up certain workflows because commits can be merged immediatelyinstantlyand no more. The benefits of this model can be incredible. However, actually deploying this workflow can be difficult. One problem is that your version control tool can get confused when rebasing or merging sub-goal states. Another problem is that it can increase the overall repository change rate, which can overwhelm version control systems for CI deployment mechanisms. Another potential issue relates to the notification of approval of the merger approval review. Many tools/workflows are combinedI sign this amendmenteI sign by landing this change🇧🇷 While they're effectively identical in many cases, there are some valid cases where you might want to track them differently. And adopting a workflow where commits can be merged incrementally will uncover those edge cases. So before you go down this path, you should think about who is merging commits and when they will be merged. (You should probably think about it anyway, because it's important.)
Designing a better integration request
After describing some problems with pull requests and alternative ways to solve the general problemintegration requests, it's time to solve the million-dollar problem: a better designIntegrationsanfrage🇧🇷 (When you factor in the time people spend on pull requests and the cost of poor quality bugs/changes slipping through the design of existing tools, improving integration requests across the industry would be onea lot ofworth more than $1 million.)
As a reminder, the pull request is basically a nice user interface and feature set built around Git's shared feature branch workflow. This property has persisted since the early days of pull requests in 2007-2008 and has since been copied by providers such as Bitbucket and GitLab. In my opinion, pull requests should be available for review.
Replace forks
The first change I would make to pull requests is move away fromforksbe a necessary part of the workflow.That may seem radical. But it is not!
ONEGabelon services like GitHub is a complete project—as is the canonical project it forked from. It has its own issues, wiki, releases, pull requests, etc. Now hands up: how often do you use these features on a fork? Neither do I.In most cases aGabelIt only exists as a vehicle to initiate a pull request to the repository it was forked from.It serves little or no additional significant functionality. I'm not saying nowforksserve no purpose - they certainly do!But if someone wants to suggest a change to a repository, aGabelIt is not strictly required and its existence is forced on us by the current implementation of pull requests.
I saidimportantin the previous sentence becauseforksIntroduce overload and confusion. The existence of oneGabelmight confuse someone about where a canon project lives.forksit also increases overhead for the version control tool. Its existence forces the user to manage remote and additional branches of Git. This forces people to remember to keep their branches in sync on their fork. As if remembering to keep your local repository in sync wasn't hard enough! And crowd into oneGabel, you need to recommit data already uploaded to the canonical repository, even if that data already exists on the server (just in a different view of the git repository). (I believe Git is working on connection protocol improvements to mitigate this.)
If only used as a vehicle to initiateintegration requests, I do not think soforksoffer enough value to justify their existence.Forks must be present: yes. If people are required to use them to contribute to change, no. (Valid use cases for aGabelwould be performing a community split of a project and creating an independent entity for reasons such as better guarantees of data availability and integrity.)
forksare essentially a veneer on a server sidegit-clone
🇧🇷 And the reason for using a separate Git repository is probably because older versions of GitHub were just a bunch of abstractions on itgit
commands. The service grew in popularity, people copied its features almost verbatim, and no one ever looked back and thoughtwhy are we doing this at all.
To answer what we would replaceforkscom, we must go back to basic principles and askWhat are we trying to do.and this ispropose a unit change from an existing design🇧🇷 And for version control tools, all you need to propose a change is a patch/commit. So to replaceforks, weonly need an alternative mechanism to push patches/commits into an existing project.
My favorite alternative to forks is to usegit push
directly to the canonical repository.This can be implemented like Gerrit, where you press on a special reference. for example.git push origin HEAD:refs/for/master
🇧🇷 Or - and this is my preferred solution - the version control servers can get smarter, liketo presswork - maybe even change the commands likegit push
do when the server is working in special modes.
One idea would be for the git server to expose different refsnamespacesdepending on the authenticated user. I am for exampleindygreg
on GitHub When I want to suggest a change to a project - let's saypython/python
-I couldgit-clone git@github.com:python/cpython
🇧🇷 I would create a branch - let's sayindygreg/proposed change
🇧🇷 I would thengit push origin indygreg/proposed change
and since the branch prefix matches my authenticated username, the server lets it pass. I can then open a pull request without a fork! (Using branch prefixes isn't ideal, but should be relatively easy to implement on the server. A better approach would be to remap the Git reference names. Users are willing to put up with that. An even better solution would be for Git develop some functions to make this easier.git push --workspace origin proposed change
would pushchange proposal
for oneWorkplacenoOrigin
remote that Git knows how to translate into a corresponding remote reference update.)
Another idea would be for the version control server to invent a new concept for exchanging commits - one based onCommit-Setsinstead of DAG synchronization. Instead of doing a complicated discovery dance to sync commits to the underlying Git repository, the server would ingest representations of them and make them availableCommit-Setsstored next to - but not within - the repository itself. That way you don't scale the repository's DAG to an infinite number of heads - which is a tough problem! A concrete implementation of this could result in the client running agit push --workspaceorigin proposed change
to tell the remote server to store yourschange proposal
Branch in your staffWorkplace(Apologies for reusing the term from the previous paragraph). The Git server would take your commits, generate a standalone blob to store them, store that blob in a key-value store like S3, and update a map of which commits/branches are in which blobs in a data store like a relational database somewhere. This would effectively separate the project's core data from the more transient branch data, keeping the core repository clean and pristine. This allows the server to rely on data stores that are easier to scale, e.g. B. Key-value blob storage and relational databases instead of the version control tool. I know that ideal is achievable because Facebook has implemented it for Mercurial. Oinfinite impulse
Extension essentially mercury siphonsPACKAGES(standalone files containing commit data) to blob storage when pushes arrive over the network. In thedraw hg
If a requested revision doesn't currently exist in the repository, the server queries the database-based blob index if the revision exists anywhere. 🇧🇷 Duringinfinite impulse
The extension is missing from the official Mercurial project (no fault of Facebook), the basic idea is solid and I wish someone would take the time to release the design a bit more because that could really lead to scaling repositories logicallyinfiniteDAG directs DAG scaling algorithms, repository storage, and version control tool algorithms for without the complexity of scalinginfiniteheads. Back to the topicintegration requests, one can imagine having onetargetthroughwork area presses🇧🇷 For example,git push --workspace=Rework source
would push on thereview
Workspace that automatically initiates a code review.
These ideas may sound familiar to attentive readers of this blog. I have made an applicationusernamespaceson my/blog/2017/12/11/common-git-problems-and-how-to-fix-them/post a few years ago. So read on there to learn more about the implications of the endingforks.
Couldforksbe dropped as a request to send pull requests? gerritsgit push origin HEAD:refs/for/master
mechanism proves it. Is Gerrit's approach too magical or confusing for regular users? I'm not sure. Could Git extend the features to make the user experience much better? Users don't need to be burdened with complexity or magic and can simply run commands likegit send --for verification
🇧🇷 Definitely!
Shift the focus from branches to individual commits
my idealIntegrationsanfrageis about individual commits, not branches.Although the client can send a branch to start or update oneIntegrationsanfrage, aIntegrationsanfrageconsists of a series of loosely coupled commits, where parent-child relationships can exist to express a dependency between commits. Each commit is evaluated individually. Although one may need to examine multiple commits to get a full understanding of the proposed change. And some UI activation operations on a group of related commits (such as bulk deleting aborted commits) may be justified.
In this world, the branch would not matter. Instead, commits are king. Why should we use the branch name as a tracker for theIntegrationsanfrage, we would need something to replace it, otherwise we don't know how to update an existing oneIntegrationsanfrage!We should do what tools like Phabricator, Gerrit, and Review Board do and add a persistent identifier to commits that survive history rewriting. be orphaned - see above.)
It's worth noting that a centered commitIntegrationsanfrageModel doesn't mean everyone writes or reviews a bunch of smaller commits!While industry titans and I strongly encourage writing smaller, commit-centric commitsintegration requestsDon't make him do it. This is because it is commit-centricintegration requestsdon't force you to change your local workflow! If you're the kind of person who doesn't want to curate lots of nice little isolated commits (it's a bit more work, after all), nobody would force you to do it. If this is your commit authorship pattern, you can submit the proposed change insteadpumpkinthese commits together as part of the commit,optionalRewriting their local history in the process. If you want to keep dozensPatch-Commitsin your local story, that's fine: just have the tools collected and sent all together. While I don't think soPatch-Commitsare so valuable and shouldn't be seen by reviewers, if we wanted we could have tools to keep submitting them and making them visible (like they are today in GitHub pull requests, for example). But they wouldn't be the focus of review (again, like GitHub pull requests are today).Makeintegration requestsCommit-centric doesn't force people to adopt a different commit-authoring workflow. But it enables projects that want to take on moremadurosacrificing hygiene.However, the way tools are implemented can impose limitations. But that's nothing about commit-centric verification, which prohibits the use ofPatch-Commitsin local workflows.
While I should make my own post defending the merits of commit-centric workflows, I'll proxy my case and note that some projects don't use modern pull requests precisely because commit-centric workflows aren't viable. When I was at Mozilla, one of the blockersMigration to GitHubPull request review tools were incompatible with our worldview that review units should be small. (This view is generally shared by Google, Facebook, and some prominent open source projects, among others.)merge differencesand are not robust in the face of history rewriting (due to lack of robust commit tracking), projects that insist on morerefinedPractices will continue to avoid pull requests. Here, too, a connection was established between the rating size and quality. And better quality—along with the long-term effect of lowering development costs through fewer bugs—can tip the scales in your favor, even against any benefit you derive from using a product like GitHub, GitLab, or Bitbucket.
The best there is
aspects of a betterIntegrationsanfrageexist in tools today. Unfortunately, many of these features are not present in the pull requests implemented by GitHub, GitLab, Bitbucket, etc. So to improve pull request, these products need to borrow ideas from other tools.
integration requeststhose not built around git branches (Gerrit, Phabricattor, Review Board, etc.) use identifiers in commit messages to track commits. This helps track commits when changes are made. This model has compelling advantages. Robust commit tracking is a requirement for commit-centric workflows. And it would even improve the functionality of branch-based pull requests. a well designed oneIntegrationsanfragewould have a robust acknowledgment tracking mechanism.
Gerrit has a world-class experience for commit-centric workflows. It is the only popular implementation ofintegration requestsI'm aware that it supports and honors this workflow by default. I don't think you can change that! (This behavior is user-unfriendly in some cases, since it forces users to know how to rewrite commits, which is often dangerous in Git land. It would be nice if you could get Gerrit to automatically squeeze commits into the same revision unit But I understand the reluctance to implement this feature because it has its own challenges in terms of commit tracking, which I won't bore you with.) Gerrit also shows groups of related commits front and center when he a proposed change.
Phabricator is the only other tool I'm aware of that allows you to get a decent commit-centric workflow without the pitfalls of orphaned comments, context overload, etc. already mentioned in this post. However, this requires non-standard push tools and the commit string is not highlighted in the web UI. Therefore, Phabricator's implementation is not as solid as Gerrit's.
Another commendable Gerrit feature is the submission mechanism. you easygit push
for a special note. That's it. No fork needs to be created. There is no need to create a git branch. There is no need to create a separate pull request after submission. Gerrit simply takes the commits you submit and turns them into a request for review. And it requires no additional client-side tools!
using a single jointgit
Send and update command aIntegrationsanfrageIt's easier and arguably more intuitive than other tools. Is Gerrit's finish perfect? Not. Ogit push origin HEAD:refs/for/master
The syntax is not intuitive. And overloading the delivery options by effectively encoding the URL parameters in the referrer name is a crude - albeit effective - hack. But users will likely quickly learn the one-liner or create more intuitive aliases.
The elegance of using just onegit push
a Integrationsanfrageputs Gerrit in a league of his own. I would be thrilled if the world's GitHubs reduced the complexity of sending pull requests to a minimumClone the canonical repository, make some commits and run agit
Command. The Future of Submissionintegration requests*, we hope that it looks more like Gerrit than other alternatives.
What needs to be built
Some aspects of the bestIntegrationsanfragedo not yet exist or require significant work before they can be considered viable.
For tools that use the native version control tool for submission (e.g. viagit push
), some work is needed to support sending over a more generic HTTP endpoint. I'm ok with leveragegit push
as a push mechanism because it makes the end-user experience so turnkey. However, making it the only delivery mechanism is a bit unfortunate. There is some support for this: I believe you can, for example, compose a pull request from scratch using the GitHub APIs. But it's not as simple assend a patch to an endpoint, which it undoubtedly should be. Even GerritsAPI HTTP robust, doesn't seem to allow creating new commits/diffs via this API. In any case, this restriction not only excludes non-Git tools from using those tools, but also restricts other tools without pushing Git. For example, you might want to write a bot that suggests automatic changes, and making a diff is much easier than using onegit
since the first one doesn't need a filesystem (this is important inserverlosenvironments for example).
A major problem with many implementations is over-reliance on Git for server storage. This is most pronounced with Gerrit, where not only hisgit push
stored in a git repository on the Gerrit server, but all comments and code review responses are also stored in git! gitit isa generic key-value store, and you can store whatever data you want in it if it suits you. And it's cool that all your Gerrit data can be replicated viagit-clone
- This practically eliminates theWe took a decentralized tool and centralized it via GitHubseries of arguments. But if you apply thisStore everything in GitLarge-scale approach means that you will be running a large-scale Git server. And not just any Git server—a write-heavy Git server! And when you have thousands of developers potentially all working on the same repository, you're potentially looking at millions of new Git references a year. While the folks at Git, Gerrit, and JGit have done a fantastic job of scaling these tools, I would feel a lot better if we avoided thatlet Git scale to infinite pushes and refsproblem and uses a more scalable approach, such as For example, an HTTP ingestion endpoint that writes data to key-value stores or relational databases. In other words, using a version control tool for maintenanceintegration requestson scale is a self-imposed weapon and can be avoided.
Conclusion
Congratulations on passing my Brain Dump! As big as the wall of text is, there are still plenty of topics I could have covered but didn't. This includes the more specific topic of code review and the various features associated with it. I've also largely ignored some common issues, like the value that aIntegrationsanfragecan serve throughout the development lifecycle:integration requestsare more than just code reviews—they serve as a conduit to track the evolution of a change over time.
I hope this post has given you an idea of ​​some of the structural issues involved in integrating pull requests andintegration requests🇧🇷 And if you are someone able to design or implement a better oneIntegrationsanfrageor tools around them (including in the version control tools themselves), I hope this gave you some good ideas or where to go next.
FAQs
How do you fix a pull request? ›
To edit a pull request, you push new changes into the same branch that was used for the pull request. Github will update everything else automatically.
How do I fix a rejected pull request? ›The pull request author or any reviewer can decline a pull request. Declining a pull request can't be undone. Once you decline a pull request, you'll have to open a new pull request request to review code for the same branch.
How do I get better at reviewing pull requests? ›- Respect People's Time. ...
- Always Provide Constructive Feedback. ...
- Keep Your Ego Out of Code Reviews. ...
- Be Precise About What Needs to be Improved. ...
- Don't Just Hope for the Code to Work. ...
- Reinforce Code Submission Best Practices. ...
- Be Strict About Temporary Code.
The pull request status may be rejected due to several factors, such as code complexity, code quality, the number of changed files, etc. Fixing the rejected pull requests will take some extra effort and time which may affect the project cost and timeline.
How do I override a pull request? ›- In the pull request, complete pull request by selecting Override branch policies and enable merge -> Override & Complete.
- Clear explicit permissions and save changes.
- For later pull request, if need bypass policy, you can set same as above steps.
You can resolve a conversation in a pull request if you opened the pull request or if you have write access to the repository where the pull request was opened. To indicate that a conversation on the Files changed tab is complete, click Resolve conversation.
Can you reopen pull request? ›You need the rights to reopen pull requests on the repository. The pull request hasn't been merged, just closed. Go to Pull requests add filter `is:closed` choose PR you want to reopen. Select from checkbox and mark as Open.
Do all reviewers need to approve a pull request? ›If your repository requires reviews, pull requests must have a specific number of approving reviews from people with write or admin permissions in the repository before they can be merged. For more information about required reviews, see "About protected branches."
Can I edit my pull request? ›Under your repository name, click Pull requests. In the "Pull Requests" list, click the pull request you'd like to modify. Next to the pull request's title, click Edit.
Who approves the pull request? ›If no edits are needed, the pull request is approved by the maintainer. Merge with Main Project. Once the repository maintainer has approved a pull request, the developer's new updates in the forked repository are merged with the main project repository.
What is a good pull request? ›
A good pull request will be reviewed quickly. It reduces bug introduction into codebase. It facilitates new developers onboarding. It does not block other developers. It speeds up the code review process and consequently, product development.
How do I edit a pull request message? ›- Add another commit on this branch and then push to this branch.
- Manually fix your changes, amend, and force push.
- Add another commit, and then squash commits.
- Interactively checkout the previous commit, remove lines that aren't wanted, stage, amend, and force push.
Elite teams typically take less than 6 hours for PR code reviews; strong teams take up to 13. A good rule of thumb is to keep your review times to less than 12 hours. If a PR is created in the morning, it should be reviewed by the end of the day.
How do you update a pull request after changes? ›To update by rebasing, click the drop down menu next to the Update Branch button, click Update with rebase, and then click Rebase branch. Previously, Update branch performed a traditional merge that always resulted in a merge commit in your pull request branch.
What do you do if you don't agree with reviewers comments? ›It is absolutely fine to disagree with a reviewer's comment. You can send a point-by-point response to the editor explaining why you disagree with the reviewer's suggestion. Make sure your response is backed by supporting evidence. If your viewpoint is justified, the editor will definitely consider it.
What do grant reviewers look for? ›In an impactful first impression, the reviewer expects to learn the objective of your proposal that is described in clear and concise writing along with sufficient data that manage to project the potential of your idea and its contribution to the relevant field, as well as determine your own capabilities of conducting ...
Do I need to fork before pull request? ›If you don't have access to create branches on that repository, there is no way to create a pull request without forking.
How do you handle conflict with pull requests? ›- We will make sure that code on both the branches is updated with the remote. If not, first take pull of both the branches or push your local changes if any.
- Switch to the branch you want to merge using git checkout command.
- Try to merge locally like this:
NOTE: You can not not approve your own pull requests, so if you are working on the repository by yourself you will need to bypass the review restrictions by just clicking Merge pull request without submitting a review.
How do I know if my pull request was accepted? ›If it's merged, then it's accepted. If it's closed and not merged it may be rejected.
How do I manage large pull requests? ›
One technique to review large pull requests is on a commit-by-commit basis. This allows the reviewer to take smaller digestible chunks of the pull request. To aid in this style of reviewing, you need to make your commits functionally logical and commit message descriptive.
Are small pull requests better? ›Small pull requests get faster code reviews
The longer it takes to review the pull request, the more likely it is that it won't be done in one go. With large pull requests, reviewers often lose their focus and get pulled into other stuff.
You need the rights to reopen pull requests on the repository. The pull request hasn't been merged, just closed. Go to Pull requests add filter `is:closed` choose PR you want to reopen. Select from checkbox and mark as Open.
Can a pull request be edited? ›Under your repository name, click Pull requests. In the "Pull Requests" list, click the pull request you'd like to modify. Next to the pull request's title, click Edit.
Can you update a pull request? ›You can update a pull request's head branch from the command line or the pull request page. The Update branch button is displayed when all of these are true: There are no merge conflicts between the pull request branch and the base branch.
Does closing a pull request delete it? ›There is no way you can delete a pull request yourself -- you and the repo owner (and all users with push access to it) can close it, but it will remain in the log. This is part of the philosophy of not denying/hiding what happened during development.