Character Profiles Unlocked: The 20 Common Amino Acids

Amino acids make a great companion for a biochemist in training. They are essentially the building blocks of proteins, which are the molecular machineries of life. Here, we present the most ubiquitous of them all - the 20 amino acids directly encoded by the human genome.

But first - what constitutes an amino acid?

From a chemical perspective, an amino acid is defined by its amino and carboxylic acid functional groups. They are both connected by a central carbon atom known as the alpha carbon (Cα). The Cα is also boned to a variable side chain (or R-group), and the adjacent carbons in the side chain are labelled Cβ, Cγ, and so on. It is therefore the side chain that makes each of the 20 amino acids unique.

Amino acids are joined together by peptide bonds to form proteins. But how does the cell know to join together the correct sequence of amino acids?

This is where the genetic code comes in. Each amino acid is encoded by three DNA bases, also known as a codon. Recall that there are four nucleotides which make up the DNA, adenine (A), thymine (T), cytosine (C), and guanine (G). This allows for 4(^3) = 64 unique ordered combinations, which is more than enough for the 20 amino acids! As a result, whilst each three-nucleotide codon specifies a single amino acid, each amino acid can be encoded by multiple codons.

Figure 1: A hypothetical DNA sequence and its translated amino acid sequence. Each codon corresponds to one amino acid (represented by its abbreviation). Note that leucine (leu) can be encoded by both the CTC codon and the CTG codon.

As mentioned previously, the central Cα is attached to an amino group, a carboxylic acid group, an R-group and hydrogen. These four bonds to Cα make up a tetrahedral geometry. When the four groups attached are unique (i.e. when the R-group is not a lone hydrogen), they can be arranged in two unique stereochemical configurations. The two configurations are chiral, which means that they are non-superimposable mirror images of each other. Intriguingly, all 20 common amino acids (apart from glycine as we will discuss later) are naturally synthesised in an L-configuration with the central Cα being the chiral centre (indicated by an asterisk *).

Take a look at Figure 2 to help you picture this better.

Figure 2: The stereochemistry of amino acids. Both amino acids can also be represented as a Fischer projection (right), where the vertical bonds (C-R and C-H) project into the page and the horizontal bonds (C-NH3+ and C-COO-) project out of the page. (a) In L-alanine, the NH3+ group is on the left-hand side in the Fischer projection, a convention defined relative to L-glyceraldehyde. When looking down the Cα-H bond of an L-amino acid (H projecting into the page), the COO-, R-group and NH3+ are arranged in an anti-clockwise manner. (b) In a D-amino acid, the NH3+ group is on the right-hand side in the Fischer projection. The COO-, R-group and NH3+ are arranged in a clockwise manner.

Finally, free amino acids exist as zwitterions, which are ions carrying two opposite charges. At physiological pH (pH 7.2), the amino group protonating to become positively charged whilst the carboxyl group deprotonating to become negatively charged.

Overall, whilst all 20 amino acids share many chemical properties, they are made unique by their individual side chains. In a biological system, it is largely the interactions between side chains, which allow proteins to fold into distinct structures and carry out a wide range of biological functions.

In this article, they are grouped into aliphatic, aromatic, polar, and charged amino acids. Additionally, do look out for the additional traits which can be used for alternative classifications of amino acids.

Aliphatic amino acids

You may remember from organic chemistry that compounds are classed as aliphatic or aromatic (discussed in the next section). Aliphatic molecules consist of an open hydrocarbon chain, as seen in alkanes and alkenes. Similarly, aliphatic amino acids have side chains consisting of single-bonded carbon and hydrogen.

Alanine

Three-letter abbreviation: Ala

One letter code: A

Dietary status: Non-essential (i.e. can be synthesised by the human body)

Other traits: Non-polar/hydrophobic

Some may think alanine lacks character all because of its simple methyl side chain. Indeed, methyl groups do not readily participate in chemical reactions. Besides that, it is also one of the smaller side chains as it consists only of a single carbon bonded to three hydrogens. However, it is also these exact properties that make it the ideal candidate for mutagenesis studies.

Mutagenesis is a technique used to characterise the function of various amino acid residues in a protein. In alanine scanning mutagenesis, other amino acids are substituted by alanine. As the methyl side chain is unreactive and does not form hydrogen bonds, this changes the local chemical and physical properties. Additionally, since alanine has a less bulky side chain than many amino acids which we will come across later, alanine substitution minimises steric clashes which may dramatically change the overall structure of the protein. In essence, if a certain mutation alters the protein’s activity or its local structure, the amino acid being substituted is likely to contribute to the function or stability of the protein.

Glycine

Three-letter abbreviation: Gly

One letter code: G

Dietary status: Non-essential

Other traits: No side chain attached

With two hydrogens attached to Cα, glycine is the only non-chiral amino acid. In particular, the lack of R-groups makes glycine residues less bulky and more flexible than other amino acids. As you may recall, steric clashes are the unfavourable overlapping of electron clouds that hinders bond rotation and prevents non-bonded atoms from being in close proximity. Hydrogen, with an atomic number of one, has the smallest electron density, minimising steric clashes between glycine and other residues.

As glycines are less conformationally constrained than other amino acids, they are often found in flexible loop regions and sharp beta-turns within the protein. All in all, the lack of side chains not only contributes to the flexibility of glycine but also exposes its backbone amide (NH). The P-loop or Walker A motif of nucleotide-binding proteins, in particular, consists of multiple conserved glycine residues. The backbone amide of these residues forms hydrogen bonds with phosphate oxygens in the nucleotide, stabilising nucleotide binding to the protein.

Proline

Three-letter abbreviation: Pro

One letter code: P

Dietary status: Non-essential

Other traits: Cyclic side chain

Proline is unique in that its side chain also bonds to the amide nitrogen, making it more rigid than the other amino acids. It is often found in regions that are also rich in glycine, despite their contrasting properties.

Although proline has a much more constrained conformation, its bent structure is particularly suited to creating sharp turns. Thus, like glycine, proline is also often found in beta-turns. Repeated Gly-Pro-X motifs, where X is any amino acid, is also prevalent in collagen. Collagen a fibrous protein commonly found in connective tissues. The rigidity of proline compensates for the flexibility of glycine, enabling the formation of linear fibrils.

Additionally, proline is abundant in intrinsically disordered proteins (IDPs). As the name suggests, IDPs lack regular structure and are highly dynamic. Perhaps counterintuitively, the rigidity of proline often disrupts regular structures. This is because its very limited number of conformations is often not compatible with that required in regular secondary structures, such as alpha helices and beta sheets. However, this does not prevent a series of prolines from forming unconventional structures such as the polyproline helix in disordered regions.

Leucine

Three-letter abbreviation: Leu

One letter code: L

Dietary status: Essential (i.e. cannot be synthesised by human cells, obtained through the diet)

Other traits: Hydrophobic

Leucine is the most abundant amino acid in proteins. Having a large non-polar side chain, leucine is often found in the hydrophobic core of proteins so that it is shielded from the aqueous cellular environment.

Leucine is also found in a DNA binding motif, appropriately named the leucine zipper. The leucine zipper binds DNA between two helical domains, which forms a Y-shaped structure. Hydrophobic interactions between leucine residues act as a zipper to drive the coiling of the two helices, forming the long stalk region of the “Y”. This brings the shorter arms of the Y-shaped protein in close proximity to the DNA molecule.

Isoleucine

Three-letter abbreviation: Ile

One letter code: I

Dietary status: Essential

Other traits: Hydrophobic

Isoleucine, as the name suggests, is an isomer of leucine, with a methyl group attached to the first rather than second side chain carbon. Similar to leucine, isoleucine is also commonly found in the hydrophobic core.

Valine

Three-letter abbreviation: Val

One letter code: V

Dietary status: Essential

Traits: Non-polar/hydrophobic

The side chain of valine may be shorter, but valine is still relatively hydrophobic and commonly found with leucine and isoleucine in the hydrophobic protein core.

Valine is also found in the substrate recognition sites of serine proteases, which catalyse the cleavage of peptide bonds in proteins. Serine proteases do not cleave the protein at random. Instead, they contain a substrate recognition site (the S1 pocket), which binds to specific amino acid side chains. Binding of the S1 pocket to its substrate positions the protease to only cleave the peptide bond after these residues. Elastase, for example, cleaves after residues with non-bulky side chains such as alanine and glycine. This is because its S1 pocket consists of two valine residues, making it too narrow to accommodate larger side chains.

Methionine

Three-letter abbreviation: Met

One letter code: M

Dietary status: Essential

Other traits: Sulfur-containing

Methionine is a non-polar amino acid with a linear side chain. Most notably, methionine is the first amino acid added during eukaryotic protein synthesis. In bacteria, a modified N-formylmethionine is added. Recall that each amino acid is encoded by a three-nucleotide codon. For methionine, this is the AUG start codon (ATG in the DNA sequence), which signals the translation of mRNA into protein.

This is not to say that methionine cannot be present in other positions of the amino acid sequence. Methionine is a relatively hydrophobic molecule, therefore is generally buried in the hydrophobic core. However, increasing experimental evidence is suggesting that the thioether (-S-) group can be oxidised to serve as an antioxidant.

In general, oxidation of most amino acid side chains is often disruptive to the protein. However, some surface methionine residues can be oxidised without compromising the protein’s structure and function.

Aromatic amino acids

In contrast to aliphatic amino acids, aromatic amino acids contain cyclic and conjugated side chains. In an aromatic ring structure, the carbon atoms are sp2 hybridised. Thus, the p-orbital electrons, perpendicular to the ring, are shared evenly (or delocalised) between all participating carbons, forming a π bond above and below the plane of the ring. You may be more familiar with the aromatic compound benzene; similarly, aromatic side chains are planer and relatively hydrophobic.

Figure 3: (a) Resonance structures of benzene. (b) An alternative representation of delocalised electrons in benzene. (c) Delocalisation of p-orbital electrons in benzene. Whilst electrons in sp2 hybrids form σ bonds between adjacent carbons; the p-orbitals form a delocalised π-bond above and below the planar ring and are shared between all six carbons.

Phenylalanine

Three-letter abbreviation: Phe

One letter code: F

Dietary status: Essential

Other traits: Hydrophobic

As the name may suggest, phenylalanine consists of an alanine residue where one of the methyl protons is replaced by an aromatic phenyl group. The delocalisation of electrons in the phenyl groups makes the centre of the ring relatively negative, and the edges relatively positive. Thus they are able to favourably interact with other aromatic rings via π-stacking.

Phenylalanine is also associated with the metabolic disorder phenylketonuria (PKU), where unmetabolised phenylalanine builds up to dangerously high levels that are toxic to the nervous system. Consequently, brain dysfunction may arise from high phenylalanine levels interfering with the normal function of cerebral enzymes. Additionally, since phenylalanine is the precursor of the amino acid tyrosine and various neurotransmitters, impaired phenylalanine metabolism may also decrease other factors essential to normal brain function. Moving forward, there is currently no long-term treatment for PKU, however, limiting dietary intake of phenylalanine could reduce its effect.

Tyrosine

Three-letter abbreviation: Tyr

One letter code: Y

Dietary status: Non-essential

Other traits: Relatively reactive

Tyrosine is synthesised from phenylalanine. However, the addition of a hydroxyl group dramatically changes the properties of tyrosine compared to its precursor. In particular, this polar and ionisable group enables it to form hydrogen-bonds and directly participate in enzyme catalysis.

In addition to catalysing chemical reactions, the tyrosine hydroxyl group itself can be covalently modified by phosphate. Phosphorylation of a key tyrosine residue in tyrosine kinase activates the enzyme, which can, in turn, catalyse the phosphorylation of other proteins, including other kinases. In addition to modifying enzyme activity, phosphorylation can also generate new binding sites for other proteins, thus is a key component of cell signalling.

Figure 4: The phosphorylation of tyrosine. Enzymes known as kinases transfer a phosphate from ATP to the hydroxyl group in tyrosine. Phosphorylation typically acts as an activation signal. The phosphate group can also be removed by another group of enzymes known as phosphatase.

Tryptophan

Three-letter abbreviation: Trp

One letter code: W

Dietary status: Essential

Other traits: Hydrophobic

Tryptophan is the largest amino acid with its heterocyclic ring. Although the pyrrole nitrogen can form hydrogen bonds, tryptophan remains a hydrophobic amino acid as the majority of the molecule is nonpolar. Meanwhile, the planar ring structure enables it to interact with ring structures in other biological macromolecules.

Cell surface proteins such as those of the C-type lectin family bind sugars on other cells as a mechanism of cell adhesion. Many sugars, including galactose, consist of a six-member ring. Unsurprisingly, a conserved tryptophan residue is found in the binding site of galactose-binding C-type lectins. The tryptophan ring is positioned parallel to the galactose ring, thus stabilising it via hydrophobic stacking. In addition to sugars, tryptophan is also able to interact with nucleotides via π-stacking.

Polar amino acids

In polar compounds, electrons are unevenly shared between atoms in a covalent bond, thereby resulting in a separation of positive and negative charges, which is otherwise known as a dipole.

Water, the most common solvent in chemical and biological systems, is a great example of a polar compound. The more electronegative oxygen attracts electrons in the O-H bond, thus is partially negatively charged. Hydrogen, in contrast, is partially positively charged.

On that note, a similar effect occurs in polar amino acids, whereby the side chains of which contain electronegative atoms such as oxygen and nitrogen. As expected, these amino acid side chains are relatively hydrophilic as they readily interact with water and other polar molecules.

Cysteine

Three-letter abbreviation: Cys

One letter code: C

Dietary status: Non-essential

Other traits: Sulfur-containing, reactive

Cysteine, along with methionine, is the only two sulfur-containing amino acids. However, the thiol (-SH) group in cysteine is more reactive. In many proteins, two cysteine thiols react to form a disulfide bond. Disulfide bonds are covalent bonds that play an important role in stabilising protein structures.

Importantly, intramolecular and intermolecular disulfide bonds are both prevalent in immunoglobulins. Immunoglobulins consist of two smaller subunits, known as light chains, and two larger subunits, known as heavy chains. Intramolecular disulfide bonds within each subunit stabilise the three-dimensional fold of the protein; whilst intermolecular disulfide bonds covalently link each light chain to a heavy chain, and the two heavy chains to each other. Thus, the four subunits of immunoglobulin can cooperate in binding foreign antigens and eliciting an immune response.

Figure 5: The formation of a disulfide bond between two cysteine residues. The two residues initially located in different regions of the protein are linked by a covalent bond. Meanwhile, the oxidised form of two cysteine residues is also known as cystine.

Serine

Three-letter abbreviation: Ser

One letter code: S

Dietary status: Non-essential

Other traits: Reactive

Serine contains a hydroxyl group, which can act as a reactive nucleophile in the presence of appropriate nearby residues or cofactors. Thus, it is often found in the active sites of enzymes such as serine proteases. In fact, serine is the main catalytic residue that cleaves peptides bonds, since the lone pair of electrons in the hydroxyl group allows it to act as a nucleophile.

The reactive hydroxyl group could also be modified by other chemical moieties, during post-translational modification (PTM). Unlike the amino acid itself, PTMs are not encoded in the DNA but are added by specific enzymes after the amino acid has been incorporated into a polypeptide.

Similar to tyrosine, serine can also be phosphorylated. Additionally, serine can be modified by sugars in a process known as O-linked glycosylation. O-glycosylation is especially prominent in mucins, a protein found in mucus. The hydrophilic sugars attached to mucin absorbs water, enabling it to act as a gel-like barrier to invading pathogens.

Figure 6: O-linked glycosylation of serine. Only the sugar (GalNAc) is shown, more sugars can be added by attaching to the hydroxyl groups highlighted.

Threonine

Three-letter abbreviation: Thr

One letter code: T

Dietary status: Essential

Other traits: Relatively reactive

Threonine contains an additional methyl group compared to serine. Like serine, its hydroxyl group is a site for phosphorylation and O-glycosylation. In fact, protein regions that are heavily O-glycosylated are often abundant in both threonine and serine residues.

Asparagine

Three-letter abbreviation: Asn

One letter code: N

Dietary status: non-essential

Other traits: amidic, relatively reactive

Asparagine is a derivative of aspartic acid. With one of the carboxyl oxygens being replaced by an amide, asparagine is no longer able to deprotonate and act as an acid. However, asparagine is a polar residue and can participate in hydrogen bonding and coordinating metal ions in enzyme active sites.

Like serine and threonine, asparagine can also be modified by sugar residues. However, as the sugars are attached to the nitrogen rather than oxygen asparagine; this type of glycosylation is termed N-linked glycosylation.

N-glycosylation is typically present on proteins on the cell surface, thus they are able to serve as attachment sites for sugar-binding proteins (or lectins) expressed on other cells. This has many applications in the immune system, such as recruiting leukocytes to sites of infection.

Figure 7: N-linked glycosylation of asparagine. Only the first GlcNAc sugar attached is shown.

Glutamine

Three-letter abbreviation: Gln

One letter code: Q

Dietary status: Non-essential

Other traits: Amidic

Glutamine is an amide derivative of glutamic acid. Similar to asparagine, the polar glutamine residue readily participates in hydrogen bonding. This ability to form polar interactions enables glutamine to be expressed on solvent-exposed protein surfaces. Moreover, surface glutamine residues can form hydrogen bonds with residues either on other proteins or distal domains in the same protein.

These protein-protein interactions often alter the activity or substrate affinity of the protein. Hsp70, a protein-folding chaperone, binds its substrate when there is minimal non-covalent interaction between its two subdomains. In order to release the substrate, surface glutamines mediate interactions between the two subdomains lowering their substrate affinity.

Charged amino acids

Some amino acids contain side chains, which can act as acids or bases in an aqueous environment. Acidic side chains deprotonate (lose a H+), becoming negatively charged, whilst basic side chains protonate (gain a H+), thus becoming positively charged.

Essentially, this phenomenon is determined by the pKa of the side chain as well as the pH of its environment (below is a quick recap of the definition of pH and pKa). If a side chain has pKa lower than the pH of its surroundings, which we will take as physiological pH 7.2, it will act as an acid; otherwise, it will act as a base.

Arginine

Three-letter abbreviation: Arg

One letter code: R

Dietary status: Non-essential

Other traits: Basic (pKa = 12.5)

Arginine is a highly basic amino acid. It can be easily protonated since the delocalisation of the positive charge in the guanidine group stabilises the additional electric charge. Because of its stable positive charge, arginine can often be found in voltage sensing systems, such as voltage-gated ion channels in the cell membrane. Concentrations of different cation and anion species are different inside and outside of the cell, generating a small voltage across the cell membrane. Voltage-gated channels open in response to the change in this voltage, as an arginine-rich transmembrane domain becomes attracted to the more negatively charged side of the membrane. This mechanism is particularly widely used by neurons, where the concerted opening and closing of voltage-gated ion channels modulates signal transduction.

Additionally, the positively charged arginine can form salt bridges with other negatively charged aspartate and glutamate. Salt bridges are non-covalent interactions, which involve both hydrogen bonding and electrostatic attraction. The formation of salt bridges can stabilise protein structure as well as protein-protein interactions.

Lysine

Three-letter abbreviation: Lys

One letter code: K

Type: Basic (pKa = 10.8)

Dietary status: Essential

Other traits: Positively charged

Lysine has a lower pKa than arginine but is also positively charged. Similar to arginine, lysine also forms salt bridges with negatively charged amino acid residues. Meanwhile, lysine is also found in nucleotide-binding sites. Proteins bind and hydrolyse nucleotides such as adenosine triphosphate (ATP) as an energy source for enzymatic reactions, or simply to change the conformation and activity of the protein. During hydrolysis, one of the three phosphates in ATP reacts with water, thus ATP is cleaved into a free phosphate and adenosine diphosphate (ADP). Hydrolase enzymes contain a range of ATP-binding residues, including a positively-charged lysine residue, which interacts with the negatively-charged phosphate, orientating it in the enzyme active site.

Histidine

Three-letter abbreviation: His

One letter code: H

Dietary status: Essential

Other traits: Basic (pKa = 6.0)

With the near-neutral pKa, histidine can be easily protonated or deprotonated depending on its local chemical environment. This allows histidine to act as either an acid or a base during different stages of enzyme catalysis. Thus, it is commonly found in the active site of enzymes such as serine proteases. During the early stages of catalysis, histidine acts as a base to deprotonate an adjacent serine residue, which in turn catalyses the cleavage of peptide bonds. However, during the latter stages, the activity of histidine is reversed, thus it acts as an acid to restore the enzyme to its ground state.

Figure 8: The catalytic triad of serine proteases. Aspartate modifies the pKa of histidine, allowing it to accept a proton from serine. The deprotonated serine acts as the main catalytic residue for peptide bond cleavage.

Aspartic acid (aspartate)

Three-letter abbreviation: Asp

One letter code: D

Dietary status: Non-essential

Other traits: Acidic (pKa = 3.65)

Aspartic acid contains a carboxylic acid group, which is often deprotonated at physiological pH, thus it is also commonly referred to as aspartate. Deprotonated aspartate carries a negative charge, which allows it to form salt bridges with the aforementioned positively charged residues.

Not only does aspartate serve a structural function, but it is also prevalent in the active site of enzymes. In serine proteases, aspartate is able to increase the pKa of the key histidine, by stabilising its protonated (positively charged) state through a hydrogen bond. In addition to interacting with other amino acids, the deprotonated carboxyl group can also coordinate water molecules, which are required for reactions such as ATP hydrolysis.

Glutamic acid (glutamate)

Three-letter abbreviation: Glu

One letter code: E

Dietary status: Non-essential

Other traits: Acidic (pKa = 4.25)

Similar to aspartate, glutamate can also be negatively charged at physiological pH. However, glutamate has a slightly higher pKa, allowing it to be more easily protonated. Therefore, rather than acting as a traditional acid, glutamate in the transmembrane domain of ATP synthase relays protons across the inner mitochondrial membrane. ATP synthase uses the translocation of protons across the membrane as a source of energy for ATP synthesis. However, biological membranes are not permeable to charged particles such as protons. Thus, a series of glutamate becomes protonated on one side to the membrane then deprotonates on the other side, thereby providing an indirect mechanism of translocating protons.

Why do we need to know about the 20 amino acids?

Amino acids are fundamental to living organisms as they are the building blocks of proteins. The unique properties of each of the 20 amino acids in the protein sequence enable it to fold into diverse three-dimensional structures, thereby carrying out essential biological functions.

Here, we discussed some examples of where an amino acid is found in proteins, though it is by no means comprehensive. However, although there are rarely any firm rules in biology, amino acids generally act “in character” for their key chemical and physical properties. Thus, we may use these key characteristics as a starting point to decipher their role in a variety of proteins.

Author

Amy Cheng

BSc Biochemistry

Imperial College London

#aminoacids #protein #chemicalstructures