Analysis

Linkers in SynLinker

Three sets of linkers were included in SynLinker. The first set contained 2150 natural linkers (in total 18081 residues) extracted from a non-redundant multi-domain PDB chain set of 647 proteins (in total 239176 residues). The second set contained 53 literature reported artificial linkers which have been empirically used to construct various recombinant fusion proteins. The third set comprised of 57 linker sequences collected from patent information on constructing fusion proteins.


Natural linkers' properties

Some of the natural linkers' properties, for example, linker length, hydrophobicity, amino acid composition and secondary structure were examined as follows.


1   Linker length distribution
The average length of our natural linkers was 8.4 ± 5.9 residues (Figure 1). These linkers were divided into three subsets based on the linker length. Small (less than 6 residues), medium (between 6 and 14 residues) and large (more than 14 residues) linker sets respectively possessed an average length of 4.1 ± 0.8, 8.5 ± 2.4, and 21.1 ± 8.2.
 
2 Amino acid propensity
The amino acid composition was calculated for our natural linker set, as shown in Figure 2. The natural linker set was observed to have a significantly higher propensity for Pro, Gly, and to a lesser extent Asp, Asn, Thr and Ser (in decreasing order). Proline was the most favourable amino acid in our natural linkers. Proline has a peculiar amino acid configuration: its side chain is cyclized back on to the backbone amide position. Such unique configuration grants proline a very restricted backbone conformation. A linker sequence enriched in prolines would tend to have a rigid and extended conformation. We also found that glycine was very common in our natural linkers. Glycine as a small non-polar amino acid could provide the linkers flexibility and movement.
 
3 Hydrophobicity
Eisenberg's normalized consensus residue hydrophobicity scale was used to calculate the average residue hydrophobicity for the natural linker set and its subsets. The derived value ranged from 0 to 1 representing hydrophilicity to hydrophobicity. The entire natural linker set has an average hydrophobicity of 0.63 ± 0.10. Small, medium and large linkers are found to have an average hydrophobicity of 0.64 ± 0.13, 0.63 ± 0.08 and 0.64 ± 0.05, respectively. The similar average hydrophobicity values of these linker subsets suggest that hydrophobicity of the linker sequence is independent of its length.
 
4 Secondary structure
Most natural linkers in our study correspond to coil or bend elements (57.1%) as assigned by DSSP, and a few linkers correspond to helix elements (11.1%), β-strand (14.4%) and turn (17.3%).

Artificial linkers' characteristics

Chen et al 2012 grouped the empirical artificial linkers into: flexible linkers, rigid linkers and cleavable linkers, according to the linker's conformation and availability for in vivo cleavage. The flexible linkers, for example, G/S-rich linkers, are rich in small or hydrophobic amino acids. The rigid linkers usually have helical or extended conformations that are rich in prolines. The cleavable linkers contain sites that can be cleaved by reductase or protease. Table 1 displays a list of artificial linkers collected from literature.


Linkers from patent information

Table 2 shows some linkers collected from patent search. Many short linker sequences from natural proteins, that are absent in our natural linker set, were used to construct recombinant fusion proteins as observed in patent information. For example, the flexible linkage between the variable domain and the CH1/CL constant domain in antibody Fab, the hinge regions between CH2 and CH3 of human or mouse IgGs and cellulase linkers have been treated as linkers to join the domain components in recombinant antibodies and enzymes. Several synthetic linkers, such as Serine-rich linkers, were documented for designing fusion proteins. These linkers were demonstrated useful in each individual fusion protein design; however, they were not approved to be applicable in a general view.

Figures
Tables

Table 1: A list of artificial linkers from empirical fusion proteins

Artificial linkers Characteristics Examples
Flexible G/S-rich (GGGGS)n, where n=1-4 with n=3 as most common
(GGS)n
(G)n, where n=6 or 8
KESGSVSSEQLAQFRSLD
EGKSSGSGSESKST
GSAGSAAGSGEF
Rigid Helical (EAAAK)n, where n=1-3
A(EAAAK)nA, where n=2-5
A(EAAAK)4ALEA(EAAAK)4A
P-rich/Extended (XP)n, where X is preferably A,K and E
(AP)n, where n=5-17
PAPAP
Cleavable Disulfide LEAGCKNFFPRSFTSCGSLE
CRRRRRREAEAC
Protease sensitive VSQTSKLTRAETVFPDV
PLGLWA
RVLAEA
EDVVCCSMSY
GGIEGRGS
TRHRQPRGWE
AGNRVRRSVG
RRRRRRRRR
GFLG
Dipeptide LE

Table 2. A list of linker sequences from patent search

Natural sequences Examples Publication No. Publication Date
A helical segment from human hemoglobin AQGTLSPADKTNVKAAWGKVMT US4946778A Aug 7, 1990
Human IgG1 Upper Hinge EPKSCDKTHT US6165476A Dec 26, 2000
Human IgG1 Middle Hinge CPPCP
Human IgG1 Lower Hinge APELLGGP
Human IgG2 Upper Hinge ERK
Human IgG2 Middle Hinge CCVECPPCP
Human IgG2 Lower Hinge APPVAGP
Human IgG3 Upper Hinge ELKTPLGDTTHT
Human IgG3 Middle Hinge CPRCP
(EPKSCDTPPCPRCP)3
Human IgG3M15 Middle Hinge CDTPPPCPRCP
Human IgG4 Upper Hinge ESKYGPP
Human IgG4 Middle Hinge CPSCP
Human IgG4 Lower Hinge APEFLGGP
Mouse IgG1 Upper Hinge VPRDCG
Mouse IgG1 Middle Hinge CKPCICT
Mouse IgG1 Lower Hinge VPSEVS
Mouse IgG2A Upper Hinge EPRGPTIKP
Mouse IgG2A Middle Hinge CPPCKCP
Mouse IgG2A Lower Hinge APNLLGGP
Cellulase linkers PGNPTTTVVPPASTSTSRPTSSTSSPVSTPTGQPGG US20100221778A1 Sep 2, 2010
PDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGT
GTTPNPPASSSTTGSSTPTNPPAG
PGAGNGGNNGGNPPPPTTTTSSAPATTTTASAG
GGNPPGGNRGTTTTRRPATTTGSSPG
TGTGTGTGTGTGTGTGTTTSSAPAA
GSSGTPPSNPSSSASPTSSTAKPSSTSTASNPSG
GTSTGGSTTTTASGTTSTKASTTSTSSTSTGTG
TVSSSSVSSSHSSTSTSSSHSSSSTPPTQPTGV
PSSGGTSSSSSAAPQSTSTKASTTTSAVRTTSTATTKTTSSAPAQGTN
GGNPPGGNPPGTTTPRPATSTGSSPGP
ASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVG
PPPPPASSTTFSTTRRSSTTSSSPS
5-13 amino acids from the N termini of human Ck and CH1 domains QPKAAP US 20120034160A1 Feb 9, 2012
TVAAP
ASTKGP
QPKAAPSVTLFPP
TVAAPSVFIFPP
ASTKGPSVFPLAP
Artificially designed sequences Examples Publication No. Publication Date
Double linker PGS US4946778A Aug 7, 1990
IAKAFKN
Single linker KESGSVSSEQLAQFRSLD
Single linker VRGSPAINVAVHVF
Serine-rich linker SSSSG US5525491A Jun 11, 1996
SSSSGSSSSG
SSSSGSSSSGSSSSG
SSSSGSSSSGSSSSGSSSSG
SSSSGSSSSGSSSSGSSSSGSSSSG
SGSSSSGSSSSGS
SVTVSSSGSSSSGSSSSGS
GSTSGSGKPGSGEGSTKG US5856456A Jan 5, 1999
GSTSGSGRPGSGEGSTKG
ATK WO2012083424A1 Jun 28, 2012
ASK
ATKASK
ATKGATK
Modified hinge region of the human CD8 alpha-chain ALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAPTIASQPLSLRPEASRPAAGGAVHTRGLD US20130280285A1 Oct 24, 2013
References
Chen, X., J.L. Zaro, and W.C. Shen, Fusion protein linkers: Property, design and functionality. Adv Drug Deliv Rev, 2012.
de Bold M.K., W.P. Sheffield, A. Martinuk , V. Bhakta, L. Eltringham-Smith, and A. J. de Bold. Characterization of a long-acting recombinant human serum albumin-atrial natriuretic factor (ANF) expressed in Pichia pastoris. Regulatory peptides 2012 175(1-3):7-10
Feldman, H.J. and C.W. Hogue, Probabilistic sampling of protein conformations: new hope for brute force? Proteins, 2002. 46(1): p. 8-23.
Laskowski, R.A., MacArthur M.W., Moss D.S., and Thornton J.M. PROCHECK - a program to check the stereochemical quality of protein structures. Journal of Applied Crystallography, 1993 26, 283-291.
© 2013-2019 SynCTI, National University of Singapore (NUS). All Rights Reserved.
SynLinker is free for academic and non-commercial use. For a commercial license please contact us.
The project is partly supported by the Next-Generation BioGreen 21 Program (SSAC, No. PJ01109405), RDA, Republic of Korea.
This page requires Chrome 32, Firefox 26, or Internet Explorer 9 (without compatibility mode).