can protein size be predicted given genome info?

September 6, 2011 at 2:21 am #15342

Participant

Ok I am looking at old questions from my teacher and have come up with I believe to be misleading. I know obviously that eukaryotes generally have larger proteins due to many different reasons but based on the information given in this question I do not think you can accurately predict which proteins will be larger without outside knowledge. Heres the question.

Assume a prokaryote encodes 3,000 proteins with a total genome size of 3 x 10 exp 6 base pairs. In this prokaryote, about 90% of the genome is actually protein-encoding (equivalent to our exons).

Assume human to encode 30,000 proteins from a genome of a genome of 3 x 10 exp 9 base pairs. In human, about 1.5% of the genome is exon.

From this information, is there any reason to believe that human proteins might actually be larger on average than prokaryotic ones? Please explain your reasoning.

Just based on this info I am pretty sure that you cant predict protein sizes. I know that eukaryotes have lots of introns and prokaryotes generally don’t and that humans have alternative slicing with allows for different forms of the protein from the same mRNA. Is there any way to predict the protein size? Thanks for any help

September 6, 2011 at 5:16 am #106188

JackBean

Participant

well, 90% of 3×10^6 is approx. 3×10^6 (nt, in AA it’s 3-times less), if this is in 3000 proteins, one protein has in average 1000 AAs
1.5% of 3×10^9 is approx. 4.5×10^7, if this is in 30 000 proteins, one protein has in average 1500 AAs

if you do more accurate calculations, the difference will be a little more

September 6, 2011 at 4:53 pm #106199

vulpes

Participant

Ok I see what you did there. But when you divided the 4.5×10^7bp by the 30,000 proteins it would be 1500base pairs for 1 gene so that would mean 500amino acids for 1 gene right? 3base pairs or 3 nucleotides code for 1 amino acid. Thanks for the help too

September 7, 2011 at 6:01 am #106203

JackBean

Participant

yeah, I see I wrote it, but didn’t calculate 😳 and the same for bacteria, so it will be approx. 300 and 500 AAs in average 😉

September 7, 2011 at 7:21 am #106204

greatmicrobiologist

Participant

Never mind guys I am very weak in calculations and giving here a simple logic. 😀 Until we know the coding sequence its hard to assume the protein length. As we can easily calculate the number of amino acids easily but not the protein length. As in the coding sequence there will be many stop codons. So where is the stop codons present can’t be estimated and also along with the start condons respectively. Hence its not easy to estimate the size but the number of amino acids may present can be calculated.! 😉

Hope I’m correct. Am I? 💡

September 7, 2011 at 8:36 am #106207

JackBean

Participant

what(s the difference between protein length and number of amino acids?

September 7, 2011 at 7:07 pm #106225

jonmoulton

Participant

Once a full ribosome assembles at the start codon, proceeds though translation and encounters a stop codon, that’s it — that is the end of the coding sequence. The ribosome leaves the mRNA and the sequence downstream of the stop is the 3′-UTR. So, you normally don’t find multiple stop codons within a coding sequence (though there are always exceptions; e.g. a "slippery" sequence can trigger translational frameshift and bring alternative stop codons in-frame).

Jack’s question gets to the core of this — protein length and # of amino acids are the same thing. Keep in mind too that the mature form of a protein might be digested by a protease to a smaller form than was originally translated. In that case, amino acids are clipped away (and with that, the polypeptide is shortened). Post-translational modifications like glycosylation can also affect protein mass, but I would not say that is a change in protein length — length refers to the number of amino acid residues in the polypeptide.

September 8, 2011 at 11:55 am #106237

greatmicrobiologist

Participant

quote jonmoulton:

Once a full ribosome assembles at the start codon, proceeds though translation and encounters a stop codon, that’s it — that is the end of the coding sequence. The ribosome leaves the mRNA and the sequence downstream of the stop is the 3′-UTR. So, you normally don’t find multiple stop codons within a coding sequence (though there are always exceptions; e.g. a “slippery” sequence can trigger translational frameshift and bring alternative stop codons in-frame).

oooh got my conceptions correct. 🙂 thanks.! 🙂

October 3, 2011 at 12:06 am #106578

merv

Participant

quote greatmicrobiologist:

Never mind guys I am very weak in calculations and giving here a simple logic. 😀 Until we know the coding sequence its hard to assume the protein length. As we can easily calculate the number of amino acids

i think you mean nucleotides not amino acids

quote greatmicrobiologist:

easily but not the protein length. As in the coding sequence there will be many stop codons.

no there wont, as explained following your original post but overlooked by you in your self congratulations. Are you a time waster? If not, then I kindly suggest you work on both your English writing and comprehension and I wish you good luck in these endeavours. False self-gratification is not going to help you, is why i point it out is all.

quote greatmicrobiologist:

So where is the stop codons present can’t be estimated

I don’t understand your english there, son

quote greatmicrobiologist:

and also along with the start condons respectively

i am not sure if this is stated elsewhere but some proteins use multiple start codons (well one per polypeptide synthesised).

quote greatmicrobiologist:

. Hence its not easy to estimate the size but the number of amino acids may present can be calculated.! 😉

errrr, no, although one can use software to predict each genes amino acid primary sequence, this can be done from a cDNA clone quite accurately (not completely so), it is one of the most difficult problems in biology to predict what the exons used are without the cDNA sequence. In fact, this is one of the most powerful arguments to say there are an infinite number of genes, not 30,000, because a cell might always opt for a piece of DNA it has never used before (or not for a million years) that has lain idle and who is to say it is not entitled to do so- as such a relatively new protein is made, and thus it could be many distinguished professors reckonings be the old gene (after all a new exon whether 0.01% of the final protein (or even 0% if it is just used regulatorily) or 99.9% of it – ) using the new exon in combination with the old promoter some fraction of its ‘usual’ exons – yet if it is a different function why call it the same gene….perhaps gene in the sense of the 30,000 odd gene estimates should best be defined by the promoter , although defining these is just as debatable partly due to the plasticity of the requirements of the mRNA polymerases (promoter co-factors etc), and the fact that there are many pseudo-genes which object to the label pseudo!!

quote greatmicrobiologist:

Hope I’m correct. Am I? 💡

no , you werent. you must be human.

Related posts: