|
|
|
OMICS:
A Journal of Integrative Biology
6, 115-121 (2002)
A Theoretical Limit to Coding Space in Chromosomes of Bacteria Julius H. Jackson, Scott H. Harrison, & Patricia A. Herring Theoretical & Computational Biology Group, Department of Microbiology & Molecular Genetics, Michigan State University, East Lansing, Michigan 48824 A mathematical model of cluster patterns for mapped genes with known
phenotypes in Escherichia coli predicted that functional genes may
account for a maximum of two-thirds of the total chromosomal space. The
corollary prediction was that one-third of the chromosome comprised noncoding
space. Open reading frame (ORF) analyses for 15 phylogenetically diverse
bacterial genomes and for 30 fully sequenced prokaryotic genomes supported
the gene cluster model prediction of a two-thirds tendency for coding space.
Our results suggest that only 3–4% of unassigned ORFs in E. coli
represent genes with potential phenotype and that ORFs marking novel genes
in prokaryotes are far fewer than previously thought.
Microbial & Comparative Genomics 5,75-87 (2000) Theoretical Indicators of Enzyme Reaction Specificity from Conserved Information in Amino Acid Side-Chains Patricia A. Herring and Julius H. Jackson Theoretical & Computational Biology Group, Department
of Microbiology & Molecular Genetics, Michigan State University, East
Lansing, Michigan 48824
Amino acid sequences for 11 acetohydroxy acid synthase (EC 4.1.3.18;
AHS) polypeptides with experimentally established activity were chosen
for computational comparisons to detect conserved local information associated
with reaction specificity for each sequence. Windowed analysis by
Pearson product moment cross-correlation of six amino acid side-chain properties
revealed locally conserved segments common to all proteins with AHS activity.
Seven information-segments were detected in the same arrangement in sequences
for the large subunit polypeptides of prokaryotes, and in the sequences
for single polypeptides of eukaryotic AHS. The information-segments
were numbered 1-7 according to sequential position, and sequence features
such as cofactor binding sites were defined for specific segments.
Extension of the information-segment analysis to seven other proteins of
the pyruvate decarboxylase superfamily permitted use of the content and
organization of information-segments to recognize four classes of enzyme
reaction specificity. Estimates of information entropy, based upon
a state space defined by reaction specificity, directly reflected the known
reaction complexity for all but one enzyme examined. Our data suggest
that development of information-segment models for enzyme superfamilies
may improve the accuracy of inferring protein activity from sequence.
Biochemical and Biophysical Research Communications 268, 289-292 (2000) Vectors of Shannon
Information from Fourier Signals Characterizing Base Periodicity in Genes
and Genomes
Equal Symbol
Fourier Transforms (FTES), characterizing nucleotide periodicity,
comprise components of 5-D vectors that define base-repeat
properties of a genomic sequence. This report describes
a conversion of the FTES signals to a common platform
of Shannon information content to facilitate comparisons
of periodic data with other measures of information
for genes and genomes. The autocorrelation used to
compute the discrete FTES formed the basis to define
repeating bases in terms of conditional probabilities.
We derived a vector equation to express the Shannon
information content of a sequence in a way that preserves the distinct
specificity of base repeat patterns characterized by FTES vectors.
We suggest application of such information vectors
to study the structure of information in genes, chromosomes,
and genomes by (doi:10.1006/bbrc.2000.2112)
Journal of Biological Systems 6:49-70 (1998) Characterization Of Base Periodicities In Protein-Coding Genes J. H. Jackson1,6,7*, R. George1,2, H. O. Adeyemi1,3, M. A. Winrow1,6, P. A. Herring1,4,6, J. J. Caguiat1,6,7, C. F. Mulks6, R. Srikanth1,2, S. H. Harrison1,6,7 & R. E. Mickens1,5 1Theoretical & Computational Biology Group, Clark Atlanta University and Michigan State University; 2Department of Computer Science, 3Department of Mathematical Sciences, 4Department of Biological Sciences, and 5Department of Physics, Clark Atlanta University, Atlanta, GA 30314, USA; 6Department of Microbiology, and 7Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA. A Fourier Transform of Equal Symbols (FTES) was applied as a spectral density analysis method to identify DNA bases that repeat at any frequency in selected protein-coding genes. The analysis especially focused on identification of bases responsible for the dominant signal at frequency f = 1/3 found in all protein-coding genes. The study included homologous sequences from two gene families and multiple unrelated sequences from single organisms. No signal pattern or spectrum specifically characterized either gene family. However, the patterns of bases comprising the signal at f = 1/3 suggested the presence of a genome-specific label for protein-coding genes from the same genome. Data suggest that three factors form the informational basis for the signal structure at f = 1/3: 1) codon base positional bias; 2) codon preference; and 3) codon arrangement. Quantitative measure of the contribution of each base to the period-3 signal suggests a basis to distinguish protein-coding genes from different organisms. Application of the FTES analysis characterized genes from Escherichia coli as different from the genes from Pseudomonas aeruginosa. Preliminary analyses of genes from these and three other bacteria by artificial neural nets, using FTES parameters, support our suggestion that the period-3 informational structure contains labels for the genomic origins of protein-coding genes. FTES analysis alone or in combination with other informational measures may reveal pathways and processes of gene flow into and through natural systems of microbial cell populations.
Biochem Biophys Res Commun 1995 Feb 6; 207(1):48-54 Channeling behavior and activity models for Escherichia coli K-12 acetohydroxy acid synthases at physiological substrate levels.Patricia A. Herring, Bette L. McKnight, & Julius H. JacksonDepartment of Microbiology, Michigan State University, East Lansing 48824.The channeling behavior of acetohydroxy acid synthases I and III (EC 4.1.3.18; AHAS) was studied by computer simulation of activities over a wide range of concentrations for the substrates pyruvate and 2-ketobutyrate. The ratios of reaction rates for both channels and three-dimensional plots of single-channel reaction rates versus substrate concentrations were introduced to compare the substrate channeling properties of the isozymes. Substrate ranges were identified in which AHAS I and III operated both channels, and in which they used only one. Kinetic constants were varied to simulate whether and how AHAS might be made channel-specific. Our study suggests that AHAS might be made channel-specific for acetolactate but not for acetohydroxybutyrate. We postulate specific physiological roles for AHAS I and III to support cell growth under conditions that vary the levels and balance of substrates. PMID: 7857304, UI: 95160716
Biochimie 1993;75(9):759-765 A mechanism for valine-resistant growth of Escherichia coli K-12 supported by the valine-sensitive acetohydroxy acid synthase IV activity from ilvJ662.Julius H. Jackson, Patricia A. Herring, Eric B. Patterson, & Joel M. BlattDepartment of Microbiology and Public Health, Michigan State University, East Lansing.Acetohydroxy acid synthase (EC 4.1.3.18; AHAS) isozymes I and III are expressed in Escherichia coli strain K-12 and, when inhibited by L-valine, cannot support cell growth. AHAS IV, expressed from mutation at ilvJ662, exhibits valine-sensitivity similar to that of AHAS III, yet AHAS IV does support cell growth in valine minimal medium. Rate equations were derived for AHAS III and AHAS IV reaction in crude extracts and for partially purified AHAS IV. Values of kinetic constants in these equations were determined in order to model a probable reaction mechanism. Computer modeling of initial velocity reactions at physiological substrate concentrations simulated consequences of valine-inhibition and revealed that AHAS IV synthesized AHB at a maximal rate over four times faster than AHAS III under these conditions. The simulations predicted that cells depending upon AHAS III for growth in valine minimal medium would accumulate higher levels of 2-ketobutyrate than cells using AHAS IV. Experiments on growth inhibition by valine revealed more than a five-fold difference in 2-ketobutyrate accumulation, thus confirming these predictions. These data support the hypothesis that valine inhibition of growth is a consequence of 2-ketobutyrate accumulation to toxic levels. We propose that the valine-inhibited AHAS IV activity prevents growth inhibition by keeping 2-ketobutyrate accumulation to a lower level than resulting from AHAS III activity. PMID: 8274527, UI: 94100267
Journal of Molecular Evolution 36:347-360 (1993) Detection of Fundamental Principles and A Level of Order for Large-Scale Gene Clustering on the Escherichia coli Chromosome Rufus M. Williamson1,2, Jack Hetherington3 and Julius H. Jackson1,4 1Department of Microbiology
and Public Health, and 3Department
of Physics and Astronomy,
The Escherichia coli K-12 genetic map was divided into intervals of equal length to count the number of genes per interval. Plots of genes per interval at four sets of interval lengths revealed large-scale clustering of genes with the major clusters occurring at regularly spaced distances apart. Major gene cluster properties were analyzed at a scale of 100 intervals wherein each interval corresponded to a genetic map unit length of one minute. In any major gene cluster, the highest gene concentration was observed at or near the midpoint interval, and the number of genes per interval was found to decline exponentially as a function of the linear distance from the midpoint or interval of peak gene concentration of that cluster. An autocorrelation analysis of gene content in first neighbor intervals throughout the chromosome revealed an ordered, first neighbor relationship. by comparison to 2000 randomized interval versions of the chromosome. Attempts to simulate gene placement by a Gaussian model did not produce large-scale gene clustering in any way comparable to that observed on the chromosome. We propose that major gene clusters formed from smaller gene clusters, and the contemporary chromosome formed from fusion of homologous or heterologous major gene clusters. |
|
|