Search Help
Global Change Master Directory (GCMD)
Metagenomics of Antarctic Lakes: a Model for Defining Microbial Biogeochemical Processes in the Cold
Entry ID: ASAC_2899

Abstract: Metadata record for data from ASAC Project 2899
See the link below for public details on this project.

We conducted a genomic analysis of Archaea and Bacteria collected from lakes in the Vestfold Hills, Antarctica. This provided a new level of understanding about the life forms inhabiting these cold lakes. Linked to knowledge of meteorological, geological, chemical and physical data that has been collected over years of previous research, the new genomic data will generate a complete understanding of how the microorganisms have evolved and how they have transformed and presently interact with the Antarctic environment. Deriving an integrated understanding of microbial ecology is essential for determining ways of preserving the health of the World's ecosystems.

The data are available for download as an excel spreadsheet and a word document from the URL given below.

The GPS coordinates where samples were collected from are as follows:

(Note these are UTM (Universal Transverse Mercator) coordinates, from zone 44D)

Ace Lake: 44D 0384881 (easting), 2401821 (northing)
Deep Lake: 44D 0385351, 2391772
Organic Lake: 44D 0384928, 2403550

The fields in this dataset are:

Water temperature - degrees Celsius

Specific conductivity - micro Seimens per centimetre

Conductivity - micro Seimens per centimetre

Salinity - parts per trillion

Dissolved oxygen % - %

Dissolved oxygen concentration - milligrams per litre

Dissolved oxygen charge - This is an engineering value. The value is unit less, the recommended reading is 50 plus or minus 25. If you have a low reading it generally means you need to replace the membrane and if you have a high reading you need to recondition the probe.

PressureA (This a depth reading of the Sonde) - (pounds-force per square inch absolute)

Water depth - metres


pHmV (This is the pH millivolt reading that the probe is outputting the Sonde) - millivolts

Turbidity - (nephelometric turbidity unit)

BP (Barometric Air Pressure) - psi (pounds per square inch)

Taken from the 2008-2009 Progress Report:
Progress against objectives:
New lake and ocean samples, including additional opportunistic samples from Heard Island, were obtained Oct-Dec 2008. All samples from 2006 forward are being processed. This includes DNA (metagenomics) and protein (proteomics). A great deal of bioinformatic analyses have been performed on metagenome data. Metaproteomics has also proceeded well. Details of some of the progress are as follows:

In the reporting period 1,064,488 Sanger sequencing reads were produced with 967,410 passing quality control, which at an average of 700bp provided 677Mb of sequence data. The reads were produced in batches for each sample. We generated assembly statistics and phylogenetic profiles after the completion of each batch. Sample diversity then guided the sequence allocation for each sample. A number of pragmatic software tools have been created to perform the analyses. As an example, for one sample the whole sample assembly was characterised by read depth, GC content, di-nucleotide frequency (Tetra) and tri-nucleotide frequency (Tetra) on a per scaffold basis. The intrinsic properties then formed vectors in a feature space on which a self-organising map clustering analysis was performed. The cluster which comprised the most abundant species was isolated and the genes annotated. This represented 9 contigs with a total of 1.7Mb and 1683 predicted genes. For this sample, proteins were extracted and metaproteomics performed resulting in a total of 3970 confident peptides matched providing identities for 504 proteins (at least 2 peptide matches per protein) representing about 30% coverage. In comparison, a total of 170 proteins were identified against the non-redundant database.

In other metaproteomic analyses, samples from 4 lake depths provided a total of 7,925 peptides providing the identification of 1015 proteins against the NCBI non-redundant protein database (matches not yet performed to annotated metagenome data). For testing detection limits and accuracy of identifications using a metaprotomics approach, a simulated mixed community study was performed using S. alaskensis and E. coli. This has shown that cell numbers, protein abundance and cell volumes all impact the ability to detect proteins of individual microorganisms within a population. The type and size of the database the metaproteomic dataset is searched against (non-redundant versus S. alaskensis + E. coli protein database) also resulted in differences in protein detection. The work has been useful for optimising parameters used for metaproteomics of the Antarctic samples.

An interesting eukaryotic virus that dominates the biomass of one of the samples is being analysed with the present work focusing on classifying and characterising. Transmission electron microscopy of the water sample revealed virus-like particles of approximately 150nm but it was unclear from morphology if they represented a single virus type or several. Two complementary metagenomic assembly approaches are being used to produce the most complete assembly possible of the large viral sequences. The first assembly strategy follows a conventional metagenomic workflow consisting of assembly of the whole metagenomic dataset followed by taxonomic binning of the constructs. An initial assembly has been constructed after determining the optimum acceptable degree of error. A high degree of assembly was evident with the largest scaffold spanning 108kb with 6 X coverage. A BLASTx search of the five largest contigs (greater than 10kb) produced two alignments to Major Capsid Protein (MCP) genes; one to the short MCP gene of Chyrsochromulina ericina virus (28% identity) and the other to the full MCP gene of Phaeocytis pouchetii virus (76% identity). Sequence flanking the full MCP gene corresponds to conserved hypothetical protein sequences from Ostreococcus virus 5 (45% identity) and Paramecium sp. Chlorella virus AR158 (39% identity). These large deeply assembling contigs will be used to 'tune' the parameters to improve assembly of the entire metagenome. A preliminary attempt to bin the scaffolds using tetra nucleotide frequencies from the initial assembly has not completely resolved into clear taxonomic clusters. A multi-dimensional binning approach including sequence coverage, GC content, nucleotide frequencies along with identification of marker genes is being developed and will be applied once an optimum whole metagenomic assembly has been completed. Although the presence of conserved genes is a promising sign of accurate assembly, validation of the scaffolds by comparison to sequenced virus genomes is uninformative as viruses are poorly represented in the public databases and extremely diverse. Instead, a second assembly strategy is underway that will conservatively extract and compile the viral sequence. The reads assigned in an initial MEGAN analysis to the large dsDNA viral clade were used in a preliminary round of assembly. This first assembly will be used as a reference to recruit more overlapping fragments and combined in another round assembly extending the construct from the high confidence 'seeds'. Cycles of recruitment and assembly will continue until the assembly reaches an end point. This is a new method of assembly that potentially can be used to extract and produce confident assemblies of other species with no sequenced representatives. Comparison between this virus specific assembly and the conventional metagenomic assembly will allow evaluation of the fidelity of both processes.

Related URL
Link: View Related Information
Description: Download point for the dataset.

Link: View Related Information
Description: Public information for ASAC project 2899

Link: View Related Information
Description: Citation reference for this metadata record and dataset

Geographic Coverage
 N: -67.0 S: -68.0  E: 79.0  W: 77.0
 N: -53.11 S: -53.19  E: 73.59  W: 73.51

Data Set Citation
Dataset Creator: Cavicchioli, R.
Dataset Title: Metagenomics of Antarctic Lakes: a Model for Defining Microbial Biogeochemical Processes in the Cold
Dataset Series Name: CAASM Metadata
Dataset Release Date: 2016-08-25
Dataset Publisher: Australian Antarctic Data Centre
Online Resource:

Temporal Coverage
Start Date: 2006-10-01
Stop Date: 2009-03-31

Location Keywords

Science Keywords

FIELD SURVEYS    [Information]

The data from Ace Lake were collected using a YSI Sonde 6600 Multifunction Probe the 20th of December, 2006.

Biomass collection was collected between December 20-25. In addition to Ace Lake, surface water samples from Organic and Deep lakes for filtering were also collected. The biomass samples were collected using a Millipore Filter System with three 293mm Filter Disk Holders with sequential 3.0, 0.8 and 0.1 micrometer filters. For concentrated filtrate samples we used a Millipore Pellicon-2 Tangential Flow Filtration System with a Pellicon II Ultrafiltration Biomax 50kD Cassette. The sediment was sampled with an Eckman sediment grab.

Taken from the 2008-2009 Progress Report:
Variations to work plan or objectives:
Samples were opportunistically obtained from Heard Island. This will broaden the scope of the project in a positive way.

Difficulties affecting project:
The logistics of the season were challenging - both in terms of the impact of the medical incident at Davis in October, and the unusually overcast and windy weather experienced at Davis during our field period (5 periods with wind exceeding 100 km/h). Despite the challenges our field work was remarkably successful.

Access Constraints
These data are publicly available for download from the provided URL.

Use Constraints
This data set conforms to the PICCCBY Attribution License (

Please follow instructions listed in the citation reference provided at when using these data.

vestfold hills

Data Set Progress

Originating Center
Australian Antarctic Division

Data Center
Australian Antarctic Data Centre, Australia    [Information]
Data Center URL:

Data Center Personnel
Phone: +61 3 6232 3244
Fax: +61 3 6232 3351
Email: metadata at
Contact Address:
Australian Antarctic Division
203 Channel Highway
City: Kingston
Province or State: Tasmania
Postal Code: 7050
Country: Australia

Distribution_Media: HTTP
Distribution_Size: 65 kb
Distribution_Format: Excel, Word
Fees: Free

Phone: +61 2 9385 3516
Email: r.cavicchioli at
Contact Address:
School Of Microbiology and Immunology
University Of New South Wales
City: Sydney
Province or State: NSW
Postal Code: 2052
Country: Australia

Allen, M., Lauro, F.M., Williams, T.J., Burg, D., Siddiqui, K.S., De Francisci, D., Chong, K.W.Y., Pilak, O., Chew, H.H., De Maere, M.Z., Ting, L., Katrib, M., Ng, C., Sowers, K.R., Galperin, M.Y., Anderson, I.J., Ivanova, N., Dalin, E., Martinez, M., Lapidus, A., Hauser, L., Land, M., Thomas, T. and Cavicchioli, R. (2009), The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold-adaptation., International Society of Microbial Ecology Journal, in press

Cavicchioli, R. and Lauro, F. (2009), Effects of climate change on polar microbes., Microbiology Australia, 30, 2
Extended Metadata Properties
(Click to view more)

Creation and Review Dates
DIF Creation Date: 2007-04-25
Last DIF Revision Date: 2017-08-23