% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vcf2tree.R
\name{vcf2tree}
\alias{vcf2tree}
\title{Generate phylogenetic tree from samples of a VCF file}
\usage{
vcf2tree(inputFile, threads = 1, verbose = FALSE, bootstrap = 0)
}
\arguments{
\item{inputFile}{Input vcf file location (uncompressed or gzip compressed).}

\item{threads}{Number of java threads to use (default 1).}

\item{verbose}{Logical. If TRUE, enables verbose output from the Java backend.}

\item{bootstrap}{Number of bootstrap replicates to perform (default 0, no bootstrapping).}
}
\value{
A \code{\link[base]{character}} vector of the generated
phylogenetic tree in Newick format.
}
\description{
This function calculates a distance matrix between the samples of a VCF file
as in \code{\link[fastreeR]{vcf2dist}}
and performs Hierarchical Clustering on this distance matrix
as in \code{\link[fastreeR]{dist2tree}}.
A phylogenetic tree is calculated with
agglomerative Neighbor Joining method (complete linkage).
}
\details{
If the \code{bootstrap} parameter is set to a positive integer, the
Java backend performs streaming bootstrap sampling of variants for the
requested number of replicates. Bootstrap support values are encoded in
the returned Newick string at internal nodes (percent support across
replicates). Note that enabling bootstrapping increases runtime and
memory usage proportionally to the number of replicates.

Biallelic or multiallelic (maximum 7 alternate alleles) SNP and/or INDEL
variants are considered, phased or not. Some VCF encoding examples are:

    \itemize{
        \item heterozygous variants : \code{1/0} or \code{0/1} or \code{0/2}
        or \code{1|0} or \code{0|1} or \code{0|2}
        \item homozygous to the reference allele variants : \code{0/0}
        or \code{0|0}
        \item homozygous to the first alternate allele variants : \code{1/1}
        or \code{1|1}
    }

If there are \code{n} samples and \code{m} variants, an \code{nxn}
zero-diagonal symmetric distance matrix is calculated.
The calculated cosine type distance (1-cosine_similarity)/2 is in the range
[0,1] where value 0 means completely identical samples (cosine is 1),
value 0.5 means perpendicular samples (cosine is 0)
and value 1 means completely opposite samples (cosine is -1).

The calculation is performed by a Java backend implementation,
that supports multi-core CPU utilization
and can be demanding in terms of memory resources.
By default a JVM is launched with a maximum memory allocation of 512 MB.
When this amount is not sufficient,
the user needs to reserve additional memory resources,
before loading the package,
by updating the value of the \code{java.parameters} option.
For example in order to allocate 4GB of RAM,
the user needs to issue \code{options(java.parameters="-Xmx4g")}
before \code{library(fastreeR)}.
}
\examples{
my.tree <- vcf2tree(
    inputFile = system.file("extdata", "samples.vcf.gz",
        package = "fastreeR"
    )
)
}
\references{
Java implementation:
\url{https://github.com/gkanogiannis/BioInfoJava-Utils}
}
\author{
Anestis Gkanogiannis, \email{anestis@gkanogiannis.com}
}
