Commonly Used Document Review Terms
This glossary serves as a Document Review 101 guide to teach you some of the basic terms involved in this portion of discovery. These key words and phrases will allow you to talk the same language as your account executives and project managers and give you a greater understanding behind some of their questions, recommendations and decisions.
Archival Data
Digital information an organization holds for long-term storage, which is not immediately available or directly accessible to the user of a computer. This information is stored on backup tapes or other removable media for record keeping and disaster recovery purposes.
Back to top
Bandwidth
Your connection speed; Amount of information or data that can be sent over a network connection within a certain period of time. This is typically measured in bits per second (bps) or megabits per second (Mbps).
Example: If you were to download 100MB worth of documents, you can expect it to take the following amount of time based on these sample bandwidths:
| BANDWIDTH |
CAPACITY |
DOWNLOAD TIME |
| 56 K (Modem) |
56,000 bps |
4 hours, 9 minutes and 39 seconds |
| 256 K (DSL) |
256,000 bps |
1 hour |
| T1, DS-1 |
1.544 Mbps |
9 minutes and 3 seconds |
| T1, DS-3 |
44.736 Mbps |
18 seconds |
Back to top
Bates Number
A unique number assigned to each page of a document within a paper or near-paper production set. Bates numbering, named for the Bates automatic numbering machine patented in the 1890’s, allows for the reference or retrieval of any page within a document review. (See Hash Value.)
Back to top
Cloud Computing
In its simplest form cloud computing is used to describe services that are hosted over the internet and are generally pay-as-you-go. There are three main categories of cloud computing: 1) SaaS or Software as Service, which is software offered by a third-party vendor that can be accessed over the internet; 2) PaaS or Platform as a Service, which allows the user to develop new platforms that is hosted on the provider’s infrastructure accessed through the internet; and 3) IaaS or Infrastructure as a Service, which can be best described as renting storage space or computing power through a secure internet connection. A cloud service can either be private or public. A public cloud provider only sells its services to a select group of users while a public cloud provider sells it services to anyone over the internet.
Back to top
Coding Documents
The process of capturing pertinent information from a document as it relates to your litigation. This is typically used to segregate documents into three categories: potentially privileged, potentially responsive or potentially non-responsive. Further issues codes specific to the matter can be captured at that time.
Back to top
De-duplication or De-duping
The process of comparing documents and other files based on their characteristics and removing duplicate or nearly-identical records from the data set. De-duping removes excess files from having to be reviewed, thus reducing the overall discovery cost.
Back to top
Early Case Assessment (ECA)
Early Case Assessment (ECA) can have different meanings relative to where in the discovery lifecycle it is deployed. ECA as an investigational tool is a methodology or business process during which a party or potential party to a lawsuit evaluates the merits of the underlying matter. ECA in this setting would involve conducting a risk-benefit analysis, issuing legal holds and taking the necessary steps to preserve potentially relevant data (the Zubulake duty). Further down the discovery timeline, ECA deployment would involve the use of specialty software to help cull data collections so that fewer documents are processed for review and production. Collectively, the use of ECA as a business process and as a software solution can provide significant cost savings.
Back to top
ESI - Electronically Stored Information
This is defined by the Federal Rules of Civil Procedure as computer-based information and other digitally stored data which includes writing, drawings, graphs, charts, photographs, sound recordings, images, and other data or data compilations stored in any medium from which information can be obtained.
Back to top
Extracted Text
The underlying text of a native file. Extracted text is more accurate than OCR and therefore provides highly defensible search results.
Back to top
Family Relationship
The various connections between two or more documents.
Family Range
The range of documents from the first Bates production number of the first page of the parent document through the last Bates production number of the last page of the last related document.
Parent/Child Relationship
The chain of documents that originate from one email or file folder. Note that a document may be both a parent and child depending on its relationship with surrounding documents. Example: In an email with a file attached, the email is the “parent” and the attached file is the “child.”
Sibling
A document or file that shares a common parent. Example: In an email with two files attached, the email is the “parent” and the attachments are the “siblings.”
Orphan
A document that was created as a child and subsequently separated from its parent. Example: An emailed attachment without the email or a faxed letter with a fax transmittal sheet.
Back to top
Forensic Collection
The collection of ESI using special software and techniques to create a bit by bit image of a computer or other electronic device. Careful planning and strict protocols are critical during this process to avoid spoliation.
Back to top
Hash Value
Hashing assigns a unique value to any electronic file that can demonstrate (or even authenticate) that the file has not been modified. It is like a digital fingerprint for ESI. Hash values have slowly started to replace Bates numbering during document productions. (See Bates Number.)
Back to top
Metadata
Commonly referred to as data about data, metadata is various layers of information for a native file such as author, date created or time sent, etc. Think of this as the DNA code of a document. When document review software analyzes your files it looks at metadata, which had been automatically encoded into your files when they were created.
Back to top
Native File
A file saved in the format of its original application. Reviewing native files can save money related to per page costs for traditional TIFF or PDF processing. Example: The native files of a PDF document that was originally created in Microsoft Word would have the .doc file extension.
Sample native files include:
- Microsoft Word (.doc)
- Microsoft PowerPoint (.ppt)
- Adobe Photoshop (.psd)
- Microsoft Excel (.xls)
- Microsoft Publisher (.pub)
- Adobe Illustrator (.ai)
Back to top
Optical Character Recognition (OCR)
Technology used to scan static files and images to identify text. OCR software turns otherwise unsearchable letters, numbers and other characters into editable documents. (See Extracted Text.)
Back to top
Sampling
The process of pulling random documents from a data set and statistically testing them to assess the frequency or likelihood of relevant information. Sampling can be a critical step in establishing the validity of the search processes, determining which sets of data should be preserved and reviewed, forecasting the time it will take to perform a legal document review and subsequently projecting the cost of the review.
Back to top
Spoliation
The accidental or sometimes intentional destruction of evidence that may be relevant to a pending or reasonably anticipated law suit or investigation. In electronic discovery, spoliation typically refers to how metadata is captured and preserved during collection and processing.
Back to top
Static File
A fixed image of a native file, typically a TIFF or PDF, which reflects a picture of a native file at a particular point in time.
Sample static files include:
- Bitmap / BMP
A bitmap, or bitmap image, is a file format that stores color data for each pixel in an image without compressing it. Because the image is not compressed, the file size tends to be large. However, it is sometimes preferable to other image storage formats as it reproduces crisp, high-quality graphics. Files in this format have a .bmp extension.
- JPEG – Joint Photographic Experts Group
Named for the team that developed this standard, the JPEG renders high-resolution photographs and other graphic images in a compressed format. Files in this format have a .jpg extension
- PDF – Portable Document Format
A widely-used format for sharing electronic documents. Documents are rarely created in PDF, but instead saved in this format to preserve the image of what was created in another software program. PDF files may retain metadata from the native file. Files in this format have a .pdf extension.
- TIFF – Tagged Image File Format
A format in which scanned documents and electronic files are saved as images. Files in this format have a .tif extension.
Back to top