IFieldExtractor

public interface IFieldExtractor

Provides methods for extracting fields from a document.

Learn more

The example demonstrates how to implement the interface.

 ```

 public class LogExtractor implements IFieldExtractor {
     private final String[] extensions = new String[] { ".log" };
 
     public final String[] getExtensions() { return extensions; }
 
     public final DocumentField[] getFields(String filePath) {
         File file = new File(filePath);
         DocumentField[] fields = new DocumentField[] {
             new DocumentField("FileName", file.getAbsolutePath()),
             new DocumentField("Content", extractContent(filePath)),
         };
         return fields;
     }
 
     private String extractContent(String filePath) {
         StringBuilder result = new StringBuilder();
         try {
             List lines = Files.readAllLines(Paths.get(filePath), StandardCharsets.UTF_8);
             for (int i = 0; i < lines.size(); i++) {
                 String line = lines.get(i);
                 String processedLine = line.substring(12);
                 result.append(processedLine);
             }
         } catch (IOException ex) {
             throw new RuntimeException(ex);
         }
         return result.toString();
     }
 }
 
```
 

The example demonstrates how to use the custorm extractor for indexing.

 ```

 String indexFolder = "c:\\MyIndex\\"; // Specify path to the index folder
 String documentsFolder = "c:\\MyDocuments\\"; // Specify path to a folder containing documents to search
 Index index = new Index(indexFolder); // Creating or loading an index
 index.getIndexSettings().getCustomExtractors().addItem(new LogExtractor()); // Adding custom text extractor to index settings
 index.add(documentsFolder); // Indexing documents from the specified folder
 
```
 

Methods

Method Description
getExtensions() Gets the supported extensions.
getFields(String filePath) Extracts all fields from the specified document.
getFields(InputStream stream) Extracts all fields from the specified document.

getExtensions()

public abstract String[] getExtensions()

Gets the supported extensions.

Returns: java.lang.String[] - The supported extensions.

getFields(String filePath)

public abstract DocumentField[] getFields(String filePath)

Extracts all fields from the specified document.

Parameters:

Parameter Type Description
filePath java.lang.String The document file path.

Returns: com.groupdocs.search.common.DocumentField[] - The extracted fields.

getFields(InputStream stream)

public abstract DocumentField[] getFields(InputStream stream)

Extracts all fields from the specified document.

Parameters:

Parameter Type Description
stream java.io.InputStream The document stream.

Returns: com.groupdocs.search.common.DocumentField[] - The extracted fields.