Friday, August 18, 2017

GPG key administration for Middle-ware Administrators

GnuPG is a tool for secure communication. Using this tool administrators can create public/private key pair which can be used by to encrypt and decrypt critical information. GnuPG uses public-key cryptography so that users may communicate securely. In a public-key system, each user has a pair of keys consisting of a private key and a public key. A user's private key is kept secret; it need never be revealed. The public key may be given to anyone with whom the user wants to communicate.



In this article I intend to provide some basic steps to create and use such key pairs. Also I will try to provide some very informative and useful links pertaining to this topic. Often you may receive requests to generate key pairs required for PGP encryption. How do you do that ?

Ok ! first things first. In order to create the required key pairs you will need to install GnuPG on the system you intend to encrypt the message on. You can also install GnuPG on your local machine but then you will need to import the key to the source system where you will be encrypting the message.Installation of GnuPG will create a key ring which stores the below.


  • pubring.gpg     # stores the public keys
  • secring.gpg     # has your secret keys
  • trustdb.gpg     # the levels of trust for signed keys


On windows the key ring was stored at C:\Users\<>\AppData\Roaming\gnupg. On Linux based system the default location is .gnupg (hidden directory) under the OS user default directory location.You will need to use ls -la option to view this directory on Linux/Solaris systems.

Now for installing GnuPG tool I will highly recommend you follow this blog. I feel the blog is very detailed and takes you through the below topics.


  • Downloading the software
  • Installing the software
  • Generating the key pair
  • Exporting Public/Private keys
  • Obtaining private keys
  • Importing private/public keys
  • Encrypting/Decrypting message


The blog shows how to do the above using Kleopatra a GUI tool for certificate management. I personally like the tool since it is very intuitive and easy to use. Same will be installed while you install GnuPG (mentioned in blog above. In case you want to use command line to do the above (like on UNIX based systems) you can easily do so by using commands mentioned here.

The GNU Privacy handbook talks of all these steps and even more. I highly recommend you go through the content to understand more on this subject. The documentation is available here 

I hope this post was helpful. Please let me know in the comment section ! 

Happy Learning
Soumya Mishra

Tuesday, August 8, 2017

Received fatal alert: handshake_failure error while making outbound connection with TLS version v1.2 [TLSv1.2] using Java 1.7.x

The below described issue affects the below version of SOA/Weblogic install:

  • Oracle SOA Suite - Version 11.1.1.6.0 to 11.1.1.9.0 [Release 11gR1 to 11g]
  • Oracle WebLogic Server - Version 10.3.6 and later


While trying to make an outbound connection using TLS 1.2 from SOA 11.1.1.7 running on WLS 10.3.6/JDK 1.7U80 as per my article here one gets the below error.

"javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure"

As per the article and Oracle notes below startup argument should help enabling the TLS 1.2 protocol to make the outbound connection. Below startup argument will support/enable TLS 1.0 to TLS 1.2 protocols for out bound connections made from the server.

-Dhttps.protocols="TLSv1,TLSv1.1,TLSv1.2"

However there is a bug that the system may in most cases run into. The bug 22612527 may cause JVM to ignore above flag.

How do we solve this ?

Option A:

Install Patch 22612527 (Please Note: required Patch 13866584 is needed before installing 22612527)

After installing the patch, add -Dhttps.protocols="TLSv1.2" to the SOA JVM startup arguments and test if the issue is gone ! This approach is recommended in case you do not want to upgrade the JDK and retest the code.

Option B:

Upgrade Java to version 1.8 or 1.7 131 b12 (or greater) both of these versions use TLSv1.2 by default.

Voila, you just solved a critical issue and turned your Integration server into a more secure system ! Please let me know in the comment section if this article helped you anyways.

Happy Learning ...

Soumya Mishra

Saturday, July 29, 2017

Disabling SSL 2.0/3.0 and enabling TLS 1.0 or higher in Weblogic 10.3.6 & JDK 7

SSL in Weblogic
Secure Sockets Layer (SSL) provides secure connections by allowing two applications connecting over a network to authenticate each other's identity and by encrypting the data exchanged between the applications.
SSL in WebLogic Server 10.3.6 is an implementation SSL 3.0 and Transport Layer Security (TLS) 1.0 specifications.
Certicom is currently the default SSL implementation in Weblogic Server.
JSSE may be enabled as an alternative SSL implementation.



So why disable SSL 2.0/3.0?
Oracle WebLogic Server should configured exclude SSL 2.0 and/or SSL 3.0 to in order to mitigate the Poodle vulnerability. This often comes as a direction from security teams. The recommended protocols are TLS 1.0 and preferably TLS 1.2 which is more recent.

What protocol to use if not SSL 2.0/3.0?
Transport layer Protocol (TLS) 1.0 and preferably 1.1 or 1.2.
Key Question 1 - So what configurations would a Middleware Admin make to disable SSL V2/V3 support on Weblogic and enable TLS protocols which are safer and recommended by Oracle?
Key Question 2 - How do we enable Weblogic to use TLS 1.0 and above SSL protocol for inbound and outbound connections?

Let’s try and answer it.

Now assuming the Weblogic version is 10.3.6 and JDK used is 1.7 plus below is some facts to know and consider before making changes.
      Before 10.3.3 (11g), Certicom SSL was the only SSL implementation.
      In 10.3.3 thru 10.3.6 (11g), Certicom SSL is the default SSL implementation, with JSSE available by enabling a property switch.
      TLS 1.1 and 1.2 is supported with a combination of JDK 7 Update 1 (or later) and JSSE enabled
      TLS 1.0 is supported on all releases using either Certicom or JSSE implementation
      Weblogic Server versions 10.3.6 and 12.1.1 and later are certified with JDK 7 in order to enable JSSE and TLS 1.1/1.2

Inbound
Dweblogic.security.SSL.protocolVersion=TLS1
The interpretation of this property is different depending on whether the Certicom or JSSE implementation is used.
q For Certicom, setting -Dweblogic.security.SSL.protocolVersion=TLS1 enables only TLS 1.0.
q For JSSE, setting -Dweblogic.security.SSL.protocolVersion=TLS1 enables any protocol starting with "TLS", for example TLS 1.0, TLS 1.1, and TLS 1.2.
You may also disable older protocols by configuring a higher minimum protocol. For example, to gain TLS 1.1 and 1.2 support, (if supported by the JDK version), use the following as a JAVA_OPTION:
      -Dweblogic.security.SSL.minimumProtocolVersion=TLSv1.1

Outbound
 To enable it, you need to use the following -D flags :
       -Dhttps.protocols="TLSv1.2“
Or as a list of choices (handshake is first attempted at the highest level protocol):
       -Dhttps.protocols="TLSv1,TLSv1.1,TLSv1.2"
You may also disable older protocols by configuring a higher minimum protocol.
      Add -Dweblogic.security.SSL.minimumProtocolVersion=TLSv1.2
You may also have applications running as a client, (e.g. web services, scripts, or command line) for an outbound ssl connection. Within a Fusion Middleware environment there are also internal processes running where an ssl connection is made, (e.g. OPMN, DMS, EM/FMW Control). To control the outbound connections the following system property is available:

       -Djdk.tls.client.protocols=TLSv1.0,TLSv1.1,TLSv1.2

Hope this post has been helpful to you. I have tried answering the two questions I put to start our discussion.
For understanding the changes needed for disabling SSL V2/3 or enabling TLS on Weblogic 12C/JDK8 please refer here

Made the above changes and still not able to connect using TLS 1.2? Do you get the below error ? 

"javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure"

In case yes please refer to to the solution here.
Please feel free to ask any questions you may have in the comment section. Keep learning and spread the word! 


Soumya

TLS Support on Weblogic 12C & JDK8


SSL in Weblogic
Secure Sockets Layer (SSL) provides secure connections by allowing two applications connecting over a network to authenticate each other's identity and by encrypting the data exchanged between the applications.
SSL in WebLogic Server 12.1/12.2 is an implementation of the Transport Layer Security (TLS) 1.2 specifications (backward compatible hence 1.0 & 1.1 supported)
JSSE is currently the default SSL implementation in Weblogic Server. (Certicom deprecated and supported in Weblogic 10.3.6)


So why are we having this discussion?
Oracle WebLogic Server should configured exclude SSL 2.0 and/or SSL 3.0 to in order to mitigate the Poodle vulnerability. This often comes as a direction from security teams. There were configurations needed to be done on Weblogic 10.3.6 and JDK7 installs to exclude these protocols. 
So are there any such configurations needed to be done for Weblogic 12c (12.1 & 12.2) installed with JDK 8? For Weblogic 10.3.6 and JDK 1.7 please refer to my post here.

So what’s the answer?

The answer is NO. Let’s talk in a bit detail here. I will be talking about both Inbound and Outbound connections.

Inbound
  • JDK 8 will use TLS 1.2 as default (No external setting needed) 
  • Supports TLS 1.0/1.1 as well – (backward compatible)
  • You may also disable older protocols by configuring a higher minimum protocol. For example, to gain TLS 1.1 and 1.2 support, (if supported by the JDK version), use the following as a JAVA_OPTION:
         -Dweblogic.security.SSL.minimumProtocolVersion=TLSv1.1 
Outbound
  • The JDK 8 default allows both TLS 1.1 and 1.2 by default. 
  • You may also set a minimum by removing the older versions, but it is important to consider the external servers the application is connecting to
  • The protocol will always be negotiated to the highest supported level between the client and server.
  • Set a minimum by removing the older versions as shown below (let's say you want to not support TLS 1.0).
         -Djdk.tls.client.protocols=TLSv1.1,TLSv1.2
So in short if your,Weblogic/JDK versions are 12.2.1/1.8,the default SSL implementation is JSSE and the default TLS version supported is TLS 1,2.TLS 1.0/1.1 are also supported (since backward compatible). 
Hence unlike Weblogic 10.3.6/1.7 we need to set no extra JAVA parameters to disable SSL V2/V3.Use above parameters highlighted in yellow only if you want to restrict certain older TLS versions. The 12C/JDK1.8 install will support all TLS versions (1.0 to 1.2). The protocol will always be negotiated to the highest supported level between the client and server.

Please feel free to ask any questions you may have in the comment section. Keep learning and spread the word! 

Soumya


Thursday, July 13, 2017

Connection Testing cannot be enabled for a pool when the ManagedConnectionFactory does not implement ValidatingManagedConnectionFactory




Applies To: Oracle SOA Suite/Weblogic 10g,11g,12C

I was trying to set below set below JCA Adapter connection pool properties to true. The values I was setting them to is mentioned below.

Test Frequency Seconds: 300 (Default 0)
Test Connections On Reserve: True (Default False)

I was trying to set it for a Oracle MQ Series connection pool. The customer reported that the MQ adapter stop picking message from the Websphere MQ once the system went down for maintenance e.g. queue manager unavailable. After confirming the MQ system was up and running the connecting SOA interface/composite had to be restarted to make the system pick messages. With my past experience with datasource connection pools I thought I could fix this issue by forcing the adapter to validate connections to MQ system periodically. I could easily achieve this by setting above connection pool parameters. Ideally above setting mean that the MQ connection pool checks for the MQ system to be present every 300 seconds and only passes valid connections to application when it tries to reserve a connection to the MQ system. Theoretically I got it all figured out until I applied the settings and the logs reported below issue ! 


weblogic.management.DeploymentException: Connection Testing cannot be enabled for a pool when the ManagedConnectionFactory does not implement ValidatingManagedConnectio
 nFactory. The following invalid settings have been detected:test-frequency-seconds is set to non-zero
   [Connector:199167]test-connections-on-reserve is set to true


The adapter would not come up till I reverted my settings back ! So what happened? 

On further research (googling of course) I could find that only if a resource adapter's ManagedConnectionFactory implements the ValidatingManagedConnectionFactory interface, then the application server can test the validity of existing connections. Refer to Testing Connections section here

 As per Oracle Metalink doc ID 957853.1 Oracle MQ adapter does not use above stated interface hence as per the error in logs connection validity can't be tested ! Hence enabling connection test setting like below for the MQ series adapter won't work as expected and you will get above mentioned error.

Test Connections On Create:
Test Connections On Reserve:
Test Connections On Release:

So now the question remains, is there a way we can work around this known limitation? Well I think it is possible. Oracle Metalink doc ID 957853.1  declares this issue as Bug 8913481 & Bug 8918056. The note prescribes applying patch 8918056 (MQAdapter fix) to fix above explained issue. But the patch applies to older version of BPEL (10.1.3.4) 

I would encourage you to present this analysis to Oracle and request a patch for above mentioned bugs for newer versions of Oracle SOA Suite/Middleware like 11g/12C. I got the above issue on Oracle SOA Suite version 11.1.1.7.8. I reached out to Oracle and they asked me to apply the patch for bug 21689260.

On a side note this issue is also relevant for other Oracle adapters available with SOA Suite. As per Oracle Metalink Doc ID 1282064.1 testing of the Apps adapter and DB adapter connection factories via the WebLogic console is not supported since the ManagedConnectionFactory of both adapters does not implement ValidatingManagedConnectionFactory due to performance considerations (during runtime) and due to the fact that the DB/Apps JCA connection is just a shallow wrapper around a data source connection handle.

Soumya Mishra



Thursday, June 29, 2017

JMS Store declared unhealthy and unavailable: start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

So what happened? 

The SOA JMS Store (Handled as an XA Resource by WLS) went unavailable (declared Unhealthy)  for 30 minutes. During this time the logs were full of below errors.

start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

After 30 minutes the persistent store was available.After the store was available, JVM was filled with pending/back log messages which resulted in Full GC condition. The Full GCs are Stop the World and rendered the JVM inaccessible for application use. To make the server accessible and available for application work to proceed had to restart the SOA server.

Above error/issue applies to Oracle SOA Suite 11g/12C.

What were the error in logs?

The JTA health state has changed from HEALTH_OK to HEALTH_WARN with reason codes: Resource WLStore_XXX_base_domain_SOAJMSFileStore declared unhealthy

start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

Exception occured when binding was invoked.
Exception occured during invocation of JCA binding: "JCA Binding execute of Reference operation 'Produce_Message' failed due to: ERRJMS_PROVIDER_ERR.
ERRJMS_PROVIDER_ERR.
Unable to produce message due to JMS provider internal error.
Please examine the log file to determine the problem.
".
The invoked JCA adapter raised a resource exception.
Please examine the above error message carefully to determine a resolution.
" . Root cause :
javax.transaction.SystemException: start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable
javax.transaction.xa.XAException: Internal error: XAResource 'WLStore_XXX_base_domain_SOAJMSFileStore' is unavailable


So why was the persistent store declared unhealthy?

By default, if an XA resource that is participating in a global transaction fails to respond to an XA call from the WebLogic Server transaction manager, WebLogic Server flags the resource as unhealthy and unavailable, and blocks any further calls to the resource in an effort to preserve resource threads. The failure can be caused by either an unhealthy transaction or an unhealthy resource—there is no distinction between the two causes. In both cases, the resource is marked as unhealthy (Doc ID 1484996.1)

Here JMS store/XA resource has not responded to a request from the WebLogic Transaction Manager for 120 seconds "MaxXACallMillis." When this happened, the WLS Transaction Manager marked that XA resource as unhealthy and then stopped all further communication to that XA resource until the time "MaxResourceUnavailableMillis"  passed, which is set to 30 mins (in a default install)

Q. Why did the persistent store go inaccessible for 30 minutes? 

A. MaxResourceUnavailableMillis  defines the maximum duration (in milliseconds) that an XA resource is marked as unhealthy. This is by default set to 30 Minutes After this duration, the XA resource is declared available again.

Q. Why did the  JMS Store not respond to transaction manager on time?

A. There could be various reasons. It could be because –

1. As per Oracle Note# 1358303.1 which has the same error code we faced - file store itself had an issue. It had grown very big, so it was showing as unhealthy and compromising the JTA health as it is a participating resource in the complete transaction. 

2. There could be a minor NW Issue that would have caused accessibility issue between server and JMS Store which resides on disks. I could not see anything in logs regarding NW connectivity errors so far.

3. The JMS store could be busy processing other transactions and would need more time to respond than configured as per MaxXACallMillis. Talk to developers and understand the code design and see how busy the JMS queues/topics are?

Q. What are the tuning recommendations to prevent this error/issue in future?

1. Set WLS domain parameter MaxResourceUnavailableMillis to lesser minutes from existing 30 minutes, I would start with 10. (This recommendation is as per Metalink Note # 1320141.1). This will ensure the WLS resources are tried for availability after 10 minutes instead of current 30, hence causing minimal system downtime. This will also cause less messages to queue up for processing once the store comes back up in case of similar failures in future. Fewer back logs will prevent server to go into long duration GCs which happened in above case.

2. In case you see this issue reappearing and anticipate a busy store, increase MaxXACallMillis to 3 Minutes and see if the issue reappears. By making this change we will allow more time to store to respond before being declared unhealthy. Keep tuning this parameter until you see optimal performance in your environment based on the application design and usage. Again no one size fits all, so try coming up with number that will work for your environment/application.

3. Compacting the file store would help to compact and fragment the space occupied by the file store. The compact command does not delete current data, and only works when the WebLogic Server that hosts the store is off-line. Make sure you back up the old store file before you run the compact command. Refer here to see how you can run the compaction commands.

4. In most situations, file stores do not grow too large. After a message is consumed, it is deleted from the file store and the space it consumed is made available for other messages. However, if too many messages are stored in the file store so that the file store keeps getting too large repeatedly, then we must set lower quotas so that producers are blocked from sending more messages into the destination until the consumers have consumed and deleted the message. Note that it is recommended that JMS configurations should configure quotas on each JMS server.  The quota can be set based on application requirements. I will try and discuss this at length in another post. 


Please let me know in the comment section if above tuning helped you. I will be glad to listen to your stories and experiences. Happy learning ! 

Soumya Mishra 


Tuesday, August 23, 2016

Faster Weblogic Server Startup on Linux VM


Problem


Weblogic server running on a Linux Virtual Machine is stuck while startup. For instance a Oracle SOA 12.2.1 weblogic manage server may take upto 12 minutes to startup on a Linux VM. A Weblogic 12C manage Server running nothing may take upto 4 minutes to start. The virtual machines are all equipped with enough CPUs and memory. 


Applies To 


Weblogic 12.1.x,12.2.x
RHEL X86,X86_64 Virtual Machines


Cause


Linux has two devices to provide random data at any time: /dev/random and /dev/urandom. Both ways should be secure enough to use them in generating PGP keys, ssh challenges, and other applications where secure random numbers are required. Starting on kernel 2.6, default entropy is 4096 bits and problem arises when the entropy available on the system is minimum (around 100 bits or less).


How to verify if you are encountering this issue?


1. Check the default system entropy

$ cat /proc/sys/kernel/random/poolsize 
4096

2. Check the available entropy.

$ cat /proc/sys/kernel/random/entropy_avail 
160

3. On previous example, entropy is too low.

Monitor the current entropy of the system by using the following command:


$ for i in $(seq 500); do cat /proc/sys/kernel/random/entropy_avail ; sleep 5; done

4. Start a WebLogic server instance. You should see that entropy decreases or stalls (use script in step 3)

Solution


1. Temporary Solution (Use for testing purpose)

Start the WLS Server with below startup arguments.

-Djava.security.egd=file:/dev/./urandom

Override the JAVA_OPTIONS environment variable before starting WebLogic Server via shell scripts.

export JAVA_OPTIONS="${JAVA_OPTIONS} -Djava.security.egd=file:/dev/./urandom"

Start the Weblogic Server and note the timings!


2. Permanent Solution (Use if Step 1 works)

If the above solution works it is time to setup the fix permanently in the env. The fix can be applied in the JAVA_HOME the Weblogic server refers to.


i.   Edit the Java Security Properties file ($JAVA_HOME/jre/lib/security/java.security)

ii.  The securerandom.source property specifies the source of seed data for secure random.

Change
securerandom.source=file:/dev/random

To

securerandom.source=file:/dev/urandom

iii.  Save changes and start the WebLogic Server instances.

Observation


We could see the startup tiings for Weblogic server improved dramatically. For instance the SOA Manage Server now took only less than 3 minutes as against 12 minutes before the fix was applied! The bare Weblogic manage servers took less tha 20 seconds to start !



References


How to Diagnose a Linux Entropy Issue on WebLogic Server Instances (Doc ID 1574979.1)